Book Read Free

Kindle Formatting: The Complete Guide to Formatting Books for the Amazon Kindle

Page 4

by Joshua Tallent



 

CHAPTER I



 

A SHIFTING REEF

 



 


  style='font-size:13.5pt;color:black'>The year 1866 was signalised by a

  remarkable incident, a mysterious and puzzling phenomenon, which doubtless no

  one has yet forgotten. Not to mention rumours which agitated the maritime

  population and excited the public mind, even in the interior of continents,

  seafaring men were particularly excited. Merchants, common sailors, captains of

  vessels, skippers, both of Europe and America, naval officers of all countries,

  and the Governments of several States on the two continents, were deeply

  interested in the matter.



 


  style='font-size:13.5pt;color:black'>For some time past vessels had been met by

  “an enormous thing,” a long object, spindle-shaped, occasionally

  phosphorescent, and infinitely larger and more rapid in its movements than a

  whale.



  Notice that the Headings and paragraphs all have margins and other styles in the style attribute, and that there are other styles like font size and color added to the span tags. Here is what that same text would look like when it is cleaned up:

 

PART ONE



 

CHAPTER I



 

A SHIFTING REEF



 

The year 1866 was signalised by a remarkable incident, a mysterious and puzzling phenomenon, which doubtless no one has yet forgotten. Not to mention rumours which agitated the maritime population and excited the public mind, even in the interior of continents, seafaring men were particularly excited. Merchants, common sailors, captains of vessels, skippers, both of Europe and America, naval officers of all countries, and the Governments of several States on the two continents, were deeply interested in the matter.



 

For some time past vessels had been met by “an enormous thing,” a long object, spindle-shaped, occasionally phosphorescent, and infinitely larger and more rapid in its movements than a whale.



  As you can tell, this code is much cleaner and easier to understand. The formatting has been trimmed down, and all the extraneous styles and tags have been removed.

  PDF HTML

  Adobe PDF files create HTML that is even more bloated and messy than Word. I took the same Word document we used above, created a PDF from it using Adobe Acrobat, and exported it as HTML from Acrobat. Here is what it gave me:

 



 
  >PART ONE
  >
  >
  >



 



 
  >CHAPTER I
  >



 



 
  >A SHIFTING REEF
  >
  >
  >



 



 
  >The year 1866 wa
  >
  >s signalised by a remarkable incident, a mysterious and puzzling phenomenon, which doubtless no one has yet forgotten. Not to mention rumours which agitated the maritime population and excited the public mind, even in the interior of continents, seafaring men were particularly excited. Merchants, common sailors, captains of vessels, skippers, both of Europe and America, naval officers of all countries, and the Governments of several States on the two continents, were deeply interested in the matter.
  >
  >
  >



 



 
  >For some time past vessels had been met by “an enormous thing,†a long object, spindle-shaped, occasionally phosphorescent, and infinitely larger and more rapid in its movements than a whale.
  >
  >
  >



  There are a lot of differences between this output and the Word output above. First, there is a lot more code added to the file. There are more tags, more attributes, and even some added ids. Second, the line breaks are added inside the tags themselves and at odd places. Third, the curly quotes (“ and ”), which were fine in the Word document, came over as garbled text from the PDF (“ and â€).

  These differences can cause many problems as you try to clean up the code and make it more useable. This is why I suggest that you convert the PDF to Word before converting to HTML. When you go that route, the code may look something like this:

 


  style='font-size:20.0pt;color:black'>PART ONE

  style='font-size:24.0pt;color:black'>



 

CHAPTER

  I



 

A SHIFTING REEF



 

The year 1866 was signalised by a remarkable incident, a

  mysterious and puzzling phenomenon, which doubtless no one has yet forgotten.

  Not to mention rumours which agitated the maritime population and excited the

  public mind, even in the interior of continents, seafaring men were

  particularly excited. Merchants, common sailors, captains of vessels, skippers,

  both of Europe and America, naval officers of all countries, and the

  Governments of several States on the two continents, were deeply interested in

  the matter.



 


  style='font-size:13.5pt'>For some time past vessels had been met by “an

  enormous thing,” a long object, spindle-shaped, occasionally phosphorescent,

  and infinitely larger and more rapid in its movements than a whale.
  style='font-size:11.5pt'>



  This HTML is not exactly like the code we got from Word directly, but it is certainly cleaner than the HTML we got from the PDF.

  Mobipocket HTML

  Mobipocket Creator does a better job of creating clean HTML, but it also has some issues of which you should be aware.

  PART ONE


  CHAPTER I


  A SHIFTING REEF


 

The year 1866 was signalised by a remarkable incident, a mysterious and

  puzzling phenomenon, which doubtless no one has yet forgotten. Not to mention

  rumours which agitated the maritime population and excited the public mind, even in

  the interior of
continents, seafaring men were particularly excited. Merchants,

  common sailors, captains of vessels, skippers, both of Europe and America, naval

  officers of all countries, and the Governments of several States on the two continents,

  were deeply interested in the matter.



 

For some time past vessels had been met by “an enormous thing,” a long object,

  spindle-shaped, occasionally phosphorescent, and infinitely larger and more rapid in

  its movements than a whale.



  Notice that the bloat is all gone, but the heading is not in a heading tag and there are some other issues that will make formatting a bit harder to do. Overall, though, the code could be much easier to work with.

  Joining Paragraph Lines

  One thing you may have noticed in the above examples, and which you will see in your own file after you convert it into HTML, is that there are line breaks added throughout the file. These line breaks are not a problem for HTML since it will only start a new paragraph when you have a

tag; however, they do make editing the file more difficult, especially if you are using regular expressions and making a lot of changes to your file.

  The easiest way to remove these line breaks is to create a Perl script that will do the work for you, and run it on your file. Here is a simple script that will work well for that purpose:

  #!/usr/bin/perl

  my $book;

  my $in = "MyBook.html";

  my $out = "MyBook.linebreaksremoved.html";

  {

  open IN, $in;

  local $/;

  $book = ;

  }

  $book =~ s{()}{

  my $body=$1;

  $body =~ s{<(p|h[1-6]|td|li|dt|dd).*?}{

  $all = $&;

  $all =~ s/n/ /g;

  $all =~ s/ss+/ /g;

  $all;

  }gesi;

  while ($body =~ s{nn}{n}g) {}

  "$body";

  }esi;

  open OUT, ">$out";

  print OUT $book;

  This script is also available in the Book Tools section on my website.

  Removing Extraneous Styles and Tags

  The next step in cleaning up your document is to remove the unneeded styles and tags that were inserted by Word or Acrobat. Which styles and tags you remove will be completely up to you, but I highly suggest that you strip the HTML down to its most basic tags. Doing so will remedy most display problems and make the book consistent throughout.

  As you are stripping out extra tags and styles, you will want to replace them with tags and formatting that work well in the Kindle. For instance, if all of your chapter headings look like this:

 

Chapter 1



  you will want to turn them into actual heading tags in the HTML file, like this:

 

Chapter 1



  If you only turn that into a regular paragraph (

Chapter 1

) you will have to go back later and change it into a heading during the formatting stage. In other words, you need to think out your book layout a little bit before starting your cleanup, and you need to know what you want to do with the elements of your book before you completely remove a style. To that end, I highly suggest that you read Chapter 5 before starting on your cleanup so you will know what tags and styles will work.

  The majority of tags and styles present in your file will actually be helpful in your efforts to convert the file to clean, Kindle-ready HTML. You can use unneeded styles like margins to help you give headings the right spacing, or to find places where your file has a blank line between paragraphs to show a scene change. The difficulty is that there are most likely also margins in your file that are really not needed. Discerning what to use and what to remove will require some investigation.

  When I am cleaning up a file I usually start with the easy pickings, like the regular paragraphs. In most books the paragraphs just need to be formatted as a

tag, but you will probably see something more like one of these examples in your HTML:

 



 



 



 



  Notice the variety in formatting. All of that is due to the settings used by the authors when they were formatting their books in Word. In most books, changing these to

tags will make the book code much more manageable. Be careful to ensure that the tags you replace are actually the regular paragraphs, not a specially styled paragraph, a poem, or something else. You will want to handle those individually.

  Next, it is usually best to attack the chapter headings, and any subheadings your book may have. Just as with paragraphs, you may find a variety of styles applied to headings. The main difference is that they will probably not be as consistent as the paragraphs to replace. You may find that searching in the HTML file for “Chapter” is the easiest way to find them all. You may also notice a pattern in the font size formatting for the various headings, such as all top-level (chapter) headings being formatted in “font-size:20.0pt;” and all the second-level subheadings being formatted in “font-size:16.0pt;”. The key, as in all of the cleanup process, is to look for patterns and put them to good use.

  In that vein, let’s work out a RegEx that might come in handy with your headings. Say you have a chapter heading like this one:

 

Chapter 1



  but when you look at Chapter 2 you see that it is slightly different, with a top margin of .50in. To catch both of these in one fell swoop, you will want to create a RegEx that ignores the top margin and bases its search on something else that you know is standardized, like the font-size. Here is an example of what that could look like:

  Find: ]*>]+font-size:20.0pt[^>]*>(Chapter [^<]+)



  Replace:

1



  Of course, there are other RegExes you can use in a situation like this, but that should give you the general idea.

  The next step I usually take is to get rid of all the span tags, since they are the worst bloat-creators in program-generated HTML. You will want to search for “
  When you have finished those three pieces of your process, you have probably handled the majority of the basic cleanup your file needs. Now it is time to learn about the formatting that the Kindle supports and how to make your book look great on the device.

  Chapter 5

  Formatting Your Book

  While the Kindle format is essentially HTML, the device only supports a small portion of the tags and styles that are supported in most Web browsers and other HTML viewers. That actually works out well for you as an author or publisher, since it removes some complexity from the formatting process.

  In this chapter I will cover the HTML tags and styles that work in the Kindle. I have also included a list of supported tags and styles in Appendix A, and there is a printable copy of the same information in the Book Tools section of my website.

  Font Formatting

  To start out, let’s take a look at some of the basic text formatting tools you have at your disposal.

  Bold and Italics

  To make text bold in your book, you will need to apply the tag, and to italicize text in your book, you will need to apply the tag. For example:

  I entered, and f
ound Captain Nemo deep in algebraical calculations of x and other quantities.

  You can also apply bold to any tag in your style sheet using the font-weight: bold; property, and italics using the text-style: italic; property.

  The tag and tag are often thought of as replacements for the and tags. These tags are intended for use in specific situations when the text being marked up requires emphasis or strong emphasis. Like most browsers, the Kindle will format as italics and as bold.

  Underline

  To underline text in the Kindle, use the tag.

  Henry, O. The Four Million. New York: McClure, Phillips & Co., 1906.

  You can also apply an underline style to any tag in your style sheet using the text-decoration: underline; property.

  Big and Small

  There are times when making some text bigger or smaller than the default size is necessary. While the Kindle does allow a small amount of tweaking with the CSS font-size property, the easiest and most consistent way to adjust font sizes in your text is by using the and tags. These tags can also be nested to enhance the effect.

  Three examples of the use of and come to mind. The first using the tag to create a drop cap of sorts. Since the Kindle does not allow floating elements, the large letter will not actually “drop,” but the overall effect is similar. For example:

 

There were two or three things...



  The second example is using the tag on a copyright page. I do this by default in most of my books because it more closely matches most hardcopies.

 

The Four Million, copyright © 1906 by O. Henry.



  The third example is using the tag to create the impression of small caps. The default font of the Kindle does not, unfortunately, allow the use of small caps, but to give the same effect just put tags around the small caps text, like this:

  WILLIAM SYDNEY PORTER

  Superscript and Subscript

 

‹ Prev