Book Read Free

Kindle Formatting: The Complete Guide to Formatting Books for the Amazon Kindle

Page 2

by Joshua Tallent


  QuarkXPress

  QuarkXPress is another book layout program that is used by some publishers. The latest version of the software has an HTML export feature, but, as with InDesign, your mileage may vary. To use the export feature, open the QuarkXPress file and duplicate the layout through Layout, Duplicate. While duplicating, set the Medium Type as Web in the Duplicate Layout dialog. This web layout can then be exported as HTML from File, Export. The program will probably save each page of your book as individual HTML files, so you will need to find a way to combine them together. (I have a sample Perl script in the book downloads section of my website.) Also, be sure to look at the exported HTML file and ensure that you did not lose any major formatting in the process.

  PDF

  Adobe’s PDF format is the most common file type that authors and publishers have available to them. There are a variety of options available to authors for getting a PDF file into useable HTML. The HTML resulting from these conversion processes will vary greatly. Some will provide you with really sparse code that does not contain all of the original formatting. Others will provide you with all of your formatting, but code that is bloated and messy. I suggest you try all of these options and look at the various outputs you get before deciding which one to go with.

  Adobe Online Conversion. Adobe has a free online conversion tool intended for the vision-impaired that will convert a PDF file into HTML. You can use this free tool by emailing the file to pdf2html@adobe.com. Adobe also offers a PDF to text conversion using the e-mail address pdf2txt@adobe.com. Both of these services will respond to your e-mail with an attachment of the files you sent converted into HTML.

  Third-party conversion tools. Other companies have produced PDF-to-HTML conversion software programs. A simple search on any Web search engine will provide you with a list of options. Some programs are as inexpensive as $50.00, and other programs may be free. While all of these programs will export HTML from your PDF, the quality of the HTML and how much work will be required to clean it up will vary based on the quality of the program. Most of the programs available will provide you with the option of downloading a trial version which you can use for a limited amount of time. These trial versions are usually fully functional and you should be able to convert your book without any problems. If you find that the conversion looks great and is usable, you should consider purchasing the program.

  Mobipocket Creator. Another option for getting a PDF file into HTML is to use Mobipocket Creator. Since the Kindle uses the Mobipocket eBook format, the HTML that results from a Mobipocket Creator conversion is supposed to be at least a little bit closer to what you want in the final file. This is sometimes the case, and I must say that Mobipocket does a great job of reducing the amount of extraneous code and formatting you see in most other conversion processes. The main drawback is that it occasionally loses or discards some of the formatting you may want it to keep. Also, it is currently only available for Windows computers, so Mac and Linux users will have to use a Windows emulator or dual boot to Windows. That being said, I heartily endorse Mobipocket Creator for the majority of PDF to HTML conversion jobs.

  Follow these steps to import your file to Mobipocket Creator and find the resulting HTML:

  Download Mobipocket Creator from:

  http://www.mobipocket.com/en/downloadSoft/ProductDetailsCreator.asp

  Install the software on your Windows computer.

  Open Creator. You will by default see the “Home” page.

  Drag and drop your PDF file onto the Creator window. Alternatively, you can click on “Import from Existing File, Adobe PDF”, then click the Browse button and find your PDF file that way.

  Click Import.

  Open your “My Publications” directory, which is usually placed in your “My Documents” directory by default. In the “My Publications” directory you will find another directory with the same name as the PDF file you uploaded (“My Publications/MyBook”). Open that directory.

  Inside you will see an HTML file, any images included in the book, the original PDF file, and two or three other files that are not relevant to the current discussion (XML, OPF, PRC). You can leave the HTML and image files there or move them somewhere else for formatting.

  Convert to Word. I have found that converting a PDF into HTML is much easier when you first convert it into the Microsoft Word format. This is even true of PDF files converted using Adobe Acrobat. While Microsoft Word does not do a perfect job of converting its documents into HTML, the number of HTML tags that are created by Word is usually much lower than the number of tags created directly by Acrobat (see an example of this in Chapter 4). With that in mind there are also tools available online that will convert a PDF file into a Microsoft Word document. Again, the quality of the resulting file will vary based on the quality of the tool.

  Adobe Acrobat Professional. If you have a copy of Adobe Acrobat, or if you have downloaded the trial version, I suggest that you export your PDF as a Word document first and follow the instructions below to convert that Word document into HTML. That is the process that I use for most of the PDF books that come to me in my Kindle formatting business. To export the file, simply go to the File menu in Acrobat, select the Export option, and choose the Word document option from the dropdown menu. You can also use the Save As feature with the same results.

  If your book is very large or has a large number of images, you may find that Acrobat will stop responding during the conversion. The best way to remedy this problem is to split your file into smaller pieces and convert each piece separately.

  Go to the Document menu.

  Select Extract Pages.

  In the dialog box, choose the pages you want to extract.

  Those pages will be pulled into a new PDF file that you will need to save and convert as explained above.

  Word and RTF

  Whether you have written your book in Microsoft Word or have converted it from another format, the process of creating HTML from the Word document is fairly simple. This conversion process only works natively on Word 2002/XP and later, but there is a plug-in available for Word 2000. Go to the File menu (in Word 2007, click the Office Button) and select the Save As option. In the Save As dialog box, click on the dropdown next to “Save as Type.” From that list, choose “Web Page, Filtered.” You will be given a couple of warning messages, but they are usually not a problem.

  If your file is large or has lots of images, Word may lock up during the save and become unresponsive. If this happens, you should split the file into pieces and save each one as HTML individually. The easiest way to do this is to cut and paste sections of the book into new Word documents.

  If you have an RTF or WordPerfect document you can open it in Word and follow these same steps to get the file into HTML. WordPerfect also has a “Publish To” feature that allows you to save the file as HTML, and you may find that that feature works best for your book.

  Of course, you may want to do some cleanup on the Word file before you save it as HTML. If that is the case, follow the instructions in Chapter 2 first.

  Note that you can also import the Word document into Mobipocket Creator and let it convert the file into HTML. The resulting code will be much less bloated than what you get in the Save as HTML function, but it may also be missing some formatting you needed. If you use this process, be sure to look carefully through your file to ensure your formatting is still in place.

  Follow these steps to import your file and find the resulting HTML:

  Download Mobipocket Creator from:

  http://www.mobipocket.com/en/downloadSoft/ProductDetailsCreator.asp

  Install the software on your Windows computer.

  Open Creator. You will by default see the “Home” page.

  Drag and drop your Word file onto the Creator Window. Alternatively, you can click on “Import from Existing File, MS Word document”, then click the Browse button and find your Word file that way.

  Click Import.

  Open your �
��My Publications” directory, which can usually be found in your “My Documents” directory by default. In the “My Publications” directory you will find another directory with the same name as the Word file you uploaded (“My Publications/MyBook”). Open that directory.

  Inside you will see an HTML file, any images included in the book, the original Word file, and two or three other files that are not relevant to the current discussion (XML, OPF, PRC). You can leave the HTML and image files there or move them somewhere else for formatting.

  Text Documents

  If your file is in a text-only format (i.e., with no formatting), it is not too difficult to prepare it for publication in the Kindle. You can add the HTML mark-up yourself (see Chapter 5) or paste the text into a Word document and following the formatting procedures listed in Chapter 2.

  HTML and XML

  If your document is already in HTML or XML, especially if the code is relatively clean, you are already a long way toward the goal of getting your book into the Kindle format. You can move on to Chapter 3 and start your process there.

  No Digital File

  There are times when an author or publisher only has a physical copy of the book they want to publish on the Kindle. This is most common with out-of-print books, but it can also happen when the rights to the book revert back to the author and the publisher, for whatever reason, does not have a copy of the book in a PDF or other digital format. The easiest way to get the book back into a digital format is to scan it and run it through an Optical Character Recognition (OCR) software program.

  There are a variety of options available to the do-it-yourself person or to the pay-someone-else person. The main benefit to doing the process yourself is saving money, but you may find that having some help in the process is easier and faster.

  The first step in the OCR process is to have your book scanned. This is a process where each page of your book is turned into an image that can be loaded into the OCR program. There are a variety of places that will do scanning for you, or you can tackle the process yourself. Some copy and print stores (like FedEx/Kinko’s) offer scanning services, but you will often find the best prices at companies that specialize in scanning documents onto microfiche. Some of these companies even have machines that can automate the scanning process by automatically turning the pages of the book.

  Be aware that the easiest way to scan a book on regular consumer scanners is to cut off the binding, which will effectively ruin the book. If your book is rare and you want to keep it intact, you should make sure the scanning company knows to handle it gently and to not cut off the binding. There is one consumer scanner called the OpticBook 3600 that is specifically designed for book scanning. That device is built in a way that allows a good scan of the pages without cutting the binding off or breaking the binding by forcing the book into unnatural positions on a flat surface.

  If you decide to scan the book yourself, you will need a flatbed or feed scanner. These devices are available at most electronics and computer stores and at various retailers online. They can be inexpensive or very expensive, depending on the options included and the quality of the scanner, and you may find that the available options are overwhelming. In general, any low-end scanner will do the job, but you may want to ensure that it comes with a built-in OCR program (more on that in a moment). Flatbed scanners will require you to position each page, while feed scanners make the process a bit faster by pulling the pages in one at a time like a copier. Realize, though, that if you are only going to scan one book you will spend almost as much money on the scanner as you will sending the book to a professional scanning company.

  The next step in the OCR process is running the page images through an OCR software program. If you are not interested in handling the OCR process yourself, there are many companies out there that can do the OCR work for you. In addition to searching for these companies online, you should ask the company that scans your book if they can suggest someone to do the OCR process. They may even offer those services in-house.

  If you are scanning the book yourself, your scanner may be installed with an option to OCR the text of the scanned pages and save them in Microsoft Word or another format. Many times the software used by these scanners is a “lite” version of ABBYY FineReader, which is, in my opinion, the best OCR software on the market. The scanned text will undoubtedly have some errors, but you may find that scanning at a higher DPI or adding more contrast to the images affects the OCR results significantly. Just remember to keep your Word files named in a consistent order so that you are easily able to add them together and edit them later.

  If you are converting a large number of books using an OCR process, you should consider investing in an OCR software program. I have used a variety of OCR programs over the years, and I cannot suggest any program except ABBYY FineReader for large-scale processes. ABBYY has a built-in document viewer, which allows you to easily make changes to the OCR output and fix the errors that ABBYY is not sure about. It also exports the output to a variety of formats, including HTML and Word.

  Chapter 2

  Formatting your book in Microsoft Word

  Most authors are not familiar with HTML code and are not in a position to learn it just for the purpose of preparing their book for the Kindle. The fact is, you can easily format a simple Kindle book in Microsoft Word without the need to work with the HTML very much unless you really want to. The key to this formatting process is mastering the use of Word’s built-in Styles and understanding how certain formatting will look on the Kindle itself.

  The instructions and comments below are based on the assumption that you have your book text in a Word document. If you have converted it into HTML, you can skip to Chapter 3 to learn how to work with the HTML code. However, if you like, you can also open your HTML file in Word and save it as a .doc file, then follow the instructions below.

  Word’s Styles and Formatting Options

  Microsoft Word has a Styles feature that allows you to easily format a document in a very consistent way. When you apply styles to the headings, paragraphs, and other items in your book, you can then make changes to those items all in one place. The changes made will automatically be applied to every item formatted in that style, cutting down drastically on the amount of work needed to make sweeping changes to your book.

  Another benefit to using Styles is that the foundational code ends up being much cleaner. If you decide to do some manual HTML cleanup before publishing, the styles you used in Word will be easy to change in the HTML file. Also, if you decide to upload the Word doc itself to Amazon’s Digital Text Platform (DTP), the chances of seeing major formatting issues after the Kindle conversion decrease dramatically.

  Understanding Styles

  When you first create a new Word document it is assigned a small set of default styles. However, if you are working with a book that was saved from another format or that was styled in Word without using the built-in Styles feature, the list of styles can be fairly long. Essentially, every paragraph that has a unique format, every heading with a slight variation, and every phrase that has its own special formatting will have its own style listed. Your goal in this process is to pare down your list to the most essential styles so that you have fewer variables to deal with.

  To get started, you will need to open the Styles and Formatting sidebar in Word. In older editions of Word, select the Format menu at the top of the window and choose “Styles and Formatting...” from the dropdown list (Figure 2.1). You will now see the sidebar on the right side of the Word window, complete with a list of the styles that are being used or are available for document (Figure 2.2).

  In Word 2007, the interface is a little bit different. The Styles are shown in the Home tab (Figure 2.3), and you can click on the dropdown arrow to see the full list of available styles (Figure 2.4). To open the Styles sidebar, click on the pop-out arrow under the “Change Styles” button (Figure 2.5). The sidebar is a floating, always-on-top window that you can position anywhe
re on your screen (Figure 2.6).

  You should take a few minutes to familiarize yourself with the sidebar. Notice that if you click the dropdown arrow next to a style or right-click on the style name you will be given some options, including one to select all of the places in the document that use that style. This can be useful as you consolidate styles and make the formatting more consistent. You will also see an option to modify the style. When you select that option a dialog box will pop up with all of modification options you have available. You can change the font size and style, the paragraph formatting (if the style you are modifying can be applied to paragraphs), and even the name of the style.

  Figure 2.1

  Figure 2.2

  Figure 2.3

  Figure 2.4

  Figure 2.5

  Figure 2.6

  You will also see a dropdown menu at the bottom of the sidebar that gives you options to see the available formatting and the formatting that is actually in use. This is a useful feature that will help you weed out styles that are not yet addressed.

  Removing All Styles

  One of the options in this Styles and Formatting sidebar is called “Clear Formatting.” If your book was saved from PDF, InDesign, or Quark into Word you might want to remove all of the formatting and start with a clean slate. The easy way to do that is to select the text in the entire book (Ctrl + A) and click on “Clear Formatting” in the sidebar. The problem with this method is that you will not just lose your paragraph formatting; you will also lose any bold, italics, and underlines in the document.

 

‹ Prev