dteviot.github.io

Github hosted website for my projects

View My GitHub Profile

How to convert a new site using the Default Parser

Sometimes WebToEpub is unable to figure out which content on a web page should be packed into the EPUB. When this happens, WebToEpub asks you to tell it which element on the web page has the content to pack. You use the default parser page to tell WebToEpub how to find the content by telling WebToEpub the Cascading Style Sheet Selector (CSS Selector) for the element containing the wanted content. (If you’re not familiar with CSS Selectors, they’re a shorthand notation for specifying elements on a HTML page. Please see https://developer.mozilla.org/en-US/docs/Web/CSS/CSS_Selectors for an excellent description of CSS selectors.)

As seen in this screenshot, the The Default Parser has 5 text inputs and 3 buttons. In order, the inputs for the control are:

  1. Hostname This is the hostname portion of the web site URL. It's automatically filled in, so just ignore it.
  2. URL of first chapter URL to first chapter of the story. If you don't want to test the selector(s) you supply, you can leave this blank.
  3. CSS Selector for the content This is where you tell WebToEpub the element that holds the content
  4. CSS Selector for Title of Chapter Sometimes the title of each chapter isn't in the same element that holds the rest of a chapter's text. e.g. http://www.ironteethserial.com/dark-fantasy-story/story-interlude/prologue/. When this happens, you can use this to tell WebToEpub which element holds the chapter's title and WebToEpub will include this element to the front of the text content that it fetches. Obviously, this field can be left blank if this isn't an issue.
  5. CSS Selector for Elements to remove Sometimes the content contains things that are not wanted in the EPUB. e.g. Advertisements, Share links, etc. This input is used to say which elements are to be removed from the content before packing into the EPUB. Obviously, this field can be left blank if this isn't an issue.

The buttons are:

  1. Help Brings up this web page.
  2. Test Will test the provided CSS Selectors. If you want to test that the CSS Selectors you provide provided work, clicking this button will get WebToEpub to fetch the first chapter from the internet and run the CSS Selectors against it. The resulting chapter that would appear in the EPUB will be shown in the box below the test button.
  3. Finished Tell WebToEpub you've finished configuring the CSS Selectors.

Worked Example

Let’s assume you want to convert “The Iron Teeth” into an EPUB. Looking at the above page, you can see that the first chapter of this story is at http://www.ironteethserial.com/dark-fantasy-story/story-interlude/prologue/

  1. The first step is to copy the URL of the first chapter into the control labelled "URL of first chapter".
  2. The next step is to discover the HTML element that contains the content to put into each chapter of the EPUB. To do this
    1. Open a chapter of the story (e.g. http://www.ironteethserial.com/dark-fantasy-story/story-interlude/prologue/) in your web browser of choice.
    2. Open the browser's DOM Inspector. E.g. On Firefox use CTRL+Shift+C, on Chrome open "Developer Tools" and select the "Elements" tab. Or press the F12 key. On Chrome, this looks like
    3. Find the HTML element that encloses the entire text your want in the EPUB. The simplest way to do this is
      1. on the chapter page (NOT the DOM inspector) move the mouse to the first word of the chapter's text,
      2. Click the right mouse button, then select "Inspect" from the drop down menu that appears
      3. The DOM Inspector will then highlight the element holding this text.
      4. You can then look at the parent elements until you find the first element that holds all text you are interested in.
      If you follow the above, you will find that the element holding all the chapter text is <div class="post_content">
    4. Figure out the CSS Selector for the element. In this case it's a div element with a class, so the CSS Selector is div.post_content
    5. Put this CSS Selector into the relevant input.
  3. You can now test the CSS Selector to see if it works. To do this:
    1. Click the "Test" button
    2. Examine the text that appears in the scroll box below the buttons.
    3. If the output is not what is wanted/expected, either fix the CSS Selector (if wrong) or use the CSS Selector for a different element.
  4. You should now see that the chapter text has appeared, but it's missing the chapter title. If you wish to add the title,
    1. Go back to the browser's DOM inspector and find the CSS Selector for the element holding the chapter (in this case it's h2.post-title
    2. Copy the CSS Selector into the "Title of Chapter" input
    3. Run the test again
  5. If desired, a similar process can be used to remove any elements in the content that are unwanted.
  6. When satisfied with the test results, click the "Finished" button.
  7. You will now go to the usual "WebToEpub" page and you can continue as normal.