dteviot

Github hosted website for my projects

View My GitHub Profile

Structure of WebToEpub code

Overview of Files

Files

FileDescription
popup.htmlHTML that provides the UI for WebToEpub
js/main.jsMain logic behind popup.html's UI
js/ChapterUrlsUi.jsLogic for the "List of Chapters" on the UI
js/CoverImageUI.jsLogic for the "Select Cover Image" list for Baka-Tsuki on the UI
js/DefaultParserUI.jsLogic for the "Default Parser" on the UI
js/Download.jsWraps Chrome's Download API. (Handles saving EPUB to hard drive.)
js/EpubItem.jsAn item to pack into an EPUB file. (e.g. an XHTML or image file)
js/EpubItemSupplier.jsConverts "files" from internet into EpubItems to pack into an EPUB
js/EpubMetaInfo.jsContainer holding metadata for an EPUB.
js/EpubPacker.jsAssembles EPUB using metadata and items from supplier
js/ErrorLog.jsRecords errors/warnings and displays to user and/or saves to file
js/Firefox.jsCode that is only needed by Firefox version of WebToEpub
js/HttpClient.jsWraps making HTTP calls to internet. Retry, decode response, etc.
js/ImageColletor.jsFetch images from internet, remove duplicates, rewrite image tags for EPUB, etc.
js/Imgur.jsLogic for fetching/processing images and galleries from Imgur
js/Parser.jsBase class for reading a site's HTML and converting into EpubItems
js/ParserFactory.jsLogic to figure out which parser to use for a web page
js/ProgressBar.jsCode to manipulate the Progress Bar on the UI
js/Sanitize.jsCode to cleanup converting HTML to XHMTL
js/UserPreferences.jsUI logic for user to set Options
js/Util.jsLibrary of miscellaneous functions

Algorithms

Basic steps to create an EPUB

  1. Figure out parser to use for web page(s)
  2. Get URLs of web pages that need to be fetched from internet
  3. For each web page
    1. Fetch from internet
    2. Find content to put in EPUB
    3. Find and fetch any images needed on page
  4. Convert web pages into items for EPUB
    1. Find content to put in EPUB
    2. Remove junk (e.g. Scripts) from content
    3. Fixup hyperlinks (e.g. footnotes), remove next/previous chapter links (where possible)
    4. Rewrite image tags for EPUB
    5. Convert from HTML to XHTML
  5. Assemble the EPUB
    1. Generate Manifest
    2. Generate Table of Contents
    3. Pack items
  6. Save EPUB to hard drive

Note. due to need to fix up hyperlinks that may cross chapters, can’t convert web pages until all pages have been collected.

Choosing parser for web page

ToDo - include special case of EPUB that has multiple sites with different formats requiring different parsers.

Solutions to site issues that require special coding

ProblemSee Parser
Site does not use UTF8 encoding or inform of coding used69shu
Each chapter spans multiple HTML pagesYushuboParser.js
Chapters "links" in Table of Contents (ToC) are not hyperlinksArchiveOfOurOwn
Walk multiple ToC pages to get all ChaptersBabelChain, Scribblehub
Make multiple REST calls to get all ToC "pages" for all ChaptersNovelsect
ToC requires REST call(s) to list all chaptersGravityTales, Lnmtl
ToC across multiple HTML pagesAsianHobbyist, Novelfull, Shinsori, ZenithNovels
Assemble chapter content from JSONNovelsect