Structure of WebToEpub code

Overview of Files

popup.html and js/main.js provide WebToEpub’s core UI.
- js/ChapterUrls.js, js/CoverImageUI.js, js/DefaultParserUI.js, js/ProgressBar.js and js/UserPreferences.js provide the rest of the UI functionality.
js/ParserFactory.js selects the Parser (derived from js/Parser.js) to use to process each web page.
- js/ImageColletor.js (and js/Imgur.js) are used by the Parsers to handle processing images from the web page.
- js/HttpClient.js is used to fetch web pages (and images, JSON or anything else) from the internet.
js/EpubPacker.js assembles the EPUB file
- js/EpubItem.js and js/EpubItemSupplier.js are a “bridge” to convert the HTML collected by Parsers into items to put into an EPUB.
js/Download.js handles saving the EPUB file to the hard drive. this file is called “Download” because it uses the Download API to do the save. (Yup, it’s a hack.)

File	Description
popup.html	HTML that provides the UI for WebToEpub
js/main.js	Main logic behind popup.html's UI
js/ChapterUrlsUi.js	Logic for the "List of Chapters" on the UI
js/CoverImageUI.js	Logic for the "Select Cover Image" list for Baka-Tsuki on the UI
js/DefaultParserUI.js	Logic for the "Default Parser" on the UI
js/Download.js	Wraps Chrome's Download API. (Handles saving EPUB to hard drive.)
js/EpubItem.js	An item to pack into an EPUB file. (e.g. an XHTML or image file)
js/EpubItemSupplier.js	Converts "files" from internet into EpubItems to pack into an EPUB
js/EpubMetaInfo.js	Container holding metadata for an EPUB.
js/EpubPacker.js	Assembles EPUB using metadata and items from supplier
js/ErrorLog.js	Records errors/warnings and displays to user and/or saves to file
js/Firefox.js	Code that is only needed by Firefox version of WebToEpub
js/HttpClient.js	Wraps making HTTP calls to internet. Retry, decode response, etc.
js/ImageColletor.js	Fetch images from internet, remove duplicates, rewrite image tags for EPUB, etc.
js/Imgur.js	Logic for fetching/processing images and galleries from Imgur
js/Parser.js	Base class for reading a site's HTML and converting into EpubItems
js/ParserFactory.js	Logic to figure out which parser to use for a web page
js/ProgressBar.js	Code to manipulate the Progress Bar on the UI
js/Sanitize.js	Code to cleanup converting HTML to XHMTL
js/UserPreferences.js	UI logic for user to set Options
js/Util.js	Library of miscellaneous functions

Figure out parser to use for web page(s)
Get URLs of web pages that need to be fetched from internet
For each web page
1. Fetch from internet
2. Find content to put in EPUB
3. Find and fetch any images needed on page
Convert web pages into items for EPUB
1. Find content to put in EPUB
2. Remove junk (e.g. Scripts) from content
3. Fixup hyperlinks (e.g. footnotes), remove next/previous chapter links (where possible)
4. Rewrite image tags for EPUB
5. Convert from HTML to XHTML
Assemble the EPUB
1. Generate Manifest
2. Generate Table of Contents
3. Pack items
Save EPUB to hard drive

Note. due to need to fix up hyperlinks that may cross chapters, can’t convert web pages until all pages have been collected.

ToDo - include special case of EPUB that has multiple sites with different formats requiring different parsers.

Problem	See Parser
Site does not use UTF8 encoding or inform of coding used	69shu
Each chapter spans multiple HTML pages	YushuboParser.js
Chapters "links" in Table of Contents (ToC) are not hyperlinks	ArchiveOfOurOwn
Walk multiple ToC pages to get all Chapters	BabelChain, Scribblehub
Make multiple REST calls to get all ToC "pages" for all Chapters	Novelsect
ToC requires REST call(s) to list all chapters	GravityTales, Lnmtl
ToC across multiple HTML pages	AsianHobbyist, Novelfull, Shinsori, ZenithNovels
Assemble chapter content from JSON	Novelsect