GitHub - hongy20/export-pdf-to-html

This project explores various tools for converting pdf to html/css. The evaluation is based on the following perspectives:

How to run

# Clean up the workspace
just clean

# Install
brew install poppler
yarn

# Run the experiments with node-poppler
just run-node-poppler

This library depends on external binaries;
It can export pdf to html
- With option complexOutput, it can preserve the original layout with css 👍;
- Fonts are not extracted, so the layout will look different than the original pdf 👎;
- It can extract links. But it will render the links with some default styling and the page will look different than the original pdf 👍👎;
- It provides an option to "use data URLs instead of external images in HTML", but enabling this option will end up crashing 👎;
It can export pdf to svg
- The output looks pixel perfect 👍;
- Each font symbol is presented with glyph tag (but it tells nothing about the actual character) 👎;
- The texts are not accessible to screen readers 👎;
- No link integration 👎;
- Footprint is huge (246KB vs 1.8MB) 👎;

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
pdf		pdf
src		src
.gitignore		.gitignore
.nvmrc		.nvmrc
.yarnrc.yml		.yarnrc.yml
README.md		README.md
justfile		justfile
package.json		package.json