Client-side PDF rendering #46

RichardLitt · 2015-01-26T21:27:25Z

I've been viewing the role of PDF.js wrongly. I had been assuming that it was accessing the natively rendered PDF in Chrome. It isn't - it's accessing the file itself, converting it from binary into JS in a worker, and accessing that, bypassing the rendered version in Chrome. This means that accessing selected text in the natively rendered Chrome environment is not possible through PDF.js - or, in fact, at all. Chrome uses PDFium, a project based on FoxitPDF, a third party system with an SDK that doesn't have an API, largely for security reasons.

This means that in order to get selected text, we have to render the PDF ourselves - load it into local storage, halt PDF rendering by Chrome, and re-render, and then use the PDF.js API to access it. This is significantly slower, but the only feasible way of accessing the PDF the way we want to.

Most platforms host the PDF on their own servers instead of locally rendering it (peerlibrary, for instance). I'm still not sure how Hypothes.is does it, but that will be worth checking out. For now, loading the PDF locally will give us the added benefit of being able to easily integrate the viewer components that comes with the PDF.js examples. Converting these into React may be a good move in itself, if it hasn't already been done. Once we're using a instance of pdfViewer, we can access and control highlighting easily.

Current goals:

Research how to get text selections from the pdf
Load the PDF into local storage
Render the PDF from the extension, possibly using the PDFjs examples
Integrate the sidebar with the PDF rendering (Css work)
Develop ability to clip content from the PDF
Save clips to local storage
Set up persistent storage
Set up user groups and a mailer
Enable mailing and sharing

RichardLitt · 2015-01-30T23:39:51Z

Loading the PDF into local storage may not be necessary. Hypothes.is doesn't do this - they just reroute from the pdf to a url for their PDFjs viewer with the PDF url escaped, like so. PDFjs is barely modified, and they load the entire example in their manifest.json, as well as removing the example pdf. Following this strategy would be good.

I ran into accessing issues for local files, which may be an artfect of using the node make server example viewer from pdf.js and not actually loading it into the extension. Next steps involve loading the PDFjs examples into the extension, à la Hypothes.is, and emulating. If possible, I should also modularize their extension code - shouldn't be too hard, but should ask first.

RichardLitt · 2015-01-30T23:41:08Z

In other news, I've got levelup installed, using level-browserify, but I'm having an issue where the db instance isn't being regularly persistent across tabs or across sessions. Spinning up an s3 instance now may be advisable. @jbenet, what do you think? Branch pdf-text has that work, rough as it is.

jbenet · 2015-02-02T09:59:55Z

Spinning up an s3 instance now may be advisable. @jbenet
https://github.com/jbenet, what do you think?

go for it, though careful getting distracted by s3. it can present
challenges (like slowness, security, etc).

On Fri, Jan 30, 2015 at 3:41 PM, Richard Littauer [email protected]
wrote:

In other news, I've got levelup installed, using level-browserify
https://github.com/Level/level-browserify, but I'm having an issue
where the db instance isn't being regularly persistent across tabs or
across sessions. Spinning up an s3 instance now may be advisable. @jbenet
https://github.com/jbenet, what do you think? Branch pdf-text has that
work, rough as it is.

—
Reply to this email directly or view it on GitHub
#46 (comment).

RichardLitt · 2015-02-11T22:11:02Z

Publish chrome extension

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Client-side PDF rendering #46

Client-side PDF rendering #46

RichardLitt commented Jan 26, 2015

RichardLitt commented Jan 30, 2015

RichardLitt commented Jan 30, 2015

jbenet commented Feb 2, 2015

RichardLitt commented Feb 11, 2015

Client-side PDF rendering #46

Client-side PDF rendering #46

Comments

RichardLitt commented Jan 26, 2015

RichardLitt commented Jan 30, 2015

RichardLitt commented Jan 30, 2015

jbenet commented Feb 2, 2015

RichardLitt commented Feb 11, 2015