Any possibility to do something like http://ix.io/4gtf? #11

zenny · 2022-11-21T17:20:21Z

Hi,

I am referred to this package from emacs IRC. What I am trying to achieve is described in http://ix.io/4gtf with the following details:

Note Capturing from Research papers

Objective

Capture an annotation from a pdf and automatically create/ask to choose a note file. Prior to creating/asking a location of a note, the capture extracts the bibtex entries from online resources and create a bibfile and populate the note with bibliography tag in the note with references including page number in original document in parentheses.

Workflow

User selects a section/annotation in a pdf document
The capture mode goes online and extracts metadata
The metatdata including page number of the annotation gets inserted into a either newly-created or existing note/file and bibliography of the same name created (or existing) as note and inserts
The annotation gets copied to a either a new/existing file/note

Example

I read say /Emacs as a Tool of Modern Science/ by Timothy Johnson from https://technology.matthey.com/article/66/2/122-129/ .
Say I annotated /Findable, Accessible. Interoperable, Reusable (FAIR)/ from the 1st column, 1st Para, line 10-11 on page 122 (/actually page 1/)
Once I select the selection, I will be asked to create anew/use existing orgfile for notetaking and subsequently creates a bibfile with the same name.
Once the file name is chosen, it will authomatically searches for bibtex entries either online/offline based on the metadata of the pdf or chosen local bibfile (if any).
Thereafter, the annotation and the reference (for example /[[Johnson Timothy, 2022] [pp. 122, col. 1, ln 10-11]]/ be inserted to the org note file crated above in (3) and also appends/replaces to the bibilography file crated.
Where /[Johnson Timothy, 2022]/ will be exported but /[pp. 122, col. 1, ln 10-11]/ remains as a clickable reference for the researcher for future references.

Help
Any help appreciated! Thanks!

yantar92 · 2022-11-22T01:41:42Z

zenny ***@***.***> writes:

1. I read say /Emacs as a Tool of Modern Science/ by Timothy Johnson from https://technology.matthey.com/article/66/2/122-129/ . 2. Say I annotated /Findable, Accessible. Interoperable, Reusable (FAIR)/ from the 1st column, 1st Para, line 10-11 on page 122 (/actually page 1/) 3. Once I select the selection, I will be asked to create anew/use existing orgfile for notetaking and subsequently creates a bibfile with the same name.

This is doable. https://github.com/weirdNox/org-noter does something similar in terms of extracting the location in pdf and creating a new Org note. org-noter does not do anything Bibtex-wise though.

4. Once the file name is chosen, it will authomatically searches for bibtex entries either online/offline based on the metadata of the pdf or chosen local bibfile (if any).

Not many pdfs contain useful metadata. For example, I just downloaded the paper you referenced, and it has the following: title:Johnson_Apr22 author: subject: keywords-raw: keywords: creator:Adobe InDesign 17.0 (Windows) producer:Adobe PDF Library 16.0.3 format:PDF-1.4 created:Thu Jan 20 00:07:39 2022 modified:Fri Jan 28 21:51:51 2022 Nothing useful if you want to search BibTeX entry online. The only somewhat useful approach to get DOI data from PDFs is what org-ref does in `org-ref-extract-doi-from-pdf'. It simply converts the PDF to text and matches the text against `org-ref-pdf-doi-regex'. Which kind of works. Sometimes. For some research journals. Sometimes it also fails or catches unrelated DOIs from references section or from extra page the journal puts into the PDF for advertisement. Actual web-pages usually contain a lot more reliable metadata. So, I usually start from the paper webpage, scrape the metadata using org-capture-ref into an Org heading, and attach the PDF to the heading. Then, paper notes are simply in the heading where the paper PDF is attached.

5. Thereafter, the annotation and the reference (for example /[[Johnson Timothy, 2022] [pp. 122, col. 1, ln 10-11]]/ be inserted to the org note file crated above in (3) and also appends/replaces to the bibilography file crated.

This should be doable. What you need is: (1) extract PDF page info somehow (it depends on where you view the PDF); (2) find the associated heading/BibTeX entry for the paper and extract the @key; (3) Insert a citation (Org does support citations, including page references now; or you can also use org-ref). See https://orgmode.org/manual/Citation-handling.html or https://github.com/jkitchin/org-ref

6. Where /[Johnson Timothy, 2022]/ will be exported but /[pp. 122, col. 1, ln 10-11]/ remains as a clickable reference for the researcher for future references.

Citations in Org and org-ref are exported as expected and are clickable.

…

-- Ihor Radchenko // yantar92, Org mode contributor, Learn more about Org mode at <https://orgmode.org/>. Support Org development at <https://liberapay.com/org-mode>, or support my work at <https://liberapay.com/yantar92>

zenny · 2022-11-28T06:30:40Z

@yantar92

Thanks for your useful inputs.

zenny @.***> writes:

I read say /Emacs as a Tool of Modern Science/ by Timothy Johnson from https://technology.matthey.com/article/66/2/122-129/ . 2. Say I annotated /Findable, Accessible. Interoperable, Reusable (FAIR)/ from the 1st column, 1st Para, line 10-11 on page 122 (/actually page 1/) 3. Once I select the selection, I will be asked to create anew/use existing orgfile for notetaking and subsequently creates a bibfile with the same name.
This is doable. https://github.com/weirdNox/org-noter does something similar in terms of extracting the location in pdf and creating a new Org note. org-noter does not do anything Bibtex-wise though.

That is the reason I am here because org-noter does not cover bibtex stuffs. Your repo appears to bring everything under a single umbrella that is what I liked.

Once the file name is chosen, it will authomatically searches for bibtex entries either online/offline based on the metadata of the pdf or chosen local bibfile (if any).
Not many pdfs contain useful metadata.

I do agree with you. I had a hope that if one can capture the doi from the webpage using this repo?

For example, I just downloaded the paper you referenced, and it has the following: title:Johnson_Apr22 author: subject: keywords-raw: keywords: creator:Adobe InDesign 17.0 (Windows) producer:Adobe PDF Library 16.0.3 format:PDF-1.4 created:Thu Jan 20 00:07:39 2022 modified:Fri Jan 28 21:51:51 2022 Nothing useful if you want to search BibTeX entry online. The only somewhat useful approach to get DOI data from PDFs is what org-ref does in org-ref-extract-doi-from-pdf'. It simply converts the PDF to text and matches the text against org-ref-pdf-doi-regex'. Which kind of works. Sometimes. For some research journals. Sometimes it also fails or catches unrelated DOIs from references section or from extra page the journal puts into the PDF for advertisement. Actual web-pages usually contain a lot more reliable metadata. So, I usually start from the paper webpage, scrape the metadata using org-capture-ref into an Org heading, and attach the PDF to the heading. Then, paper notes are simply in the heading where the paper PDF is attached.

Thanks for the pointer.

Thereafter, the annotation and the reference (for example /[[Johnson Timothy, 2022] [pp. 122, col. 1, ln 10-11]]/ be inserted to the org note file crated above in (3) and also appends/replaces to the bibilography file crated.
This should be doable. What you need is: (1) extract PDF page info somehow (it depends on where you view the PDF); (2) find the associated heading/BibTeX entry for the paper and extract the @key; (3) Insert a citation (Org does support citations, including page references now; or you can also use org-ref). See https://orgmode.org/manual/Citation-handling.html or https://github.com/jkitchin/org-ref

I tried with org-ref, but you need to have a bibfile already created. What I am trying to achieve is to create a bibfile from the doi, and also grasp both annotation/bookmarks to either an existing or new note, and append the bibfile information at the bottom (org-ref does with bibiliography: tag as you know of.

Where /[Johnson Timothy, 2022]/ will be exported but /[pp. 122, col. 1, ln 10-11]/ remains as a clickable reference for the researcher for future references.
Citations in Org and org-ref are exported as expected and are clickable.
…
-- Ihor Radchenko // yantar92, Org mode contributor, Learn more about Org mode at https://orgmode.org/. Support Org development at https://liberapay.com/org-mode, or support my work at https://liberapay.com/yantar92

yantar92 · 2022-12-11T10:50:14Z

zenny ***@***.***> writes:

That is the reason I am here because org-noter does not cover bibtex stuffs. Your repo appears to bring everything under a single umbrella that is what I liked.

Not everything. The main purpose of this package is extracting metadata from web-pages and Emacs buffers. And then using it for org-capture.

I do agree with you. I had a hope that if one can capture the doi from the webpage using this repo?

Yes. See `org-capture-ref-capture-doi`. This can also be done from browser, but I only implemented qutebrowser support. I do not use major browsers, though a simple bookmarklet may do.

I tried with org-ref, but you need to have a bibfile already created. What I am trying to achieve is to create a bibfile from the doi, and also grasp both annotation/bookmarks to either an existing or new note, and append the bibfile information at the bottom (org-ref does with `bibiliography:` tag as you know of.

You may be able to create bibfile from doi. It is available through (org-capture-ref-get-bibtex-field :bibtex-string) inside capture template. Or you can use `org-capture-ref-get-bibtex-from-url`. I also have `org-capture-ref-capture-at-point' that is able to extract bibtex metadata at point in Emacs buffer. However, as I said, I am not sure how I can reliably get DOI from a pdf at point. If I knew, I could implement it.

…

-- Ihor Radchenko // yantar92, Org mode contributor, Learn more about Org mode at <https://orgmode.org/>. Support Org development at <https://liberapay.com/org-mode>, or support my work at <https://liberapay.com/yantar92>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Any possibility to do something like http://ix.io/4gtf? #11

Any possibility to do something like http://ix.io/4gtf? #11

zenny commented Nov 21, 2022

yantar92 commented Nov 22, 2022 via email

zenny commented Nov 28, 2022

yantar92 commented Dec 11, 2022 via email

Any possibility to do something like http://ix.io/4gtf? #11

Any possibility to do something like http://ix.io/4gtf? #11

Comments

zenny commented Nov 21, 2022

yantar92 commented Nov 22, 2022 via email

zenny commented Nov 28, 2022

yantar92 commented Dec 11, 2022 via email