arxivbot

Feature Requests

provide cleaner abstracts
- parse URLs and hyperlink them
- parse latex and render it as an equation
provide integration via Telegram so that users can send their link to an authenticated channel and GitHub actions will trigger uploads nightly or, better, in response to a trigger triggered by Telegram
write abstract to entries as text
add check for duplicate entries before adding to database (i.e. if arXiv ID is already in database)
I would love to read arXiv papers as epubs on my Kindle
- https://tex.stackexchange.com/questions/1551/use-latex-to-produce-epub
- https://www.reddit.com/r/MachineLearning/comments/5xtnl4/d_reading_arxiv_preprints_on_an_ereader/

Credentials

credentials_template.env is a copy of (template for) a file, credentials.env, which should be placed in the same (top-level) directory with your Notion Integration token for authenticating. This is more convenient that exporting it each time you use the tool.

Semantic Scholar Resources

Move the arxivbot onto the Semantic Scholar (S2) Academic Graph API.

Motivations:

this is more general
aggregates data from a wide array of sources - in addition to arXiv, we can directly collect from e.g. ACL, Nature etc. more easily
easily queryable by extracting S2 paper SHA from URL (terminal)

Suggested usage: Exactly as arxivbot but with S2 URL.

Ideally would accept arXiv or S2 URL
- can we extract the same metadata from both;
- maybe just a fallback to arXiv arxivbot if arXiv ID/URL passed?
What does S2 give us (response) that we might want to add to the paper (meta)data?

S2 API Tutorial

Get Open Access PDFs
- this is via Open Access
Get details about a paper - Academic Graph API
- See the contents of Response Schema (200 OK) in that same section for a list of all available fields that can be returned and image below
See API usage examples and Python examples
relevant unofficial Python wrapper of S2 API: https://github.com/danielnsilva/semanticscholar
- retrieve multiple items at once with danielnsilva/semanticscholar - check that this uses the batch endpoints e.g. Get details for multiple papers at once

I filled out the S2 API key request form on 2024-12-21.

Example S2 paper page URL:

https://www.semanticscholar.org/paper/WavTokenizer%3A-an-Efficient-Acoustic-Discrete-Codec-Ji-Jiang/ebdbded60f48131ed7ba73807c3c086993a96f89

Example S2 Academic Graph API query

https://api.semanticscholar.org/graph/v1/paper/ebdbded60f48131ed7ba73807c3c086993a96f89?fields=url,year,authors,externalIds,abstract,venue,references,influentialCitationCount,fieldsOfStudy

Example based on: CLI_cURL_Papers_with_Key example.

Example response

Another Example S2 Academic Graph API query

https://api.semanticscholar.org/graph/v1/paper/6bc4b1376ec2812b6d752c4f6bc8d8fd0512db91?fields=url,year,authors,externalIds,abstract,venue,influentialCitationCount,fieldsOfStudy

{
  "paperId": "6bc4b1376ec2812b6d752c4f6bc8d8fd0512db91",
  "externalIds": {
    "ArXiv": "1705.09406",
    "DBLP": "journals/pami/BaltrusaitisAM19",
    "MAG": "2951127645",
    "DOI": "10.1109/TPAMI.2018.2798607",
    "CorpusId": 10137425,
    "PubMed": "29994351"
  },
  "url": "https://www.semanticscholar.org/paper/6bc4b1376ec2812b6d752c4f6bc8d8fd0512db91",
  "abstract": "Our experience of the world is multimodal - we see objects, hear sounds, feel texture, smell odors, and taste flavors. Modality refers to the way in which something happens or is experienced and a research problem is characterized as multimodal when it includes multiple such modalities. In order for Artificial Intelligence to make progress in understanding the world around us, it needs to be able to interpret such multimodal signals together. Multimodal machine learning aims to build models that can process and relate information from multiple modalities. It is a vibrant multi-disciplinary field of increasing importance and with extraordinary potential. Instead of focusing on specific multimodal applications, this paper surveys the recent advances in multimodal machine learning itself and presents them in a common taxonomy. We go beyond the typical early and late fusion categorization and identify broader challenges that are faced by multimodal machine learning, namely: representation, translation, alignment, fusion, and co-learning. This new taxonomy will enable researchers to better understand the state of the field and identify directions for future research.",
  "venue": "IEEE Transactions on Pattern Analysis and Machine Intelligence",
  "year": 2017,
  "influentialCitationCount": 135,
  "fieldsOfStudy": [
    "Computer Science",
    "Medicine"
  ],
  "authors": [
    {
      "authorId": "1756344",
      "name": "T. Baltrušaitis"
    },
    {
      "authorId": "118242121",
      "name": "Chaitanya Ahuja"
    },
    {
      "authorId": "49933077",
      "name": "Louis-Philippe Morency"
    }
  ]
}

Resources

arxiv
arxiv API
https://www.notion.so/my-integrations
https://developers.notion.com/reference/update-a-database
what you actually want - to update or add a row of a database
- https://developers.notion.com/reference/patch-page
- https://developers.notion.com/reference/post-page
- info about property of page's parent db (must match) https://developers.notion.com/reference/property-object

similar projects:

https://github.com/wangjksjtu/arxiv2notionplus

Directory structure

.
├── LICENSE
├── README.md
├── arxivbot
│   ├── __init__.py
│   ├── constants.py
│   ├── credentials.env
│   ├── credentials_template.env
│   ├── find_arxiv_links.py
│   ├── ieee_api.py
│   ├── ieee_scrape.py
│   ├── migrate_notion_obsidian.py
│   ├── notion_importer.py
│   ├── obsidian_importer.py
│   └── utils.py
├── docs
├── notion-sdk-py-examples
│   ├── README.md
│   ├── assets
│   │   └── notion-api-client-docs-map.jpg
│   ├── authenication.py
│   ├── db_read.py
│   ├── db_write.py
│   ├── page_read.py
│   └── page_write.py
├── requirements.txt
└── tests
    └── example_inputs
        └── ieee
            └── 9381661.html

8 directories, 22 files

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
_acl_wip		_acl_wip
_notion-sdk-py-examples		_notion-sdk-py-examples
arxivbot		arxivbot
docs		docs
scripts		scripts
tests/example_inputs/ieee		tests/example_inputs/ieee
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
S2 Academic Graph - Details about a paper - Sample Response - 200 OK.png		S2 Academic Graph - Details about a paper - Sample Response - 200 OK.png
TODO.md		TODO.md
credentials_template.env		credentials_template.env
example_s2_academic_api_response.json		example_s2_academic_api_response.json
papers.db		papers.db
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py
template.env		template.env

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

arxivbot

Feature Requests

Credentials

Semantic Scholar Resources

Example S2 Academic Graph API query

Another Example S2 Academic Graph API query

Resources

Directory structure

About

Releases

Packages

Languages

License

anilkeshwani/arxivbot

Folders and files

Latest commit

History

Repository files navigation

arxivbot

Feature Requests

Credentials

Semantic Scholar Resources

Example S2 Academic Graph API query

Another Example S2 Academic Graph API query

Resources

Directory structure

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages