- provide cleaner abstracts
- parse URLs and hyperlink them
- parse latex and render it as an equation
- provide integration via Telegram so that users can send their link to an authenticated channel and GitHub actions will trigger uploads nightly or, better, in response to a trigger triggered by Telegram
- write abstract to entries as text
- add check for duplicate entries before adding to database (i.e. if arXiv ID is already in database)
- I would love to read arXiv papers as epubs on my Kindle
credentials_template.env is a copy of (template for) a file, credentials.env, which should be placed in the same (top-level) directory with your Notion Integration token for authenticating. This is more convenient that exporting it each time you use the tool.
Move the arxivbot onto the Semantic Scholar (S2) Academic Graph API.
Motivations:
- this is more general
- aggregates data from a wide array of sources - in addition to arXiv, we can directly collect from e.g. ACL, Nature etc. more easily
- easily queryable by extracting S2 paper SHA from URL (terminal)
Suggested usage: Exactly as arxivbot but with S2 URL.
- Ideally would accept arXiv or S2 URL
- can we extract the same metadata from both;
- maybe just a fallback to arXiv arxivbot if arXiv ID/URL passed?
- What does S2 give us (response) that we might want to add to the paper (meta)data?
- Get Open Access PDFs
- this is via Open Access
- Get details about a paper - Academic Graph API
- See the contents of Response Schema (200 OK) in that same section for a list of all available fields that can be returned and image below
- See API usage examples and Python examples
- relevant unofficial Python wrapper of S2 API: https://github.com/danielnsilva/semanticscholar
- retrieve multiple items at once with danielnsilva/semanticscholar - check that this uses the batch endpoints e.g. Get details for multiple papers at once
I filled out the S2 API key request form on 2024-12-21.
Example S2 paper page URL:
https://www.semanticscholar.org/paper/WavTokenizer%3A-an-Efficient-Acoustic-Discrete-Codec-Ji-Jiang/ebdbded60f48131ed7ba73807c3c086993a96f89
https://api.semanticscholar.org/graph/v1/paper/ebdbded60f48131ed7ba73807c3c086993a96f89?fields=url,year,authors,externalIds,abstract,venue,references,influentialCitationCount,fieldsOfStudy
Example based on: CLI_cURL_Papers_with_Key example.
https://api.semanticscholar.org/graph/v1/paper/6bc4b1376ec2812b6d752c4f6bc8d8fd0512db91?fields=url,year,authors,externalIds,abstract,venue,influentialCitationCount,fieldsOfStudy
{
"paperId": "6bc4b1376ec2812b6d752c4f6bc8d8fd0512db91",
"externalIds": {
"ArXiv": "1705.09406",
"DBLP": "journals/pami/BaltrusaitisAM19",
"MAG": "2951127645",
"DOI": "10.1109/TPAMI.2018.2798607",
"CorpusId": 10137425,
"PubMed": "29994351"
},
"url": "https://www.semanticscholar.org/paper/6bc4b1376ec2812b6d752c4f6bc8d8fd0512db91",
"abstract": "Our experience of the world is multimodal - we see objects, hear sounds, feel texture, smell odors, and taste flavors. Modality refers to the way in which something happens or is experienced and a research problem is characterized as multimodal when it includes multiple such modalities. In order for Artificial Intelligence to make progress in understanding the world around us, it needs to be able to interpret such multimodal signals together. Multimodal machine learning aims to build models that can process and relate information from multiple modalities. It is a vibrant multi-disciplinary field of increasing importance and with extraordinary potential. Instead of focusing on specific multimodal applications, this paper surveys the recent advances in multimodal machine learning itself and presents them in a common taxonomy. We go beyond the typical early and late fusion categorization and identify broader challenges that are faced by multimodal machine learning, namely: representation, translation, alignment, fusion, and co-learning. This new taxonomy will enable researchers to better understand the state of the field and identify directions for future research.",
"venue": "IEEE Transactions on Pattern Analysis and Machine Intelligence",
"year": 2017,
"influentialCitationCount": 135,
"fieldsOfStudy": [
"Computer Science",
"Medicine"
],
"authors": [
{
"authorId": "1756344",
"name": "T. Baltrušaitis"
},
{
"authorId": "118242121",
"name": "Chaitanya Ahuja"
},
{
"authorId": "49933077",
"name": "Louis-Philippe Morency"
}
]
}
- arxiv
- arxiv API
- https://www.notion.so/my-integrations
- https://developers.notion.com/reference/update-a-database
- what you actually want - to update or add a row of a database
- https://developers.notion.com/reference/patch-page
- https://developers.notion.com/reference/post-page
- info about property of page's parent db (must match) https://developers.notion.com/reference/property-object
similar projects:
.
├── LICENSE
├── README.md
├── arxivbot
│ ├── __init__.py
│ ├── constants.py
│ ├── credentials.env
│ ├── credentials_template.env
│ ├── find_arxiv_links.py
│ ├── ieee_api.py
│ ├── ieee_scrape.py
│ ├── migrate_notion_obsidian.py
│ ├── notion_importer.py
│ ├── obsidian_importer.py
│ └── utils.py
├── docs
├── notion-sdk-py-examples
│ ├── README.md
│ ├── assets
│ │ └── notion-api-client-docs-map.jpg
│ ├── authenication.py
│ ├── db_read.py
│ ├── db_write.py
│ ├── page_read.py
│ └── page_write.py
├── requirements.txt
└── tests
└── example_inputs
└── ieee
└── 9381661.html
8 directories, 22 files