The pylitsense
package is an unofficial Python wrapper for the NCBI LitSense API. It allows you to query the LitSense API and get back a list of LitSenseResult
objects, which contain the text of relevant sentences and a score of their similarity to your query.
NCBI describes LitSense as follows:
LitSense is a unique search system for making sense of the biomedical literature at the sentence level, providing a unified access to over half a billion statements extracted from PubMed and PubMed Central.
Given a query, LitSense finds the best-matching sentences based on overlapping terms as well as semantic similarity via a cutting-edge neural embedding approach.
For example, given this query:
"Breast cancers with HER2 amplification"
You will receive an array of results, each with a score of how relevant it is to your query, like this:
[
LitSenseResult(
text='HER2 amplification/overexpression accounts for aggressive clinical features of HER2 positive breast cancer.',
score=0.9443494081497192,
annotations=[
'79|4|gene|2064',
'93|13|disease|MESH:D001943',
'0|4|gene|2064'
],
pmid=28800870,
pmcid=None,
section='abstract'
),
...
]
pip install pylitsense
from pylitsense.pylitsense import PyLitSense
# Initialize
pls = PyLitSense()
# Query
results = pls.query("your query here")
# Print results
for result in results:
print(result.text, result.score)
query_str
(str): The query string for the API.rerank
(bool, optional): Whether to rerank the results. Defaults to True. It seems that this refers to reranking using "semantic vectors" (embeddings).limit
(int, optional): Limit the number of results. Defaults to None (no limit).min_score
(float, optional): Minimum score threshold for results. Defaults to None (no minimum score).
The LitSense API itself has the following limitations:
- Maximum of 100 results per query
- Rate limit of one request per user per second
In addition, this package is:
- Only tested on Python 3.9, 3.10, and 3.11