-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alternative way to get genomic positions #144
Labels
enhancement
New feature or request
Comments
This works very well for me as well def query_biomart() -> pd.DataFrame:
"""
Extract gene annotations from Biomart.
Parameters
----------
index_key : str, optional
Index key for the DataFrame.
Returns
-------
pd.DataFrame
DataFrame with gene annotations from Biomart.
"""
annot = sc.queries.biomart_annotations(
"hsapiens",
[
"ensembl_gene_id",
"hgnc_symbol",
"start_position",
"end_position",
"chromosome_name",
],
use_cache=True,
).rename(
columns={
"ensembl_gene_id": "gene_ids",
"hgnc_symbol": "gene_symbol",
"start_position": "start",
"end_position": "end",
"chromosome_name": "chromosome",
}
)
return annot
def annotate_var(
adata: AnnData, annotation: pd.DataFrame, index_key: str = "gene_ids"
) -> None:
"""
Annotate the features with in an AnnData object.
Parameters
----------
adata : AnnData
Input AnnData object.
annotation : pd.DataFrame
Gene annotation DataFrame.
index_key : str, optional
Index key for the DataFrame.
"""
for col in ["start", "end", "chromosome", index_key]:
assert (
col in annotation.columns
), f"Annotation DataFrame must contain the column named `{col}`."
for col in annotation:
var_dict = annotation[col].to_dict()
adata.var[col] = [
var_dict[x] if x in var_dict else None for x in adata.var[index_key]
] |
very nice 🤩 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Description of feature
Currently the only way to get genomic positions is through reading a GTF file. This is (a) slow and (b) gtfparse repeatedly makes problems.
It could be more conveniente to retrieve this information from online sources such as biomart or Bioconductor AnnotationHub.
Then gtfparse could become an optional dependency.
The text was updated successfully, but these errors were encountered: