BioScraper

Uniprot Scraper
HMDB Scraper

These scripts are scrapers to scrape substance informations from biomedical databases.

Running: directly running this script with Python3 (dependences: pandas, requests, BeautifulSoup & lxml parser, urllib3)

Before running scraper, check whether your environment satisfied dependences.

Uniprot Scraper

Version 1.1: searching for protein subcellular loaction & secreted protein selection from Uniprot.

Update:

For some proteins, there will be not available Uniprot subcellular location annotation in database.

Gene Ontology (Cellular Component) terms are added into search targets to solving this issue. You will see two columns of subcellular location in final table, corresponding to Uniprot annotation and GO annotation.

Uniprot Scraper accepts 3 arguments:

1. Input file: Input directory of your ID file, which is a column list of protein Uniprot IDs, csv or xlsx format recommended.

e.g. D:\Users\work_dir\test.csv

2. Output file: Output directory of output file (csv format) containing Uniprot IDs and corresponding subcelluar location.

NO filename extension is fine and recommended, output filename will end with sub_loc.csv automatically.

e.g. D:\Users\work_dir\out_test ---> D:\Users\work_dir\out_test_sub_loc.tsv

3. Secreted protein selection:

Accept Uppercase Y or N, corresponding to select secreted protein as a seperated file xxx_secreted.csv, xxx is same as output name in 2

Example

You can check Uniprot ID csv or xlsx file in Example_data/Uniprot_scraper_example. Here is an example to running it:

When showing:

Please input your Uniprot ID list file directory (csv format recommended, e.g. D:\Users\work_dir\test.csv):

Enter directory of input file Metabolite_searching.xlsx on your computer. For example, if I put it under the file folder C:\User\Desktop\, I should enter:

C:\User\Desktop\Protein-20240315.csv

When showing:

Please input your output file directory (with output name you want, e.g. D:\Users\work_dir\out_test):

For example, if I want to put result under the file folder C:\User\Desktop\Result\ and name it as Result_HMDB, I should enter:

C:\User\Desktop\Result\20240315

When showing:

Do you want to select secreted proteins as an independent file? (Y/N):

For example, if I want to select all secreted protein or proteins located in extracellular space, enter Y. Finally, you will get the same output file as 20240315_sub_loc.csv and 20240315_secreted.csv in Example_data/Uniprot_scraper_example.

HMDB Scraper

Version 1.0: searching for description of metabolite in HMDB according to CAS ID.

HMDB Scraper accepts 3 arguments:

1. Input file: same as Uniprot Scraper

e.g. D:\Users\work_dir\test.csv

2. Output file: Output directory of output file (csv format) containing CAS IDs and corresponding description in HMDB.

NO filename extension recommended, output filename will end with .csv or xlsx automatically (same as your input).

e.g. D:\Users\work_dir\out_test ---> D:\Users\work_dir\out_test.csv

3. Sheet in your input file

You will need this paramter only when you are using xlsx file as input. It determines which sheet the program will deal with.

e.g. 1 = Sheet1, 2 = Sheet2. You can also enter the name of sheet. If you want all sheets searched, just press Enter.

Example

You can check xlsx file in Example_data/HMDB_scraper_example. Here is an example to running it:

When showing:

Please input your list file directory of metabolites (csv or xlsx format recommended, e.g. D:\Users\work_dir\test.csv):

Enter directory of input file Metabolite_searching.xlsx on your computer. For example, if I put it under the file folder C:\User\Desktop\, I should enter:

C:\User\Desktop\Metabolite_searching.xlsx

When showing:

Please input your output file directory (with output name you want, e.g. D:\Users\work_dir\out_test):

For example, if I want to put result under the file folder C:\User\Desktop\Result\ and name it as Result_HMDB, I should enter:

C:\User\Desktop\Result\Result_HMDB

When showing:

Which sheet do you want to search for?

For example, if I want to search all sheets in input file, just press Enter on your keyboard. Finally, you will get the same output file as Result_HMDB.xlsx in Example_data/HMDB_scraper_example. Have a try!

Thanks Yusong Zhang from Shandong University for asking me for develop these tools.

For more requirements and information worth, I will keep to update these scrapers.

If any issue, please contact with [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
Example_data		Example_data
HMDB_scraper.py		HMDB_scraper.py
README.md		README.md
Uniprot_scraper.py		Uniprot_scraper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BioScraper

Uniprot Scraper

Uniprot Scraper accepts 3 arguments:

1. Input file: Input directory of your ID file, which is a column list of protein Uniprot IDs, csv or xlsx format recommended.

2. Output file: Output directory of output file (csv format) containing Uniprot IDs and corresponding subcelluar location.

3. Secreted protein selection:

Example

HMDB Scraper

HMDB Scraper accepts 3 arguments:

1. Input file: same as Uniprot Scraper

2. Output file: Output directory of output file (csv format) containing CAS IDs and corresponding description in HMDB.

3. Sheet in your input file

Example

About

Releases

Packages

Languages

Rong-ao/BioScraper

Folders and files

Latest commit

History

Repository files navigation

BioScraper

Uniprot Scraper

Uniprot Scraper accepts 3 arguments:

1. Input file: Input directory of your ID file, which is a column list of protein Uniprot IDs, csv or xlsx format recommended.

2. Output file: Output directory of output file (csv format) containing Uniprot IDs and corresponding subcelluar location.

3. Secreted protein selection:

Example

HMDB Scraper

HMDB Scraper accepts 3 arguments:

1. Input file: same as Uniprot Scraper

2. Output file: Output directory of output file (csv format) containing CAS IDs and corresponding description in HMDB.

3. Sheet in your input file

Example

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages