Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HW_18 #4

Open
wants to merge 29 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
fa64ed0
Create README.md
LinaWhite15 Oct 8, 2023
278379a
Create protein_toolkit.py
LinaWhite15 Oct 8, 2023
e05d8eb
Create fastq_toolkit.py
LinaWhite15 Oct 8, 2023
fada259
Relocate protein_toolkit.py to /modules
LinaWhite15 Oct 8, 2023
5aa6e1f
Relocate fastq_toolkit.py to /modules
LinaWhite15 Oct 8, 2023
28a43bf
Create main script.py
LinaWhite15 Oct 8, 2023
59937fc
Update README.md
LinaWhite15 Oct 8, 2023
a0a7345
Update main script.py
LinaWhite15 Oct 8, 2023
bae0802
Add functions read_fastq and write_fastq
LinaWhite15 Oct 18, 2023
c44c77d
Add functions read_fastq and write_fastq
LinaWhite15 Oct 18, 2023
12ad015
Delete modules directory
LinaWhite15 Feb 25, 2024
466c6cd
Delete main script.py
LinaWhite15 Feb 25, 2024
a13186e
Update main_script.py
LinaWhite15 Feb 25, 2024
af1bef9
Create requirements.txt
LinaWhite15 Feb 25, 2024
ba11f49
Update requirements.txt
LinaWhite15 Feb 25, 2024
59fc34c
Delete main_script.py
LinaWhite15 May 1, 2024
5a6e924
Add bio_files_processor.py
LinaWhite15 May 1, 2024
fd2fa02
Add bioinfUtils.py
LinaWhite15 May 1, 2024
3a2353c
Add test_bioinf_utils.py
LinaWhite15 May 1, 2024
1ff2082
Add Showcases.ipynb
LinaWhite15 May 1, 2024
0272078
Add custom_random_forest.py
LinaWhite15 May 1, 2024
3a0366e
Create data folder
LinaWhite15 May 1, 2024
3825362
Add BRCA2.fasta
LinaWhite15 May 1, 2024
a2ef22f
Add SOWAHA.fasta
LinaWhite15 May 1, 2024
334ffa4
Update requirements.txt
LinaWhite15 May 1, 2024
64c1987
Update README.md
LinaWhite15 May 1, 2024
4547450
Fix README.md
LinaWhite15 May 1, 2024
7d5643c
Update README.md
LinaWhite15 May 1, 2024
1fd02af
Fix link README.md
LinaWhite15 May 1, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
66 changes: 66 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# Bioinformatics toolkit for beginner <a href=""><img src="https://cdn-icons-png.flaticon.com/512/8662/8662375.png" align="right" width="150" ></a> <h3> Final homework of Python course at the BI</h3>

The repository contains homework of the Python course during retraining program at the [Bioinformatics Institute(2023/2024)](https://bioinf.me/).

The repository content is scripts, scripts designed to work with multiple biological data storage formats (such as FASTA and GBK), for processing biological sequences, parsing and for interacting with a telegram bot.

## Table of content
+ [Installation](https://github.com/LinaWhite15/Bioinformatics_toolkit_for_beginner/edit/Development/README.md#installation)
+ [Usage](https://github.com/LinaWhite15/Bioinformatics_toolkit_for_beginner/edit/Development/README.md#usage)
+ [Content](https://github.com/LinaWhite15/Bioinformatics_toolkit_for_beginner/edit/Development/README.md#content)
+ [Special gratitudes](https://github.com/LinaWhite15/Bioinformatics_toolkit_for_beginner/edit/Development/README.md#Special_gratitudes)
+ [Credits](https://github.com/LinaWhite15/Bioinformatics_toolkit_for_beginner/edit/Development/README.md#credits)

## Installation
To install the program, download files in main directory and, additionally, contents of the folder "data".
OR
You can simple clone this repository using
```
git clone [email protected]:LinaWhite15/Bioinformatics_toolkit_for_beginner.git
```
(for Linux and WSL users)
**Python3 is required.**

## Usage
Before running the script, you must import required modules in your script. For example:
```
from bioinfUtils import genscan, GenscanOutput
```
or
```
import custom_random_forest
```
## Content

### `bioinfUtils.py`

`genscan` - [GENSCAN](http://hollywood.mit.edu/GENSCAN.html) API

`telegram_logger` - decorator function for launching a Telegram bot, logging function execution

`DNASequence`, `RNASequence` and `AminoAcidSequence` - classes for processing of biological sequences.

### `bio_files_processor.py`
`OpenFasta`, `FastaRecord` - context manager and data class for handling with .fasta files. Provides less resource-intensive storage and iteration.

`convert_multiline_fasta_to_oneline` - conwerts a multi-line FASTA to wide format

`select_genes_from_gbk_to_fasta` - function for processing of GenBank files. The function selects genes flanking the given ones and saves its translated sequence into a FASTA file.

### `custom_random_forest.py`
`RandomForestClassifier` - custom implementation of random forest algorithm with parallelization

### `test_bioinf_utils.py`
Contains tests for functions from modules `bio_files_processor` and `bio_files_processor.py`.

### `Showcases.ipynb`
Hotebook with demonstration of `RandomForestClassifier`, `genscan`, `DNASequence`, and `AminoAcidSequence` functionality.

## Special gratitudes
Many thanks to the team of the Bioinformatics Institute, especially to the teachers and assistants of the Python program, for preparing and supporting such a wonderful course! It was a wonderful journey, or hopefully just the beginning of one.

# Contacts
If you have any comments or suggestions regarding this software, you can adress me using these contacts:

Belikova Angelina - [email protected]

Loading