-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HW4_bredov #28
base: main
Are you sure you want to change the base?
HW4_bredov #28
Conversation
Hw4 bredov
Update README.md, add information about protein_mass and protein_formula function
…', 'seq_transform'. Rename 'check_seq' to 'check_and_procees_seq'
Add "aa_content_check" and "aa_chain_charge functions" functions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Хорошая работа! Очень понравилась проверка аминокислотных последовательностей. Главная функция хорошо выполнена. Здорово, что поработали над стилистикой кода по PEP8.
Замечания:
- Обратите внимание на отступы констант.
- README написан слабо: не хватает примеров, не написано, что делают функции, нет примера установки или импорта модуля.
- Функции protein_mass и protein_formula написаны без использования словарей. Выглядит неоптимально.
- Переменные в Python (за исключением констант) пишутся прописными буквами.
- Не хватает единообразия кода. Например, где-то кавычки одинарные, где-то двойные.
- Комментарии к коммитам не все удачные. Хороший комментарий к коммиту: Add protein_formula function.
Итого:
- README 1.7 балла (-1 сняла за недостаточную информативность, не хватает примеров)
- 5 функций * 1.5 = 7 баллов (-0.5 за отсутствие словарей в функциях protein_mass и protein_formula)
- штрафы: -0.5 за комментарии к коммитам, -0.3 за не всегда удачный нейминг, -0.2 за доктринга главной функции, оформленный в произвольном стиле -0.2 = -1.2 балла
Итого: 7.5 баллов.
def aa_content_check(seq: str) -> dict: | ||
"Returns aminoacids content of the protein" | ||
seq_content = dict.fromkeys(AA_UNIPROT_CONTENT.keys(), 0) | ||
for AAcd in seq.upper(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
В Python переменные (за исключением констант) пишут прописными буквами.
for AAcd in seq.upper(): | |
for aa_cd in seq.upper(): |
|
||
seq_length = len(seq) | ||
for AAcd, occurence in seq_content.items(): | ||
seq_content[AAcd] = 100 * occurence / seq_length |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Лучше округлять значения, чтобы не было длинных хвостов после запятой.
return seq_content | ||
|
||
|
||
def check_and_procees_seq(seq: str, abbreviation: int = 1) -> Tuple[bool, str]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Вот эта функция прям отличная)
seq_set = set(seq.upper()) | ||
exit_code = bool(seq) and seq_set.issubset(set(AA_TR_DICT.values())) | ||
if exit_code: | ||
seq_content, uniprot_content = aa_content_check(seq).values(), AA_UNIPROT_CONTENT.values() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
В этой строке происходят 2 разные операции: работа функции aa_content_check и получение долей аминокислотных остатков в последовательности и получение значений из константы AA_UNIPROT_CONTENT. Не нужно неоправданно сокращать код таким образом. Лучше написать это в 2 строки.
exit_code = bool(seq) and seq_set.issubset(set(AA_TR_DICT.values())) | ||
if exit_code: | ||
seq_content, uniprot_content = aa_content_check(seq).values(), AA_UNIPROT_CONTENT.values() | ||
seq_Mann_Whitney_U = Mann_Whitney_U(seq_content, uniprot_content) if len(seq_set) == 20 else True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Не совсем логичный нейминг переменной. В нее запись идет True/False. Лучше написать passed_mann_whitney_test.
И названия переменных в Python пишут строчными буквами.
for seq_index, seq in enumerate(seqs): | ||
is_seq_valid, seq_alt = check_and_procees_seq(seq, abbreviation) | ||
if is_seq_valid: | ||
result.append(OPERATIONS[operation](seq_alt)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
|
||
res_len, cor_seq_len = len(result), len(corrupt_seqs) | ||
result = result[0] if res_len == 1 else result | ||
corrupt_seqs = corrupt_seqs[0] if cor_seq_len == 1 else corrupt_seqs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Я бы сюда еще добавила условие, что если cor_seq_len == 0, то не возвращать corrupt_seqs в принципе.
AA_UNIPROT_CONTENT = { | ||
"A": 9.03, "R": 5.84, "N": 3.79, "D": 5.47, "C": 1.29, | ||
"Q": 3.80, "E": 6.24, "G": 7.27, "H": 2.22, "I": 5.53, | ||
"L": 9.85, "K": 4.93, "M": 2.33, "F": 3.88, "P": 4.99, | ||
"S": 6.82, "T": 5.55, "W": 1.30, "Y": 2.88, "V": 6.86 | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Не хватает отступов.
AA_UNIPROT_CONTENT = { | |
"A": 9.03, "R": 5.84, "N": 3.79, "D": 5.47, "C": 1.29, | |
"Q": 3.80, "E": 6.24, "G": 7.27, "H": 2.22, "I": 5.53, | |
"L": 9.85, "K": 4.93, "M": 2.33, "F": 3.88, "P": 4.99, | |
"S": 6.82, "T": 5.55, "W": 1.30, "Y": 2.88, "V": 6.86 | |
} | |
AA_UNIPROT_CONTENT = {"A": 9.03, "R": 5.84, "N": 3.79, "D": 5.47, "C": 1.29, | |
"Q": 3.80, "E": 6.24, "G": 7.27, "H": 2.22, "I": 5.53, | |
"L": 9.85, "K": 4.93, "M": 2.33, "F": 3.88, "P": 4.99, | |
"S": 6.82, "T": 5.55, "W": 1.30, "Y": 2.88, "V": 6.86, | |
} |
if not len_corr_seq: | ||
print(f"All {len_seqs} sequence(s) processed successfully") | ||
elif len_corr_seq: | ||
for i in corrupt_seqs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i - плохой нейминг
Provides interface for 5 operations from `OPERATIONS` dictionary. Takes various number of positional arguments and one keyword-only argument: | ||
|
||
- First `n` arguments - protein sequences; | ||
- Latter positional argument - desired operation from list: "content_check", "seq_length", "protein_formula", "protein_mass", "charge"; | ||
- `abbrevition` keyword-only argument. Should be type integer, 1 for 1-letter abbreviation and 3 for 3-letter. | ||
|
||
Returns tuple containing two list: | ||
|
||
- `result` - list with operation results for each valid sequence; | ||
- `corrupt_seqs` - list with non-valid sequences and their indices; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
У докстрингов есть определенные стили, выберите понравившийся. У вас он более произвольный.
Merge "protein_analyzer_tool.py" and "README.md" inside"HW4_Bredov" folder from "HW4_Bredov" branch