Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pytest #87

Merged
merged 7 commits into from
Oct 28, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
name: python auto-format
name: requirements_lint_test
on:
# Triggers the workflow on push to all the branches and on pull request to the "master" or the "dev" branch
push:
pull_request:
branches: [ "master", "dev"]
branches: [ "master", "dev", "backend", "pytest"]
# Allows you to run this workflow manually from the Actions tab
workflow_dispatch:

jobs:
lint:
test-n-lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
Expand All @@ -17,6 +17,7 @@ jobs:

- name: Install dependencies
run: |
python -m pip install -r src/backend/requirements.txt
python -m pip install -r requirements_git_actions.txt

- name: apply_black
Expand All @@ -28,4 +29,9 @@ jobs:
options: "--check --verbose"
src: "."
jupyter: true
version: "~= 22.0"
version: "~= 22.0"

- name: Test with pytest
run: |
pytest

204 changes: 204 additions & 0 deletions notebooks/languages.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,204 @@
Abkhazian
Afar
Afrikaans
Akan
Albanian
Amharic
Arabic
Aragonese
Armenian
Assamese
Avaric
Avestan
Aymara
Azerbaijani
Bambara
Bashkir
Basque
Belarusian
Bengali
Bislama
Bosnian
Breton
Bulgarian
Burmese
Catalan
Valencian
Chamorro
Chechen
Chichewa
Chewa
Nyanja
Chinese
Church Slavonic
Chuvash
Cornish
Corsican
Cree
Croatian
Czech
Danish
Divehi
Dhivehi
Maldivian
Dutch
Flemish
Dzongkha
English
Esperanto
Estonian
Ewe
Faroese
Fijian
Finnish
French
Western Frisian
Fulah
Gaelic
Scottish Gaelic
Galician
Ganda
Georgian
German
Greek
Kalaallisut
Greenlandic
Guarani
Gujarati
Haitian
Haitian Creole
Hausa
Hebrew
Herero
Hindi
Hiri Motu
Hungarian
Icelandic
Ido
Igbo
Indonesian
Interlingua
Inuktitut
Inupiaq
Irish
Italian
Japanese
Javanese
Kannada
Kanuri
Kashmiri
Kazakh
Central Khmer
Kikuyu
Gikuyu
Kinyarwand
Kirghiz
Kyrgyz
Komi
Kongo
Korean
Kuanyama
Kwanyama
Kurdish
Lao
Latin
Latvian
Limburgan
Limburger
Limburgish
Lingala
Lithuanian
Luba-Katanga
Luxembourgish
Letzeburgesch
Macedonian
Malagasy
Malay
Malayalam
Maltes
Manx
Maori
Marathi
Marshallese
Mongolian
Nauru
Navajo
Navaho
North Ndebele
South Ndebele
Ndonga
Nepali
Norwegian
Norwegian Bokmål
Norwegian Nynorsk
Sichuan Yi
Nuosu
Occitan
Ojibwa
Oriya
Oromo
Ossetian
Ossetic
Pali
Pashto
Pushto
Persian
Polish
Portuguese
Punjabi
Panjabi
Quechua
Romanian
Moldavian
Moldovan
Northern Sami
Samoan
Sango
Sanskrit
Sardinian
Serbian
Shona
Sindhi
Sinhala
Sinhalese
Slovak
Slovenian
Somali
Southern Sotho
Spanish
Castilian
Sundanese
Swahili
Swati
Swedish
Tagalog
Tahitian
Tajik
Tamil
Tatar
Telugu
Thai
Tibetan
Tigrinya
Tsonga
Tsonga
Tswana
Turkish
Turkmen
Twi
Uighur
Ukrainian
Urdu
Uzbek
Venda
Vietnamese
Volapük
Walloon
Welsh
Wolof
Xhosa
Yiddish
Yoruba
Zhuang
Chuang
Zulu
3 changes: 2 additions & 1 deletion requirements_git_actions.txt
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
black==23.9.1
black[jupyter]
black[jupyter]
pytest==7.4.3
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
from typing import List
import torch
from sklearn.metrics.pairwise import cosine_similarity


Expand Down Expand Up @@ -27,3 +28,9 @@ def get_free_text_match(
return 0

return cosine_similarity(candidate_embeddings, job_embeddings)[0][0]

if __name__=="__main__":
print(int(get_free_text_match(
torch.tensor([[1,0,0]]),
torch.tensor([[-1,0,0]]),
))==-1)
File renamed without changes.
19 changes: 19 additions & 0 deletions src/backend/api/migrations/0032_skills_skill_new_col.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Generated by Django 4.2.5 on 2023-10-25 08:48

from django.db import migrations, models


class Migration(migrations.Migration):

dependencies = [
('api', '0031_rename_main_contact_first_name_companies_first_name_and_more'),
]

operations = [
migrations.AddField(
model_name='skills',
name='skill_new_col',
field=models.CharField(default='some string', max_length=255),
preserve_default=False,
),
]
17 changes: 17 additions & 0 deletions src/backend/api/migrations/0033_remove_skills_skill_new_col.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Generated by Django 4.2.5 on 2023-10-25 08:55

from django.db import migrations


class Migration(migrations.Migration):

dependencies = [
('api', '0032_skills_skill_new_col'),
]

operations = [
migrations.RemoveField(
model_name='skills',
name='skill_new_col',
),
]
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# Generated by Django 4.2.5 on 2023-10-26 15:57

from django.db import migrations, models


class Migration(migrations.Migration):

dependencies = [
('api', '0033_remove_skills_skill_new_col'),
]

operations = [
migrations.AddField(
model_name='candidates',
name='aboutme_experinece_embedded',
field=models.TextField(blank=True, null=True),
),
]
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
1 change: 1 addition & 0 deletions src/backend/api/skill_db_relax_20.json

Large diffs are not rendered by default.

File renamed without changes.
1 change: 1 addition & 0 deletions src/backend/api/token_dist.json

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
import torch
from typing import List
from transformers import AutoTokenizer, AutoModel
from sklearn.metrics.pairwise import cosine_similarity


MODEL_NAME = "bert-base-uncased"
Expand Down Expand Up @@ -46,3 +45,6 @@ def generate_embeddings(text: str, model_name: str=MODEL_NAME) -> List[List]:
text_embeddings = text_outputs.last_hidden_state.mean(dim=1)
return text_embeddings

if __name__=="__main__":
s = ""
print(generate_embeddings(s)[0][0].item()==-0.00922924280166626)
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Empty file.
File renamed without changes.
File renamed without changes.
Empty file added tests/__init__.py
Empty file.
31 changes: 31 additions & 0 deletions tests/test_embedding_similarity.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
import torch
import sys
import os
from src.backend.api.tokenization_n_embedding import tokenize_text, generate_embeddings
from src.backend.api.matching_algorithm import get_free_text_match

def test_tokenize_text():
text_tokens_keys = set(['input_ids','token_type_ids','attention_mask'])
assert set(tokenize_text("test").keys()) == \
text_tokens_keys
assert set(tokenize_text("").keys()) == \
text_tokens_keys

def test_generate_embeddings():
assert len(generate_embeddings("any text")) == 1
assert len(generate_embeddings("any text")[0]) == 768
assert generate_embeddings("").dtype == torch.float32

def test_get_free_text_match_text():
assert get_free_text_match(
torch.tensor([[1,2,3]]),
torch.tensor([[-1,-2,-3]])
) == -1
assert get_free_text_match(
torch.tensor([[1,2,3]]),
torch.tensor([[1,2,3]])
) == 1
assert get_free_text_match(
torch.tensor([[1,0,0]]),
torch.tensor([[0,1,0]]),
) == 0