-
Notifications
You must be signed in to change notification settings - Fork 481
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYSTEMDS-3179] Add Co-occurrence matrix for GloVe word embedding #2200
base: main
Are you sure you want to change the base?
[SYSTEMDS-3179] Add Co-occurrence matrix for GloVe word embedding #2200
Conversation
I suggest editing the PR's title to indicate that it is related to the issue [SYSTEMDS-3179], for example, #2206. Moreover, I suggest giving more details regarding the code in this PR the proposed |
c7dc94a
to
0293dbc
Compare
Clean cooc matrix code and add more accurate descriptions in comments.
rename cooccur.dml to cooccurrenceMatrix.dml for more clear naming in tests
add cooccurrenceMatrix dml script in Builtins.java
Add a small test data containing diffrent charactors to check proper text preprocessing in cooccurrenceMatrix.dml.
The test checks the result of the cooccurrenceMatrix.dml for a small dataset.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2200 +/- ##
============================================
+ Coverage 72.27% 72.37% +0.10%
- Complexity 44978 45307 +329
============================================
Files 1452 1467 +15
Lines 169309 170607 +1298
Branches 33037 33257 +220
============================================
+ Hits 122364 123477 +1113
- Misses 37630 37736 +106
- Partials 9315 9394 +79 ☔ View full report in Codecov by Sentry. |
add licenses to BuiltinCooccurrenceMatrixTest.java
This PR adds the implementation for the co-occurrence matrix in DML.
The co-occurrence matrix can be used in word embedding like GloVe and word2vec.
The implementation is based on this repository:
GloVe: Global Vectors for Word Representation