Document updated text-reuse detection process #8

piconti · 2024-05-02T09:01:27Z

Once the text-reuse clusters have been identified on the new corpus, using both passim v1 (spark scala based) and v2 (python based), the best process should be chosen, and corresponding processing steps should be documented in detail for furture reference.

Action items:

Document steps for scala version 1
Document steps for python version 2
Express the difference between the two (where exactly to define)

piconti · 2025-01-24T17:20:56Z

This is only relevant if we switch versions to use python instead.
Putting this on hold as a more recent issue on documenting the current approach has been created

piconti mentioned this issue May 2, 2024

Prepare and launch text-reuse detection with Passim #5

Open

9 tasks

e-maud assigned piconti May 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document updated text-reuse detection process #8

Document updated text-reuse detection process #8

piconti commented May 2, 2024 •

edited

Loading

piconti commented Jan 24, 2025

Document updated text-reuse detection process #8

Document updated text-reuse detection process #8

Comments

piconti commented May 2, 2024 • edited Loading

piconti commented Jan 24, 2025

piconti commented May 2, 2024 •

edited

Loading