OPT20031: Planning for Toolkit Pipeline Implementation #88

tenzin3 · 2024-12-03T09:41:56Z

Description

With the data team, toolkit developers, and pecha.org collaborating, there is a significant amount of work and updates happening. To streamline these operations, we are preparing all the necessary inputs (from the data team) and output JSON files for pecha.org manually. This approach will ensure clarity and alignment among all the involved parties.

Important Note

This card is strictly for planning and data preparation.
The actual implementation steps will not begin unless approved by drupchen and ngawangtrinley.

Diagram

Edit here

Implementation Diagram

Edit here

Transfer mechanism for Translation and Commentary Diagram:

Dummy Data

available here

Input ( All Google Docs )

Pecha Display segment and its translations aligned input
- Tibetan Root Text Segmented Google Docs
- English Root Text Segmented in alignment with Tibetan Root Text Google Docs
- Chinese Root Text Segmented in alignment with Tibetan Root Text Google Docs
  Note: There could be more than one translation for a single language
All Commentary with its corresponding Root Text.
- Tibetan Commentary Text Alignment to its root text
- English Commentary text aligned to its Root text
- Chinese Commentary text alignment to its root text

Note: This root text is unique to its commentary and there could be more than one commentary for a single language.

Output ( All Json files )

Translation alignment
- JSON file for Tibetan Root text with english aligned text. Test-Root-Text.json
- JSON of it's aligned Chinese translation Root Text Chinese-Test.json
  Note:
  Each of the segment has to start with the root segment id mapping if exists.
  Root segment id would be use to map with Commentary.
Commentary alignment
-JSON file for Commentary text.

Note:
Commentary associated with Root segment should start with root segment id.

View on Pecha.org

Tibetan Root Text and Its Alignment: https://staging.pecha.org/texts/Test

Tasks at hand

Parsers:

Root texts google docs to convert to OPF
Commentary texts google docs to convert to OPF

Serializer:

Root text and its translations OPFs into a different json
Commentary OPFs into a different json

Annotation Transfer Scripts:

Base Update Script: Updates the root text (aligned with the commentary) to match the root text (aligned with the translation).
Layer Update Script: Adjusts the segment and span values based on segmentation changes made from the root text (commentary-aligned) to the root text (translation-aligned).

tenzin3 assigned ta4tsering and tenzin3 Dec 3, 2024

OpenPecha deleted a comment from tenzin3 Dec 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OPT20031: Planning for Toolkit Pipeline Implementation #88

OPT20031: Planning for Toolkit Pipeline Implementation #88

tenzin3 commented Dec 3, 2024 •

edited by ta4tsering

Loading

OPT20031: Planning for Toolkit Pipeline Implementation #88

OPT20031: Planning for Toolkit Pipeline Implementation #88

Comments

tenzin3 commented Dec 3, 2024 • edited by ta4tsering Loading

Description

Important Note

Diagram

Implementation Diagram

Dummy Data

Input ( All Google Docs )

Output ( All Json files )

View on Pecha.org

Tasks at hand

tenzin3 commented Dec 3, 2024 •

edited by ta4tsering

Loading