Skip to content

Meeting Note #4 21.03.2019

ugurcanarikan edited this page Mar 21, 2019 · 2 revisions

Location: Bogazici University Computer Engineering Building

Date/Time: 21.03.2019 / 12:00

Attendees:

  • Suzan Üsküdarlı
  • Onur Güngör
  • Uğurcan Arıkan

1. Preparation Before Meeting

  • 1.1. Split the corpus
  • 1.2. Start creating pretraining data on one of the chunks

2. Agenda

  • 2.1. Pretraining BERT after the Turkish vocabulary has been created

3. Discussion

  • 3.1. Memory issue during BERT's pretraining due to the corpus' size has been discussed
  • 3.2. Creating pretraining data and pretraining BERT has been discussed
  • 3.3. Current status of the project has been discussed
  • 3.3. Memory issues about the creating pretraining data process has been discussed

4. Outcomes

  • 4.1. BERT pretraining
    • 4.1.1. Due to its massive size, corpus will be split into 50 smaller chunks before pretraining

5. TO-DO list

Deadline: 28.03 12:00 Assignee: Uğurcan Arıkan

  • 5.1. Split corpus into smaller pieces of 1.5 million lines each

Deadline: 28.03 12:00 Assignee: Uğurcan Arıkan

  • 5.2. Create pretraining data for BERT

Deadline: 28.03 12:00 Assignee: Uğurcan Arıkan

  • 5.3. Run pretraining data on BERT