-
Notifications
You must be signed in to change notification settings - Fork 3
Meeting Note #4 21.03.2019
ugurcanarikan edited this page Mar 21, 2019
·
2 revisions
Location: Bogazici University Computer Engineering Building
Date/Time: 21.03.2019 / 12:00
- Suzan Üsküdarlı
- Onur Güngör
- Uğurcan Arıkan
- 1.1. Split the corpus
- 1.2. Start creating pretraining data on one of the chunks
- 2.1. Pretraining BERT after the Turkish vocabulary has been created
- 3.1. Memory issue during BERT's pretraining due to the corpus' size has been discussed
- 3.2. Creating pretraining data and pretraining BERT has been discussed
- 3.3. Current status of the project has been discussed
- 3.3. Memory issues about the creating pretraining data process has been discussed
- 4.1. BERT pretraining
- 4.1.1. Due to its massive size, corpus will be split into 50 smaller chunks before pretraining
Deadline: 28.03 12:00 Assignee: Uğurcan Arıkan
- 5.1. Split corpus into smaller pieces of 1.5 million lines each
Deadline: 28.03 12:00 Assignee: Uğurcan Arıkan
- 5.2. Create pretraining data for BERT
Deadline: 28.03 12:00 Assignee: Uğurcan Arıkan
- 5.3. Run pretraining data on BERT