Initially created for PhD Seminar on AI-Assisted Software Engineering workshop, Track 3:
- Fine-tuning a transformer model based on a text corpus (Day 1)
- Developing custom transformer architecture (Day 2)
The main
branch contains all the code necessary to run the notebooks (e.g. correct answers for the students).
The google-colab
branch is intended for students and requires manually coding some parts of the notebooks. To use the notebook in Google Colab:
- Go to https://colab.research.google.com/
- Open a new notebook and select the Github tab
- For username, enter karmus89
- For repository, select transformers-seminar-workshop
- For branch, select google-colab
- Remember to change the runtime to GPU for training
- The repository is always up to date with respect to the main branch
- Some of the code cells have been omitted, though
The repository comes bundled with an already fine-tuned BERT for the data to help all get on board even when they don't have sufficient resources for performing the fine-tuning a) themself or b) in a timely manner.
To get the fine-tuned model:
- Download the the already trained models used in the notebooks from thei corresponding Hugging Face repositories and persist them in the
model
folder:
- Fine-tuned MLM BERT:
git clone https://huggingface.co/karmus89/bert-base-uncased-finetuned
- Fine-tuned MLM BERT with classification head:
git clone https://huggingface.co/karmus89/classifier-fine
- Pre-trained MLM BERT with classification head:
git clone https://huggingface.co/karmus89/classifier-pre
-
Install Miniconda
-
Create a
conda
environment:conda env create -f environment.yml
-
Install Pytorch (prefer
pip
overconda
):pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
-
Use correct
conda
environment with notebooks.
The custom BERT of workshop 2 does not incorporate MLM pre-training, but is essentially a transformer-based classifier
- For learning about transformers, this suffices
- For learning about BERT and how a fine-tunable and transfer learnable models, this this lacks a bit
Attention is all you need (2017)
- arXiv
- The Annotated Transformer: notebook of paper with code
- Stanford CS224N: NLP with Deep Learning | Winter 2019 | Lecture 14 – Transformers and Self-Attention
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (2018)
HuggingFace Course
Jay Alammar's blog posts about core concepts
- The Illustrated Transformer
- The Illustrated BERT
- The Illustrated GPT-2 (Visualizing Transformer Language Models)
- Visualizing A Neural Machine Translation Model (Mechanics of Seq2seq Models With Attention)
Udemy course "Natural Language Processing: NLP With Transformers in Python"
Peter Bloems blog "Transformers from scratch"
Neptune.ai's blog on creating BERT in Pytorch
Better introductions to the notebook re: structure
- Add general image descriptions for the steps that are taken