Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Notebook llama #739

Merged
merged 56 commits into from
Oct 26, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
56 commits
Select commit Hold shift + click to select a range
8b5fd85
init
init27 Oct 20, 2024
d05c3ec
chat-starter-pack
init27 Oct 20, 2024
e979f06
Create 1B-debating-script.py
init27 Oct 20, 2024
c9fffe8
Gradio-App
init27 Oct 20, 2024
6ef9d09
Added notes
init27 Oct 20, 2024
8f379c2
Test-1
init27 Oct 21, 2024
f00e1cc
added notes
init27 Oct 21, 2024
92821d3
Tested idea of re-writing
init27 Oct 21, 2024
1ca1665
starting
init27 Oct 22, 2024
ac54e6c
Starter nb
init27 Oct 22, 2024
e79e925
update nb
init27 Oct 22, 2024
13c4010
brute force
init27 Oct 22, 2024
d47c442
grid search
init27 Oct 22, 2024
0bd41ea
Sweeps added
init27 Oct 22, 2024
045bf70
Notes
init27 Oct 22, 2024
1654f20
1B-init
init27 Oct 22, 2024
11a00e7
iterate
init27 Oct 22, 2024
2c5ad4d
Step-1 Notebook
init27 Oct 22, 2024
0c03327
Update Step-1 PDF Pre-Processing Logic.ipynb
init27 Oct 22, 2024
83436a1
Update Step-1 PDF Pre-Processing Logic.ipynb
init27 Oct 22, 2024
a6dd5e7
Update Step-1 PDF Pre-Processing Logic.ipynb
init27 Oct 22, 2024
75b5b39
Happy push
init27 Oct 22, 2024
716a220
Create Step-2-Bark-Multiple-Speaker-Workflow.ipynb
init27 Oct 22, 2024
8795999
Update README.md
init27 Oct 22, 2024
05a178d
Update README.md
init27 Oct 22, 2024
867d99e
Create Parler-Testing.ipynb
init27 Oct 22, 2024
083b593
Add notes
init27 Oct 22, 2024
e002c0d
Fix conflicts
init27 Oct 23, 2024
eba932b
Update Prompt_testing.md
init27 Oct 23, 2024
d6b2f42
rename
init27 Oct 23, 2024
e6e08e0
changed to pipeline
init27 Oct 23, 2024
e84dc56
Polish out notebooks and worflow
init27 Oct 23, 2024
ca0221f
Semi-Final-runs
init27 Oct 23, 2024
5ce0b09
final_runs
init27 Oct 23, 2024
ae04cd3
Update README.md
init27 Oct 23, 2024
2a32851
update ReadMe
init27 Oct 23, 2024
766184a
rm
init27 Oct 23, 2024
38646b0
Update README.md
init27 Oct 24, 2024
3d2da18
Add notes
init27 Oct 24, 2024
5b55693
added some notes
init27 Oct 24, 2024
96ea541
Added all notes
init27 Oct 24, 2024
75cd0f4
Final nb1
init27 Oct 24, 2024
5d430e3
Notebook 2 finalise
init27 Oct 24, 2024
73dc1c6
Notebook 3 finalise
init27 Oct 24, 2024
46efd25
Nb-4 Fin
init27 Oct 24, 2024
ca19fd6
Fix spellings
init27 Oct 24, 2024
d12ba36
add example
init27 Oct 25, 2024
20197fc
fix path
init27 Oct 25, 2024
ff91819
fix
init27 Oct 25, 2024
2ad6caf
Update recipes/quickstart/NotebookLlama/README.md
init27 Oct 25, 2024
584a101
Update recipes/quickstart/NotebookLlama/README.md
init27 Oct 25, 2024
9057ba4
Update recipes/quickstart/NotebookLlama/README.md
init27 Oct 25, 2024
572f44d
Address comments, add the file
init27 Oct 25, 2024
95db0fe
Address PR Comments
init27 Oct 25, 2024
be79a2d
One more fix
init27 Oct 25, 2024
62c1005
Add an image
init27 Oct 26, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
93 changes: 93 additions & 0 deletions recipes/quickstart/NotebookLlama/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
## NotebookLlama: An Open Source version of NotebookLM

![NotebookLlama](./resources/Outline.jpg)

[Listen to audio from the example here](./resources/_podcast.mp3)

This is a guided series of tutorials/notebooks that can be taken as a reference or course to build a PDF to Podcast workflow.

You will also learn from the experiments of using Text to Speech Models.

It assumes zero knowledge of LLMs, prompting and audio models, everything is covered in their respective notebooks.

### Outline:

Here is step by step thought (pun intended) for the task:

- Step 1: Pre-process PDF: Use `Llama-3.2-1B-Instruct` to pre-process the PDF and save it in a `.txt` file.
- Step 2: Transcript Writer: Use `Llama-3.1-70B-Instruct` model to write a podcast transcript from the text
- Step 3: Dramatic Re-Writer: Use `Llama-3.1-8B-Instruct` model to make the transcript more dramatic
- Step 4: Text-To-Speech Workflow: Use `parler-tts/parler-tts-mini-v1` and `bark/suno` to generate a conversational podcast

Note 1: In Step 1, we prompt the 1B model to not modify the text or summarize it, strictly clean up extra characters or garbage characters that might get picked due to encoding from PDF. Please see the prompt in Notebook 1 for more details.

Note 2: For Step 2, you can also use `Llama-3.1-8B-Instruct` model, we recommend experimenting and trying if you see any differences. The 70B model was used here because it gave slightly more creative podcast transcripts for the tested examples.

### Detailed steps on running the notebook:

Requirements: GPU server or an API provider for using 70B, 8B and 1B Llama models.
init27 marked this conversation as resolved.
Show resolved Hide resolved
For running the 70B model, you will need a GPU with aggregated memory around 140GB to infer in bfloat-16 precision.

Note: For our GPU Poor friends, you can also use the 8B and lower models for the entire pipeline. There is no strong recommendation. The pipeline below is what worked best on first few tests. You should try and see what works best for you!

- Before getting started, please make sure to login using the `huggingface cli` and then launch your jupyter notebook server to make sure you are able to download the Llama models.

You'll need your Hugging Face access token, which you can get at your Settings page [here](https://huggingface.co/settings/tokens). Then run `huggingface-cli login` and copy and paste your Hugging Face access token to complete the login to make sure the scripts can download Hugging Face models if needed.

- First, please Install the requirements from [here]() by running inside the folder:

```
git clone https://github.com/meta-llama/llama-recipes
cd llama-recipes/recipes/quickstart/NotebookLlama/
pip install -r requirements.txt
init27 marked this conversation as resolved.
Show resolved Hide resolved
```

- Notebook 1:

This notebook is used for processing the PDF and processing it using the new Feather light model into a `.txt` file.

Update the first cell with a PDF link that you would like to use. Please decide on a PDF to use for Notebook 1, it can be any link but please remember to update the first cell of the notebook with the right link.

Please try changing the prompts for the `Llama-3.2-1B-Instruct` model and see if you can improve results.
init27 marked this conversation as resolved.
Show resolved Hide resolved

- Notebook 2:

This notebook will take in the processed output from Notebook 1 and creatively convert it into a podcast transcript using the `Llama-3.1-70B-Instruct` model. If you are GPU rich, please feel free to test with the 405B model!

Please try experimenting with the System prompts for the model and see if you can improve the results and try the 8B model as well here to see if there is a huge difference!

- Notebook 3:

This notebook takes the transcript from earlier and prompts `Llama-3.1-8B-Instruct` to add more dramatization and interruptions in the conversations.

There is also a key factor here: we return a tuple of conversation which makes our lives easier later. Yes, studying Data Structures 101 was actually useful for once!

For our TTS logic, we use two different models that behave differently with certain prompts. So we prompt the model to add specifics for each speaker accordingly.

Please again try changing the system prompt and see if you can improve the results. We encourage testing the feather light 3B and 1B models as well at this stage

- Notebook 4:

Finally, we take the results from last notebook and convert them into a podcast. We use the `parler-tts/parler-tts-mini-v1` and `bark/suno` models for a conversation.

The speakers and the prompt for parler model were decided based on experimentation and suggestions from the model authors. Please try experimenting, you can find more details in the resources section.


#### Note: Right now there is one issue: Parler needs transformers 4.43.3 or earlier and for steps 1 to 3 of the pipeline you need latest, so we just switch versions in the last notebook.

### Next-Improvements/Further ideas:

- Speech Model experimentation: The TTS model is the limitation of how natural this will sound. This probably be improved with a better pipeline and with the help of someone more knowledgable-PRs are welcome! :)
- LLM vs LLM Debate: Another approach of writing the podcast would be having two agents debate the topic of interest and write the podcast outline. Right now we use a single LLM (70B) to write the podcast outline
- Testing 405B for writing the transcripts
- Better prompting
- Support for ingesting a website, audio file, YouTube links and more. Again, we welcome community PRs!

### Resources for further learning:

- https://betterprogramming.pub/text-to-audio-generation-with-bark-clearly-explained-4ee300a3713a
- https://colab.research.google.com/drive/1dWWkZzvu7L9Bunq9zvD-W02RFUXoW-Pd?usp=sharing
- https://colab.research.google.com/drive/1eJfA2XUa-mXwdMy7DoYKVYHI1iTd9Vkt?usp=sharing#scrollTo=NyYQ--3YksJY
- https://replicate.com/suno-ai/bark?prediction=zh8j6yddxxrge0cjp9asgzd534
- https://suno-ai.notion.site/8b8e8749ed514b0cbf3f699013548683?v=bc67cff786b04b50b3ceb756fd05f68c

Loading
Loading