Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Browser use with Llama 3.2 Vision Quickstart #799

Merged
merged 7 commits into from
Dec 6, 2024

Conversation

miguelg719
Copy link
Contributor

@miguelg719 miguelg719 commented Nov 21, 2024

Browser Use Llama-Recipe

This is an example notebook on how to create a Llama 3.2 vision-powered agent that can interact with web browsers on your behalf. It includes a detailed explanation of every section and example use cases.

Features

  • Visual understanding of web pages through screenshots
  • Autonomous navigation and interaction
  • Natural language instructions for web tasks
  • Persistent browser session management

For example, you can ask the agent to:

  • Search for a product on Amazon
  • Find the cheapest flight to Tokyo
  • Buy tickets for the next Warriors game

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

Thanks for contributing 🎉!

Copy link
Contributor

@HamidShojanazeri HamidShojanazeri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome! thanks @miguelg719 for the PR! it would be great if you would like to add a short video demoing it as well.

"outputs": [],
"source": [
"few_shot_example_1 = \"\"\"\n",
"User Input: \"How much did Nvidia stock gain today?\"\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we please change the query to a more neutral example.

"source": [
"import base64\n",
"from IPython.display import Markdown\n",
"imagePath= \"screenshot.png\"\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to add this screenshot in this folder as well? right now its been missing

@miguelg719
Copy link
Contributor Author

Demo video added! @HamidShojanazeri

@HamidShojanazeri
Copy link
Contributor

Thanks @miguelg719 great PR!

@HamidShojanazeri HamidShojanazeri merged commit 03c61ae into meta-llama:main Dec 6, 2024
4 checks passed
@aidando73
Copy link
Contributor

That's cool

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants