-
Notifications
You must be signed in to change notification settings - Fork 359
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
gpt4o notebook over tesla impact docs (#180)
Co-authored-by: Logan Markewich <[email protected]>
- Loading branch information
1 parent
4572f00
commit 0d2ad9f
Showing
2 changed files
with
327 additions
and
0 deletions.
There are no files selected for viewing
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,327 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"id": "f20600ce-d57a-446e-b033-3aadeec39c1b", | ||
"metadata": {}, | ||
"source": [ | ||
"# LlamaParse with GPT-4o\n", | ||
"\n", | ||
"\n", | ||
"<a href=\"https://colab.research.google.com/github/run-llama/llama_parse/blob/main/examples/demo_gpt4o/gpt4o_tesla_impact_report.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>\n", | ||
"\n", | ||
"GPT-4o is a [fully multimodal model by OpenAI](https://openai.com/index/hello-gpt-4o/) released in May 2024. It matches GPT-4 Turbo performance in text and code, and has significantly improved vision and audio capabilities.\n", | ||
"\n", | ||
"The expanded vision/audio capabilities mean that it can be used for document parsing, by treating each page as an image and performing document extraction. We support using GPT-4o natively in LlamaParse for document parsing. The notebook below walks you through an example of using GPT-4o over the Tesla impact report.\n", | ||
"\n", | ||
"**NOTE**: The pricing for LlamaParse + gpt4o is an order more expensive than using LlamaParse by default. Currently, every page parsed with gpt4o counts for 200 pages in the LlamaParse usage tracker.\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "86b173ac-9fce-4813-bdf1-6dd7d93a491d", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"import nest_asyncio\n", | ||
"\n", | ||
"nest_asyncio.apply()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "ecc5eba5-96ce-4db7-bba1-f9ece33e681c", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"import os" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "b805592b-d1a5-4cd2-b916-348f66ca7941", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"os.environ[\"LLAMA_CLOUD_API_KEY\"] = \"<LLAMA_CLOUD_API_KEY>" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "6e73e3c4-9e09-4cba-805f-326c82be812d", | ||
"metadata": {}, | ||
"source": [ | ||
"### Use LlamaParse with `gpt4o_mode=True`\n", | ||
"\n", | ||
"By turning on gpt4o, we use GPT-4o multimodal capabilities to do document parsing per page instead of the LlamaParse default pipeline.\n", | ||
"\n", | ||
"We load a snippet of the [2019 Tesla impact report](https://www.tesla.com/ns_videos/2019-tesla-impact-report.pdf). To help you save tokens, we only load 4 pages of this report (which will count as 800 pages of LlamaParse pages). \n", | ||
"\n", | ||
"You can optionally choose to provide a `gpt4o_api_key`. If you do this, then we will use your API key to make GPT-4o calls, and your LlamaParse usage will be counted as if `gpt4o_mode` was not turned on (each page will be counted as a page instead of 200 pages). " | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "f46991c1-031b-461f-b9a6-9237a821f4c8", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from llama_parse import LlamaParse\n", | ||
"\n", | ||
"parser_gpt4o = LlamaParse(\n", | ||
" result_type=\"markdown\",\n", | ||
" # base_url=\"https://api.staging.llamaindex.ai\",\n", | ||
" api_key=api_key,\n", | ||
" gpt4o_mode=True,\n", | ||
" # gpt4o_api_key=\"<gpt4o_api_key>\"\n", | ||
")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "1136ba82-074b-489d-9b0a-d609ccbf02b6", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"documents_gpt4o = parser_gpt4o.load_data(\"./2019-tesla-impact-report-short.pdf\")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "9e65c54f-3e4c-4c78-b1e8-a55ebeba1f24", | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"# Impact Report\n", | ||
"## 2019\n", | ||
"\n", | ||
"![Tesla Logo](https://www.tesla.com/sites/default/files/blog_images/tesla_announcement_social.jpg)\n", | ||
"\n", | ||
"![Earth and Car in Space](https://www.tesla.com/sites/default/files/blog_images/tesla_announcement_social.jpg)\n", | ||
"---\n", | ||
"# Introduction 03\n", | ||
"\n", | ||
"# Mission and Tesla Ecosystem 04\n", | ||
"\n", | ||
"# Environmental Impact 06\n", | ||
"- Lifecycle Analysis of Tesla Vehicles versus Average ICE\n", | ||
"- Battery Recycling\n", | ||
"- NOx, Particulates and Other Pollutants\n", | ||
"- Water Used per Vehicle Manufactured\n", | ||
"- Emissions Credits\n", | ||
"- Net Energy Impact of Our Products\n", | ||
"\n", | ||
"# Product Impact 20\n", | ||
"- Price Equivalency\n", | ||
"- Primary Driver\n", | ||
"- Long Distance Travel\n", | ||
"- Active Safety\n", | ||
"- Passive Safety\n", | ||
"- Tesla Safety Awards\n", | ||
"- Fire Safety\n", | ||
"- Cyber Security\n", | ||
"- Disaster Relief\n", | ||
"- Resilience of the Grid\n", | ||
"- Megapack\n", | ||
"- Solar Roof\n", | ||
"\n", | ||
"# Supply Chain 33\n", | ||
"- Responsible Material Sourcing\n", | ||
"- Cobalt Sourcing\n", | ||
"\n", | ||
"# People and Culture 37\n", | ||
"- Our Environmental, Health, and Safety Strategy\n", | ||
"- Safety Improvements\n", | ||
"- Case Study: Ergonomics and Model Y Design\n", | ||
"- Rewarding the Individual\n", | ||
"- Culture of Diversity and Inclusion\n", | ||
"- Workforce Development\n", | ||
"- Community Engagement\n", | ||
"- Employee Mobility and Transportation Programs\n", | ||
"- Corporate Governance\n", | ||
"\n", | ||
"# Appendix 52\n", | ||
"---\n", | ||
"# Introduction\n", | ||
"\n", | ||
"The very purpose of Tesla’s existence is to accelerate the world’s transition to sustainable energy. In furtherance of this mission, we are excited to publish our second annual Impact Report. Transparency and disclosure are important for our customers, employees, and shareholders, which is why we have expanded the Impact Report’s content this year.\n", | ||
"\n", | ||
"While many environmental reports focus on emissions generated by the manufacturing phase of products and future goals for energy consumption, we highlight the totality of the environmental impact of our products today. After all, the vast majority of emissions generated by vehicles today occur in the product-use phase—that is, when consumers are driving their vehicles. We believe that providing information on both sides of the manufacturing and consumer-use equation provides a clearer picture of the environmental impact of Tesla products, and we have done so this year largely through a lifecycle analysis detailed in this report.\n", | ||
"\n", | ||
"Tesla aims to continue to increase the proportion of renewable energy usage at our factories in an effort to minimize the carbon footprint for every mile traveled by our products and their components in our supply chain. All of the factories that we built from the ground-up, such as Gigafactory Nevada and Gigafactory Shanghai, and our forthcoming Gigafactories in Berlin and North America, are designed from the beginning to use energy from renewable sources.\n", | ||
"\n", | ||
"Making a significant and lasting impact on environmental sustainability is difficult to achieve without securing financial sustainability for the long term. We generated positive Free Cash Flow (operating cash flow less capex) of more than $1 billion for the first time in 2019. We believe the notion that a sustainable future is not economically feasible is no longer valid.\n", | ||
"---\n", | ||
"# Mission & Tesla Ecosystem\n", | ||
"\n", | ||
"Climate change is reaching alarming levels in large part due to emissions from burning fossil fuels for transportation and electricity generation. In 2016, carbon dioxide (CO2) concentration levels in the atmosphere exceeded the 400 parts per million threshold on a sustained basis - a level that climate scientists believe will have a catastrophic impact on the environment. Worse, annual global CO2 emissions continue to increase and have approximately doubled over the past 50 years to over 43 gigatons in 2019. The world’s current path is unwise and unsustainable.\n", | ||
"\n", | ||
"The world cannot reduce CO2 emissions without addressing both energy generation and consumption. And the world cannot address its energy habits without first directly reducing emissions in the transportation and energy sectors. We are focused on creating a complete energy and transportation ecosystem from solar generation and energy storage to all-electric vehicles that produce zero tailpipe emissions.\n", | ||
"\n", | ||
"Since the onset of shelter-in-place orders and travel restrictions due to COVID-19, we have seen dramatic increases in air quality across the planet, as well as projections for CO2 emissions to drop in excess of 4% in 2020 compared to pre-COVID-19 levels, according to researchers. Because these improvements in air quality and reductions in CO2 are a result of a global economic disruption and not due to systemic changes in how we produce and consume energy, they are not expected to be sustained absent intervention. However, these changes have shown us the positive impacts of reduced pollution in a very short period of time. At Tesla, we believe that we all have an unprecedented opportunity to learn from this disruption and accelerate the deployment of clean energy solutions as part of a recovery for all economies throughout the world, and we will actively continue to advocate for the realization of these long-term changes.\n", | ||
"\n", | ||
"## Global Greenhouse Gas (GHG) Emissions by Economic Sector\n", | ||
"\n", | ||
"| Sector | Percentage |\n", | ||
"|---------------------------------------------|------------|\n", | ||
"| Electricity & Heat Production* | 31% |\n", | ||
"| Agriculture, Forestry & Other Land Use | 20% |\n", | ||
"| Industry | 18% |\n", | ||
"| Transportation* | 16% |\n", | ||
"| Other Energy | 9% |\n", | ||
"| Buildings | 6% |\n", | ||
"\n", | ||
"*Tesla-related sectors. Source: World Resources Institute\n", | ||
"\n", | ||
"According to the Global Carbon project, when fully tallied, total carbon emissions from 2019 are expected to hit another record high of over 43 gigatons for the year. Energy use through electricity and heat production (31%) and transportation (16%) are significant drivers of these GHG emissions.\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"print(documents_gpt4o[0].get_content())" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "d62cbb62-37ea-4370-9411-d979aa3a627e", | ||
"metadata": {}, | ||
"source": [ | ||
"## Build RAG pipeline over the Parsed Report\n", | ||
"\n", | ||
"We now try building a RAG pipeline over this parsed report. It's not a lot of text, but we split it into chunks and load it into a simple in-memory vector store.\n", | ||
"\n", | ||
"We ask a question over the parsed markdown table and get back the right answer! We also ask a question over the text." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "f624e243-1878-4d87-841d-69e57360a7d9", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from llama_index.core import VectorStoreIndex\n", | ||
"from llama_index.core.node_parser import SentenceSplitter\n", | ||
"\n", | ||
"\n", | ||
"splitter = SentenceSplitter()\n", | ||
"nodes = splitter.get_nodes_from_documents(documents_gpt4o)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "d8b7c3ad-2147-448c-bcbe-3e6fcd8d5361", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"vector_index = VectorStoreIndex(nodes)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "8013351a-180d-4947-9f81-513042175c19", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"query_engine = vector_index.as_query_engine()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "795dc5c4-e122-4ff3-94d2-747fa51d5add", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"response = query_engine.query(\n", | ||
" \"What are the greenhouse emissions for agriculture and transportation?\"\n", | ||
")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "39d2e6bd-3316-49b5-9a5d-5b4b95343e5a", | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"Agriculture accounts for 20% of global greenhouse gas emissions, while transportation contributes 16% to these emissions.\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"print(str(response))" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "9beb5cd4-4041-48c7-b22b-de5540f92a6d", | ||
"metadata": {}, | ||
"source": [ | ||
"Let's also try asking a question over another piece of the text." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "543c8b63-5cd1-47a1-a8a1-81abbfd3e52b", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"response = query_engine.query(\"What does Tesla aim to do?\")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "e739eabf-732b-4f59-9628-972c4bf6c857", | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"Tesla aims to accelerate the world's transition to sustainable energy by creating a complete energy and transportation ecosystem that includes solar generation, energy storage, and all-electric vehicles with zero tailpipe emissions.\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"print(str(response))" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "llama_parse", | ||
"language": "python", | ||
"name": "llama_parse" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 5 | ||
} |