From cdbddef86dda86573ba9a512fd840a690ef3f05e Mon Sep 17 00:00:00 2001 From: Ravi Theja Date: Thu, 5 Dec 2024 22:08:34 +0530 Subject: [PATCH] Add demo videos notebooks (#529) --- examples/demo_starter_multimodal.ipynb | 415 ++++++++++++++++++ .../demo_starter_parse_selected_pages.ipynb | 181 ++++++++ 2 files changed, 596 insertions(+) create mode 100644 examples/demo_starter_multimodal.ipynb create mode 100644 examples/demo_starter_parse_selected_pages.ipynb diff --git a/examples/demo_starter_multimodal.ipynb b/examples/demo_starter_multimodal.ipynb new file mode 100644 index 0000000..3ae3c06 --- /dev/null +++ b/examples/demo_starter_multimodal.ipynb @@ -0,0 +1,415 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "97c79c38-38a3-40f3-ba2e-250649347d63", + "metadata": { + "id": "97c79c38-38a3-40f3-ba2e-250649347d63" + }, + "source": [ + "\"Open" + ] + }, + { + "cell_type": "markdown", + "id": "4e081457", + "metadata": {}, + "source": [ + "# Multimodal Parsing using LlamaParse\n", + "\n", + "This cookbook shows you how to use LlamaParse to parse any document with the multimodal capabilities of Multi-Modal LLMs from Anthropic/ OpenAI.\n", + "\n", + "LlamaParse allows you to plug in external, multimodal model vendors for parsing - we handle the error correction, validation, and scalability/reliability for you.\n" + ] + }, + { + "cell_type": "markdown", + "id": "qOdqBxCS51Ow", + "metadata": { + "id": "qOdqBxCS51Ow" + }, + "source": [ + "### Installation" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "H_Vqcylb50vm", + "metadata": { + "id": "H_Vqcylb50vm" + }, + "outputs": [], + "source": [ + "!pip install llama-parse" + ] + }, + { + "cell_type": "markdown", + "id": "15e60ecf-519c-41fc-911b-765adaf8bad4", + "metadata": { + "id": "15e60ecf-519c-41fc-911b-765adaf8bad4" + }, + "source": [ + "### Setup\n", + "\n", + "Here we setup `LLAMA_CLOUD_API_KEY` for using `LlamaParse`." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "91a9e532-1454-40e0-bbf0-fd442c350121", + "metadata": { + "id": "91a9e532-1454-40e0-bbf0-fd442c350121" + }, + "outputs": [], + "source": [ + "import nest_asyncio\n", + "\n", + "nest_asyncio.apply()\n", + "\n", + "import os\n", + "\n", + "# API access to llama-cloud\n", + "os.environ[\"LLAMA_CLOUD_API_KEY\"] = \"\"" + ] + }, + { + "cell_type": "markdown", + "id": "LGwBNPNotZRQ", + "metadata": { + "id": "LGwBNPNotZRQ" + }, + "source": [ + "## Download Data\n", + "\n", + "For this demonstration, we will use OpenAI's recent paper `Evaluation of OpenAI o1: Opportunities and Challenges of AGI`." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "IjtKDQRLrylI", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "IjtKDQRLrylI", + "outputId": "31df0fac-51f2-4697-f78b-0b7c0b8cd145" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "--2024-12-05 18:54:24-- https://arxiv.org/pdf/2409.18486\n", + "Resolving arxiv.org (arxiv.org)... 151.101.67.42, 151.101.131.42, 151.101.3.42, ...\n", + "Connecting to arxiv.org (arxiv.org)|151.101.67.42|:443... connected.\n", + "HTTP request sent, awaiting response... 200 OK\n", + "Length: 13986265 (13M) [application/pdf]\n", + "Saving to: ‘o1.pdf’\n", + "\n", + "o1.pdf 100%[===================>] 13.34M 11.8MB/s in 1.1s \n", + "\n", + "2024-12-05 18:54:26 (11.8 MB/s) - ‘o1.pdf’ saved [13986265/13986265]\n", + "\n" + ] + } + ], + "source": [ + "!wget \"https://arxiv.org/pdf/2409.18486\" -O \"o1.pdf\"" + ] + }, + { + "cell_type": "markdown", + "id": "4e29a9d7-5bd9-4fb8-8ec1-4c128a748662", + "metadata": { + "id": "4e29a9d7-5bd9-4fb8-8ec1-4c128a748662" + }, + "source": [ + "## Initialize LlamaParse\n", + "\n", + "Initialize LlamaParse in multimodal mode, and specify the vendor.\n", + "\n", + "**NOTE**: optionally you can specify the Anthropic/ OpenAI API key. If you choose to do so LlamaParse will only charge you 1 credit (0.3c) per page. \n", + "\n", + "\n", + "Using your own API key may incur additional costs from your model provider and could result in failed pages or documents if you do not have sufficient usage limits." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "dc921729-3446-42ca-8e1b-a6fd26195ed9", + "metadata": { + "id": "dc921729-3446-42ca-8e1b-a6fd26195ed9" + }, + "outputs": [], + "source": [ + "from llama_index.core.schema import TextNode\n", + "from typing import List\n", + "\n", + "def get_text_nodes(json_list: List[dict]):\n", + " text_nodes = []\n", + " for idx, page in enumerate(json_list):\n", + " text_node = TextNode(text=page[\"md\"], metadata={\"page\": page[\"page\"]})\n", + " text_nodes.append(text_node)\n", + " return text_nodes" + ] + }, + { + "cell_type": "markdown", + "id": "1b5d6da6", + "metadata": {}, + "source": [ + "### With anthropic-sonnet-3.5" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "f2e9d9cf-8189-4fcb-b34f-cde6cc0b59c8", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "f2e9d9cf-8189-4fcb-b34f-cde6cc0b59c8", + "outputId": "a337cbdd-60db-4a73-b66b-2bd6159e81f2" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Started parsing the file under job_id dd9d5e0f-160e-486a-89a2-6005e5a1c2ac\n" + ] + } + ], + "source": [ + "from llama_parse import LlamaParse\n", + "\n", + "parser = LlamaParse(\n", + " result_type=\"markdown\",\n", + " use_vendor_multimodal_model=True,\n", + " vendor_multimodal_model_name=\"anthropic-sonnet-3.5\",\n", + " target_pages=\"24\"\n", + " # invalidate_cache=True\n", + ")\n", + "json_objs = parser.get_json_result(\"o1.pdf\")\n", + "json_list = json_objs[0][\"pages\"]\n", + "docs = get_text_nodes(json_list)" + ] + }, + { + "cell_type": "markdown", + "id": "4f3c51b0-7878-48d7-9bc3-02b516500128", + "metadata": { + "id": "4f3c51b0-7878-48d7-9bc3-02b516500128" + }, + "source": [ + "### With GPT-4o\n", + "\n", + "For comparison, we will also parse the document using GPT-4o." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "6fc3f258-50ae-4988-b904-c105463a498f", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "6fc3f258-50ae-4988-b904-c105463a498f", + "outputId": "89c525c4-2b93-4909-9657-55646e034637" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Started parsing the file under job_id 6a4dea44-4f90-406b-b290-9e98620b1232\n" + ] + } + ], + "source": [ + "from llama_parse import LlamaParse\n", + "\n", + "parser_gpt4o = LlamaParse(\n", + " result_type=\"markdown\",\n", + " use_vendor_multimodal_model=True,\n", + " vendor_multimodal_model=\"openai-gpt4o\",\n", + " target_pages=\"24\",\n", + " # invalidate_cache=True\n", + ")\n", + "json_objs_gpt4o = parser_gpt4o.get_json_result(\"o1.pdf\")\n", + "json_list_gpt4o = json_objs_gpt4o[0][\"pages\"]\n", + "docs_gpt4o = get_text_nodes(json_list_gpt4o)" + ] + }, + { + "cell_type": "markdown", + "id": "44c20f7a-2901-4dd0-b635-a4b33c5664c1", + "metadata": { + "id": "44c20f7a-2901-4dd0-b635-a4b33c5664c1" + }, + "source": [ + "### View Results\n", + "\n", + "Let's visualize the results along with the original document page." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "778698aa-da7e-4081-b3b5-0372f228536f", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "778698aa-da7e-4081-b3b5-0372f228536f", + "outputId": "bb89e323-7041-4fc3-d835-95e373189d02" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "page: 25\n", + "\n", + "| Participant_ID | clinical Description Reference |\n", + "|-----------------|----------------------------------|\n", + "| Attribute | Value | Basic Personal Information: Subject 098_S_0896 is a 72.0-year-old Female who has completed 15 years of education. The ethnicity is Not Hisp/Latino and race is White. Marital status is Married. Initially diagnosed as AD, as of the date 2007-10-24, the final diagnosis was Dementia. |\n", + "| Age | 72.0 |\n", + "| Sex | Female |\n", + "| Education | 15 |\n", + "| Race | White | Biomarker Measurements: The subject's genetic profile includes an ApoE4 status of 0.0... |\n", + "| DX_bl | AD |\n", + "| DX | Dementia |\n", + "| ... | ... | Cognitive and Neurofunctional Assessments: The Mini-Mental State Examination score stands at 29.0. The Clinical Dementia Rating, sum of boxes, is 1.0. ADAS 11 and 13 scores are 4.67 and 4.67 respectively, with a score of 1.0 in delayed word recall... |\n", + "| APOE4 | 1.0 |\n", + "| TAU | 212.5 |\n", + "| ... | ... |\n", + "| MMSE | 29.0 | Volumetric Data: Under MRI conditions at a field strength of 1.5 Tesla MRI Tesla, using Cross Sectional FreeSurfer (FreeSurfer Version 4.3), the imaging data recorded includes ventricles volume at 54422.0, hippocampus volume at 6677.0, whole brain volume at 1147980.0, entorhinal cortex volume at 2782.0, fusiform gyrus volume at 19432.0, and middle temporal area volume at 24951.0. The intracranial volume measured is 1799580.0.... |\n", + "| CDRSB | 0.0 |\n", + "| ... | ... |\n", + "| FLDSTRENG | 1.5 Tesla MRI |\n", + "| Ventricles | 84599 |\n", + "| Hippocampus | 5319 |\n", + "| ... | ... |\n", + "\n", + "Figure 2: An example of a patient table and its corresponding clinical description.\n", + "\n", + "skills. Mathematics, as a highly structured and logic-driven discipline, provides an ideal testing ground for evaluating this reasoning ability. To investigate o1-preview's performance, we designed a series of tests covering various difficulty levels. We begin with high school-level math competition problems in this section, followed by college-level mathematics problems in the next section, allowing us to observe the model's logical reasoning across varying levels of complexity.\n", + "\n", + "In this section, we selected two primary areas of mathematics: algebra and counting and probability in this section. We chose these two topics because of their heavy reliance on problem-solving skills and their frequent use in assessing logical and abstract thinking [46]. The dataset used in testing is from the MATH dataset [46]. The problems in the dataset cover a wide range of subjects, including Prealgebra, Intermediate Algebra, Algebra, Geometry, Counting and Probability, Number Theory, and Precalculus. Each problem is categorized based on difficulty, ranked from level 1 to 5, according to the Art of Problem Solving (AoPS). The dataset mainly comprises problems from various high school math competitions, including the American Mathematics Competitions (AMC) 10 and 12, as well as the American Invitational Mathematics Examination (AIME), and other similar contests. Each problem comes with detailed reference solutions, allowing for a comprehensive comparison of o1-preview's solutions.\n", + "\n", + "In addition to evaluating the final answers produced by o1-preview, our analysis delves into the step-by-step reasoning process of the o1-preview's solutions. By comparing o1-preview's solutions with the dataset's solutions, we assess its ability to engage in logical reasoning, handle abstract problem-solving tasks, and apply structured approaches to reach correct answers. This deeper analysis offers insights into o1-preview's overall reasoning capabilities, using mathematics as a reliable indicator for logical and structured thought processes.\n" + ] + } + ], + "source": [ + "# using Sonnet-3.5\n", + "print(docs[0].get_content(metadata_mode=\"all\"))" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "1511a30f-3efc-4142-9668-7dc056a24d0c", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "1511a30f-3efc-4142-9668-7dc056a24d0c", + "outputId": "2e5e8e20-2b41-4183-f21f-dff503a03089" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "page: 25\n", + "\n", + "\n", + "| Participant_ID | clinical Description Reference |\n", + "|----------------|--------------------------------|\n", + "| **Attribute** | **Value** |\n", + "| Age | 72.0 |\n", + "| Sex | Female |\n", + "| Education | 15 |\n", + "| Race | White |\n", + "| DX_bl | AD |\n", + "| DX | Dementia |\n", + "| ... | ... |\n", + "| APOE4 | 1.0 |\n", + "| TAU | 212.5 |\n", + "| ... | ... |\n", + "| MMSE | 29.0 |\n", + "| CDRSB | 0.0 |\n", + "| ... | ... |\n", + "| FLDSTRENG | 1.5 Tesla MRI |\n", + "| Ventricles | 84599 |\n", + "| Hippocampus | 5319 |\n", + "| ... | ... |\n", + "\n", + "**Basic Personal Information:** Subject 098_S_0896 is a 72.0-year-old Female who has completed 15 years of education. The ethnicity is Not Hisp/Latino and race is White. Marital status is Married. Initially diagnosed as AD, as of the date 2007-10-24, the final diagnosis was Dementia.\n", + "\n", + "**Biomarker Measurements:** The subject's genetic profile includes an ApoE4 status of 0.0...\n", + "\n", + "**Cognitive and Neurofunctional Assessments:** The Mini-Mental State Examination score stands at 29.0. The Clinical Dementia Rating, sum of boxes, is 1.0. ADAS 11 and 13 scores are 4.67 and 4.67 respectively, with a score of 1.0 in delayed word recall...\n", + "\n", + "**Volumetric Data:** Under MRI conditions at a field strength of 1.5 Tesla MRI Tesla, using Cross-Sectional FreeSurfer (FreeSurfer Version 4.3), the imaging data recorded includes ventricles volume at 84422.0, hippocampus volume at 6677.0, whole brain volume at 1147980.0, entorhinal cortex volume at 27820.0, fusiform gyrus volume at 19432.0, and middle temporal area volume at 24951.0. The intracranial volume measured is 1799580.0...\n", + "\n", + "Figure 2: An example of a patient table and its corresponding clinical description.\n", + "\n", + "----\n", + "\n", + "Skills. Mathematics, as a highly structured and logic-driven discipline, provides an ideal testing ground for evaluating this reasoning ability. To investigate o1-preview’s performance, we designed a series of tests covering various difficulty levels. We begin with high school-level math competition problems in this section, followed by college-level mathematics problems in the next section, allowing us to observe the model’s logical reasoning across varying levels of complexity.\n", + "\n", + "In this section, we selected two primary areas of mathematics: algebra and counting and probability in this section. We chose these two topics because of their heavy reliance on problem-solving skills and their frequent use in assessing logical and abstract thinking [46]. The dataset used in testing is from the MATH dataset [46]. The problems in the dataset cover a wide range of subjects, including Prealgebra, Intermediate Algebra, Algebra, Geometry, Counting and Probability, Number Theory, and Precalculus. Each problem is categorized based on difficulty, ranked from level 1 to 5, according to the Art of Problem Solving (AoPS). The dataset mainly comprises problems from various high school math competitions, including the American Mathematics Competitions (AMC) 10 and 12, as well as the American Invitational Mathematics Examination (AIME), and other similar contests. Each problem comes with detailed reference solutions, allowing for a comprehensive comparison of o1-preview’s solutions.\n", + "\n", + "In addition to evaluating the final answers produced by o1-preview, our analysis delves into the step-by-step reasoning process of the o1-preview’s solutions. By comparing o1-preview’s solutions with the dataset’s solutions, we assess its ability to engage in logical reasoning, handle abstract problem-solving tasks, and apply structured approaches to reach correct answers. This deeper analysis offers insights into o1-preview’s overall reasoning capabilities, using mathematics as a reliable indicator for logical and structured thought processes.\n" + ] + } + ], + "source": [ + "# using GPT-4o\n", + "print(docs_gpt4o[0].get_content(metadata_mode=\"all\"))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1c75bb85", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "colab": { + "provenance": [] + }, + "kernelspec": { + "display_name": "llamacloud", + "language": "python", + "name": "llamacloud" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.12.4" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/examples/demo_starter_parse_selected_pages.ipynb b/examples/demo_starter_parse_selected_pages.ipynb new file mode 100644 index 0000000..7e9ffbd --- /dev/null +++ b/examples/demo_starter_parse_selected_pages.ipynb @@ -0,0 +1,181 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\"Open" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Parse Selected Pages \n", + "\n", + "In this notebook we will demonstrate how to parse selected pages in a document using LlamaParse." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Installation\n", + "\n", + "Here we install `llama-parse` used for parsing the document" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "!pip install llama-parse" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Set API Key" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "# llama-parse is async-first, running the async code in a notebook requires the use of nest_asyncio\n", + "import nest_asyncio\n", + "\n", + "nest_asyncio.apply()\n", + "\n", + "import os\n", + "\n", + "# API access to llama-cloud\n", + "os.environ[\"LLAMA_CLOUD_API_KEY\"] = \"\"" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Download Data\n", + "\n", + "Here we download Uber 2021 10K SEC filings data for the demonstration." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "--2024-12-05 11:40:59-- https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10k/uber_2021.pdf\n", + "Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 2606:50c0:8000::154, 2606:50c0:8002::154, 2606:50c0:8003::154, ...\n", + "Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|2606:50c0:8000::154|:443... connected.\n", + "HTTP request sent, awaiting response... 200 OK\n", + "Length: 1880483 (1.8M) [application/octet-stream]\n", + "Saving to: ‘./uber_2021.pdf’\n", + "\n", + "./uber_2021.pdf 100%[===================>] 1.79M --.-KB/s in 0.1s \n", + "\n", + "2024-12-05 11:40:59 (14.2 MB/s) - ‘./uber_2021.pdf’ saved [1880483/1880483]\n", + "\n" + ] + } + ], + "source": [ + "!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10k/uber_2021.pdf' -O './uber_2021.pdf'" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Parse the PDF file in selected pages\n", + "\n", + "Here we will parse the PDF file in selected pages and get the text in `markdown` format." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Started parsing the file under job_id ad1087c1-b085-4dc7-9aa8-d13cdd440f2b\n" + ] + } + ], + "source": [ + "from llama_parse import LlamaParse\n", + "\n", + "parser = LlamaParse(\n", + " target_pages=\"0,1,2\",\n", + " result_type=\"markdown\"\n", + ")\n", + "\n", + "documents = parser.load_data('./uber_2021.pdf')" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[Document(id_='d0b34f4a-27ef-48e2-a92a-386e5e265f4c', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, metadata_template='{key}: {value}', metadata_separator='\\n', text='# UNITED STATES SECURITIES AND EXCHANGE COMMISSION\\n\\n# Washington, D.C. 20549\\n\\n# FORM 10-K\\n\\n(Mark One)\\n\\n☒ ANNUAL REPORT PURSUANT TO SECTION 13 OR 15(d) OF THE SECURITIES EXCHANGE ACT OF 1934\\n\\nFor the fiscal year ended December 31, 2021\\n\\nOR\\n\\n☐ TRANSITION REPORT PURSUANT TO SECTION 13 OR 15(d) OF THE SECURITIES EXCHANGE ACT OF 1934\\n\\nFor the transition period from _____ to _____\\n\\nCommission File Number: 001-38902\\n\\n# UBER TECHNOLOGIES, INC.\\n\\n(Exact name of registrant as specified in its charter)\\n\\nDelaware\\n\\n45-2647441\\n\\n(State or other jurisdiction of incorporation or organization) (I.R.S. Employer Identification No.)\\n\\n1515 3rd Street\\n\\nSan Francisco, California 94158\\n\\n(Address of principal executive offices, including zip code)\\n\\n(415) 612-8582\\n\\n(Registrant’s telephone number, including area code)\\n\\n# Securities registered pursuant to Section 12(b) of the Act:\\n\\n|Title of each class|Trading Symbol(s)|Name of each exchange on which registered|\\n|---|---|---|\\n|Common Stock, par value $0.00001 per share|UBER|New York Stock Exchange|\\n\\nSecurities registered pursuant to Section 12(g) of the Act: None\\n\\nIndicate by check mark whether the registrant is a well-known seasoned issuer, as defined in Rule 405 of the Securities Act. Yes ☒ No ☐\\n\\nIndicate by check mark whether the registrant is not required to file reports pursuant to Section 13 or Section 15(d) of the Act. Yes ☐ No ☒\\n\\nIndicate by check mark whether the registrant (1) has filed all reports required to be filed by Section 13 or 15(d) of the Securities Exchange Act of 1934 during the preceding 12 months (or for such shorter period that the registrant was required to file such reports), and (2) has been subject to such filing requirements for the past 90 days. Yes ☒ No ☐\\n\\nIndicate by check mark whether the registrant has submitted electronically every Interactive Data File required to be submitted pursuant to Rule 405 of Regulation S-T (§232.405 of this chapter) during the preceding 12 months (or for such shorter period that the registrant was required to submit such files). Yes ☒ No ☐\\n\\nIndicate by check mark whether the registrant is a large accelerated filer, an accelerated filer, a non-accelerated filer, a smaller reporting company, or an emerging growth company. See the definitions of “large accelerated filer,” “accelerated filer,” “smaller reporting company,” and “emerging growth company” in Rule 12b-2 of the Exchange Act.', mimetype='text/plain', start_char_idx=None, end_char_idx=None, metadata_seperator='\\n', text_template='{metadata_str}\\n\\n{content}'),\n", + " Document(id_='253b1141-a260-466e-b164-b39df67ef799', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, metadata_template='{key}: {value}', metadata_separator='\\n', text=\"# Large accelerated filer\\n\\n☒\\n\\n# Accelerated filer\\n\\n☐\\n\\n# Non-accelerated filer\\n\\n☐\\n\\n# Smaller reporting company\\n\\n☐\\n\\n# Emerging growth company\\n\\n☐\\n\\nIf an emerging growth company, indicate by check mark if the registrant has elected not to use the extended transition period for complying with any new or revised financial accounting standards provided pursuant to Section 13(a) of the Exchange Act.\\n\\n☐\\n\\nIndicate by check mark whether the registrant has filed a report on and attestation to its management’s assessment of the effectiveness of its internal control over financial reporting under Section 404(b) of the Sarbanes-Oxley Act (15 U.S.C. 7262(b)) by the registered public accounting firm that prepared or issued\\n\\n☒\\n\\nIndicate by check mark whether the registrant is a shell company (as defined in Rule 12b-2 of the Exchange Act). Yes\\n\\n☐\\n\\nNo\\n\\n☒\\n\\nThe aggregate market value of the voting and non-voting common equity held by non-affiliates of the registrant as of June 30, 2021, the last business day of the registrant's most recently completed second fiscal quarter, was approximately $90.5 billion based upon the closing price reported for such date on the New York Stock Exchange.\\n\\nThe number of shares of the registrant's common stock outstanding as of February 22, 2022 was 1,954,464,088.\\n\\n# DOCUMENTS INCORPORATED BY REFERENCE\\n\\nPortions of the registrant’s Definitive Proxy Statement relating to the Annual Meeting of Stockholders are incorporated by reference into Part III of this Annual Report on Form 10-K where indicated. Such Definitive Proxy Statement will be filed with the Securities and Exchange Commission within 120 days after the end of the registrant’s fiscal year ended December 31, 2021.\", mimetype='text/plain', start_char_idx=None, end_char_idx=None, metadata_seperator='\\n', text_template='{metadata_str}\\n\\n{content}'),\n", + " Document(id_='ad988239-3ab5-498d-85ba-a29241db24d4', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, metadata_template='{key}: {value}', metadata_separator='\\n', text='# UBER TECHNOLOGIES, INC.\\n\\n# TABLE OF CONTENTS\\n\\n|Special Note Regarding Forward-Looking Statements|2|\\n|---|---|\\n|PART I|PART I|\\n|Item 1. Business|4|\\n|Item 1A. Risk Factors|11|\\n|Item 1B. Unresolved Staff Comments|46|\\n|Item 2. Properties|46|\\n|Item 3. Legal Proceedings|46|\\n|Item 4. Mine Safety Disclosures|47|\\n|PART II|PART II|\\n|Item 5. Market for Registrant’s Common Equity, Related Stockholder Matters and Issuer Purchases of Equity Securities|47|\\n|Item 6. [Reserved]|48|\\n|Item 7. Management’s Discussion and Analysis of Financial Condition and Results of Operations|48|\\n|Item 7A. Quantitative and Qualitative Disclosures About Market Risk|69|\\n|Item 8. Financial Statements and Supplementary Data|70|\\n|Item 9. Changes in and Disagreements with Accountants on Accounting and Financial Disclosure|146|\\n|Item 9A. Controls and Procedures|147|\\n|Item 9B. Other Information|147|\\n|Item 9C. Disclosure Regarding Foreign Jurisdictions that Prevent Inspections|147|\\n|PART III|PART III|\\n|Item 10. Directors, Executive Officers and Corporate Governance|147|\\n|Item 11. Executive Compensation|147|\\n|Item 12. Security Ownership of Certain Beneficial Owners and Management and Related Stockholder Matters|148|\\n|Item 13. Certain Relationships and Related Transactions, and Director Independence|148|\\n|Item 14. Principal Accounting Fees and Services|148|\\n|PART IV|PART IV|\\n|Item 15. Exhibits, Financial Statement Schedules|148|\\n|Item 16. Form 10-K Summary|148|\\n|Exhibit Index|149|\\n|Signatures|152|', mimetype='text/plain', start_char_idx=None, end_char_idx=None, metadata_seperator='\\n', text_template='{metadata_str}\\n\\n{content}')]" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "documents" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "llamacloud", + "language": "python", + "name": "llamacloud" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.12.4" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +}