diff --git a/examples/data/BP_Excel.xlsx b/examples/data/BP_Excel.xlsx
new file mode 100644
index 0000000..a2c14d5
Binary files /dev/null and b/examples/data/BP_Excel.xlsx differ
diff --git a/examples/o1_excel_rag.ipynb b/examples/o1_excel_rag.ipynb
new file mode 100755
index 0000000..684e6de
--- /dev/null
+++ b/examples/o1_excel_rag.ipynb
@@ -0,0 +1,960 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ ""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Advanced RAG with LlamaParse and Recursive Retrieval on Excel document.\n",
+ "\n",
+ "This notebook provides reference to compare advanced RAG capabilities using LlamaParse with `o1-preview`, `o1-mini` and `gpt4o-mini` on Excel document.\n",
+ "\n",
+ "We will use `2Q 2024 Group databook - xls` file from [bp.com](https://www.bp.com/en/global/corporate/investors/results-reporting-and-presentations/financial-disclosure-framework/archive.html) for the demonstration.\n",
+ "\n",
+ "When interacting with our enterprise customers, we've identified two prominent types of queries. Let's check how they perform with the o1 models:\n",
+ "\n",
+ "1. Queries requesting exact values.\n",
+ "2. Queries using the greater than/less than (>/ <) operators."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### Installation"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# !pip install llama-index\n",
+ "# !pip install llama-parse"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Import"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import nest_asyncio\n",
+ "\n",
+ "from llama_index.llms.openai import OpenAI\n",
+ "from llama_index.core import VectorStoreIndex\n",
+ "from IPython.display import Markdown, display\n",
+ "\n",
+ "from llama_parse import LlamaParse\n",
+ "\n",
+ "from llama_index.core.node_parser import MarkdownElementNodeParser"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import os\n",
+ "\n",
+ "os.environ[\"OPENAI_API_KEY\"] = \"sk-...\""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### Some OpenAI and LlamaParse details"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# llama-parse is async-first, running the async code in a notebook requires the use of nest_asyncio\n",
+ "nest_asyncio.apply()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### Setup LLM"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "llm_o1 = OpenAI(model=\"o1-mini\")\n",
+ "llm_gpt4o_mini = OpenAI(model=\"gpt-4o-mini\")\n",
+ "llm_o1_preview = OpenAI(model=\"o1-preview\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Using brand new `LlamaParse` PDF reader for PDF Parsing\n",
+ "\n",
+ "\n",
+ "We will use `MarkdownElementNodeParser` for parsing the `LlamaParse` output Markdown results and building recursive retriever query engine for generation."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### LlamaParse"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Started parsing the file under job_id 9481484f-4414-4f20-aad8-892dc57649a1\n"
+ ]
+ }
+ ],
+ "source": [
+ "parser = LlamaParse(\n",
+ " api_key=\"llx-...\",\n",
+ " result_type=\"markdown\",\n",
+ ")\n",
+ "\n",
+ "documents = parser.load_data(\"./data/BP_Excel.xlsx\")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "44"
+ ]
+ },
+ "execution_count": null,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "len(documents)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "# Summary\n",
+ "\n",
+ "|Financial and Operating Information 2020 - 2024 | | | | | | | | | | | | | | | | | | | | | | | | | | |\n",
+ "|-------------------------------------------------------------------------|-----------|--------|--------|--------|--------|--------|--------|--------|--------|--------|--------|---------|--------|--------|--------|--------|--------|--------|--------|--------|--------|--------|--------|------|------|-----------|\n",
+ "|Group information | | | | | | | | | | | | | | | | | | | | | | | | | | |\n",
+ "| | | | | | | | | | | | | | | | | | | | | | |Contents| | | | |\n",
+ "| | | | | | | | | | | | | | | | | | | | | | | | | | | |\n",
+ "|Summary | | | | | | | | | | | | | | | | | | | | | | | | | | |\n",
+ "| | | | | | | | | | | | | | | | | | | | | | | | | | | |\n",
+ "| | | | | | | | | | | | | | | | | | | | | | | | | | |$ million |\n",
+ "| |Footnotes |Q1 |Q2 |Q3 |Q4 |2020 |Q1 |Q2 |Q3 |Q4 |2021 |Q1 |Q2 |Q3 |Q4 |2022 |Q1 |Q2 |Q3 |Q4 |2023 |Q1 |Q2 |Q3 |Q4 |2024 |\n",
+ "|Profit (loss) attributable to bp shareholders | |(4,365) |(16,848)|(450) |1,358 |(20,305)|4,667 |3,116 |(2,544) |2,326 |7,565 |(20,384) |9,257 |(2,163) |10,803 |(2,487) |8,218 |1,792 |4,858 |371 |15,239 |2,263 |(129) | | |2,134 |\n",
+ "|Inventory holding (gains) losses, net of tax | |3,737 |(809) |(194) |(533) |2,201 |(1,342) |(736) |(390) |(358) |(2,826) |(2,664) |(1,607) |2,186 |1,066 |(1,019) |452 |549 |(1,212) |1,155 |944 |(657) |113 | | |(544) |\n",
+ "|Replacement cost profit (loss) attributable to bp shareholders | |(628) |(17,657)|(644) |825 |(18,104)|3,325 |2,380 |(2,934) |1,968 |4,739 |(23,048) |7,650 |23 |11,869 |(3,506) |8,670 |2,341 |3,646 |1,526 |16,183 |1,606 |(16) | | |1,590 |\n",
+ "|Net (favourable) adverse impact of adjusting items, net of tax | |1,419 |10,975 |730 |(710) |12,414 |(695) |418 |6,256 |2,097 |8,076 |29,293 |801 |8,127 |(7,062) |31,159 |(3,707) |248 |(353) |1,465 |(2,347) |1,117 |2,772 | | |3,889 |\n",
+ "|Underlying replacement cost profit (loss) attributable to bp shareholders| |791 |(6,682) |86 |115 |(5,690) |2,630 |2,798 |3,322 |4,065 |12,815 |6,245 |8,451 |8,150 |4,807 |27,653 |4,963 |2,589 |3,293 |2,991 |13,836 |2,723 |2,756 | | |5,479 |\n",
+ "| | | | | | | | | | | | | | | | | | | | | | | | | | | |\n",
+ "|Underlying effective tax rate (ETR) (%) | |55% |9% |64% |40% |-14% | | | | |-14% | | | | |-14% | | | | |-14% | | | | |-14% |\n",
+ "|Operating cash flow | |952 |3,737 |5,204 |2,269 |12,162 |6,109 |5,411 |5,976 |6,116 |23,612 |8,210 |10,863 |8,288 |13,571 |40,932 |7,622 |6,293 |8,747 |9,377 |32,039 |5,009 |8,100 | | |13,109 |\n",
+ "|Capital expenditure | |3,861 |3,067 |3,636 |3,491 |14,055 |3,798 |2,514 |2,903 |3,633 |12,848 |2,929 |2,838 |3,194 |7,369 |16,330 |3,625 |4,314 |3,603 |4,711 |16,253 |4,278 |3,691 | | |7,969 |\n",
+ "|Divestment and other proceeds | |681 |1,135 |597 |4,173 |6,586 |4,839 |215 |313 |2,265 |7,632 |1,181 |722 |606 |614 |3,123 |800 |88 |655 |300 |1,843 |413 |760 | | |1,173 |\n",
+ "|Surplus cash flow |c | | | | | |1,704 |655 |899 |2,955 |6,213 |4,037 |6,546 |3,496 |4,985 |19,065 |2,283 |(269) |3,107 |2,755 |7,876 | | | | | |\n",
+ "|Net issue (repurchase) of shares | |(776) |– |– |– |(776) |– |(500) |(926) |(1,725) |(3,151) |(1,592) |(2,288) |(2,876) |(3,240) |(9,996) |(2,448) |(2,073) |(2,047) |(1,350) |(7,918) |(1,750) |(1,751) | | |(3,501) |\n",
+ "|Net debt | |51,404 |40,920 |40,379 |38,941 |38,941 |33,313 |32,706 |31,971 |30,613 |30,613 |27,457 |22,816 |22,002 |21,422 |21,422 |21,232 |23,660 |22,324 |20,912 |20,912 |24,015 |22,614 | | |22,614 |\n",
+ "|ROACE% | | | | | |-3.8% | | | | |13.3% | | | | |30.5% | | | | |18.1% | | | | | |\n",
+ "|Adjusted EBIDA | | | | | |19,244 | | | | |30,783 | | | | |45,695 | | | | |34,345 | | | | | |\n",
+ "|upstream Production (mboe/d) | |2,579 |2,525 |2,243 |2,155 |2,375 |2,218 |2,120 |2,202 |2,332 |2,219 |2,252 |2,198 |2,298 |2,265 |2,254 |2,329 |2,272 |2,328 |2,320 |2,313 |2,378 |2,379 | | |2,379 |\n",
+ "|Announced dividend per ordinary share (cents per share) | |10.50 |5.25 |5.25 |5.25 |26.25 |5.25 |5.46 |5.46 |5.46 |21.63 |5.46 |6.01 |6.01 |6.61 |24.08 |6.61 |7.27 |7.27 |7.27 |28.42 |7.27 |8.000 | | |15.27 |\n",
+ "|RC profit (loss) per ordinary share (cents) | |(3.11) |(87.32) |(3.18) |4.08 |(89.53) |16.38 |11.74 |(14.57) |9.94 |23.53 |(118.11) |39.45 |0.12 |65.29 |(18.47) |48.46 |13.35 |21.19 |9.06 |93.21 |9.65 |(0.10) | | |9.59 |\n",
+ "|RC profit (loss) per ADS (dollars) | |(0.19) |(5.24) |(0.19) |0.24 |(5.37) |0.98 |0.70 |(0.87) |0.60 |1.41 |(7.09) |2.37 |0.01 |3.92 |(1.11) |2.91 |0.80 |1.27 |0.54 |5.59 |0.58 |(0.01) | | |0.58 |\n",
+ "|Underlying RC profit (loss) per ordinary share (cents) | |3.92 |(33.05) |0.42 |0.57 |(28.14) |12.95 |13.80 |16.48 |20.53 |63.65 |32.00 |43.58 |43.15 |26.44 |145.63 |27.74 |14.77 |19.14 |17.77 |79.69 |16.24 |16.61 | | |32.86 |\n",
+ "|Underlying RC profit (loss) per ADS (dollars) | |0.24 |(1.98) |0.03 |0.03 |(1.69) |0.78 |0.83 |0.99 |1.23 |3.82 |1.92 |2.61 |2.59 |1.59 |8.74 |1.66 |0.89 |1.15 |1.07 |4.78 |0.97 |1.00 | | |1.97 |\n",
+ "| | | | | | | | | | | | | | | | | | | | | | | | | | | |\n",
+ "|Resilient hydrocarbons | | | | | | | | | | | | | | | | | | | | | | | | | | |\n",
+ "|Total production - total hydrocarbons (mboe/d) | |3,715 |3,596 |3,318 |3,266 |3,473 |3,268 |3,215 |3,322 |3,458 |3,317 |3,002 |2,198 |2,298 |2,265 |3,004 |2,329 |2,272 |2,328 |2,320 |2,313 |2,378 |2,379 | | |2,379 |\n",
+ "|bp-operated upstream plant reliability* % (YTD) | |93.0 |94.2 |93.8 |94.0 |94.0 |93.0 |93.7 |94.3 |94.0 |94.0 |96.1 |95.3 |95.8 |96.0 |96.0 |95.5 |95.0 |95.7 |95.0 |95.0 |94.9 |95.5 | | |95.5 |\n",
+ "|upstream unit production costs* ($/boe) | |7.07 |6.13 |6.30 |6.39 |6.39 |7.36 |7.33 |6.96 |6.82 |6.82 |6.52 |6.53 |6.25 |6.07 |6.07 |5.73 |5.94 |5.88 |5.78 |5.78 |6.00 |6.17 | | |6.17 |\n",
+ "|bp-operated refining availability (%) | |96.1 |95.6 |96.2 |96.1 |96.0 |94.8 |93.5 |95.6 |95.4 |94.8 |95.0 |93.9 |94.3 |95.0 |94.5 |96.1 |95.7 |96.3 |96.1 |96.1 |90.4 |96.4 | | |93.4 |\n",
+ "|Biofuels production (kb/d) | | | | | |30 | | | | |26 | | | | |27 | | | | |32 | | | | | |\n",
+ "|Biogas supply volumes (mboe/d) | | | | | |11 | | | | |9 | | | | |12 | | | | |22 | | | | | |\n",
+ "|LNG Portfolio, Mtpa | | | | | |20 | | | | |18 | | | | |19 | | | | |23 | | | | | |\n",
+ "| | | | | | | | | | | | | | | | | | | | | | | | | | | |\n",
+ "|Convenience and mobility | | | | | | | | | | | | | | | | | | | | | | | | | | |\n",
+ "|Strategic convenience sites (#) |a |1,650 |1,650 |1,900 |1,900 |1,900 |1,950 |2,000 |2,050 |2,150 |2,150 |2,150 |2,200 |2,250 |2,400 |2,400 |2,450 |2,750 |2,750 |2,850 |2,850 |2,900 |2,950 | | |2,950 |\n",
+ "|Customer touchpoints (# millions) | | | | | |>11 | | | | |>12 | | | | |~12 | | | | |>12 | |– | | | |\n",
+ "|Electric vehicle charge points (#) |b | | | | |10,100 | | | | |13,100 | | | | |21,900 | | | | |29,300 | | | | | |\n",
+ "| | | | | | | | | | | | | | | | | | | | | | | | | | | |\n",
+ "|Low carbon energy | | | | | | | | | | | | | | | | | | | | | | | | | | |\n",
+ "|Developed renewables to FID (net), GW | |2.7 |2.8 |3.1 |3.3 |3.3 |3.3 |3.5 |3.6 |4.4 |4.4 |4.4 |4.4 |4.6 |5.8 |5.8 |5.9 |6.1 |6.1 |6.2 |6.2 |6.2 |6.5 | | |6.5 |\n",
+ "|Installed renewables capacity (net), GW | |1.1 |1.1 |1.2 |1.5 |1.5 |1.6 |1.6 |1.7 |1.9 |1.9 |1.9 |2.0 |2.0 |2.2 |2.2 |2.2 |2.4 |2.5 |2.7 |2.7 |2.7 |2.7 | | |2.7 |\n",
+ "\n"
+ ]
+ }
+ ],
+ "source": [
+ "print(documents[3].get_content())"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### MarkdownElementNodeParser\n",
+ "\n",
+ "This will generate a summary for each node; if a table is present, it will also create a summary for the table."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "node_parser = MarkdownElementNodeParser(llm=llm_gpt4o_mini, num_workers=4)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Parse the documents"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "1it [00:00, 18558.87it/s]\n",
+ "0it [00:00, ?it/s]\n",
+ "0it [00:00, ?it/s]\n",
+ "1it [00:00, 8630.26it/s]\n",
+ "1it [00:00, 18157.16it/s]\n",
+ "1it [00:00, 8355.19it/s]\n",
+ "1it [00:00, 5053.38it/s]\n",
+ "1it [00:00, 6955.73it/s]\n",
+ "1it [00:00, 7626.01it/s]\n",
+ "1it [00:00, 4832.15it/s]\n"
+ ]
+ }
+ ],
+ "source": [
+ "nodes = node_parser.get_nodes_from_documents(documents[:10])"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "base_nodes, objects = node_parser.get_nodes_and_objects(nodes)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "(27, 11, 8)"
+ ]
+ },
+ "execution_count": null,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "len(nodes), len(base_nodes), len(objects)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "This table presents the financial and operating information of a group from 2020 to 2024, detailing the condensed group statement of comprehensive income, including profit or loss for the period, other comprehensive income, and total comprehensive income attributable to shareholders and non-controlling interests.,\n",
+ "with the following table title:\n",
+ "Financial and Operating Information 2020 - 2024,\n",
+ "with the following columns:\n",
+ "- Group information: None\n",
+ "- Condensed group statement of comprehensive income: None\n",
+ "- Profit (loss) for the period: None\n",
+ "- Other comprehensive income: None\n",
+ "- Total comprehensive income: None\n",
+ "- Attributable to bp shareholders: None\n",
+ "- Attributable to non-controlling interests: None\n",
+ "\n"
+ ]
+ }
+ ],
+ "source": [
+ "print(objects[3].get_content())"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "(11, 8)"
+ ]
+ },
+ "execution_count": null,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "len(base_nodes), len(objects)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### Build Recursive Retrieval Index"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# dump both indexed tables and page text into the vector index\n",
+ "recursive_index = VectorStoreIndex(nodes=base_nodes + objects, llm=llm_gpt4o_mini)\n",
+ "\n",
+ "recursive_query_engine_o1 = recursive_index.as_query_engine(\n",
+ " similarity_top_k=5, llm=llm_o1\n",
+ ")\n",
+ "\n",
+ "recursive_query_engine_o1_preview = recursive_index.as_query_engine(\n",
+ " similarity_top_k=5, llm=llm_o1_preview\n",
+ ")\n",
+ "\n",
+ "recursive_query_engine_gpt4o_mini = recursive_index.as_query_engine(\n",
+ " similarity_top_k=5, llm=llm_gpt4o_mini\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Testing queries"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Query 1\n",
+ "\n",
+ "Expected Answer:\n",
+ "\n",
+ "$105,944 Million"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "query = \"What is the Sales and other operating revenues in 2020?\"\n",
+ "\n",
+ "response_recursive_o1 = recursive_query_engine_o1.query(query)\n",
+ "response_recursive_o1_preview = recursive_query_engine_o1_preview.query(query)\n",
+ "response_recursive_gpt4o_mini = recursive_query_engine_gpt4o_mini.query(query)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "----------------------RESPONSE WITH O1 MINI----------------------\n"
+ ]
+ },
+ {
+ "data": {
+ "text/markdown": [
+ "In 2020, the Sales and Other Operating Revenues amounted to **$105,944 million**."
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "----------------------RESPONSE WITH O1 PREVIEW----------------------\n"
+ ]
+ },
+ {
+ "data": {
+ "text/markdown": [
+ "In 2020, the Sales and other operating revenues were $105,944 million."
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "----------------------RESPONSE WITH GPT4O-MINI----------------------\n"
+ ]
+ },
+ {
+ "data": {
+ "text/markdown": [
+ "The Sales and other operating revenues in 2020 amount to 105,944 million dollars."
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "print(\"----------------------RESPONSE WITH O1 MINI----------------------\")\n",
+ "display(Markdown(f\"{response_recursive_o1}\"))\n",
+ "\n",
+ "print(\"----------------------RESPONSE WITH O1 PREVIEW----------------------\")\n",
+ "display(Markdown(f\"{response_recursive_o1_preview}\"))\n",
+ "\n",
+ "print(\"----------------------RESPONSE WITH GPT4O-MINI----------------------\")\n",
+ "display(Markdown(f\"{response_recursive_gpt4o_mini}\"))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Query 2\n",
+ "\n",
+ "Expected Answer:\n",
+ "\n",
+ "2021, 2022, 2023"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "query = \"In which years the Sales and other operating revenues is greater than $1,50,000 million?\"\n",
+ "\n",
+ "response_recursive_o1 = recursive_query_engine_o1.query(query)\n",
+ "response_recursive_o1_preview = recursive_query_engine_o1_preview.query(query)\n",
+ "response_recursive_gpt4o_mini = recursive_query_engine_gpt4o_mini.query(query)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "----------------------RESPONSE WITH O1 MINI----------------------\n"
+ ]
+ },
+ {
+ "data": {
+ "text/markdown": [
+ "The years in which the Sales and other operating revenues exceeded $150,000 million are 2021, 2022, and 2023."
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "----------------------RESPONSE WITH O1 PREVIEW----------------------\n"
+ ]
+ },
+ {
+ "data": {
+ "text/markdown": [
+ "The Sales and other operating revenues were greater than $150,000 million in the years 2021, 2022, and 2023."
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "----------------------RESPONSE WITH GPT4O-MINI----------------------\n"
+ ]
+ },
+ {
+ "data": {
+ "text/markdown": [
+ "The Sales and other operating revenues exceed $150,000 million in the years 2022 and 2023."
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "print(\"----------------------RESPONSE WITH O1 MINI----------------------\")\n",
+ "display(Markdown(f\"{response_recursive_o1}\"))\n",
+ "\n",
+ "print(\"----------------------RESPONSE WITH O1 PREVIEW----------------------\")\n",
+ "display(Markdown(f\"{response_recursive_o1_preview}\"))\n",
+ "\n",
+ "print(\"----------------------RESPONSE WITH GPT4O-MINI----------------------\")\n",
+ "display(Markdown(f\"{response_recursive_gpt4o_mini}\"))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Query 3"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "query = \"Which quarters and years has Total revenues and other income greater than $35K million?\"\n",
+ "\n",
+ "response_recursive_o1 = recursive_query_engine_o1.query(query)\n",
+ "response_recursive_o1_preview = recursive_query_engine_o1_preview.query(query)\n",
+ "response_recursive_gpt4o_mini = recursive_query_engine_gpt4o_mini.query(query)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "----------------------RESPONSE WITH O1 MINI----------------------\n"
+ ]
+ },
+ {
+ "data": {
+ "text/markdown": [
+ "Between 2021 and 2023, all four quarters each year exceeded $35,000 million in total revenues and other income. Additionally, both the first and second quarters of 2024 also surpassed this threshold."
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "----------------------RESPONSE WITH O1 PREVIEW----------------------\n"
+ ]
+ },
+ {
+ "data": {
+ "text/markdown": [
+ "Total revenues and other income exceeded $35,000 million in the following periods:\n",
+ "\n",
+ "**Quarters:**\n",
+ "- Q1 2021\n",
+ "- Q2 2021\n",
+ "- Q3 2021\n",
+ "- Q4 2021\n",
+ "- Q1 2022\n",
+ "- Q2 2022\n",
+ "- Q3 2022\n",
+ "- Q4 2022\n",
+ "- Q1 2023\n",
+ "- Q2 2023\n",
+ "- Q3 2023\n",
+ "- Q4 2023\n",
+ "- Q1 2024\n",
+ "- Q2 2024\n",
+ "\n",
+ "**Years:**\n",
+ "- 2020\n",
+ "- 2021\n",
+ "- 2022\n",
+ "- 2023"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "----------------------RESPONSE WITH GPT4O-MINI----------------------\n"
+ ]
+ },
+ {
+ "data": {
+ "text/markdown": [
+ "The quarters and years where Total revenues and other income exceeded $35,000 million are:\n",
+ "\n",
+ "- Q1 2021: $36,492 million\n",
+ "- Q2 2021: $37,598 million\n",
+ "- Q3 2021: $37,867 million\n",
+ "- Q4 2021: $52,238 million\n",
+ "- Q1 2022: $51,220 million\n",
+ "- Q2 2022: $69,506 million\n",
+ "- Q3 2022: $57,809 million\n",
+ "- Q4 2022: $70,356 million\n",
+ "- Q1 2023: $56,951 million\n",
+ "- Q2 2023: $49,479 million\n",
+ "- Q3 2023: $54,016 million\n",
+ "- Q4 2023: $52,586 million\n",
+ "- Q1 2024: $49,961 million\n",
+ "- Q2 2024: $48,250 million"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "print(\"----------------------RESPONSE WITH O1 MINI----------------------\")\n",
+ "display(Markdown(f\"{response_recursive_o1}\"))\n",
+ "\n",
+ "print(\"----------------------RESPONSE WITH O1 PREVIEW----------------------\")\n",
+ "display(Markdown(f\"{response_recursive_o1_preview}\"))\n",
+ "\n",
+ "print(\"----------------------RESPONSE WITH GPT4O-MINI----------------------\")\n",
+ "display(Markdown(f\"{response_recursive_gpt4o_mini}\"))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Query 4\n",
+ "\n",
+ "Expected Answer:\n",
+ "\n",
+ "Q1"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "query = (\n",
+ " \"Which quarters in 2023 is Total comprehensive income greater than $9000 million?\"\n",
+ ")\n",
+ "\n",
+ "response_recursive_o1 = recursive_query_engine_o1.query(query)\n",
+ "response_recursive_o1_preview = recursive_query_engine_o1_preview.query(query)\n",
+ "response_recursive_gpt4o_mini = recursive_query_engine_gpt4o_mini.query(query)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "----------------------RESPONSE WITH O1 MINI----------------------\n"
+ ]
+ },
+ {
+ "data": {
+ "text/markdown": [
+ "In 2023, the Total Comprehensive Income exceeded $9,000 million in the first quarter."
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "----------------------RESPONSE WITH O1 PREVIEW----------------------\n"
+ ]
+ },
+ {
+ "data": {
+ "text/markdown": [
+ "In the second quarter (Q2) of 2023, the total comprehensive income exceeded $9,000 million."
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "----------------------RESPONSE WITH GPT4O-MINI----------------------\n"
+ ]
+ },
+ {
+ "data": {
+ "text/markdown": [
+ "In 2023, the quarters where Total comprehensive income is greater than $9000 million are Q2 and Q4."
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "print(\"----------------------RESPONSE WITH O1 MINI----------------------\")\n",
+ "display(Markdown(f\"{response_recursive_o1}\"))\n",
+ "\n",
+ "print(\"----------------------RESPONSE WITH O1 PREVIEW----------------------\")\n",
+ "display(Markdown(f\"{response_recursive_o1_preview}\"))\n",
+ "\n",
+ "print(\"----------------------RESPONSE WITH GPT4O-MINI----------------------\")\n",
+ "display(Markdown(f\"{response_recursive_gpt4o_mini}\"))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Query 5\n",
+ "\n",
+ "Expected Answer:\n",
+ "\n",
+ "$392 million"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "query = \"what is the replacement cost profit (loss) in oil production & operations URCP post taxation in Q1 of 2020?\"\n",
+ "\n",
+ "response_recursive_o1 = recursive_query_engine_o1.query(query)\n",
+ "response_recursive_o1_preview = recursive_query_engine_o1_preview.query(query)\n",
+ "response_recursive_gpt4o_mini = recursive_query_engine_gpt4o_mini.query(query)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "----------------------RESPONSE WITH O1 MINI----------------------\n"
+ ]
+ },
+ {
+ "data": {
+ "text/markdown": [
+ "The replacement cost profit in oil production & operations URCP post taxation for the first quarter of 2020 was $392 million."
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "----------------------RESPONSE WITH O1 PREVIEW----------------------\n"
+ ]
+ },
+ {
+ "data": {
+ "text/markdown": [
+ "The replacement cost profit in oil production and operations after taxation in the first quarter of 2020 is $392 million."
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "----------------------RESPONSE WITH GPT4O-MINI----------------------\n"
+ ]
+ },
+ {
+ "data": {
+ "text/markdown": [
+ "The replacement cost profit (loss) in oil production & operations URCP post taxation in Q1 of 2020 is (2,798) million dollars."
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "print(\"----------------------RESPONSE WITH O1 MINI----------------------\")\n",
+ "display(Markdown(f\"{response_recursive_o1}\"))\n",
+ "\n",
+ "print(\"----------------------RESPONSE WITH O1 PREVIEW----------------------\")\n",
+ "display(Markdown(f\"{response_recursive_o1_preview}\"))\n",
+ "\n",
+ "print(\"----------------------RESPONSE WITH GPT4O-MINI----------------------\")\n",
+ "display(Markdown(f\"{response_recursive_gpt4o_mini}\"))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Observation:\n",
+ "\n",
+ "Although for some queries `o1-mini` and `o1-preview` outperformed GPT-4o-mini, `o1-preview` encountered issues with Query-4.\n",
+ "\n",
+ "Careful evaluation is necessary when considering the use of o1-models for Excel RAG."
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "llamaindex",
+ "language": "python",
+ "name": "llamaindex"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}