From 690efebab2e81b6cedcafd1940e9f36105f63e5b Mon Sep 17 00:00:00 2001
From: killian <63927363+KillianLucas@users.noreply.github.com>
Date: Wed, 13 Dec 2023 17:28:24 -0800
Subject: [PATCH] Split `ROADMAP.md` into actionable steps for contributors

---
 docs/ROADMAP.md | 41 +++++++++++++++++++++++++++++++++--------
 1 file changed, 33 insertions(+), 8 deletions(-)

diff --git a/docs/ROADMAP.md b/docs/ROADMAP.md
index c2a544d348..80ea95de1d 100644
--- a/docs/ROADMAP.md
+++ b/docs/ROADMAP.md
@@ -3,27 +3,50 @@
 ## New features
 
 - [ ] Add anonymous, opt-in data collection → open-source dataset, like `--contribute_conversations`
-- [ ] Add `interpreter --async` command (that OI itself can use) — simply prints the final resulting output — nothing intermediary.
+  - [ ] Make that flag send each message to server
+  - [ ] Set up receiving replit server
+  - [ ] Add option to send previous conversations
+  - [ ] Make the messaging really strong re: "We will be saving this, we will redact PII, we will open source the dataset so we (and others) can train code interpreting models"
+- [ ] Let OI use OI. Add `interpreter.chat(async=True)` bool. OI can use this to open OI on a new thread
+  - [ ] Also add `interpreter.get_last_assistant_messages()` to return the last assistant messages.
 - [ ] Allow for limited functions (`interpreter.functions`) using regex
-- [ ] Allow for custom llms (`interpreter.llm`) which conform to some class, properties like `.supports_functions` and `.supports_vision`
+  - [ ] If `interpreter.functions != []`:
+      - [ ] set `interpreter.languages` to only use Python
+      - [ ] Use regex to ensure the output of code blocks conforms to just using those functions + other python basics
+- [ ] Allow for custom llms (to be stored in `interpreter.llm`) which conform to some class
+  - [ ] Should be a generator that can be treated exactly like the OpenAI streaming API
+  - [ ] Has attributes `.supports_function_calling`, `.supports_vision`, and `.context_window`
 - [ ] (Maybe) Allow for a custom embedding function (`interpreter.embed`) which will let us do semantic search
 - [ ] Allow for custom languages (`interpreter.computer.languages.append(class_that_conforms_to_base_language)`)
+  - [x] Make it so function calling dynamically uses the languages in interpreter.computer.languages
 - [ ] Add a skill library, or maybe expose post processing on code, so we can save functions for later & semantically search docstrings. Keep this minimal!
-- [ ] Improve partnership with `languagetools`
-- [ ] Allow for integrations
-- [ ] Expand "safe mode" to have proper, simple Docker support
+  - [ ] 
+  - [ ] If `interpreter.skill_library == True`, we should add a decorator above all functions, then show OI how to search its skill library
+- [ ] Allow for integrations somehow
+- [ ] Expand "safe mode" to have proper, simple Docker support, or maybe Cosmopolitan LibC
 - [ ] Make it so core can be run elsewhere from terminal package — perhaps split over HTTP (this would make docker easier too)
 
 ## Future-proofing
 
-- [ ] Figure out how to run us on [GAIA](https://huggingface.co/gaia-benchmark) and use a subset of that as our tests / optimization framework
-- [ ] Add more language models to tests (use Replicate, ask LiteLLM how they made their "mega key" to many different LLM providers)
+- [ ] Really good tests / optimization framework, to be run less frequently than Github actions tests
+  - [ ] Figure out how to run us on [GAIA](https://huggingface.co/gaia-benchmark) and use a subset of that as our tests / optimization framework
+    - [ ] How do we just get the questions out of this thing?
+    - [ ] How do we assess whether or not OI has solved the task?
+  - [ ] Loop over GAIA, use a different language model every time (use Replicate, then ask LiteLLM how they made their "mega key" to many different LLM providers)
+  - [ ] Loop over that ↑ using a different prompt each time. Which prompt is best across all LLMs?
+  - [ ] (Future future) Use GPT-4 to assess each result, explaining each failure. Summarize. Send it all to GPT-4 + our prompt. Let it redesign the prompt, given the failures, rinse and repeat
 - [ ] Use Anthropic function calling
-- [ ] Stateless core python package (free of config settings) config passed in by TUI
+- [ ] Stateless (as in, doesn't use the application directory) core python package. All `appdir` stuff should be only for the TUI
+  - [ ] `interpreter.__dict__` = a dict derived from config is how the python package should be set, and this should be from the TUI. `interpreter` should not know about the config
+  - [ ] Move conversation storage out of the core and into the TUI. When we exit or error, save messages same as core currently does
 - [ ] Local and vision should be reserved for TUI, more granular settings for Python
+  - [ ] Rename `interpreter.local` → `interpreter.offline`, implement ↑ custom LLMs with a `.supports_vision` attribute instead of `interpreter.vision`
 - [ ] Further split TUI from core (some utils still reach across)
 - [ ] Remove `procedures` (there must be a better way)
 - [ ] Better storage of different model keys in TUI / config file. All keys, to multiple providers, should be stored in there. Easy switching
+  - [ ] Automatically migrate users from old config to new config, display a message of this
+- [ ] On update, check for new system message and ask user to overwrite theirs, or only let users pass in "custom instructions" which adds to our system message
+  - [ ] I think we could have a config that's like... system_message_version. If system_message_version is below the current version, ask the user if we can overwrite it with the default config system message of that version
 
 ## Documentation
 
@@ -31,6 +54,8 @@
 - [ ] **Easy 🟢** Require documentation for PRs
 - [ ] Work with Mintlify to translate docs
 - [ ] Better comments throughout the package (they're like docs for contributors)
+- [ ] Document the New Computer Update
+- [ ] Make a migration guide for the New Computer Update
 
 ## Completed