From 690efebab2e81b6cedcafd1940e9f36105f63e5b Mon Sep 17 00:00:00 2001 From: killian <63927363+KillianLucas@users.noreply.github.com> Date: Wed, 13 Dec 2023 17:28:24 -0800 Subject: [PATCH] Split `ROADMAP.md` into actionable steps for contributors --- docs/ROADMAP.md | 41 +++++++++++++++++++++++++++++++++-------- 1 file changed, 33 insertions(+), 8 deletions(-) diff --git a/docs/ROADMAP.md b/docs/ROADMAP.md index c2a544d348..80ea95de1d 100644 --- a/docs/ROADMAP.md +++ b/docs/ROADMAP.md @@ -3,27 +3,50 @@ ## New features - [ ] Add anonymous, opt-in data collection → open-source dataset, like `--contribute_conversations` -- [ ] Add `interpreter --async` command (that OI itself can use) — simply prints the final resulting output — nothing intermediary. + - [ ] Make that flag send each message to server + - [ ] Set up receiving replit server + - [ ] Add option to send previous conversations + - [ ] Make the messaging really strong re: "We will be saving this, we will redact PII, we will open source the dataset so we (and others) can train code interpreting models" +- [ ] Let OI use OI. Add `interpreter.chat(async=True)` bool. OI can use this to open OI on a new thread + - [ ] Also add `interpreter.get_last_assistant_messages()` to return the last assistant messages. - [ ] Allow for limited functions (`interpreter.functions`) using regex -- [ ] Allow for custom llms (`interpreter.llm`) which conform to some class, properties like `.supports_functions` and `.supports_vision` + - [ ] If `interpreter.functions != []`: + - [ ] set `interpreter.languages` to only use Python + - [ ] Use regex to ensure the output of code blocks conforms to just using those functions + other python basics +- [ ] Allow for custom llms (to be stored in `interpreter.llm`) which conform to some class + - [ ] Should be a generator that can be treated exactly like the OpenAI streaming API + - [ ] Has attributes `.supports_function_calling`, `.supports_vision`, and `.context_window` - [ ] (Maybe) Allow for a custom embedding function (`interpreter.embed`) which will let us do semantic search - [ ] Allow for custom languages (`interpreter.computer.languages.append(class_that_conforms_to_base_language)`) + - [x] Make it so function calling dynamically uses the languages in interpreter.computer.languages - [ ] Add a skill library, or maybe expose post processing on code, so we can save functions for later & semantically search docstrings. Keep this minimal! -- [ ] Improve partnership with `languagetools` -- [ ] Allow for integrations -- [ ] Expand "safe mode" to have proper, simple Docker support + - [ ] + - [ ] If `interpreter.skill_library == True`, we should add a decorator above all functions, then show OI how to search its skill library +- [ ] Allow for integrations somehow +- [ ] Expand "safe mode" to have proper, simple Docker support, or maybe Cosmopolitan LibC - [ ] Make it so core can be run elsewhere from terminal package — perhaps split over HTTP (this would make docker easier too) ## Future-proofing -- [ ] Figure out how to run us on [GAIA](https://huggingface.co/gaia-benchmark) and use a subset of that as our tests / optimization framework -- [ ] Add more language models to tests (use Replicate, ask LiteLLM how they made their "mega key" to many different LLM providers) +- [ ] Really good tests / optimization framework, to be run less frequently than Github actions tests + - [ ] Figure out how to run us on [GAIA](https://huggingface.co/gaia-benchmark) and use a subset of that as our tests / optimization framework + - [ ] How do we just get the questions out of this thing? + - [ ] How do we assess whether or not OI has solved the task? + - [ ] Loop over GAIA, use a different language model every time (use Replicate, then ask LiteLLM how they made their "mega key" to many different LLM providers) + - [ ] Loop over that ↑ using a different prompt each time. Which prompt is best across all LLMs? + - [ ] (Future future) Use GPT-4 to assess each result, explaining each failure. Summarize. Send it all to GPT-4 + our prompt. Let it redesign the prompt, given the failures, rinse and repeat - [ ] Use Anthropic function calling -- [ ] Stateless core python package (free of config settings) config passed in by TUI +- [ ] Stateless (as in, doesn't use the application directory) core python package. All `appdir` stuff should be only for the TUI + - [ ] `interpreter.__dict__` = a dict derived from config is how the python package should be set, and this should be from the TUI. `interpreter` should not know about the config + - [ ] Move conversation storage out of the core and into the TUI. When we exit or error, save messages same as core currently does - [ ] Local and vision should be reserved for TUI, more granular settings for Python + - [ ] Rename `interpreter.local` → `interpreter.offline`, implement ↑ custom LLMs with a `.supports_vision` attribute instead of `interpreter.vision` - [ ] Further split TUI from core (some utils still reach across) - [ ] Remove `procedures` (there must be a better way) - [ ] Better storage of different model keys in TUI / config file. All keys, to multiple providers, should be stored in there. Easy switching + - [ ] Automatically migrate users from old config to new config, display a message of this +- [ ] On update, check for new system message and ask user to overwrite theirs, or only let users pass in "custom instructions" which adds to our system message + - [ ] I think we could have a config that's like... system_message_version. If system_message_version is below the current version, ask the user if we can overwrite it with the default config system message of that version ## Documentation @@ -31,6 +54,8 @@ - [ ] **Easy 🟢** Require documentation for PRs - [ ] Work with Mintlify to translate docs - [ ] Better comments throughout the package (they're like docs for contributors) +- [ ] Document the New Computer Update +- [ ] Make a migration guide for the New Computer Update ## Completed