Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ell serialization refactor / distributed 2 #362

Draft
wants to merge 45 commits into
base: main
Choose a base branch
from

Conversation

alex-dixon
Copy link
Contributor

@alex-dixon alex-dixon commented Nov 2, 2024

Rough overview:

  • Production environments may benefit from an architecture where lmp versions and invocation traces are written from multiple machines that do not share the same file system
  • Ideally, these setups need not miss out on any of ell’s features. This includes realtime updates to ell studio
  • Data stores like postgres do not offer an easy way to subscribe to realtime changes. We could have an api that works off postgres change data capture. I’ve worked with this extensively to implement materialized views from changes to tables using debezium, Kafka/redpanda, and Apache Spark. It is doable, but at the end of it this would at most support realtime for one datastore. Ideally ell and its feature set should be available with other data stores, even nosql ones. Allowing studio to wire itself to an event bus lets us “read our own writes” without getting into the particulars of realtime options per data store. This makes it much easier to integrate data stores because the realtime component is solved for all of them by default.
  • The event bus itself is pluggable in the same way as SQLite or postgres is for storage backends.
  • We establish patterns for adding new functionality via conditional import of modules and Ell’s —extras. We remove barriers to adoption and promote what is more or less single file contributions to support additional event buses or data stores. See the celery library for an example of this pattern scaling to dozens of different backends: https://github.com/celery/celery/tree/main/celery/backends
  • We continue to separate of ell studio from the core and lay groundwork for async support by introducing a serialization interface that is fully async and does not use sqlmodels as input or necessarily depend on sqlalchemy. Of all the changes this is the only one that touches the core and is probably the hardest to get right. The interfaces are written and ready for use but I have held off integrating them until we gain consensus on the code that is written here.
  • Even with serialization code added to core, ell users should notice 0 changes when updating the library except perhaps the addition of new configuration options. All changes are internal.

Subscriber = WebSocket


class PubSub(ABC):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs documented


MAX_TOPIC_LENGTH = 65535

class TopicMatcher:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Document

app = FastAPI(
title="ell api",
description="ell api server",
version="0.1.0",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be ell version



# Load the image using PIL
big_picture = Image.open(os.path.join(os.path.dirname(__file__), "bigpicture.jpg"))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe download the big picture instead of keeping in repo

@@ -97,7 +97,7 @@ def wrapper(

# Determine the type annotation
if param.annotation == inspect.Parameter.empty:
raise ValueError(f"Parameter {param_name} has no type annotation, and cannot be converted into a tool schema for OpenAI and other provisders. Should OpenAI produce a string or an integer, etc, for this parameter?")
raise ValueError(f"Parameter {param_name} has no type annotation, and cannot be converted into a tool schema for OpenAI and other providers. Should OpenAI produce a string or an integer, etc, for this parameter?")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like

Suggested change
raise ValueError(f"Parameter {param_name} has no type annotation, and cannot be converted into a tool schema for OpenAI and other providers. Should OpenAI produce a string or an integer, etc, for this parameter?")
raise ValueError(f"The {tool_name} tool parameter {param_name} needs a type annotation.
Example:
def {tool_name}({param_name}: str):
^^^ Add the type of {param_name} here
Tool function annotations help ell create a tool schema for OpenAI and other providers.")

Copy link
Contributor Author

@alex-dixon alex-dixon Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last line of the error message should be: More info: https://docs.ell.so/core_concepts/tool_usage.html#defining-tools

@@ -1,6 +1,9 @@
from abc import ABC, abstractmethod
from collections import defaultdict
from typing import Any, Callable, Dict, List, Optional, Tuple, Type, Union, cast

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be removed

src/ell/serialize/http.py Outdated Show resolved Hide resolved
src/ell/serialize/http.py Outdated Show resolved Hide resolved
# """Serializes ell objects to json for writing to the database or wire protocols"""
# return json.dumps(
# pydantic_ltype_aware_cattr.unstructure(obj),
# sort_keys=True, default=repr, ensure_ascii=False)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Address this…will had a comment about serialize_object elsewhere

from typing import Any
import ell.types.serialize

def utc_now() -> datetime:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See if this lives elsewhere and maybe name current_timestamp or something less awkward

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant