-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(llm-observability): add langchain integration #159
Conversation
5d04873
to
2d6513e
Compare
3f21b1f
to
d1fa606
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments, but overall looking solid. Great test coverage
@@ -51,10 +51,10 @@ jobs: | |||
with: | |||
fetch-depth: 1 | |||
|
|||
- name: Set up Python 3.8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the Python version change? Just wanna be sure we aren't silently dropping older Python versions. Though 3.8 should be fine to drop as it's EoL – we should just be explicit about that event if it happens. (To be honest also surprised we don't use a matrix of Python versions for this Actions job)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
langchain-community
requires Python >=3.9, so the CI was failing on 3.8. It is listed under test
requirements, so no package will be installed for people not using the AI integration.
posthog/ai/openai/openai_async.py
Outdated
try: | ||
import openai | ||
except ImportError: | ||
raise ModuleNotFoundError("Please install the OpenAI SDK to use this feature: 'pip install openai'") | ||
|
||
import openai.resources |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. For maximum readability would be great to put both openai
imports under that try
posthog/ai/openai/openai.py
Outdated
try: | ||
import openai | ||
except ImportError: | ||
raise ModuleNotFoundError("Please install the OpenAI SDK to use this feature: 'pip install openai'") | ||
|
||
import openai.resources |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as with the other openai
imports
posthog/ai/langchain/callbacks.py
Outdated
RunStorage = Dict[UUID, RunMetadata] | ||
|
||
|
||
class PosthogCallbackHandler(BaseCallbackHandler): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Posthog
capitalization really bugs me 😅 But I also see it's what we're already using in the Python SDK. Maybe to avoid the PostHog
vs. Posthog
inconsistency, we can export just CallbackHandler
– same as Langfuse (from langfuse.callback import CallbackHandler
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call. I followed the Client example, but let's rename it.
posthog/ai/langchain/callbacks.py
Outdated
from posthog.ai.utils import get_model_params | ||
from posthog.client import Client | ||
|
||
PosthogProperties = Dict[str, Any] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This alias is not super useful, I think just Dict[str, Any]
in function signatures might actually be a bit more obvious for SDK users
posthog/ai/langchain/callbacks.py
Outdated
self._properties = properties | ||
self._runs = {} | ||
self._parent_tree = {} | ||
self.log = logging.getLogger("posthog") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this can be just at the top level of the file, given we're going to be reusing the same logger instance every time
"$ai_model": run.get("model"), | ||
"$ai_model_parameters": run.get("model_params"), | ||
"$ai_input": run.get("messages"), | ||
"$ai_output": {"choices": output}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
100% agree, @k11kirky any objections?
"$ai_output_tokens": output_tokens, | ||
"$ai_latency": latency, | ||
"$ai_trace_id": trace_id, | ||
"$ai_posthog_properties": self._properties, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not unpack instead?
"$ai_posthog_properties": self._properties, | |
**self._properties, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense to unpack them since customers and we don't need to unpack nested JSON values. I've followed the data model. @k11kirky what do you think?
output = [_extract_raw_esponse(generation) for generation in generation_result] | ||
|
||
event_properties = { | ||
"$ai_provider": run.get("provider"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're missing $ai_request_url
– is this metadata we have access to here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't seen that one. Let me check; we should have this information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
base_url
is retrievable. However, an actual endpoint URL is tricky. I think the base API URL matters the most, so we can use it for the MVP. Otherwise, we should postpone it for Langchain.
posthog/ai/langchain/callbacks.py
Outdated
|
||
|
||
def _get_http_status(error: BaseException) -> int: | ||
# OpenAI: https://github.com/anthropics/anthropic-sdk-python/blob/main/src/anthropic/_exceptions.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the wrong link
"$ai_posthog_properties": self._properties, | ||
} | ||
self._client.capture( | ||
distinct_id=self._distinct_id or trace_id, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this will result in lots of persons, as we discussed. Does error tracking do something useful here that we can reuse?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They explicitly mark the event as personless:
if self._distinct_id is None:
event_properties["$process_person_profile"] = False
Working on that now.
Problem
Implements a callback handler for LangChain, which can be used as a global or local instance.
Global instance:
Local instance:
Changes
from posthog.ai.openai import OpenAI
and the handler asfrom posthog.ai.langchain import PosthogCallbackHandler
.