Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(llm-observability): add langchain integration #159

Merged
merged 11 commits into from
Jan 13, 2025

Conversation

skoob13
Copy link
Contributor

@skoob13 skoob13 commented Jan 10, 2025

Problem

Implements a callback handler for LangChain, which can be used as a global or local instance.

Global instance:

callback_handler = PosthogCallbackHandler(posthog_client)
def call_model():
    res = ChatOpenAI().invoke(["Foo"], callbacks=[callback_handler])

Local instance:

def call_model():
    callback_handler = PosthogCallbackHandler(posthog_client)
    res = ChatOpenAI().invoke(["Foo"], callbacks=[callback_handler])

Changes

  • Add a handler.
  • Reorganize imports so the OpenAI wrapper is now imported as from posthog.ai.openai import OpenAI and the handler as from posthog.ai.langchain import PosthogCallbackHandler.
  • Added tests for the integration: the handler supports LangChain>=0.2.0, including new 0.3.0.
  • Bumped the Python version on CI to 3.9 because it's the minimum supported version by LangChain. It won't affect users who are not using the AI wrappers.
  • Set up setup.py to install optional packages for the integration.

@skoob13 skoob13 force-pushed the feat/llm-observability-v0.1-langchain branch from 5d04873 to 2d6513e Compare January 11, 2025 14:41
@skoob13 skoob13 changed the title WIP: LangChain integration for AI Observability feat(llm-observability): add langchain integration Jan 13, 2025
@skoob13 skoob13 requested review from k11kirky and Twixes January 13, 2025 11:59
@skoob13 skoob13 marked this pull request as ready for review January 13, 2025 11:59
@skoob13 skoob13 force-pushed the feat/llm-observability-v0.1-langchain branch from 3f21b1f to d1fa606 Compare January 13, 2025 12:01
Copy link
Member

@Twixes Twixes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments, but overall looking solid. Great test coverage

@@ -51,10 +51,10 @@ jobs:
with:
fetch-depth: 1

- name: Set up Python 3.8
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the Python version change? Just wanna be sure we aren't silently dropping older Python versions. Though 3.8 should be fine to drop as it's EoL – we should just be explicit about that event if it happens. (To be honest also surprised we don't use a matrix of Python versions for this Actions job)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

langchain-community requires Python >=3.9, so the CI was failing on 3.8. It is listed under test requirements, so no package will be installed for people not using the AI integration.

Comment on lines 5 to 10
try:
import openai
except ImportError:
raise ModuleNotFoundError("Please install the OpenAI SDK to use this feature: 'pip install openai'")

import openai.resources
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. For maximum readability would be great to put both openai imports under that try

Comment on lines 5 to 10
try:
import openai
except ImportError:
raise ModuleNotFoundError("Please install the OpenAI SDK to use this feature: 'pip install openai'")

import openai.resources
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as with the other openai imports

RunStorage = Dict[UUID, RunMetadata]


class PosthogCallbackHandler(BaseCallbackHandler):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Posthog capitalization really bugs me 😅 But I also see it's what we're already using in the Python SDK. Maybe to avoid the PostHog vs. Posthog inconsistency, we can export just CallbackHandler – same as Langfuse (from langfuse.callback import CallbackHandler)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. I followed the Client example, but let's rename it.

from posthog.ai.utils import get_model_params
from posthog.client import Client

PosthogProperties = Dict[str, Any]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This alias is not super useful, I think just Dict[str, Any] in function signatures might actually be a bit more obvious for SDK users

self._properties = properties
self._runs = {}
self._parent_tree = {}
self.log = logging.getLogger("posthog")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this can be just at the top level of the file, given we're going to be reusing the same logger instance every time

"$ai_model": run.get("model"),
"$ai_model_parameters": run.get("model_params"),
"$ai_input": run.get("messages"),
"$ai_output": {"choices": output},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idea for simplifying the data model @skoob13 @k11kirky:

Suggested change
"$ai_output": {"choices": output},
"$ai_output_choices": output,

this should be just as readable, but with less nesting!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100% agree, @k11kirky any objections?

"$ai_output_tokens": output_tokens,
"$ai_latency": latency,
"$ai_trace_id": trace_id,
"$ai_posthog_properties": self._properties,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not unpack instead?

Suggested change
"$ai_posthog_properties": self._properties,
**self._properties,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to unpack them since customers and we don't need to unpack nested JSON values. I've followed the data model. @k11kirky what do you think?

output = [_extract_raw_esponse(generation) for generation in generation_result]

event_properties = {
"$ai_provider": run.get("provider"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're missing $ai_request_url – is this metadata we have access to here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't seen that one. Let me check; we should have this information.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

base_url is retrievable. However, an actual endpoint URL is tricky. I think the base API URL matters the most, so we can use it for the MVP. Otherwise, we should postpone it for Langchain.



def _get_http_status(error: BaseException) -> int:
# OpenAI: https://github.com/anthropics/anthropic-sdk-python/blob/main/src/anthropic/_exceptions.py
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the wrong link

"$ai_posthog_properties": self._properties,
}
self._client.capture(
distinct_id=self._distinct_id or trace_id,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this will result in lots of persons, as we discussed. Does error tracking do something useful here that we can reuse?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They explicitly mark the event as personless:

if self._distinct_id is None:
        event_properties["$process_person_profile"] = False

Working on that now.

@skoob13 skoob13 merged commit e51b883 into master Jan 13, 2025
2 checks passed
@skoob13 skoob13 deleted the feat/llm-observability-v0.1-langchain branch January 13, 2025 17:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants