Custom ontology #262

prasmussen15 · 2025-02-07T18:27:40Z

Important

This pull request adds custom ontology support, enabling entity classification and attribute extraction, with updates to graphiti.py, node_db_queries.py, and extract_nodes.py.

Behavior:
- Adds support for custom ontology by allowing entity classification and attribute extraction in graphiti.py and node_operations.py.
- Introduces entity_types parameter in add_episode() in graphiti.py to specify custom entity models.
- Updates extract_nodes() in node_operations.py to classify nodes using custom entity types.
Database:
- Modifies ENTITY_NODE_SAVE and ENTITY_NODE_SAVE_BULK in node_db_queries.py to handle dynamic labels and attributes.
- Updates get_by_uuid() and similar functions in nodes.py to retrieve node attributes and labels.
Prompts:
- Adds classify_nodes prompt in extract_nodes.py for entity classification.
- Updates summarize_context in summarize_nodes.py to extract entity properties.
Misc:
- Deletes msc_eval.py, msc_runner.py, and parse_msc_messages.py from multi_session_conversation_memory examples.

^{This description was created by}^{for 46b279b. It will automatically update as commits are pushed.}

ellipsis-dev

❌ Changes requested. Reviewed everything up to 7ee2755 in 2 minutes and 22 seconds

More details

Looked at 871 lines of code in 12 files
Skipped 0 files when reviewing.
Skipped posting 8 drafted comments based on config settings.

1. graphiti_core/utils/maintenance/node_operations.py:166

Draft comment:
Using ast.literal_eval on LLM responses can be unsafe if the response is malformed. Consider using a safer JSON parsing approach if possible.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 80% vs. threshold = 80%

The comment raises a valid security concern since ast.literal_eval() could potentially execute malicious code if the input is crafted carefully. 2. However, the code is using a pydantic response_model (EntityClassification) to validate the LLM response first. 3. The string being parsed is accessed via .get() with a default of '{}', providing some safety. 4. Still, using ast.literal_eval() on any external input is generally risky, and JSON parsing would be safer.
The comment might be overstating the risk since there are some safeguards in place with the pydantic model. Also, ast.literal_eval() is actually designed to be a safe subset of eval() that only handles literals.
While there are safeguards, using json.loads() would be strictly safer and more appropriate for parsing what is expected to be a JSON structure. The risk, while mitigated, is still present.
Keep the comment as it suggests a concrete, actionable improvement to security by using a more appropriate parsing method for the expected JSON data.

2. graphiti_core/utils/maintenance/node_operations.py:320

Draft comment:
The creation of the dynamic Pydantic model (entity_attributes_model) by merging entity type classes and Summary is complex. Adding an inline comment explaining the rationale would improve maintainability.
Reason this comment was not posted:
Confidence changes required: 80% <= threshold 80%
None

3. graphiti_core/utils/maintenance/node_operations.py:127

Draft comment:
The iterative reflexion loop in the extract_nodes function is central for refining extracted entities. Consider adding more inline comments to clarify the iterative refinement process.
Reason this comment was not posted:
Confidence changes required: 70% <= threshold 80%
None

4. graphiti_core/utils/maintenance/node_operations.py:280

Draft comment:
Context preparation for LLM in node extraction is repeated across functions (extract_message_nodes, extract_text_nodes, etc.). Consider refactoring these into a helper function to follow DRY principles.
Reason this comment was not posted:
Confidence changes required: 80% <= threshold 80%
None

5. graphiti_core/utils/maintenance/node_operations.py:152

Draft comment:
Potential bug: 'entity_types.keys()' is called without checking if entity_types is None. Consider using 'entity_types.keys() if entity_types else []' to avoid an AttributeError.
Reason this comment was not posted:
Comment looked like it was already resolved.

6. graphiti_core/utils/maintenance/node_operations.py:332

Draft comment:
Danger: 'extracted_node.attributes.update(node_attributes_response)' may raise an error if attributes is None. Initialize attributes (e.g. to {}) before calling update.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 30% vs. threshold = 80%

EntityNode is imported but we can't see its definition. 2. The code is using pydantic models and type hints throughout. 3. This is in an async function that's part of a larger node processing system. 4. Given the professional nature of the codebase and use of pydantic, it's likely EntityNode is properly initialized. 5. Without seeing EntityNode's definition, we can't be 100% certain.
I can't see the EntityNode class definition, so I don't know for certain if attributes is guaranteed to be initialized. The codebase seems well-structured but we should be cautious about assumptions.
While true that we can't see the EntityNode definition, the extensive use of pydantic and type hints throughout the codebase suggests this is a well-structured system where basic initialization would be handled properly.
Without seeing the EntityNode definition, we can't be completely certain about the attributes initialization. The comment raises a valid concern but lacks strong evidence.

7. graphiti_core/utils/maintenance/node_operations.py:159

Draft comment:
Consider adding error handling for ast.literal_eval in parsing the LLM response, in case the output format is unexpected.
Reason this comment was not posted:
Confidence changes required: 80% <= threshold 80%
None

8. graphiti_core/utils/maintenance/node_operations.py:148

Draft comment:
Minor style: When building 'custom_prompt' with missed entities, the concatenation adds a trailing comma. Consider formatting the list of missed entities more cleanly.
Reason this comment was not posted:
Confidence changes required: 50% <= threshold 80%
None

Workflow ID: wflow_7R2PM5DPT5kOQfM1

Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

graphiti_core/utils/maintenance/node_operations.py

paul-paliychuk

🚀🚀🚀🚀🚀🚀🚀

ellipsis-dev

👍 Looks good to me! Incremental review on 406b889 in 1 minute and 57 seconds

More details

Looked at 68 lines of code in 3 files
Skipped 0 files when reviewing.
Skipped posting 11 drafted comments based on config settings.

1. examples/podcast/podcast_runner.py:66

Draft comment:
Uncommenting clear_data call will clear the DB; ensure this is intended for production.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 80%
The comment is asking the author to ensure that uncommenting a line of code is intended for production. This is a form of asking for confirmation, which is against the rules. The comment does not provide a specific code suggestion or ask for a test to be written. Therefore, it should be removed.

2. graphiti_core/nodes.py:255

Draft comment:
Changing the default of 'attributes' to {} avoids None checks; ensure this doesn’t mask missing data.
Reason this comment was not posted:
Comment looked like it was already resolved.

3. graphiti_core/nodes.py:277

Draft comment:
Using 'self.attributes or {}' is a good safeguard, ensuring update does not fail if attributes is None.
Reason this comment was not posted:
Marked as duplicate.

4. graphiti_core/utils/maintenance/node_operations.py:156

Draft comment:
Safely handling entity_types with a conditional ensures keys() isn’t called on None.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 80%
This comment is purely informative and does not provide any actionable feedback or suggestions for improvement. It simply states what the code is doing without offering any critique or suggestion for change.

5. graphiti_core/utils/maintenance/node_operations.py:174

Draft comment:
Using node_classifications.get(name) without an explicit default is clean since it returns None by default.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 80%
This comment is purely informative and does not provide any actionable feedback or suggestions for improvement. It simply states that the current implementation is clean, which does not align with the rules provided.

6. graphiti_core/utils/maintenance/node_operations.py:247

Draft comment:
Type hint changed for entity_types (dict[str, BaseModel]) clarifies expectations; verify that downstream code accepts BaseModel values.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 80%
This comment is asking the PR author to verify something, which is against the rules. It doesn't provide a specific suggestion or point out a specific issue with the code. Therefore, it should be removed.

7. examples/podcast/podcast_runner.py:66

Draft comment:
Ensure that calling clear_data(client.driver) is intended in this example since it deletes existing graph data.
Reason this comment was not posted:
Marked as duplicate.

8. graphiti_core/nodes.py:255

Draft comment:
Defaulting 'attributes' to {} makes it non-nullable; consider removing the redundant 'or {}' in the update call.
Reason this comment was not posted:
Marked as duplicate.

9. graphiti_core/utils/maintenance/node_operations.py:156

Draft comment:
Using 'entity_types.keys() if entity_types is not None else []' safely handles the case when entity_types is None.
Reason this comment was not posted:
Marked as duplicate.

10. graphiti_core/utils/maintenance/node_operations.py:171

Draft comment:
Removing the explicit default None in 'node_classifications.get(name)' is cleaner, as dict.get returns None by default.
Reason this comment was not posted:
Marked as duplicate.

11. graphiti_core/utils/maintenance/node_operations.py:243

Draft comment:
The type for 'entity_types' parameter has been updated to use BaseModel; ensure that all provided models are BaseModel-derived and that this change is backward compatible.
Reason this comment was not posted:
Marked as duplicate.

Workflow ID: wflow_KPRFEnanaBNomjwC

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ellipsis-dev

❌ Changes requested. Incremental review on 25a43f8 in 2 minutes and 42 seconds

More details

Looked at 28 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 2 drafted comments based on config settings.

1. graphiti_core/utils/maintenance/node_operations.py:316

Draft comment:
Using BaseModel() as the fallback creates an instance, but later code expects a model class (to access fields). Consider using BaseModel (the class) instead.
Reason this comment was not posted:
Marked as duplicate.

2. graphiti_core/utils/maintenance/node_operations.py:314

Draft comment:
Providing a default value (BaseModel()) guarantees non-None items, making the filter(lambda x: x is not None, ...) redundant.
Reason this comment was not posted:
Marked as duplicate.

Workflow ID: wflow_GEa7gDg1VNjys4Xp

Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

graphiti_core/utils/maintenance/node_operations.py

ellipsis-dev

❌ Changes requested. Incremental review on 8d2e7c6 in 2 minutes and 20 seconds

More details

Looked at 28 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 3 drafted comments based on config settings.

1. graphiti_core/utils/maintenance/node_operations.py:321

Draft comment:
Prefer using direct key access on 'summary_context' instead of 'get' since 'attributes' is initialized.
Reason this comment was not posted:
Confidence changes required: 50% <= threshold 80%
None

2. graphiti_core/utils/maintenance/node_operations.py:312

Draft comment:
Result of tuple addition isn’t assigned. This line should update entity_type_classes (e.g. using '+=' or assignment) so that the filtered tuple is actually used.
Reason this comment was not posted:
Marked as duplicate.

3. graphiti_core/utils/maintenance/node_operations.py:321

Draft comment:
Switching from fields to model_fields is acceptable in pydantic v2, but using summary_context.get('attributes', []) is unnecessary if the key is already set. Consider reverting to summary_context['attributes'].append(field_name) for clarity.
Reason this comment was not posted:
Confidence changes required: 66% <= threshold 80%
None

Workflow ID: wflow_h3TA5qSSsmUxyFQX

Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

graphiti_core/utils/maintenance/node_operations.py

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>

ellipsis-dev

👍 Looks good to me! Incremental review on 46b279b in 45 seconds

More details

Looked at 13 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 2 drafted comments based on config settings.

1. graphiti_core/utils/maintenance/node_operations.py:311

Draft comment:
Good update: using tuple[BaseModel, ...] correctly indicates a tuple of arbitrary length. This improves type safety.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 80%
None

2. graphiti_core/utils/maintenance/node_operations.py:311

Draft comment:
Good fix: Changing the tuple type hint to tuple[BaseModel, ...] correctly indicates a variadic tuple of BaseModel instances.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 80%
None

Workflow ID: wflow_0Y9Nz70jKSQPbqgg

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

prasmussen15 temporarily deployed to development February 7, 2025 18:27 — with GitHub Actions Inactive

prasmussen15 had a problem deploying to development February 7, 2025 18:27 — with GitHub Actions Failure

ellipsis-dev bot reviewed Feb 7, 2025

View reviewed changes

graphiti_core/utils/maintenance/node_operations.py Outdated Show resolved Hide resolved

paul-paliychuk approved these changes Feb 8, 2025

View reviewed changes

prasmussen15 temporarily deployed to development February 11, 2025 16:41 — with GitHub Actions Inactive

prasmussen15 had a problem deploying to development February 11, 2025 16:41 — with GitHub Actions Failure

ellipsis-dev bot reviewed Feb 11, 2025

View reviewed changes

prasmussen15 had a problem deploying to development February 11, 2025 16:59 — with GitHub Actions Failure

prasmussen15 temporarily deployed to development February 11, 2025 16:59 — with GitHub Actions Inactive

prasmussen15 had a problem deploying to development February 11, 2025 16:59 — with GitHub Actions Failure

ellipsis-dev bot reviewed Feb 11, 2025

View reviewed changes

graphiti_core/utils/maintenance/node_operations.py Outdated Show resolved Hide resolved

prasmussen15 added 8 commits February 12, 2025 14:22

ontology

bc1dc0c

extract and save node labels

8abae59

extract entity type properties

6eb199d

neo4j upgrade needed

1dd91d9

add entity types

668d317

update typing

7f78678

update types

9550e00

updates

8d2e7c6

prasmussen15 force-pushed the custom-ontology branch from 1fdba66 to 8d2e7c6 Compare February 12, 2025 19:24

prasmussen15 had a problem deploying to development February 12, 2025 19:24 — with GitHub Actions Failure

prasmussen15 temporarily deployed to development February 12, 2025 19:24 — with GitHub Actions Inactive

prasmussen15 had a problem deploying to development February 12, 2025 19:24 — with GitHub Actions Failure

ellipsis-dev bot reviewed Feb 12, 2025

View reviewed changes

graphiti_core/utils/maintenance/node_operations.py Outdated Show resolved Hide resolved

Update graphiti_core/utils/maintenance/node_operations.py

89965a2

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>

prasmussen15 had a problem deploying to development February 12, 2025 19:28 — with GitHub Actions Failure

prasmussen15 temporarily deployed to development February 12, 2025 19:28 — with GitHub Actions Inactive

prasmussen15 had a problem deploying to development February 12, 2025 19:28 — with GitHub Actions Failure

fix warning

46b279b

prasmussen15 had a problem deploying to development February 12, 2025 19:29 — with GitHub Actions Failure

prasmussen15 deployed to development February 12, 2025 19:29 — with GitHub Actions Active

ellipsis-dev bot reviewed Feb 12, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom ontology #262

Custom ontology #262

prasmussen15 commented Feb 7, 2025 •

edited by ellipsis-dev bot

Loading

ellipsis-dev bot left a comment

paul-paliychuk left a comment

ellipsis-dev bot left a comment

ellipsis-dev bot left a comment

ellipsis-dev bot left a comment

ellipsis-dev bot left a comment

Custom ontology #262

Are you sure you want to change the base?

Custom ontology #262

Conversation

prasmussen15 commented Feb 7, 2025 • edited by ellipsis-dev bot Loading

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

paul-paliychuk left a comment

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

prasmussen15 commented Feb 7, 2025 •

edited by ellipsis-dev bot

Loading