Skip to content

Commit

Permalink
Update datasources.py (#93)
Browse files Browse the repository at this point in the history
* Update datasources.py

Removing Hardcoded limit to pdf text.
We kept it for a demo but now is undocumented behavior

* bump version

---------

Co-authored-by: Matthew Russo <[email protected]>
  • Loading branch information
vitaglianog and mdr223 authored Jan 30, 2025
1 parent c4183e0 commit cb72e4e
Show file tree
Hide file tree
Showing 3 changed files with 3 additions and 3 deletions.
2 changes: 1 addition & 1 deletion docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
project = "Palimpzest"
copyright = "2025, MIT Data Systems Group"
author = "MIT Data Systems Group"
release = "0.5.1"
release = "0.5.2"

# -- General configuration ---------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[project]
name = "palimpzest"
version = "0.5.1" # if you update this, be sure to update package version in `docs/source/conf.py` as well
version = "0.5.2" # if you update this, be sure to update package version in `docs/source/conf.py` as well
description = "Palimpzest is a system which enables anyone to process AI-powered analytical queries simply by defining them in a declarative language"
readme = "README.md"
requires-python = ">=3.8"
Expand Down
2 changes: 1 addition & 1 deletion src/palimpzest/core/data/datasources.py
Original file line number Diff line number Diff line change
Expand Up @@ -287,7 +287,7 @@ def get_item(self, idx: int) -> DataRecord:
dr = DataRecord(self.schema, source_id=filepath)
dr.filename = pdf_filename
dr.contents = pdf_bytes
dr.text_contents = text_content[:15000] # TODO Very hacky
dr.text_contents = text_content

return dr

Expand Down

0 comments on commit cb72e4e

Please sign in to comment.