Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Caching error with groupby #88

Closed
lisadunlap opened this issue Jan 21, 2025 · 4 comments
Closed

Caching error with groupby #88

lisadunlap opened this issue Jan 21, 2025 · 4 comments
Assignees
Labels
bug Something isn't working

Comments

@lisadunlap
Copy link

lisadunlap commented Jan 21, 2025

Thanks for fixing the previous caching! Found one more bug

Describe the bug
Getting caching errors when group by is used for sem_agg

Input:

summaries = results.sem_agg(
        create_reduce_prompt(num_final_vibes),
        group_by="cluster_id",
        suffix="reduced axes",
    )

Output:

Traceback (most recent call last):
  File "/home/lisabdunlap/VibeCheck/main.py", line 424, in <module>
    main(
  File "/home/lisabdunlap/VibeCheck/main.py", line 257, in main
    vibes = propose_vibes(
  File "/home/lisabdunlap/VibeCheck/main.py", line 197, in propose_vibes
    summaries = results.sem_agg(
  File "/home/lisabdunlap/miniconda3/envs/vibecheck/lib/python3.10/site-packages/lotus/cache.py", line 78, in wrapper
    result = func(self, *args, **kwargs)
  File "/home/lisabdunlap/miniconda3/envs/vibecheck/lib/python3.10/site-packages/lotus/sem_ops/sem_agg.py", line 197, in __call__
    return pd.concat(list(executor.map(SemAggDataframe.process_group, group_args)))
  File "/home/lisabdunlap/miniconda3/envs/vibecheck/lib/python3.10/concurrent/futures/_base.py", line 621, in result_iterator
    yield _result_or_cancel(fs.pop())
  File "/home/lisabdunlap/miniconda3/envs/vibecheck/lib/python3.10/concurrent/futures/_base.py", line 319, in _result_or_cancel
    return fut.result(timeout)
  File "/home/lisabdunlap/miniconda3/envs/vibecheck/lib/python3.10/concurrent/futures/_base.py", line 458, in result
    return self.__get_result()
  File "/home/lisabdunlap/miniconda3/envs/vibecheck/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/home/lisabdunlap/miniconda3/envs/vibecheck/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/lisabdunlap/miniconda3/envs/vibecheck/lib/python3.10/site-packages/lotus/sem_ops/sem_agg.py", line 150, in process_group
    return group.sem_agg(user_instruction, all_cols, suffix, None, progress_bar_desc=progress_bar_desc)
  File "/home/lisabdunlap/miniconda3/envs/vibecheck/lib/python3.10/site-packages/lotus/cache.py", line 71, in wrapper
    cached_result = model.cache.get(cache_key)
  File "/home/lisabdunlap/miniconda3/envs/vibecheck/lib/python3.10/site-packages/lotus/cache.py", line 156, in get
    with self.conn:
sqlite3.ProgrammingError: SQLite objects created in a thread can only be used in that same thread. The object was created in thread id 138740594186048 and this is thread id 138728762771136.

Input that works:

    summaries = results.sem_agg(
        create_reduce_prompt(num_final_vibes),
        suffix="reduced axes",
    )
@lisadunlap lisadunlap added the bug Something isn't working label Jan 21, 2025
@StanChan03
Copy link
Collaborator

StanChan03 commented Jan 22, 2025

@dhruviyer Any thoughts on this threading issue with sqlite?

@dhruviyer
Copy link
Collaborator

dhruviyer commented Jan 23, 2025

@StanChan03 I will look into it! Appears to be a problem when using multi-threading with caching

@dhruviyer dhruviyer self-assigned this Jan 23, 2025
@StanChan03
Copy link
Collaborator

Yea, I was discussing this with Sid, either maybe sync the threads or have an alternative caching backend

@dhruviyer
Copy link
Collaborator

Tracking a potential fix for this in #92

liana313 pushed a commit that referenced this issue Feb 6, 2025
Fixes the issue described in #88 by creating a new SQlite connection per
thread and handles cleanup accoridingly by storing the connection in a
Thread.Local() object.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants