Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in read_bb() in cluster_bins.py due to change in pandas df.groupby() #196

Open
BramLimSJ opened this issue Oct 31, 2023 · 4 comments

Comments

@BramLimSJ
Copy link

BramLimSJ commented Oct 31, 2023

When running the demo script (demo-complete.sh), I encountered the following error:

File "/software/team274/bl10/miniconda3/envs/hatchet/lib/python3.9/site-packages/hatchet/utils/cluster_bins.py", line 71, in main assert str(start_row[0]) == my_chr, (start_row[0], my_chr) AssertionError: ('chr22', "('chr22',)")

The error appears to be due to line 144 in cluster_bins.py:

for ch, df0 in bb.groupby(['#CHR']):

The newer version of pandas outputs ch = ('chr22',) instead of ch = 'chr22'. This occurs when a length-1 list is supplied to df.groupby(). Refer to this issue here.

The error can be overcome by changing line 144 to :

for ch, df0 in bb.groupby('#CHR'):

In my environment, pandas (v2.1.2) was installed as a dependency when I installed hatchet (v1.1.1) using:

conda install hatchet=1.1.1

@BramLimSJ
Copy link
Author

In addition, there is an extra comma in line 190 after chr_labels, which could potentially return a tuple. However, this does not seem to be the problem here.

@ronkesm
Copy link

ronkesm commented Nov 3, 2023

An issue I've come up against as well. Can also be fixed by installing pandas==1.5.0 - there are no conflicts with other dependencies on install.

@vineetbansal
Copy link
Collaborator

Thanks @ronkesm, @BramLimSJ - I'm inclined to pin the version of pandas in our pyproject.toml so that this error does not happen if you do a pip install of our package. Does this seem like an acceptable solution at this time, or are you using the latest features of pandas elsewhere in your code such that we need to fix this issue?

@BramLimSJ
Copy link
Author

Pinning an older version of pandas could work. However, I personally feel the newer version of pandas is more stable as df.groupby() with a list argument always returns a tuple when iterated (including a length-1 tuple) and df.groupby() with a string argument always returns a string.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants