Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: Shape of passed values is (15376, 28211), indices imply (15375, 28211) #1

Open
mdurante1 opened this issue Mar 16, 2019 · 4 comments

Comments

@mdurante1
Copy link

mdurante1 commented Mar 16, 2019

Hello,

I have tested your tool out on the example data that you provided and it seems to work very nicely. I proceeded to run my own data set with the default training set and received good results. I then tried to test the "tcell_subtype" dataset you describe in your manuscript and received the error below. Can you please provide any insight into the source of this error?

Best,
Michael

(base) mdurante@hlab4:~/software/ACTINN$ python actinn_format.py -I dataset.txt -o tcell_subset -f txt
Dimension of the matrix after removing non-zero rows: (22430, 16740)
(base) mdurante@hlab4:~/software/ACTINN$ python actinn_predict.py -trs ./test_data/tcell_subtype_ref.h5 -trl ./test_data/tcell_subtype_ref_label.txt -ts ./tcell_subset.h5 -lr 0.0001 -ne 50 -ms 128 -pc True
actinn_predict.py:286: FutureWarning: read_table is deprecated, use read_csv instead, passing sep='\t'.
  train_label = pd.read_table(args.train_label, header=None)
actinn_predict.py:37: FutureWarning: Sorting because non-concatenation axis is not aligned. A future version
of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.

To retain the current behavior and silence the warning, pass 'sort=True'.

  total_set = np.array(pd.concat(sets, axis=1), dtype=np.float32)
Traceback (most recent call last):
  File "actinn_predict.py", line 291, in <module>
    train_set, test_set = scale_sets([train_set, test_set])
  File "actinn_predict.py", line 37, in scale_sets
    total_set = np.array(pd.concat(sets, axis=1), dtype=np.float32)
  File "/home/mdurante/miniconda3/lib/python3.6/site-packages/pandas/core/reshape/concat.py", line 229, in concat
    return op.get_result()
  File "/home/mdurante/miniconda3/lib/python3.6/site-packages/pandas/core/reshape/concat.py", line 426, in get_result
    copy=self.copy)
  File "/home/mdurante/miniconda3/lib/python3.6/site-packages/pandas/core/internals/managers.py", line 2065, in concatenate_block_managers
    return BlockManager(blocks, axes)
  File "/home/mdurante/miniconda3/lib/python3.6/site-packages/pandas/core/internals/managers.py", line 114, in __init__
    self._verify_integrity()
  File "/home/mdurante/miniconda3/lib/python3.6/site-packages/pandas/core/internals/managers.py", line 311, in _verify_integrity
    construction_error(tot_items, block.shape[1:], self.axes)
  File "/home/mdurante/miniconda3/lib/python3.6/site-packages/pandas/core/internals/managers.py", line 1691, in construction_error
    passed, implied))
ValueError: Shape of passed values is (15376, 28211), indices imply (15375, 28211)
@mafeiyang
Copy link
Owner

Hi Michael,

Thanks for trying the tool. It looks like a pandas data frame issue, and one gene in your matrix is causing the problem.
Can you remove the genes that are lowly expressed, say, the average nUMI is less than 0.1 and try the tool again? And if you can remove the "NA" in your input matrix, that will be helpful, too.

Best,
Feiyang

@raph06
Copy link

raph06 commented Mar 26, 2019

Hi,
I append to stumbled upon the same issue a couple of days ago.
It arose from a duplicate gene in the training dataset (C2ORF15).
After removing this gene from the common_gene array. Everything worked smoothly.

Edit: It also append with another dataset and C2ORF15 was the culprit as well. This gene doesn't seem to be duplicated in the input dataset although it is clearly duplicated in sets[0]. This is why scale_sets([train_set, test_set]) function fails to execute properly.

Hope that helps
Best
Raphael

@Weiwen1992
Copy link

I have the same problem. Turns out there is indeed C2ORF15 duplicate in the training dataset....

@mafeiyang
Copy link
Owner

Hi All,

Thanks for bringing the problem up.
I revised the code to remove the duplicated genes in the datasets. Now we won't get the shape error from pandas dataframe.

Best,
Feiyang

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants