Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not sure if there's a bug with process_unlabeled or if it is my data. Fails with error - 'AttributeError: 'MatchingBatch' object has no attribute 'id'' #86

Open
iamkavinarasu opened this issue Jun 14, 2021 · 3 comments

Comments

@iamkavinarasu
Copy link

image
I have a train data that looks like the above screenshot.

image
I have a validation dataset that looks like the above screenshot.

image
And the unlabelled dataset that looks like this.

Now, it is all good till the training happens and it fails with the error as shown in this screenshot (
image
) while it tries to run on the unlabelled dataset.

This is how the code looks like - (
image
)

Please help me out on this.

@iamkavinarasu
Copy link
Author

@sidharthms - I'm sorry for bothering you, but did you have time to look into this? It'd be a very big help if you can help me find out if it is my data that is being wrongly structured, or if it's a bug? Thank you so much!

@spycherf
Copy link

@iamkavinarasu I am not sure if you still need help with this, but for future reference: changing the encoding of my unlabeled dataset from UTF-8 BOM to UTF-8 fixed it for me.

@NPap0
Copy link

NPap0 commented Sep 22, 2022

Ok so I was getting the same error and the reason was because the candidate's(as per the documentation) dataframe had to have the same columns as the train dataframe and I had to drop the id column(or ignore it as we did with the extra left/right id). That was the only way of running dm.data.process_unlabeled but when I went ahead and ran predictions I got an error saying there was no ID column.

And @spycherf 's solution worked!
(When saving to csv I just added the encoding option and then I was able to run dm.data.process_unlabeled with the id column and run predictions afterwards)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants