Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

torchtext end-of-life and broken #829

Closed
egpbos opened this issue Jul 29, 2024 · 3 comments
Closed

torchtext end-of-life and broken #829

egpbos opened this issue Jul 29, 2024 · 3 comments
Assignees

Comments

@egpbos
Copy link
Member

egpbos commented Jul 29, 2024

As noted in #827 and other recent PRs with breaking CI workflows, the torchtext package seems to be breaking down. It is no longer being developed (see pytorch/text#2250).

Some options:

  1. Find a workaround ourselves.
  2. Look for a fork that is still maintained and switch to that.
  3. Replace torchtext as a dependency.

Option 3 seems the most attractive to me, naively, but I haven't looked deeply into how unique the functionality is that we use. We only use torchtext in two ways:

  • from torchtext.data import get_tokenizer in utils/tokenizer.py
  • from torchtext.vocab import Vectors in test/utils.py, in a couple of notebooks and in the dashboard.

Can these easily be replaced? If not, a fourth option presents itself:

  1. Cannibalize torchtext for these parts only.
@loostrum
Copy link
Member

loostrum commented Jul 31, 2024

Yikes, I guess we should have followed up on the deprecation warning earlier ;)
I'll see if we can replace torchtext. In the linked pytorch issue they also suggest taking from the torchtext source what you need, so option 4. may not be the worst idea, we can just put them in a dianna text utils file.

@loostrum loostrum self-assigned this Jul 31, 2024
@cwmeijer
Copy link
Member

cwmeijer commented Aug 7, 2024

Spacy, a dependency of DIANNA already, may provide both functionalities. Looking at their code, it seems so. I came across a vocab.Vectors and a get_tokenizer in their code. That would be a nice solution I think (that's an option 3 for ya 😉).

@SarahAlidoost
Copy link
Member

As a temporary fix to get CI green again, torch version is pinned in #841. Once this issue is resolved, the temporary fix can be removed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants