Skip to content
This repository has been archived by the owner on May 17, 2023. It is now read-only.

please implement the from_pretrained tokenizer method #12

Open
cregouby opened this issue Jan 29, 2022 · 0 comments
Open

please implement the from_pretrained tokenizer method #12

cregouby opened this issue Jan 29, 2022 · 0 comments

Comments

@cregouby
Copy link
Contributor

Hello,

current behavior

The Quicktour from Huggingface ends with the following paragraph :

image

but currently, {hftokenizer} do not allow to load pretrained tokenizers.

expected behavior

I'd like to be able to reuse pretrained tokenizers already available in the LM models present in the wild ( BERT, RoBERTa and friends) and / or in my local cache folder, in order to feed those models with the result of {hftokenizer} tokenizer$encode()$ids.

And have the Quicktour vignette to cover the API to do it.

Thanks a lot !

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant