-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bag of words - what is the delimiter? #129
Comments
The tokenizer will tokenize the string in the following way:
It's not splitting text into tokens using a comma delimiter. If you want the behavior to instead be three tokens |
Do you have an example of how that would work? |
You would need to pre-process your csv using another tool. Alternatively, you can use an In the example linked above, the "chest_pain" column is specified as type "enum" with four variants.
For your dataset, you would specify that the Then, use the config file by passing |
Consider a table:
Am I using the commas to infer the bag of words correctly?
The text was updated successfully, but these errors were encountered: