data sequence and customized csv dataset #2733
-
Hi, But, when the csv is very wide (such as having 1000 columns), it is impossible to construct a struct with all the columns as fields manually. Are there an easy way ? Besides, how to construct a tensor from a digital string ? Here is my intention: let s = String::from("1, 2, 3, 4, 5, 6");
let tensor = Tensor::<B, 1>::from_floats(s); // I want a tensor [1, 2, 3, 4, 5, 6], but this will not work They may be useful when implementing a LSTM as sequences are needed. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 3 replies
-
The burn/crates/burn-dataset/src/dataset/in_memory.rs Lines 72 to 88 in 245fbcd If that doesn't fit your needs, you can implement your own parsing to create the
You cannot construct a tensor from a string. For NLP tasks, you need to go from the string representation to tokens. This can be done in many different ways, so the implementation is up to the user. Modern techniques involve tokenization, where strings (e.g., sentences) are split into smaller units (e.g., words, subwords, or characters) called tokens, and these tokens are mapped to unique integers using a vocabulary. See for example the tokenizer in the text classification example. |
Beta Was this translation helpful? Give feedback.
-
There is also [ |
Beta Was this translation helpful? Give feedback.
There is also [
DataframeDataset](https://burn.dev/burn-book/building-blocks/dataset.html#storage)
option that you can use. It uses Polars dataframe underneath. You can use Polars to read and manipulate CSV. If you can achieve your goal via Polars, then you can load dataframe as a dataset.