TorchData or Torch.Utils.Data ? #580
Replies: 8 comments 2 replies
-
Adding @michaelgsharp and @tarekgh for visibility. @GeorgeS2019 -- good timing! We're just now starting to internally discuss better coverage of NLP scenarios for both ML.NET and TorchSharp. We're first discussing what to do about tokenizers, which is what torch.data.utils is primarily about. Haven't gotten very far in our discussion yet. |
Beta Was this translation helpful? Give feedback.
-
@NiklasGustafsson @michaelgsharp @tarekgh Having a preliminary look at how pytorch and onnx have been approaching Deep NLP. I personally feel Onnxruntime has c++ codes that are far more ahead than that of pytorch (torchtext, tourch.data.util). The python and .NET communities are generally leaning towards the onnxruntime direction when come to deep NLP. One of key drivers is the easily availability of Deep NLP Onnx provided through e.g. Onnxruntime, Onnx/Zoo and HuggingFace. I personally feel that the priority of TorchSharp and ML.NET has always been supporting .NET community to do deep ML within the framework of .NET. The decision of adhering to pytorch python syntax was correct. The .NET community's commitment to Torchsharp is the testimony of that decision. For Deep NLP, I am skeptical, In my view, if Torchsharp needs to adhere to what TorchText, TorchData is doing. For python community, it seems it is preferred to adopt what has been Implemed in onnxruntime, in term of pre and post inference operators. e.g. instead of using the tokenizers provided by TorchText, it is more effective to use the SAME tokenizers provided by onnxruntime. In other words, when come to Deep NLP, onnxruntime is a more complete environment than that of TorchText, TorchData. I suggest Torchsharp and ML.NET take the Onnxruntime path. Torchsharp provides bridging and utility codes typically performed by pytorch when come to building and manipulating models (e.g. Onnx) and training and inference (e.g. Onnxruntime). For deep NLP, reusing what have been developed in onnxruntime is key for delivering deep NLP by torchsharp and ML.NET Looking for feedback! |
Beta Was this translation helpful? Give feedback.
-
@NiklasGustafsson @michaelgsharp @tarekgh @luisquintanilla The following ReadMe from TorchText explains it. Only basic are provided
While onnxruntime provides not just basic but a more complete environment driven by contribution from industry users. This is why I thought onnxruntime is a better choice. THE KEY: organize what onnxruntime provides in a folder and namespace structure consistent with TorchText. The above suggestion does not include the workflow and nlp pipeline planned for TorchData. The suggestion will have multiple impact not just for TorchSharp and ML.NET BUT also Microsoft Bot Framework.
|
Beta Was this translation helpful? Give feedback.
-
@SergeiAlonichau I included you here as the same tokenizers provided by TorchText and Onnxruntime to some extend also addressed by BlingFire.
|
Beta Was this translation helpful? Give feedback.
-
For ML.NET, we prefer having tokenizers design as HuggingFace tokenizers. It is sophisticated and more flexible. We are doing some prototypes to have the full implementation done in C# to avoid some other complexity with the native dependencies (like producing all supported architectures bits x86, x64, arm...etc. and, avoid packaging issues, and to avoid problems when need to use the tokenizers inside WASM if needed). Note, we want to have the tokenizers for global scenarios and not necessarily restricted for any specific runtimes (ONNX, TorchSharp,...). |
Beta Was this translation helpful? Give feedback.
-
The onnxrunime project, which involves many industry partners, has resulted in the use case analysis into delivering REUSABLE pre- and post-processing algorithms. It is the Reusability (in both pre and post processing the result of consensus feedback from these industry partners) is what we HOPE in your claimed c# approach. With NativeAOT, the c# approach will lead to high performance without compromising ease of maintainability that comes with c# |
Beta Was this translation helpful? Give feedback.
-
@GeorgeS2019 -- as always, thank you for your enthusiasm and interest in making DL in .NET the best it can be. As far as TorchSharp goes, I want to make sure that whatever we do, we stick with the simple value prop -- it is .NET bindings to the native engine underlying Pytorch, i.e. libtorch. It is no more, no less. We can build other libraries and integrate other technologies into ML.NET, or SciSharp, or other things we haven't come up with yet, but TorchSharp should remain simple and focused. So, in one sense, if it's documented at pytorch.org, then it belongs in TorchSharp. If not, then it doesn't. |
Beta Was this translation helpful? Give feedback.
-
TorchData or Torch.Utils.Data ?
In TorchSharp, there are
I am posting here more of a reflection of this Torchsharp NLP NER Support issue
It is an NLP use case and yet it is not part of TorchText. The data loading functions used in pytorch-ner is more related to torch.utils.data.
Question: is there a need for Torchsharp data utility functions to be more align with those of pytorch? Shall TorchSharp.Data more aligns with that of TorchData or TorchSharp.Utils more aligns with that of torch.utils.data?
Other discussion related to TorchData
Beta Was this translation helpful? Give feedback.
All reactions