Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add HF dataset support to DPK #88

Open
deanwampler opened this issue Jan 21, 2025 · 4 comments
Open

Add HF dataset support to DPK #88

deanwampler opened this issue Jan 21, 2025 · 4 comments
Assignees

Comments

@deanwampler
Copy link
Contributor

No description provided.

@blublinsky
Copy link
Contributor

The initial implementation is here: blublinsky/dpk#2

@blublinsky
Copy link
Contributor

blublinsky commented Jan 27, 2025

The official DPK PR is here IBM/data-prep-kit#962

There is also https://github.com/IBM/data-prep-kit/tree/dev/transforms/code/license_select transform which does something very similar to what we need

@deanwampler
Copy link
Contributor Author

We'll work on a fork for now.

@blublinsky
Copy link
Contributor

Currently DPK allows to define only one data source for both the input and output for the transformer
Starting a new PR to allow separate definition for input and output for processing, so that for example, you can transform data from S3 and write it HF and vice versa

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

No branches or pull requests

2 participants