-
Notifications
You must be signed in to change notification settings - Fork 3
Issues: The-AI-Alliance/open-trusted-data-initiative
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
Add DPK transformer to obtain individual documents license information for a given dataset
#101
opened Feb 6, 2025 by
blublinsky
Add support for splitting input and output data sources in DPK
#100
opened Feb 6, 2025 by
blublinsky
Leverage DataPerf?
data pipelines
Defining and implementing data processing pipelines
#99
opened Feb 5, 2025 by
deanwampler
Add a section to the website catalog page that reports on the HF summary statistics
data pipelines
Defining and implementing data processing pipelines
dataset catalog
All aspects of managing the catalog and its use
"Semi-automate" periodic gathering of HF dataset statistics
data pipelines
Defining and implementing data processing pipelines
Create a schedule of AIA blog posts for OTDI
evangelism
Anything related to public exposure
#95
opened Jan 31, 2025 by
deanwampler
Add link to blog post announcing OTDI at the AI Summit in Paris
documentation
Improvements or additions to documentation
Explore ideas for crowd-sourcing tools and processes for building datasets
contribution process
Steps for contributing datasets and validating contributions.
#90
opened Jan 23, 2025 by
deanwampler
Replace the form submission process that sends an email with an actual web service invocation.
#89
opened Jan 21, 2025 by
deanwampler
Open Source DPK components for processing FineWeb data
data pipelines
Defining and implementing data processing pipelines
#87
opened Jan 17, 2025 by
deanwampler
Evaluate ml-metadata project for possible use for our metadata management needs.
dataset catalog
All aspects of managing the catalog and its use
#82
opened Jan 13, 2025 by
deanwampler
Explore possible connection to IETF initiative for "AI prefs"
#81
opened Jan 10, 2025 by
deanwampler
Define the takedown process
dataset catalog
All aspects of managing the catalog and its use
dataset requirements
All aspects of the specification for acceptable datasets.
#79
opened Jan 8, 2025 by
deanwampler
Evaluate Open Metadata for the catalog
dataset catalog
All aspects of managing the catalog and its use
#78
opened Jan 8, 2025 by
deanwampler
Define the submission checklist used by review committee
contribution process
Steps for contributing datasets and validating contributions.
dataset requirements
All aspects of the specification for acceptable datasets.
Evaluate LinkedIn Data Hub as a catalog system
dataset catalog
All aspects of managing the catalog and its use
dataset requirements
All aspects of the specification for acceptable datasets.
help wanted
Extra attention is needed
Investigate using Databricks-sponsored Unity Catalog for metadata management
administration
Misc. admin. tasks, like organizing the work, recruiting participants, etc.
data pipelines
Defining and implementing data processing pipelines
dataset catalog
All aspects of managing the catalog and its use
#71
opened Dec 12, 2024 by
deanwampler
There are needs to support hidden/restricted data. Investigate what we might do
dataset catalog
All aspects of managing the catalog and its use
dataset requirements
All aspects of the specification for acceptable datasets.
#70
opened Dec 12, 2024 by
deanwampler
How should we integrate the unitxt catalog?
dataset catalog
All aspects of managing the catalog and its use
Previous Next
ProTip!
Type g i on any issue or pull request to go back to the issue listing page.