This folder contains the core implementation of the MLentory ETL (Extract, Transform, Load) pipeline, designed to collect and process machine learning model metadata from various sources.
MLentory Pipeline Architecture
code/
├── extractors/
├── transform/
└── load/
Platform-specific modules that extract ML model metadata from different sources:
- HuggingFace Hub extractor
- Future extractors for other platforms
For detailed information, see the extractors documentation
Transforms extracted data into a standardized schema:
- Configurable transformation rules
- Field processing and validation
- Schema mapping
For detailed information, see the transform documentation .
Handles storage and versioning of processed data:
- PostgreSQL for relational data
- Virtuoso for RDF triples
- Elasticsearch for search capabilities
For detailed information, see the load documentation .
If you want to run the full extraction, transformation and loading process you can follow the instructions in the deployment documentation.
If you want to run any of the specific components you need to have the prerequisites installed from the deployment documentation, if you already have them installed you can follow the instuctions from any of the components folders.