This demo project demonstrates how to load nested data from separate API endpoints, where multiple endpoints rely on the response of one endpoint. It demonstrates how to set up dlt
(Data Loading Tool) resources, including transformer resources and a source that merges them into a single dataset. Additionally, it includes a pipeline that handles the data ingestion process. PostgreSQL is used as the storage destination, and data is sourced from the Coinpaprika API.
-
Docker Desktop.
Visit their official page to download.
-
DBeaver or a different database administration tool of your choice.
Download DBeaver.
-
Clone this repo.
-
Install the necessary dependencies for Postgres.
pip install -r requirements.txt
-
Setup PostgreSQL using the public image.
$ docker pull postgres
-
Run the Docker container using the postgres:latest image with the below command.
$ docker run -itd -e POSTGRES_USER=loader -e POSTGRES_PASSWORD=password -p 5432:5432 -v /data:/var/lib/postgresql/data --name postgresql postgres
Replace the first
/data
with the absolute path to the directory on your local machine that you want to map to/var/lib/postgresql/data
inside the container. -
Connect with the database.
PGPASSWORD=password psql -h localhost -p 5432 -U loader
-
Create a new database.
CREATE DATABASE demo_data;
-
Enter your credentials into
.dlt/secrets.toml
.[destination.postgres.credentials] database = "demo_data" username = "loader" password = "password" # replace with your password host = "localhost" # or the IP address location of your database port = 5432 connect_timeout = 15
-
Run your pipeline.
$ python3 dlt_pipeline_merged.py
-
Understand your resources and sources.
Explanation to be added by Anuun.
-
Understand your pipeline.
Explanation to be added by Anuun.