-- Under construction --
Bookstore ETL is a data pipeline that extracts book metadata from the Open Library API, transforms it for relevance, and loads it into a PostgreSQL database. The goal is to develop a personalized book recommendation system based on user preferences.
- Extract: Retrieves book metadata from the Open Library API.
- Transform: Cleans and enriches the data by processing publication years, genres, and other metadata.
- Load: Stores the structured data in a PostgreSQL database.
- Scoring System:
- Subject Matching (40%)
- Ratings & Popularity (30%)
- Text Similarity (30%)
- Testing: Unit tests implemented using
pytest
.
- Python (ETL logic and API interactions)
- PostgreSQL (Database storage)
- SQLAlchemy (Database connection)
- pandas (Data transformation)
- pytest (Testing framework)
- Logging (Monitoring ETL processes)
bookstore_etl/
│── src/
│ ├── extract.py
│ ├── transform.py
│ ├── load.py
│ ├── utils.py
│ ├── main.py
│── tests/
│ ├── test_extract.py
│ ├── test_transform.py
│ ├── test_load.py
│── logs/
│ ├── etl.log
│── requirements.txt
│── README.md
│── .gitignore
- Clone the repository:
git clone https://github.com/janlodewijk/bookstore_etl.git
- Navigate to the project directory:
cd bookstore_etl
- Create a virtual environment:
python -m venv venv
- Activate the virtual environment:
- Windows:
venv\Scripts\activate
- macOS/Linux:
source venv/bin/activate
- Windows:
- Install dependencies:
pip install -r requirements.txt
Run the ETL pipeline with:
python src/main.py
Create a .env
file to store API keys and database credentials:
OPEN_LIBRARY_API_KEY=your_api_key
DATABASE_URL=postgresql://user:password@localhost:5432/bookstore
Run tests using:
pytest tests/
- Enhance the recommendation algorithm.
- Implement a front-end interface.
- Deploy the system for broader use.
This project is licensed under the MIT License.