Bookstore ETL

-- Under construction --

Project Overview

Bookstore ETL is a data pipeline that extracts book metadata from the Open Library API, transforms it for relevance, and loads it into a PostgreSQL database. The goal is to develop a personalized book recommendation system based on user preferences.

Features

Extract: Retrieves book metadata from the Open Library API.
Transform: Cleans and enriches the data by processing publication years, genres, and other metadata.
Load: Stores the structured data in a PostgreSQL database.
Scoring System:
- Subject Matching (40%)
- Ratings & Popularity (30%)
- Text Similarity (30%)
Testing: Unit tests implemented using pytest.

Technologies Used

Python (ETL logic and API interactions)
PostgreSQL (Database storage)
SQLAlchemy (Database connection)
pandas (Data transformation)
pytest (Testing framework)
Logging (Monitoring ETL processes)

Project Structure

bookstore_etl/
│── src/
│   ├── extract.py
│   ├── transform.py
│   ├── load.py
│   ├── utils.py
│   ├── main.py
│── tests/
│   ├── test_extract.py
│   ├── test_transform.py
│   ├── test_load.py
│── logs/
│   ├── etl.log
│── requirements.txt
│── README.md
│── .gitignore

Installation

Clone the repository:

git clone https://github.com/janlodewijk/bookstore_etl.git

Navigate to the project directory:
```
cd bookstore_etl
```
Create a virtual environment:
```
python -m venv venv
```
Activate the virtual environment:
- Windows:
```
venv\Scripts\activate
```
- macOS/Linux:
```
source venv/bin/activate
```
Install dependencies:
```
pip install -r requirements.txt
```

Usage

Run the ETL pipeline with:

python src/main.py

Environment Variables

Create a .env file to store API keys and database credentials:

OPEN_LIBRARY_API_KEY=your_api_key
DATABASE_URL=postgresql://user:password@localhost:5432/bookstore

Testing

Run tests using:

pytest tests/

Future Improvements

Enhance the recommendation algorithm.
Implement a front-end interface.
Deploy the system for broader use.

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
__pycache__		__pycache__
logs		logs
src		src
tests		tests
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
notes.txt		notes.txt
pytest.ini		pytest.ini
quick_check.py		quick_check.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bookstore ETL

Project Overview

Features

Technologies Used

Project Structure

Installation

Usage

Environment Variables

Testing

Future Improvements

License

About

Releases

Packages

Languages

janlodewijk/bookstore_etl

Folders and files

Latest commit

History

Repository files navigation

Bookstore ETL

Project Overview

Features

Technologies Used

Project Structure

Installation

Usage

Environment Variables

Testing

Future Improvements

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages