Skip to content

Latest commit

 

History

History
144 lines (113 loc) · 3.73 KB

README.md

File metadata and controls

144 lines (113 loc) · 3.73 KB

wine-prediction

Wine-Prediction classifies the wine label based upon following features:

  • fixed acidity
  • volatile acidity
  • citric acid
  • residual sugar
  • chlorides
  • free sulfur dioxide
  • total sulfur dioxide
  • density
  • pH
  • sulphates
  • alcohol
  • quality
  • label

This application is built to demonstrate the machine learning pipeline using widely used technologies.

Dataset

Dataset is extracted from the UCI.

Architecture Diagram

airflow_diagram

Used Technologies

  • Flask
  • Python
  • Streamlit
  • Postgresql
  • AirFlow 2.2
  • Grafana

Steps to Run Application

  1. Install Dependencies
  2. Run API
  3. Run Airflow
  4. Run Frontend

Install Dependencies

  1. Create a virtual environment with python3
    python3 -m venv wine_prediction
  2. Activate the virtual environment:
    cd wine_prediction
    source /bin/activate
  3. Install dependencies
    pip install -r requirements.txt

Run API

  1. Create database and add .env file in api/.env. template of .env is as follows:
    DATABASE_NAME = YOUR_DATABASE
    DATABASE_PORT = 5432
    USER_NAME = YOUR_DATABASE_USER
    USER_PASSWORD = YOUR_DATABASE_USER_PASSWORD
  2. Navigate to root of the project
  3. Set environment variables
    export FLASK_APP=app:create_app
    export APP_SETTINGS="api.config.DevelopmentConfig"
  4. Run Flask
    flask run

Run Frontend

  1. Navigate to the /frontend directory of application
  2. Run streamlit application as:
   streamlit run run.py

Run Airflow

  1. Create database user and grant all permission to that user which will be used to store the logs of airflow

    Create user using psql shell.

    CREATE DATABASE wine_airflow;
    CREATE USER airflow_user WITH ENCRYPTED PASSWORD 'airflow_pass';
    GRANT ALL PRIVILEGES ON DATABASE wine_airflow TO airflow_user;
    
  2. Go to root directory of project and set env variable AIRFLOW_HOME as:

    export AIRFLOW_HOME=$PWD/airflow
  3. Initialize database

    airflow db init
  4. Create User (username:admin, password:admin) to access the airflow web application which will be run on http://localhost:8080

    airflow users create --username admin --firstname admin --lastname admin --role Admin --email [email protected] --password admin
  5. Start Airflow Scheduler

    # Set Environment variable to use postgresql as database to store airflow log
    export AIRFLOW__CORE__SQL_ALCHEMY_CONN=postgresql+psycopg2://airflow_user:airflow_pass@localhost/wine_airflow
    
    airflow scheduler
  6. Start Web Server

    # Set Environment variable to use postgresql as database to store airflow log
    export AIRFLOW__CORE__SQL_ALCHEMY_CONN=postgresql+psycopg2://airflow_user:airflow_pass@localhost/wine_airflow
    
    airflow webserver

Once you run the webserver you can access airflow dashboard on http://localhost:8080.

Airflow has the following data ingestion pipeline:

airflow_diagram

When the data validation fails, airflow sends email to the respective member which can be configured by adding following variables in airflow. To check this scenario we can enable mimic_validation_fail in airflow variable.

airflow_diagram

Data Drift Report

Data Drift report can be generated by running the jupyter notebook available in the directory /notebooks/data_drift_report.ipynb. If there is drift in data reporting will be of the following format.

airflow_diagram