Skip to content

Load and clean taxi trip data using python spark in azure environment

Notifications You must be signed in to change notification settings

bssulfikkar/tripdata

Repository files navigation

trip data cleansing and loading into hive table using python and spark

tripdata

Load trip data from different csv files clean the data and create hive external tables. create some use case and process the data store the final result into adls location Connect to hive tables from powerbi and create some simple report

Loading and Cleansing Trip data into Hive table using python and spark in azue environment##

  1. Data is available in the below urls

Taxi Data: http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml

  1. Git Code Checkin

cd "local repository path"

git init

git add .

git commit -m First commit

git remote add origin "git hub url"

git remote -v

git push origin master -f

About

Load and clean taxi trip data using python spark in azure environment

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages