trip data cleansing and loading into hive table using python and spark
Load trip data from different csv files clean the data and create hive external tables. create some use case and process the data store the final result into adls location Connect to hive tables from powerbi and create some simple report
- Data is available in the below urls
Taxi Data: http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml
- Git Code Checkin
cd "local repository path"
git init
git add .
git commit -m First commit
git remote add origin "git hub url"
git remote -v
git push origin master -f