Skip to content

Latest commit

 

History

History
32 lines (19 loc) · 726 Bytes

README.md

File metadata and controls

32 lines (19 loc) · 726 Bytes

trip data cleansing and loading into hive table using python and spark

tripdata

Load trip data from different csv files clean the data and create hive external tables. create some use case and process the data store the final result into adls location Connect to hive tables from powerbi and create some simple report

Loading and Cleansing Trip data into Hive table using python and spark in azue environment##

  1. Data is available in the below urls

Taxi Data: http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml

  1. Git Code Checkin

cd "local repository path"

git init

git add .

git commit -m First commit

git remote add origin "git hub url"

git remote -v

git push origin master -f