Title | Date |
---|---|
Data Cleaning Tasks |
01.10.2020 |
-
Adjust Values: Open Refine and load in your browser. You should see
127.0.01.XXXX
in your address bar. Add/preferences
to this address and adjust thevalue
limit to 10,000. This means that we can perform operations on larger datasets. -
Get Data: Go to the LIS 545 github repository, and find the
Data
tab. Download the file underdata
titledBuilding_Permits.csv
(link) This is a from the City of Seattle open data portal (read more about it at BUILDING PERMITS: CURRENT) -
Load Data: Upload this data to Refine by selecting the file form your desktop (or whatever directory you downloaded to). Be sure to select
commas (.csv)
as the upload option.
- Replace all missing values with
N/A
- For each column, trim leading and trailing whitespace
- Convert values with all UPPERCASE to Name Case
-
The
Category
and theStatus
columns have a series of codes. It would be helpful to know how many codes exist in this dataset. How would we find out? -
The
Value
column could be summarized as a range. Create a new column directly to the left ofValue
and title itValue Range
Now, cluster all values into High (> $1 million) Medium ($500,000 - 999,999) and Low ($1-499,999). How many cells did this effect?
-
The
Application Date
and theIssue Date
tell us the lag time in city responding to applications for building permits. Use the values in these two columns to create a new third column titledPermit Issue Period
- In this column calculate the time between Application Data and Issue Date. You should provide the value in days. What is the longest period you observe in this dataset? -
The
Location
field is static. We want these values to be linked to google maps so that a user coming to the dataset can simply click to see a housing permit site. Use the text string"https://www.google.com/maps/place/"
(the quotations are important) to create this link.