Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clementi_hw2_submission #13

Open
wants to merge 30 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
520325c
Updated answers on readme, q1 & q2 working
johnatawnclementawn Oct 3, 2021
6093564
Update readme table formatting
johnatawnclementawn Oct 3, 2021
8c5496a
Current answers, questions with Q3 and Q6/7
johnatawnclementawn Oct 10, 2021
3ba7087
Functioning query, but results don't appear... hmmm
johnatawnclementawn Oct 10, 2021
daa1272
Rejoice! Query3 is running quickly using the spatial indicies!
johnatawnclementawn Oct 12, 2021
69a009f
Functioning q6, may need to be refined
johnatawnclementawn Oct 12, 2021
3e3c221
Needed to re-import parcels shp. Working Query 9
johnatawnclementawn Oct 12, 2021
b04d6a8
Functional q8. Found campus using owner and address
johnatawnclementawn Oct 12, 2021
f0fe0e5
Formatting changes to q3
johnatawnclementawn Oct 12, 2021
ddc301e
Speed improvements made, still not sure about the computations
johnatawnclementawn Oct 12, 2021
747c26b
Accessibility double counting checkpoint
johnatawnclementawn Oct 14, 2021
a74c0a1
Retrofitted q6, having problems with optimization
johnatawnclementawn Oct 14, 2021
caf0588
Hadn't saved
johnatawnclementawn Oct 14, 2021
ca16b07
Retrieve proper trip_headsign from trips dataset
johnatawnclementawn Oct 16, 2021
d3b7da2
Alter bus shapes name, add in bus routes, add notes to q2
johnatawnclementawn Oct 16, 2021
185cabc
Starting out w/ q10
johnatawnclementawn Oct 16, 2021
86d2531
Working distance calc for q10
johnatawnclementawn Oct 16, 2021
d13d4c3
Proper join of bus stops to bus routes
johnatawnclementawn Oct 16, 2021
d72c592
Progress on q10. Just need to format description field now
johnatawnclementawn Oct 16, 2021
98909d6
Format correct, now try checking for duplicate rail stop id's
johnatawnclementawn Oct 16, 2021
382dd6d
satisfice q10
johnatawnclementawn Oct 16, 2021
fcfc095
Q6 & 7 working but very slow. Do not know how to optimize any more
johnatawnclementawn Oct 16, 2021
66d57d6
Changed length of buffer radius for ease of analysis
johnatawnclementawn Oct 16, 2021
0eed192
Add answer description/philosphy
johnatawnclementawn Oct 16, 2021
c9b3a9d
Adding answers to readme
johnatawnclementawn Oct 16, 2021
8e6007e
Formatting changes
johnatawnclementawn Oct 16, 2021
5caad97
Answer for q7 added to readme, added screenshots of q6 and 7 results …
johnatawnclementawn Oct 16, 2021
1d96376
If only we were asking for block groups that completely contain Penn'…
johnatawnclementawn Oct 19, 2021
b62d837
But allas, we are not... here lies the number of block groups contain…
johnatawnclementawn Oct 19, 2021
1522c3a
Add links for readme
johnatawnclementawn Oct 19, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added A2_Q6_queryResults.PNG
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added A2_Q6_queryTime.PNG
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added A2_Q7_queryTime.PNG
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added A2_Q7_results.PNG
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
103 changes: 85 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,9 +22,14 @@

## Questions

1. Which bus stop has the largest population within 800 meters? As a rough estimation, consider any block group that intersects the buffer as being part of the 800 meter buffer.
[1. Which bus stop has the largest population within 800 meters?](query01.sql)
As a rough estimation, consider any block group that intersects the buffer as being part of the 800 meter buffer.

2. Which bus stop has the smallest population within 800 meters?
|stop_name|Population|the_geom|
|:---:|:---:|:---:|
|"Passyunk Av & 15th St"|50867|"0101000020E6100000B1C398F4F7CA52C0D0807A336AF64340"|

[2. Which bus stop has the smallest population within 800 meters?](query02.sql)

**The queries to #1 & #2 should generate relations with a single row, with the following structure:**

Expand All @@ -35,8 +40,12 @@
the_geom geometry(Point, 4326) -- The geometry of the bus stop
)
```
|stop_name|Population|the_geom|
|:---:|:---:|:---:|
|"Charter Rd & Norcom Rd"|2|"0101000020E6100000C896E5EB32C052C0DF3312A1110C4440"|

3. Using the Philadelphia Water Department Stormwater Billing Parcels dataset, pair each parcel with its closest bus stop. The final result should give the parcel address, bus stop name, and distance apart in meters. Order by distance (largest on top).
[3. Using the Philadelphia Water Department Stormwater Billing Parcels dataset, pair each parcel with its closest bus stop. The final result should give the parcel address, bus stop name, and distance apart in meters.](query0.sql)
Order by distance (largest on top).

**Structure:**
```sql
Expand All @@ -46,8 +55,17 @@
distance_m double precision -- The distance apart in meters
)
```

4. Using the _shapes.txt_ file from GTFS bus feed, find the **two** routes with the longest trips. In the final query, give the `trip_headsign` that corresponds to the `shape_id` of this route and the length of the trip.
|address|stop_name|distance_m|
|:---:|:---:|:---:|
|"170 SPRING LN"|"Ridge Av & Ivins Rd"|1658.7873935682778|
|"150 SPRING LN"|"Ridge Av & Ivins Rd"|1620.287986054119|
|"130 SPRING LN"|"Ridge Av & Ivins Rd"|1610.9941677070408|
|"190 SPRING LN"|"Ridge Av & Ivins Rd"|1490.0758681771356|
|"630 ST ANDREW RD"|"Germantown Av & Springfield Av"|1418.391081836291|
|...|...|...|

[4. Using the _shapes.txt_ file from GTFS bus feed, find the **two** routes with the longest trips.](query04.sql)
In the final query, give the `trip_headsign` that corresponds to the `shape_id` of this route and the length of the trip.

**Structure:**
```sql
Expand All @@ -56,15 +74,39 @@
trip_length double precision -- Length of the trip in meters
)
```
|trip_headsign|trip_length|
|:---:|:---:|
|"Bucks County Community College"|46504.13530588818|
|NULL: no trip_headsign for 266697|45331.46753203432|

5. Rate neighborhoods by their bus stop accessibility for wheelchairs. Use Azavea's neighborhood dataset from OpenDataPhilly along with an appropriate dataset from the Septa GTFS bus feed. Use the [GTFS documentation](https://gtfs.org/reference/static/) for help. Use some creativity in the metric you devise in rating neighborhoods. Describe your accessibility metric:
[5. Rate neighborhoods by their bus stop accessibility for wheelchairs.](query05.sql)
Use Azavea's neighborhood dataset from OpenDataPhilly along with an appropriate dataset from the Septa GTFS bus feed. Use the [GTFS documentation](https://gtfs.org/reference/static/) for help. Use some creativity in the metric you devise in rating neighborhoods. Describe your accessibility metric:

**Description:**

6. What are the _top five_ neighborhoods according to your accessibility metric?

7. What are the _bottom five_ neighborhoods according to your accessibility metric?

The basic measure of accessibility is the equation A_i= ∑ O_j * d_ij^(-b) (where X_y denotes that y is a subscript of X)
The equation describes the accessibility of an individual where the accessibility of the individual, A_i,
is calculated by finding the sum of all quality opportunities (such as jobs),
O_j, multiplied by the separation of those opportunities from the individual’s starting location,
d_ij – which can be measured in distance, time, or a monetary cost, exponentiated by the degree to which accessibility to that opportunity declines with increasing separation.

Job opportunities will be substituted for parcels (potential dwellings) -> (O_j)
A rule-of-thumb used by transportation planners is that people are generally willing to walk up to 0.5 miles to access transit.
Since we are measuring wheelchair accessibility, we will measure the number of opportunities within 500 feet (152.5 meters) of each wheelchair accessible bus stop -> (d_ij)

This index will be aggregated at the neighborhood level, and paired with a count of the wheelchair accessible stops in each neighborhood.

[6. What are the _top five_ neighborhoods according to your accessibility metric?](query06.sql)
[Screenshot of answer - queries take 45 minutes to run](A2_Q6_queryResults.PNG)
|neighborhood_name|accessibility_metric|num_bus_stops_accessible|num_bus_stops_inaccessible|
|:---:|:---:|:---:|:---:|
|COBBS_CREEK|10282|123|10|
|POINT_BREEZE|8943|83|0|
|OLNEY|8960|172|0|
|RICHMOND|8359|116|0|
|WEST_OAK_LANE|7889|124|0|

[7. What are the _bottom five_ neighborhoods according to your accessibility metric?](query07.sql)
[Screenshot of answer - queries take 45 minutes to run](A2_Q7_results.PNG)
**Both #6 and #7 should have the structure:**
```sql
(
Expand All @@ -74,26 +116,44 @@
num_bus_stops_inaccessible integer
)
```
|neighborhood_name|accessibility_metric|num_bus_stops_accessible|num_bus_stops_inaccessible|
|:---:|:---:|:---:|:---:|
|"WEST_PARK"|0|28|0|
|"BARTRAM_VILLAGE"|0|0|14|
|"PENNYPACK_PARK"|0|22|0|
|"MECHANICSVILLE"|0|0|0|
|"WEST_TORRESDALE"|2|1|0|

8. With a query, find out how many census block groups Penn's main campus fully contains. Discuss which dataset you chose for defining Penn's campus.
[8. With a query, find out how many census block groups Penn's main campus fully contains.](query08.sql)
Discuss which dataset you chose for defining Penn's campus.

**Structure (should be a single value):**
```sql
(
count_block_groups integer
)
```
|count_block_groups|
|:---:|
|1|

9. With a query involving PWD parcels and census block groups, find the `geo_id` of the block group that contains Meyerson Hall. ST_MakePoint() and functions like that are not allowed.
[9. With a query involving PWD parcels and census block groups, find the `geo_id` of the block group that contains Meyerson Hall.](query09.sql)
ST_MakePoint() and functions like that are not allowed.

**Structure (should be a single value):**
```sql
(
geo_id text
)
```
|geo_id|
|:---:|
|421010369001|

10. You're tasked with giving more contextual information to rail stops to fill the `stop_desc` field in a GTFS feed. Using any of the data sets above, PostGIS functions (e.g., `ST_Distance`, `ST_Azimuth`, etc.), and PostgreSQL string functions, build a description (alias as `stop_desc`) for each stop. Feel free to supplement with other datasets (must provide link to data used so it's reproducible), and other methods of describing the relationships. PostgreSQL's `CASE` statements may be helpful for some operations.
[10. You're tasked with giving more contextual information to rail stops to fill the `stop_desc` field in a GTFS feed.](query10.sql)
Using any of the data sets above, PostGIS functions (e.g., `ST_Distance`, `ST_Azimuth`, etc.), and PostgreSQL string functions, build a description (alias as `stop_desc`) for each stop. Feel free to supplement with other datasets (must provide link to data used so it's reproducible), and other methods of describing the relationships. PostgreSQL's `CASE` statements may be helpful for some operations.
As an example, your `stop_desc` for a station stop may be something like "37 meters NE of 1234 Market St" (that's only an example, feel free to be creative, silly, descriptive, etc.)
**Tip when experimenting:** Use subqueries to limit your query to just a few rows to keep query times faster. Once your query is giving you answers you want, scale it up. E.g., instead of `FROM tablename`, use `FROM (SELECT * FROM tablename limit 10) as t`.

**Structure:**
```sql
Expand All @@ -105,7 +165,14 @@
stop_lat double precision
)
```

As an example, your `stop_desc` for a station stop may be something like "37 meters NE of 1234 Market St" (that's only an example, feel free to be creative, silly, descriptive, etc.)

**Tip when experimenting:** Use subqueries to limit your query to just a few rows to keep query times faster. Once your query is giving you answers you want, scale it up. E.g., instead of `FROM tablename`, use `FROM (SELECT * FROM tablename limit 10) as t`.
I decided to list the closest bus station to the rail station and which bus routes that station serves. This method is flawed in that
it accounts for multiple bus routes that service the same bus stop.

|stop_id|stop_name|stop_desc|stop_lon|stop_lat|
|:---:|:---:|:---:|:---:|:---:|
|91004|"30th St Lower Level"|"The closest bus stop is 33rd St & Race St and is 84.56 meters away. It is serviced by the City Hall to 76th-City route."|-75.1883333|39.9591667|
|90004|"30th Street Station"|"The closest bus stop is and is 168.65 meters away. It is serviced by the route."|-75.1816667|39.9566667|
|90314|"49th Street"|"The closest bus stop is 49th St & Chester Av - FS and is 46.76 meters away. It is serviced by the 50th-Parkside to Pier 70 route."|-75.2166667|39.9436111|
|90539|"9TH Street Lansdale"|"The closest bus stop is Broad St & Hatfield St - FS and is 259.03 meters away. It is serviced by the Telford to Montgomery Mall route."|-75.2791667|40.25|
|90404|"Airport Terminal A"|"The closest bus stop is and is 142.09 meters away. It is serviced by the route."|-75.2452778|39.8761111|
|...|...|...|...|...|
165 changes: 165 additions & 0 deletions createTbls.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,165 @@
-- CREATE EXTENSION postgis; --

-- PHL Bus stops --
DROP TABLE IF EXISTS septa_bus_stops;
CREATE TABLE septa_bus_stops (
stop_id NUMERIC(7) PRIMARY KEY NOT NULL,
stop_name VARCHAR(65) NOT NULL,
stop_lat FLOAT NOT NULL,
stop_lon FLOAT NOT NULL,
location_type NUMERIC(3),
parent_station NUMERIC(7),
zone_id NUMERIC(3),
wheelchair_boarding NUMERIC(3)
);

-- Import data into bus stop table --
COPY septa_bus_stops(stop_id, stop_name, stop_lat, stop_lon, location_type, parent_station, zone_id, wheelchair_boarding)
FROM 'C:\Users\Public\CloudComputing_data\google_bus\stops.csv'
DELIMITER ','
CSV HEADER;

-- Add geometry field to bus stop data --
ALTER TABLE septa_bus_stops ADD COLUMN the_geom geometry(Point, 4326);
UPDATE septa_bus_stops SET the_geom = ST_SetSRID(ST_MakePoint(stop_lon, stop_lat),4326);


-- PHL Bus Stop times --
DROP TABLE IF EXISTS septa_bus_stopTimes;
CREATE TABLE septa_bus_stopTimes (
trip_id NUMERIC,
-- USE VARCHAR for times here b/c there are values >24:00:00, and idk how to ignore/fix those --
arrival_time VARCHAR,
departure_time VARCHAR,
stop_id NUMERIC,
stop_sequence NUMERIC
);

-- Import data into bus trips table --
COPY septa_bus_stopTimes
FROM 'C:\Users\Public\CloudComputing_data\google_bus\stop_times.txt'
DELIMITER ','
CSV HEADER;


-- PHL Bus Trips --
DROP TABLE IF EXISTS septa_bus_trips;
CREATE TABLE septa_bus_trips (
route_id NUMERIC,
service_id NUMERIC,
trip_id NUMERIC PRIMARY KEY NOT NULL,
trip_headsign VARCHAR(65) NOT NULL,
block_id NUMERIC,
direction_id NUMERIC,
shape_id NUMERIC
);

-- Import data into bus trips table --
COPY septa_bus_trips
FROM 'C:\Users\Public\CloudComputing_data\google_bus\trips.csv'
DELIMITER ','
CSV HEADER;


-- PHL Bus Routes --
DROP TABLE IF EXISTS septa_bus_routes;
CREATE TABLE septa_bus_routes (
route_id VARCHAR,
route_short_name VARCHAR,
route_long_name VARCHAR,
route_type NUMERIC,
route_color VARCHAR,
route_text_color VARCHAR,
route_url VARCHAR
);

-- Import data into bus routes table --
COPY septa_bus_routes
FROM 'C:\Users\Public\CloudComputing_data\google_bus\routes.csv'
DELIMITER ','
CSV HEADER;



-- Septa Bus shapes --
DROP TABLE IF EXISTS septa_bus_shapes;
CREATE TABLE septa_bus_shapes (
shape_id NUMERIC(7) NOT NULL,
shape_pt_lat FLOAT NOT NULL,
shape_pt_lon FLOAT NOT NULL,
shape_pt_sequence NUMERIC(5)
);

COPY septa_bus_shapes(shape_id, shape_pt_lat, shape_pt_lon,shape_pt_sequence)
FROM 'C:\Users\Public\CloudComputing_data\google_bus\shapes.csv'
DELIMITER ','
CSV HEADER;

-- Add geometry field to bus routes data --
ALTER TABLE septa_bus_shapes ADD COLUMN the_geom geometry(Point, 4326);
UPDATE septa_bus_shapes SET the_geom = ST_SetSRID(ST_MakePoint(shape_pt_lon, shape_pt_lat),4326);



-- SEPTA rail stops --
DROP TABLE IF EXISTS septa_rail_stops;
CREATE TABLE septa_rail_stops(
stop_id numeric,
stop_name text,
stop_desc text,
stop_lat numeric,
stop_lon numeric,
zone_id text,
stop_url text
);

COPY septa_rail_stops
FROM 'C:\Users\Public\CloudComputing_data\google_rail\stops.csv'
DELIMITER ','
CSV HEADER;

-- Add geometry field to bus routes data --
ALTER TABLE septa_rail_stops ADD COLUMN the_geom geometry(Point, 4326);
UPDATE septa_rail_stops SET the_geom = ST_SetSRID(ST_MakePoint(stop_lon, stop_lat),4326);



-- PHL Census Block Group Population join w/ census_block_groups_2010 --
DROP TABLE IF EXISTS population;
CREATE TABLE population (
id VARCHAR(23) PRIMARY KEY NOT NULL,
name VARCHAR(75) NOT NULL,
total NUMERIC(7) NOT NULL
);

COPY population(id, name, total)
FROM 'C:\Users\Public\CloudComputing_data\PHL_2010_blockGroupPopulation\phl_2010_blockGroup_population.csv'
DELIMITER ','
CSV HEADER;


-- Edit block group shp geom column name & set its crs --
-- ALTER TABLE census_block_groups RENAME COLUMN geom TO the_geom;
--UPDATE census_block_groups SET the_geom = ST_Transform(ST_SetSRID(the_geom, 4326),32129);

-- Edit parcels shp geom column name & set its crs --
-- ALTER TABLE pwd_parcels RENAME COLUMN geom TO the_geom;
-- UPDATE pwd_parcels SET the_geom = ST_Transform(ST_SetSRID(the_geom, 4326),32129);


--ALTER TABLE neighborhoods_philadelphia RENAME COLUMN geom to the_geom;
--UPDATE neighborhoods_philadelphia SET the_geom = ST_Transform(ST_SetSRID(the_geom, 2272),32129);




-- Create spatial indices --
-- DROP INDEX IF EXISTS septa_bus_stops_the_geom_idx;
-- create index septa_bus_stops_the_geom_idx
-- on septa_bus_stops
-- using GiST(st_transform(the_geom, 32129));

-- DROP INDEX IF EXISTS pwd_parcels_the_geom_idx;
-- CREATE index pwd_parcels_the_geom_idx
-- on pwd_parcels
-- using GiST(the_geom);
18 changes: 7 additions & 11 deletions query01.sql
Original file line number Diff line number Diff line change
Expand Up @@ -3,17 +3,15 @@
estimation, consider any block group that intersects the buffer as being part
of the 800 meter buffer.
*/


create index septa_bus_stops__the_geom__32129__idx
on septa_bus_stops
using GiST (ST_Transform(the_geom, 32129));
-- Answer:
-- stop_name Population the_geom --
-- "Passyunk Av & 15th St" 50867 "0101000020E6100000B1C398F4F7CA52C0D0807A336AF64340" --


with septa_bus_stop_block_groups as (
select
s.stop_id,
'1500000US' || bg.geoid10 as geo_id
'1500000US' || bg.geoid10 as geo_id -- concatenate the state prefix to the Census tract/block group string
from septa_bus_stops as s
join census_block_groups as bg
on ST_DWithin(
Expand All @@ -22,21 +20,19 @@ with septa_bus_stop_block_groups as (
800
)
),

septa_bus_stop_surrounding_population as (
select
stop_id,
sum(population) as estimated_pop_800m
sum(total) as estimated_pop_800m
from septa_bus_stop_block_groups as s
join census_population as p using (geo_id)
join population as p on s.geo_id = p.id
group by stop_id
)

select
stop_name,
estimated_pop_800m,
the_geom
from septa_bus_stop_surrounding_population
join septa_bus_stops using (stop_id)
order by estimated_pop_800m desc
limit 1
limit 1
Loading