Skip to content

Commit

Permalink
added analysis text
Browse files Browse the repository at this point in the history
  • Loading branch information
tatasky09 committed Aug 14, 2024
1 parent 57be87d commit d3fda95
Show file tree
Hide file tree
Showing 2 changed files with 9 additions and 6 deletions.
2 changes: 1 addition & 1 deletion questions.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ We are interested broadly in identifying patterns in transit use between ORCA ca

1) Are stops that are central in transit ridership networks shared across all card demographics? How do the networks vary in structure between card demographics? Is the structure of these networks reflected by the geographic layout of the transportation network?

2) Can we improve our prediction for people's home locations so we can provide better services?
2) How do transit users behave on the temporal scale? What are the types of users on the temporal scale? Do certain type of users need better service quality?

3) Do hotspots of transfer have adequate shelters? Do low-income riders suffer more from not having adequate shelters?

Expand Down
13 changes: 8 additions & 5 deletions user_classification.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,16 +21,19 @@ The primary tools used are Python and PostgreSQL. The key packages leveraged are

**Processes**

We initially implemented the code using SQLAlchemy. Given the large volume of data, we refined the raw SQL generated by SQLAlchemy to optimize performance and generate results directly in SQL.
We initially implemented the code using SQLAlchemy. Given the large volume of data, we refined the raw SQL generated by SQLAlchemy to optimize performance and generate results directly in SQL.

A heuristic approach to classification begins by analyzing three months of data, filtering out the least frequent users, and identifying key groups such as typical peak-time commuters, noon/afternoon commuters, and weekend users. Further analysis of the remaining data uncovered additional patterns, leading to the emergence of new categories: pre-dawn and afternoon commuters (often night shift workers), short round-trip riders, one-way commuters, and more.



**Analyses**

What approaches did you try that didn’t work?
What analyses did you end up sticking with?
Our analysis of the temporal categories suggest that the mean trip duration across all temporal categories are not equal. Specifically, the pre-dawn and afternoon riders (mostly night shift workers) have significant longer commute trips compared to typical peak commuter.

Additionally, our findings reveal that LIFT card users are more often associated with peak and noon/afternoon commutes, whereas disability card users have a higher proportion of dawn and pre-dawn trips, as well as shorter round trips.

**Limitations**

What are the shortcomings of your approach?
How can your work be improved?
The major limitation of the process is that the approach is more based on experience and intuition. A future direction of the work is to classify temporal categories using machine learning techniques such as DBSCAN, k-means or hierarchical clustering to check if there are new patterns emerged from the data.

0 comments on commit d3fda95

Please sign in to comment.