Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Peer Review - vvs24 #7

Open
virenvshah opened this issue Oct 7, 2019 · 0 comments
Open

Peer Review - vvs24 #7

virenvshah opened this issue Oct 7, 2019 · 0 comments

Comments

@virenvshah
Copy link

The project that I am reviewing is about predicting construction permits. The datasets are used are numerous and include several datasets from the site data.cityofnewyork.us This site includes datasets about permits issued in New York, crime statistics, park zones, etc. The objective of this project is to predict whether a permit (both residential and commercial) will be approved based on various features including average income, crime, park zones, etc in the area.

Things I liked abut the proposal:

  1. The proposal identifies a large number of datasets that the project can draw from. There is an abundance of features as well as a large number of data entries in each dataset which means that there are a lot of different directions this project can be taken in.

  2. The problem is very relevant. I'm sure many companies and home owners would be interested in how likely they would be to get a permit based on various conditions. I think its interesting that the project proposal identified the permit prediction problem as a subproblem under the larger question of how different features affect the overall development of an area.

  3. I like that a potential reach question for your project relates development to time. I think there is a lot of potential here with analyzing the time series data in your datasets.

A few concerns that I have:

  1. The datasets have a large number of features but the proposal did not list any plan on how the datasets will be used or how the different features in the datasets will be incorporated.

  2. Some of the features in the datasets will be hard to embed / vectorize / represent as inputs to a classification model. For example, I wish there was more description about how the park zone maps would be used as inputs to the problem.

  3. Be careful of using features that might result in a model that could be offensive to certain groups of people. Your final model should not classify based on stereotypical features.

Overall great work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant