You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The project that I am reviewing is about predicting construction permits. The datasets are used are numerous and include several datasets from the site data.cityofnewyork.us This site includes datasets about permits issued in New York, crime statistics, park zones, etc. The objective of this project is to predict whether a permit (both residential and commercial) will be approved based on various features including average income, crime, park zones, etc in the area.
Things I liked abut the proposal:
The proposal identifies a large number of datasets that the project can draw from. There is an abundance of features as well as a large number of data entries in each dataset which means that there are a lot of different directions this project can be taken in.
The problem is very relevant. I'm sure many companies and home owners would be interested in how likely they would be to get a permit based on various conditions. I think its interesting that the project proposal identified the permit prediction problem as a subproblem under the larger question of how different features affect the overall development of an area.
I like that a potential reach question for your project relates development to time. I think there is a lot of potential here with analyzing the time series data in your datasets.
A few concerns that I have:
The datasets have a large number of features but the proposal did not list any plan on how the datasets will be used or how the different features in the datasets will be incorporated.
Some of the features in the datasets will be hard to embed / vectorize / represent as inputs to a classification model. For example, I wish there was more description about how the park zone maps would be used as inputs to the problem.
Be careful of using features that might result in a model that could be offensive to certain groups of people. Your final model should not classify based on stereotypical features.
Overall great work!
The text was updated successfully, but these errors were encountered:
The project that I am reviewing is about predicting construction permits. The datasets are used are numerous and include several datasets from the site data.cityofnewyork.us This site includes datasets about permits issued in New York, crime statistics, park zones, etc. The objective of this project is to predict whether a permit (both residential and commercial) will be approved based on various features including average income, crime, park zones, etc in the area.
Things I liked abut the proposal:
The proposal identifies a large number of datasets that the project can draw from. There is an abundance of features as well as a large number of data entries in each dataset which means that there are a lot of different directions this project can be taken in.
The problem is very relevant. I'm sure many companies and home owners would be interested in how likely they would be to get a permit based on various conditions. I think its interesting that the project proposal identified the permit prediction problem as a subproblem under the larger question of how different features affect the overall development of an area.
I like that a potential reach question for your project relates development to time. I think there is a lot of potential here with analyzing the time series data in your datasets.
A few concerns that I have:
The datasets have a large number of features but the proposal did not list any plan on how the datasets will be used or how the different features in the datasets will be incorporated.
Some of the features in the datasets will be hard to embed / vectorize / represent as inputs to a classification model. For example, I wish there was more description about how the park zone maps would be used as inputs to the problem.
Be careful of using features that might result in a model that could be offensive to certain groups of people. Your final model should not classify based on stereotypical features.
Overall great work!
The text was updated successfully, but these errors were encountered: