-
Notifications
You must be signed in to change notification settings - Fork 1
4 Proposal Revision Process
Synthesis and Integration:
1. Decision Context:
"Make a definitive question to answer. You have a lot of questions but no decision context (literally a question asking "Should I do this or that?"). Without this you're not really finding an answer you're just looking at data and for this project we need to be finding an answer."
"Yelp is definitely a more, if not the most, popular food rating sites. However, since your question is fundamentally about food ratings across economic division it might be worthwhile to consider the user demographics of Yelp (I don’t know if it’s diverse or not). Alternatively, you might use data from a few other sites to get a broader mix of review."
"I agree with your likely issues of using the restaurant's locations and mapping that to a specific socio-economic zone. You might try narrowing your scope to a few of the same restaurants that appear in multiple "zones" and comparing those. That would really show the correlation and possible causation of the data."
"How it is framed currently it sounds like the following sentence is what you are getting at: "I love food, I rely heavily on Yelp for making my decisions on where to eat/what Yelp rating to give myself, so I whether I give my usual ratings or give harsher/more lenient ratings". I wonder if there isn't a more interesting decision maker 'neath the current setup."
"What if a chain restaurant(i.e. McDonald's ) is located in both affluent and non-affluent areas and receive similar ratings?"
"Could you provide a more specific effect that this research could have in society? You mentioned that you will be informing the yelp users that there is a bias in these ratings and they should be more careful making their decisions, but how will you be informing them? Will you be throwing in this research at them in email? Or have an extension in yelp on the side as a warning sign?"
"I believe you need to find a way to quantify the effects of affluence on restaurant reviews: give people a specific number (ex: this area was on average rated 0.5 stars lower than it should have been due to its area) and I think you'll have a much larger impact on their decision on where to eat."
To an extent, we agree that our former research question does not have a true definite decision context. Reading this feedback, we realized that question we're answering isn't exactly for a specific stakeholder (it can either be a business owner or restaurant goer). Therefore, we are re-evaluating our question to answer to one stakeholder to fully explore the impact we have. We also admit that our hypothesis regarding the impact of affluence on restaurant review and rating may not be correct. We take that feedback into consideration as well. The exploration of this project should give us a more concrete evidence pertaining to the impact of affluence.
Proposed Integrations: We are enhancing our question to only focus on Yelp users to help them evaluating reviews on a more holistic scale. Our goal is to promote awareness that Yelp users may make evaluate a restaurant quality without considering other environmental factors, such as the demographics of the restaurant it is located. We choose to only focus on Yelp since many others implicitly mentioned that Yelp is a more popular rating and reviews platform for restaurants. We are suggesting to create a supplemental scale to Yelp reviews and ratings in order to evaluate a restaurant quality at a more weighed scale. Using this scale, we should be able to have a different rating for each restaurant. We are considering having the score in a format similar to the feedback above, which is “this area was on average rated 0.5 stars lower than it should have been due to its area”. We believe this would help Yelp users compare and contrast the differing score and make a decision in that context.
2. API/Metrics Implementation:
"Under the Google Maps API bullet point what do you mean by “benchmark” and to which “directory” are you referring to?"
"I don't know if using Google Maps' API to search for restaurants would help. There is the possibility that they would hold more restaurants than Yelp, but it might be that they miss the same restaurants that Yelp fails to have."
One of the risks that we mentioned concerns the restaurants that are not associated with Yelp, which means those restaurants will not be on Yelp API or directory. A way to mitigate is to cross-references the list of restaurants from Yelp API with the list from Google Maps API (this is referring to the act of benchmarking). At this current state, we do not know the sources would have a fuller directory of all restaurants in greater Seattle area nor do we know whether Google Maps API provide a better list than Yelp API. However, by doing so will answer the uncertainty that we have. The underlying concept of getting a greater directory of restaurants is to explore whether restaurants in the non-affluent area have social media/Yelp presence or not. If they do not have any presence, it may mean that restaurants in the non-affluent area may be prone to the inability of reaching to mass audience compared to restaurants that are on Yelp presence.
"I’m not convinced that sorting by mode as opposed to median adjusts for variances in the number of reviews. If a restaurant has one five-star review, that data is not nearly as accurate as if another restaurant has one hundred five-star reviews."
"How will you take into account the number of reviews for the restaurants as well?"
"Accounting for exterior influence as well? For example, affluent areas might have access to better quality ingredients and may have more money to invest in better ingredients than those in more rural areas who are just trying to get by."
"How are you going to account for the population of the area? For example, these affluent areas might also have higher density in population and therefore have more reviews and better ratings."
In terms of the confusion on whether to use median or mode to adjust for variances in the number of reviews, the mode is the better choice as median doesn’t help us determine which rating was chosen most. Median determines the middle choice of all the ratings and could be interpreted incorrectly. The number of reviews comes into play as another weighted factor along with the number of ratings and the level of ratings. In terms of population density, we have yet to find data on whether the population of an area as variable influences how many reviews a restaurant may receive nor is it part of our decision context we are trying to answer.
Proposed Integrations: As a means to weigh reviews and ratings for a restaurant, we will be computing the ratings based on their received rating and reviews. For each restaurant, we will be breaking down the ratings into buckets (1 - 5 star). We will sum the count of each rating and divide that by the total ratings a restaurant received. The rating with the highest distribution would be the main score for the restaurant. In addition, we are currently in the process of integrating rating with reviews, affluence score, and median household income into our metrics. We are still exploring the computation of our metrics; there isn’t a definitive score metrics yet.
3. Out of the Scope of the Project:
"I agree with your likely issues of using the restaurant's locations and mapping that to a specific socio-economic zone. You might try narrowing your scope to a few of the same restaurants that appear in multiple "zones" and comparing those. That would really show the correlation and possible causation of the data."
"Another risk you might consider is that reviews tend to capture extreme reactions. As in, people who have extraordinarily good or bad experiences tend to leave reviews but those who have “standard” or “average” experiences tend not to."
"How will areas that have been historically low income but are now being gentrified be defined."
"Are you comparing price range and review primarily in addition to location?"
While all of these are valid points, they are out of scope for the problem we are trying to answer. For the first point, not every restaurant has a chain which would eliminate too many from the database and also change our question to focus only on major chain restaurants. For the second point, we don’t know of any data that supports that assumption though we understand it is a possibility. However, since there is no data to confirm, we are ignoring this as a factor. We will also be using our metrics which consist of the number of ratings, level of ratings and number of reviews. That should assist in eliminating that issue. To address the issue of recently gentrified areas, we are using data that goes back only up to one year in order to have the most recent data. For the last point, we are not worried about the price range as it is also out of scope for our project and if incorporated would lead to a much more time intensive project.
4. Data types (structures/constraints)
"Why is household median income a good indicator of affluence? How will you divide Seattle neighborhoods? What if within those divisions there are sections with very extreme household incomes causing the median to be somewhat average?"
"Definitely identifying how you draw boundaries is going to be really important"
"How are different areas separated, and will there be some distinct difference between areas that are mostly houses and most apartments?"
"Where are going to be the borderlines for affluent areas? For example, how do you draw a line between Green Lake and University District? Or Belltown and the South Lake Union? Will you be using zip code as your border system? You mentioned focusing in and around the Seattle area, which as a "mix of socio-economic zones". I think this is a very interesting point, and because you are segmenting by these zones, I think it could be really useful to provide some context, example, or data to show what you mean by socio-economic zones."
The feedback we got from data types was quite insightful. They basically asked questions about how we plan on splitting up the boundaries as well as what are we planning to use to determine affluence. Our plan is to take the existing boundaries of the Seattle neighborhoods and determine affluent area from there. We decided to choose the existing boundaries of the Seattle neighborhoods since that is already a definitive border that exists and that we can find data on. In addition, Yelp reviews do include the neighborhood the restaurant is in, so it would mitigate any uncertainties. Though, we are set on using the exact boundaries and determining the address or lat/long of the restaurant to determine its exact neighborhood. Using Yelp’s neighborhood location would be a fall back if in the case we are unable to set up those boundaries. In terms of how we determine the affluence of an area, we will be using the median household income of the area.
Proposed Integrations: With the feedback that we obtained, we decided to stick with using the median household income as it was a good determinant on seeing places that were affluent by definition. The boundaries of Seattle are created through the U.S. census tract by census ID because it is federally defined. The census tract can be determined through a restaurant’s geolocation which can easily be used to track where on a map it is located. By using the yelp data we obtain, we will also be filtering the type of data we collect. We plan to use data that is most current in terms of getting restaurant ratings and reviews in which we decided to be a one-year duration starting from when we began scraping the data.
Proposal Review process (reflection):
Opposite perspective: While reading through all the feedback, part of the process of how we decided is becoming a devil’s advocate by trying to falsify or find ways to justify our claims that we have made.
Specific: We go deep into specific questions with detailed proof, as we realized our initial proposal was not very clear and specific, which leads to some confusion to our readers. With more proof and evidence in our proposal revision, we’ll be able to justify our project more.
Evaluation: We started to brainstorm more ideas, solutions, and stakeholders, as we realized we should have a broad range of topics and stakeholders first. And then we narrowed them down into specific audiences, rather than going straight into one narrow view at the beginning. Through the evaluation, we were also able to back and reevaluate our risks. We began to raise more questions, especially those that are relevant to our stakeholders. From doing so, our discussion revolving around approaches to answering them as well as whether or not they should be a part of our project. This made us rethink what the exact problem we are solving down to the specific details of the type of data we are using.