-
Notifications
You must be signed in to change notification settings - Fork 0
Revised Project Proposal
What decision-making context will you support? What are some decisions in that context you might support?
Many people love dogs, and some would love to have a dog. However, it’s an important and often life-changing decision to make, as deciding to own a dog requires considerations in lifestyle compatibility and financial responsibility. Our decision-making context is getting a dog. One major decision in the space, arguably the most important, is whether to get a dog or not. We argue that this is the most important because any other decision would depend on if you get a dog. Some decisions that may be made if you do get a dog are: what to name your dog, what to feed your dog, how often to walk your dog, where to walk your dog, whether to hire a dog-sitter, whether to hire a dog-walker, what dog parks to go to based on popularity or crowdedness, as well many other decision regarding the dog’s life and how it will be treated.
Our main decision is to inform the user whether they should get a dog or not. We will also provide some other insights if the user does choose to get a dog such as what breed may be the most compatible to their lifestyle.
Anyone who is thinking about getting a dog. These can be single owners, partners, families with children. These may be people who are in college or are working full time.
When thinking about getting a dog, there are many factors that a person has to consider. This is a very daunting decision to make blindly, and having some help would ease the process a lot. People who are thinking of adopting a dog may be worried about whether they are ready to take on the responsibility of having a dog. Some may not even know how hard taking care of a dog may be. However, having a dog has its benefits too. Dogs can bring happiness, aid feelings of loneliness, and encourage regular exercise for their owners with daily walks. Potential dog owners need to weigh the benefits with the responsibilities that come with owning a dog. Also if they do decide to get a dog, the owner must be aware that different breeds of dogs required different types of living arrangements and lifestyles.
What data will you work with? Please include background on who collected the data, where you accessed it, and any additional information we should know about how this data came to be.
Datasets for analyzing current dog ownership (and dog names):
- http://americanpetproducts.org/Uploads/MemServices/GPE2017_NPOS_Seminar.pdf - “2017-2018 APPA National Pet Owners Survey”. This is an annual survey of American Pet Owners, performed by the American Pet Products Association. The purpose of these surveys are “to monitor consumer habits on an on-going basis to identify short and long-term trends, as well as new opportunities in pet ownership and pet product and service consumption.” It includes various statistics regarding pet owner demographics, pet owner behaviors and beliefs, and other information regarding their interactions with their pet. This study includes information on multiple pet types, but for dogs, 505 owners were surveyed. This resource provides a lot of insight to the kinds of people that own dogs and what dog ownership is like. It also outlines the costs and benefits (which can be used for the other sections below).
- https://data.seattle.gov/Community/Seattle-Pet-Licenses/jguv-t9rb - This data is provided by the City of Seattle Department of Finance and Administrative Services. This dataset contains active/current Seattle pet licenses, including animal type (species), pet’s name, breed and the owner’s ZIP code. It is a public government dataset that can be exported from their website. The list of pet licenses was created on January 24, 2017 and is current as of January 11th, 2017, and the data goes back to 2005.
- https://fusiontables.google.com/data?docid=1pKcxc8kzJbBVzLu_kgzoAMzqYhZyUhtScXjB0BQ#rows:id=1 - This is source data of the WNYC's 'Dogs of NYC' project, collected by the NYC Dept of Health and Mental Hygiene. This dataset is a list of licensed dogs in New York City, which seems to be for dogs born up until 2012. The fields are dog name, gender, breed, birth, dominant_color, secondary_color, third_color, spayed_or_neutered, guard_trainied, borough and zip_code. We initially found and analyzed a subset of this dataset from a GitHub user which focused mainly on dog names (link: https://github.com/Kaz-A/dog_names/) This dataset only included dog’s name, breed, gender, age of dog (as of 2015), and owner’s borough.
Data for analyzing costs of owning a dog:
- http://www.peteducation.com/article.cfm?c=2+2106&aid=1543 - This website summarizes the basic costs of owning a dog. The data is broken down by services/products (ex) grooming, food, vaccines, etc.). It is also separated by first year costs and annual costs. This information can help us estimate the initial and continuous financial costs of owning a dog. The data is provided by peteducation.com which is a public organization that provides online resources regarding Pet Information and Animal Care Tips. The website was created and funded by Vets, Dr. Smith and Foster, who are committed to pet education. The information provided is based off of a 50lb dog living in the rural MidWest. It should be noted that costs may differ between other locations and dog sizes. However, the dataset does provide “low” and “high” cost estimates for a more general range of costs.
- https://www.petfinder.com/pet-adoption/dog-adoption/annual-dog-care-costs/ - This dataset provides information regarding the financial costs of owning a dog, similar to the one above. It is also broken down by specific Expense type and First Year costs vs. Each Following Year costs. This data focuses on adopted dogs and the values are based off of a survey conducted on pet-owners across the country. This data is provided by petfinder.com. They are an online, searchable database of animals who need homes. It is also a directory of nearly 14,000 animal shelters and adoption organizations across the U.S., Canada and Mexico.
Data for analyzing benefits of owning a dog:
We will be scraping and extracting data from various research studies and articles related to the benefits of owning a dog, in order to compile their findings into a dataset. Potential fields in the dataset that we build may be Name of Study/Article, Source/Publisher, Year, Research Method, Number of Participants, Participant Type, Benefit Type, Benefit Description, Benefit Target Group, Benefit Supporting Data, etc. If there’s raw data available related to the study/article, we can also perform our own analysis on it as well.
Preliminary list of potential data sources:
- https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1289326/ - “Pet ownership and human health: a brief review of evidence and issues.” This report examines the current evidence for a link between pet ownership and human health and discuss the importance of understanding the role of pets in people's lives. It was published by the British Medical Journal (BMJ) in 2005, and is accessed through the National Center for Biotechnology Information (NCBI) online database.
- https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3408111/ - “Psychosocial and Psychophysiological Effects of Human-Animal Interactions: The Possible Role of Oxytocin.” This report examines evidence from 69 original studies on human-animal interactions (HAI) and summarizes the many effects of human-animal interactions. It was published by the Frontier in Pyschology in 2012, and is accessed through the National Center for Biotechnology Information (NCBI) online database.
- https://www.uclahealth.org/pac/animal-assisted-therapy - “Animal-Assisted Therapy Research Findings.” This UCLA Health webpage outlines the physical and mental health benefits of animal-assisted therapy. It also contains links to the specific Research Findings that they drew from, which we may also choose to explore.
- http://time.com/4870796/dog-owners-benefits/ - “More Evidence That Owning a Dog Is Really Good for You.” This article highlights recent research that supports the many benefits of owning a dog. It was published by Time Health magazine on July 24, 2017.
- http://apgr.wssp.edu.pl/kynotherapy-method-supporting-development-independence-children-syndrome/ - “Kynotherapy as a method of supporting the development of independence of children with Down syndrome” This research study examined the benefits of kynotherapy (dog therapy) on children with Down syndrome. It was published by Vincent Pol University in Lublin, Poland in 2016.
- https://www.rover.com/blog/rover-trend-report/ - Rover conducted a survey on pet-owners in order to understand the various ways dog ownership affects American lives. Their findings give insight to the way dog-owners treat and interact with their pets, as well as indicate some benefits of having a dog. Rover is an online network for dog walkers and sitters. They help dog owners find dog sitting, walking and other pet services. Rover’s survey was conducted by Wakefield Research amongst 1000 U.S dogs owners aged 18+ in July 2016.
Data for analyzing dog breed compatibility:
- http://www.akc.org/ - This website has a lot of information about specific dog breeds and its general behaviour. It contains information that will help us select breeds that could be compatible or not compatible with each owner. We would have to extract the information ourselves through web scraping. The data is collected by the American Kennel Club. Their mission statement is: “The American Kennel Club is dedicated to upholding the integrity of its Registry, promoting the sport of purebred dogs and breeding for type and function. Founded in 1884, the AKC® and its affiliated organizations advocate for the purebred dog as a family companion, advance canine health and well-being, work to protect the rights of all dog owners and promote responsible dog ownership.” They are a reputable organization.
Who is affected by those decisions? Depending on the domain of your data, there may be a variety of audiences interested in using your analysis. You should hone in on one of these audiences.
Our main audience would be anyone who is thinking about getting a dog. This audience could be broken further by financial status, mental state, and other factors. Financial status is important because a dog can be expensive and the owner should know how much it costs and whether they can afford it or not. If they were to get a dog and later realize that they cannot afford it it may be a burden that they did not foresee.
Furthermore, dogs are known to bring happiness to those around them. There are various dogs that are used for therapy. A person who is facing emotional issues may also consider to get a dog to aid their recovery. The choice of getting a dog for therapy would affect their well being. A dog could be considered to be too much work or too expensive, however the benefits could outweigh the costs.
Another audience is the dog itself. We want to make sure that the owner understands the factors to consider when getting a dog. If a person gets dog and later realizes they cannot manage to keep it, the dog may be mistreated or left alone. This is an important factor and we want to help avoid dogs being harmed.
How will your project support decisions? List out at least one decision your project will support for your audience.
Our main decision that we want to support is whether a person should get a dog or not, and if they do choose to get a dog, what breed they might want to get and what breed they want to avoid. We will make this decision by collecting data from the user and comparing it with our analysis to see if they are fit to owning a dog.
Example Factors that we will consider:
-
About potential dog owner:
- Age
- Income
- Money they are willing or would like to spend on the dog
- Has children
- Allergies
- Location
- Type of house
- This can matter for specific breeds
- Type of location (Ex: City, Suburbs, etc…)
- We can also ask if the person plans to move and provide an indication that our results are based on location
- Previous ownership
- Own any dogs currently or in the past
- Lifestyle
- Is the owner active? Or likes to stay home
- Work hours/schedule
- Health Info (which may be related to certain benefits of owning a dog)
- Mental Health Status
- Physical Health Status
- Disabilities
-
About dog they want or could get:
- Adopting vs. Buying
- Breed
- We may suggest a breed based on the information about them
- Age of dog
What will be the format of your final product (Shiny app, HTML page or slideshow compiled with KnitR, etc.)?
The final product of our project will be an infographic and a Shiny app, or some interactive web application that showcases our finding and allows users to see whether they should get a dog or not. It will have an opening piece that highlights the benefits of a owning a dog, that can include features such as happiness. For our breed analysis it will include entry boxes for information such as income, and family makeup (e.g. has children) that we will consider when presenting the decision. We will also add features for those who do choose to get a dog such as what name is considered to be popular. Our definition on popular will be a simple analysis of the frequency of dog names used previously.
- Different breeds require different levels of exercise (therefore, require different amounts of time from the owner), and require different levels of financial responsibility (some breeds tend to have less health problems than others), and some breeds are not fit for families (some are harder to train to be patient around children, for example).
- Need data on the average cost (monthly or weekly) for each breed
- Need data on whether a breed is good with children or not (could just be a boolean yes/no)
- For past dog name registration data, we might need to make a public records request in order to access the data.
- It may be challenging to collect consistent data when scraping various websites.
- We would also need to find a way to check the quality of the information we are receiving
- A potential option could be limiting our web scraping to reputable websites and including sources when presenting our decisions)
- We would also need to find a way to check the quality of the information we are receiving
- Cleaning and integrating data from multiple data source may also be challenging
- Getting information about average costs for owning a dog based on breed and linking it with location may be difficult
- Finding meta-analysis about dog benefits and combining different sources to generate one conclusion
- The difficulty in this lies in:
- Finding the data
- Using aggregated results
- Combining numerous sources
- The difficulty in this lies in:
- If we build a Shiny Application, we will need to learn how to do that, as it will be most of ours first time working with it.
- We also need to learn what makes a good infographic and how to present our findings in a friendly way that can please the viewer rather than showing them just numbers
- We also might need to research new methods of analysis/modeling based on our specific project needs. Such as the best or most appropriate way to define and calculate popularity/timelessness.
- We will also need to learn how to use and work with Beautiful Soup for scraping websites.
How will you conduct you analysis? Please include a detailed description of your intended modeling approach.
Once all required data is compiled and cleaned, we can perform analyses on the data by creating models. We can find trends in the data, which will help inform us on the design of our application. Finding trends can help answer questions about whether a specific factor is important to consider or not in our decision context. We can also highlight different studies done through meta-analysis and combine them to aid the decision process.
Our analysis will focus on identifying and weighing the different aspects that go into getting a dog. We will also build an optimization function/algorithm that is able to take into these factors and then rate whether or not a person should get a dog. We will try to validate our models by comparing it to current dog ownership. The inputs would be provided by the potential dog owner regarding themselves and the intended home of the dog, such as their age, income, living location, lifestyle, etc. We can also ask them to highlight which features they feel is the most important. With this information we can adapt our model to customize to what the user finds valuable. The model will use this information to assess the situation and balance the costs and benefits. The output will either be a simple yes/no on whether they should get a dog or a value on a scale of how much we recommend they should get a dog. We would also like to have these recommendation scores specified or ranked by dog breed, showing which breeds are most compatible with them, and which breeds are not. If we have time, we will also analyze dog name trends, in order to suggest popular or common dog names certain qualities such as breed and gender. Once the user knows what dog is most compatible with them, they can use these suggestions to help inform their decision of what to name the dog, should they end up getting one.
Finding data sources that showcase the benefits and costs of having a dog. Most of the data regarding this matter is already aggregated and it is hard to find the source. The challenge lies in finding multiple sources of aggregated data, cleaning that data, and combining in order to come to a logical and accurate conclusion. We have to be able to find what is important and learn how to combine data from different sources.
- Data Collection - Identify and gather data from data sources (completed by Nov 21)
- Research different meta-analysis and other reports
- Scrape data from websites
- Data Cleaning – Join and consolidate data so that it is ready for analysis (completed by Nov 27)
- Make sure data sources have all the necessary information we will need to study
- Find similarities and combine them into one set
- Decide on standard format/structure for the dataset (schema and instances)
- Decide what we need from the websites and how to export it
- Clean data and join data sources into the standardized format/structure
- Data Analysis – Understand factors that go into getting a dog (completed by Nov 30)
- Analyze costs and benefits of dog ownership overall
- Analyze dog ownership by breed
- Analyze dog names (if we have time)
- Build models
- Validate and test models
- Synthesize our findings
- Final Artifact – Build an interface to showcase our findings and develop an infographic highlighting our research (completed by Dec 11)
- Use Shiny App to build an interactive website
- Build an infographic that highlights benefits of getting a dog and potential costs of specific breeds/sizes of dogs
Akush has experience with web scraping and working with Shiny App. He can help lead the data collection process and building our final artifact using Shiny. Trevor, Ariadna, Kathryn and Jillian will focus on the data cleaning and data analysis using R. Ariadna and Jillian also have experience with information visualization and design, they can help with creating the infographic in order to display our project’s findings.
- Not having access to necessary data or not having enough necessary data. Any data we cannot find a dataset for could be scraped from websites.
- Finding the source of aggregated results is close to impossible for many cases so we need to make sure that when pulling from multiple sources we can actually combine them in a way that doesn’t misrepresent the original findings
- When recommending a compatible breed we need to make sure that we are transparent in why we chose that breed
- It can be unfair if we blindly report our suggestions and a person uses that as their sole research for getting a breed
- Another risk is scraping websites that may not be reputable
- We can mitigate this by making sure that the websites we do use are credible
- A way we can check this is to see if they are an organization
- We can mitigate this by making sure that the websites we do use are credible
- Data is unclean
- To mitigate this, we will need to make sure the datasets that we use have some consistent formatting, and that unclean data (missing values, errors, etc.) are minimized
- We will need to figure out a standardized schema and instances for our dataset, so that our data can be consolidated in a organized way and then accurately analyzed
- However, if we take all the necessary steps in order to clean the data and there are still issues, then we just have to mindful of how that affects our analysis and models and accommodate accordingly.