diff --git a/tools-appendix/modules/python/images/barcharts-plotly-aa.png b/tools-appendix/modules/python/images/barcharts-plotly-aa.png new file mode 100644 index 000000000..ead687787 Binary files /dev/null and b/tools-appendix/modules/python/images/barcharts-plotly-aa.png differ diff --git a/tools-appendix/modules/python/images/boxplot-plotly-aa.png b/tools-appendix/modules/python/images/boxplot-plotly-aa.png new file mode 100644 index 000000000..d913f004c Binary files /dev/null and b/tools-appendix/modules/python/images/boxplot-plotly-aa.png differ diff --git a/tools-appendix/modules/python/images/histograms-plotly-aa.png b/tools-appendix/modules/python/images/histograms-plotly-aa.png new file mode 100644 index 000000000..f56be5169 Binary files /dev/null and b/tools-appendix/modules/python/images/histograms-plotly-aa.png differ diff --git a/tools-appendix/modules/python/images/scatterplots-plotly-aa.png b/tools-appendix/modules/python/images/scatterplots-plotly-aa.png new file mode 100644 index 000000000..721281a7b Binary files /dev/null and b/tools-appendix/modules/python/images/scatterplots-plotly-aa.png differ diff --git a/tools-appendix/modules/python/matplotlib.adoc b/tools-appendix/modules/python/matplotlib.adoc deleted file mode 100644 index e69de29bb..000000000 diff --git a/tools-appendix/modules/python/pages/plotly-examples.adoc b/tools-appendix/modules/python/pages/plotly-examples.adoc index f76a13683..be5bc350f 100644 --- a/tools-appendix/modules/python/pages/plotly-examples.adoc +++ b/tools-appendix/modules/python/pages/plotly-examples.adoc @@ -1,6 +1,272 @@ -= Plotly Examples += Plotly + +Being able to analyze and create good visualizations is a skill that is invaluable in many fields. It can be pretty fun too! In this document, we will dive into ploting using `plotly`, a very popular source graphing library that can interact graphs online. + +Plotly Express is a simple way to make interactive graphs directly from your data. It works with pandas DataFrames. + +Once you create a graph (like a bar chart or scatter plot), you can edit it. You can add titles, change axis labels, adjust colors, or pick a different theme to make it look exactly how you want (similar to Matplotlib). Also, the graphs are interactive—you can hover over points, zoom in, and explore your data. Visit the https://plotly.com/python/plotly-express/[Plotly Express Documentation] for examples and detailed explanations. It’s a great resource to learn about all the different graph types and how to customize them. + +In this document we will: + +* Create visualizations using plotly. +* Modify axis labels, titles, and other graph elements. +* Customize plots with colors, shapes, and line types. +* Explore dataset preparation and handling missing values. +* Try to understanding trends and patterns in the data using Plotly. + + + +== Understanding the Data Prior to Plotting + +We will use the following dataset(s) to understand plotly: + +`/anvil/projects/tdm/data/zillow/Metro_time_series.csv` + +First, let's review the dataset's columns to identify which ones might be suitable for creating a plot. We can use the `.columns` function + +[source,python] +---- +import plotly.express as px +import pandas as pd + +myDF = pd.read_csv("/anvil/projects/tdm/data/zillow/Metro_time_series.csv") +myDF.columns +---- + +---- +Index(['Date', 'RegionName', 'AgeOfInventory', 'DaysOnZillow_AllHomes', + 'InventorySeasonallyAdjusted_AllHomes', 'InventoryRaw_AllHomes', + 'InventorySeasonallyAdjusted_BottomTier', + 'InventorySeasonallyAdjusted_MiddleTier', + 'InventorySeasonallyAdjusted_TopTier', + 'MedianListingPricePerSqft_1Bedroom', + 'MedianListingPricePerSqft_2Bedroom', + 'MedianListingPricePerSqft_3Bedroom', + 'MedianListingPricePerSqft_4Bedroom', + 'MedianListingPricePerSqft_5BedroomOrMore', + 'MedianListingPricePerSqft_AllHomes', + 'MedianListingPricePerSqft_CondoCoop', + 'MedianListingPricePerSqft_DuplexTriplex', + 'MedianListingPricePerSqft_SingleFamilyResidence', + 'MedianListingPrice_1Bedroom', 'MedianListingPrice_2Bedroom', + 'MedianListingPrice_3Bedroom', 'MedianListingPrice_4Bedroom', + 'MedianListingPrice_5BedroomOrMore', 'MedianListingPrice_AllHomes', + 'MedianListingPrice_CondoCoop', 'MedianListingPrice_DuplexTriplex', + 'MedianListingPrice_SingleFamilyResidence', + 'MedianPctOfPriceReduction_AllHomes', + 'MedianPctOfPriceReduction_CondoCoop', + 'MedianPctOfPriceReduction_SingleFamilyResidence', + 'MedianPriceCutDollar_AllHomes', 'MedianPriceCutDollar_CondoCoop', + 'MedianPriceCutDollar_SingleFamilyResidence', + 'MedianRentalPricePerSqft_1Bedroom', + 'MedianRentalPricePerSqft_2Bedroom', + 'MedianRentalPricePerSqft_3Bedroom', + 'MedianRentalPricePerSqft_4Bedroom', + 'MedianRentalPricePerSqft_5BedroomOrMore', + 'MedianRentalPricePerSqft_AllHomes', + 'MedianRentalPricePerSqft_CondoCoop', + 'MedianRentalPricePerSqft_DuplexTriplex', + 'MedianRentalPricePerSqft_MultiFamilyResidence5PlusUnits', + 'MedianRentalPricePerSqft_SingleFamilyResidence', + 'MedianRentalPricePerSqft_Studio', 'MedianRentalPrice_1Bedroom', + 'MedianRentalPrice_2Bedroom', 'MedianRentalPrice_3Bedroom', + 'MedianRentalPrice_4Bedroom', 'MedianRentalPrice_5BedroomOrMore', + 'MedianRentalPrice_AllHomes', 'MedianRentalPrice_CondoCoop', + 'MedianRentalPrice_DuplexTriplex', + 'MedianRentalPrice_MultiFamilyResidence5PlusUnits', + 'MedianRentalPrice_SingleFamilyResidence', 'MedianRentalPrice_Studio', + 'ZHVIPerSqft_AllHomes', 'PctOfHomesDecreasingInValues_AllHomes', + 'PctOfHomesIncreasingInValues_AllHomes', + 'PctOfHomesSellingForGain_AllHomes', + 'PctOfHomesSellingForLoss_AllHomes', + 'PctOfListingsWithPriceReductionsSeasAdj_AllHomes', + 'PctOfListingsWithPriceReductionsSeasAdj_BottomTier', + 'PctOfListingsWithPriceReductionsSeasAdj_CondoCoop', + 'PctOfListingsWithPriceReductionsSeasAdj_MiddleTier', + 'PctOfListingsWithPriceReductionsSeasAdj_SingleFamilyResidence', + 'PctOfListingsWithPriceReductionsSeasAdj_TopTier', + 'PctOfListingsWithPriceReductions_AllHomes', + 'PctOfListingsWithPriceReductions_BottomTier', + 'PctOfListingsWithPriceReductions_CondoCoop', + 'PctOfListingsWithPriceReductions_MiddleTier', + 'PctOfListingsWithPriceReductions_SingleFamilyResidence', + 'PctOfListingsWithPriceReductions_TopTier', 'PriceToRentRatio_AllHomes', + 'Sale_Counts_Msa', 'Sale_Counts_Seas_Adj_Msa', 'Sale_Prices_Msa', + 'InventoryTierShare_BottomTier_AllHomes', + 'InventoryTierShare_MiddleTier_AllHomes', + 'InventoryTierShare_TopTier_AllHomes', 'ZHVI_1bedroom', 'ZHVI_2bedroom', + 'ZHVI_3bedroom', 'ZHVI_4bedroom', 'ZHVI_5BedroomOrMore', + 'ZHVI_AllHomes', 'ZHVI_BottomTier', 'ZHVI_CondoCoop', 'ZHVI_MiddleTier', + 'ZHVI_SingleFamilyResidence', 'ZHVI_TopTier', 'ZRI_AllHomes', + 'ZRI_AllHomesPlusMultifamily', 'ZriPerSqft_AllHomes', + 'Zri_MultiFamilyResidenceRental', 'Zri_SingleFamilyResidenceRental'], + dtype='object') +---- + +As we can see, there are many columns in the dataset! It's essential to understand the available columns before creating any visualizations. + +You can use the .head() function and the .tail() function to get a further preview of the data. + + + +== Barplots Using Plotly + +Let's start by plotting a Barplot of `MedianListingPrice_1Bedroom` and the first 10 regions that are not null. There are some missing values in our dataset, so we will need to filter for when they are not null. + +[source,python] +---- +myDF = pd.read_csv("/anvil/projects/tdm/data/zillow/Metro_time_series.csv") +filteredDF = myDF[myDF['MedianListingPrice_1Bedroom'].notnull()][['RegionName', 'MedianListingPrice_1Bedroom']].head(10) +print(filteredDF) +---- + +---- + RegionName MedianListingPrice_1Bedroom +124298 12060 132130.0 +124313 12700 249900.0 +124352 14460 195000.0 +124359 14720 324000.0 +124367 15180 113750.0 +124412 17200 98500.0 +124594 25540 95000.0 +124605 25940 449000.0 +124799 34820 106900.0 +124810 35300 135000.0 +---- + +Now that we see the values we will be plotting, let's use the `plotly` library to plot a barchart of this data! + +[source,python] +---- +fig = px.bar( + filteredDF, + x='RegionName', + y='MedianListingPrice_1Bedroom', + title='Median Listing Price for 1-Bedroom Homes by Region (First 10)', + labels={'RegionName': 'Region', 'MedianListingPrice_1Bedroom': 'Median Listing Price ($)'} +) + +fig.update_layout( + xaxis_title="Region", + yaxis_title="Median Listing Price ($)", + template="plotly_white" +) + +fig.show() +---- +image::barcharts-plotly-aa.png[Initial Bar Plot using Plotly, width=792, height=500, loading=lazy, title="First bar plot"] + + +== Boxplots Using Plotly +Now let's use plotly to plot a box-plot. + +A box plot (also known as a box-and-whisker plot) is a way of displaying the distribution of data based on the statistics: + +* Minimum +* First quartile (Q1) +* Median (Q2) +* Third quartile (Q3) +* Maximum + +A box plot requires numerical data to display the distribution and summarize its key statistical properties. Let's use the column `DaysOnZillow_AllHomes` and create a new column called `Year` to create a box-plot in plotly. Note, we will need to clean up the `Date` column to be able to create the `Year` column. + + + +[source,python] +---- +import pandas as pd +import plotly.express as px +myDF = pd.read_csv("/anvil/projects/tdm/data/zillow/Metro_time_series.csv") + +# Clean up year +myDF['Year'] = pd.to_datetime(myDF['Date']).dt.year + +filtered_days_zillow = myDF[myDF['DaysOnZillow_AllHomes'].notnull()][['Year', 'DaysOnZillow_AllHomes']] + +fig = px.box( + filtered_days_zillow, + x="Year", + y="DaysOnZillow_AllHomes", + title="Box Plot of Days on Zillow by Year", +) +fig.show() +---- + +image::barcharts-plotly-aa.png[Box-plot using Plotly, width=792, height=500, loading=lazy, title="First Box-plot Plotly"] + + + +== Histograms with Plotly + +[source,python] +---- +import plotly.express as px +import pandas as pd + +myDF = pd.read_csv("/anvil/projects/tdm/data/zillow/Metro_time_series.csv") + +filtered_data = myDF[myDF['DaysOnZillow_AllHomes'].notnull()] + +fig = px.histogram( + filtered_data, + x="DaysOnZillow_AllHomes", + nbins=100, # Adjustable + title="Histogram of Days on Zillow", + labels={"DaysOnZillow_AllHomes": "Days on Zillow"} +) + +fig.show() +---- + +image::histograms-plotly-aa.png[Histograms using Plotly, width=792, height=500, loading=lazy, title="Histogram using Plotly"] + + +Notice how outliers in the histogram appear as really small bars at the extreme ends of the distribution, with very few values compared to the rest of the dataset. The histogram helps us understand the distribution of the number of days the houses are on zillow. Some properties might genuinely stay on Zillow for much longer due to location, pricing, or other factors. + +== Scatterplots with Plotly + +Scatterplots in Plotly visually show relationships between two variables. Each point on the plot represents a data observation, with its position determined by the x and y values. You can plot a scatterplot in Plotly just as you can using Matplotlib. + +Let's plot a scatterplot of `MedianListingPrice_1Bedroom` vs `DaysOnZillow_AllHomes`. + + +[source,python] +---- +import pandas as pd +import plotly.express as px + +myDF = pd.read_csv("/anvil/projects/tdm/data/zillow/Metro_time_series.csv") + +filteredDF_scatter = myDF.dropna(subset=['DaysOnZillow_AllHomes', 'MedianListingPrice_1Bedroom']) + +fig = px.scatter( + filteredDF_scatter, + x='DaysOnZillow_AllHomes', + y='MedianListingPrice_1Bedroom', + title='Days on Zillow vs Median Listing Price for 1-Bedroom Homes', + labels={'DaysOnZillow_AllHomes': 'Days on Zillow (All Homes)', + 'MedianListingPrice_1Bedroom': 'Median Listing Price (1-Bedroom)'} +) + +fig.show() + +---- + +image::scatterplots-plotly-aa.png[Scatterplots using Plotly, width=792, height=500, loading=lazy, title="Scatterplots using Plotly"] + + +Based on the scatterplot, we can see find see some interesting things! There appears to be a general downward trend between the number of days a home is listed on Zillow `DaysOnZillow_AllHomes` and the median listing price for 1-bedroom homes `MedianListingPrice_1Bedroom`. This suggests that higher-priced 1-bedroom homes tend to spend fewer days on Zillow. Also, we can see that a significant concentration of points is clustered around lower listing prices (below $400K) and shorter listing durations (less than 150 days). + + +Notice how the syntax for these visualizations in Plotly remains largely consistent. You only need to change the function call—whether it's px.scatter, px.box, or px.histogram—while keeping most of the parameters the same! + + +== Additional Examples with Plotly + +While this document provides an introduction for using Plotly, watching a live demonstration can significantly enhance your understanding. Dr. Ward has created a video that walks through how to use Plotly, explaining the process step-by-step. This video is particularly helpful for seeing how to apply these techniques and gain deeper insights into Plotly’s capabilities. + +I highly recommend taking some time to watch the video, as it complements this document and provides an additional example. -TDM 20200 Project 8 introduction ++++