Skip to content

Commit

Permalink
Editing the Plotly Document
Browse files Browse the repository at this point in the history
Still incorporating videos
  • Loading branch information
arroyo38 committed Jan 2, 2025
1 parent 5789d3f commit f8a2b08
Show file tree
Hide file tree
Showing 6 changed files with 268 additions and 2 deletions.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Empty file.
270 changes: 268 additions & 2 deletions tools-appendix/modules/python/pages/plotly-examples.adoc
Original file line number Diff line number Diff line change
@@ -1,6 +1,272 @@
= Plotly Examples
= Plotly

Being able to analyze and create good visualizations is a skill that is invaluable in many fields. It can be pretty fun too! In this document, we will dive into ploting using `plotly`, a very popular source graphing library that can interact graphs online.

Plotly Express is a simple way to make interactive graphs directly from your data. It works with pandas DataFrames.

Once you create a graph (like a bar chart or scatter plot), you can edit it. You can add titles, change axis labels, adjust colors, or pick a different theme to make it look exactly how you want (similar to Matplotlib). Also, the graphs are interactive—you can hover over points, zoom in, and explore your data. Visit the https://plotly.com/python/plotly-express/[Plotly Express Documentation] for examples and detailed explanations. It’s a great resource to learn about all the different graph types and how to customize them.

In this document we will:

* Create visualizations using plotly.
* Modify axis labels, titles, and other graph elements.
* Customize plots with colors, shapes, and line types.
* Explore dataset preparation and handling missing values.
* Try to understanding trends and patterns in the data using Plotly.
== Understanding the Data Prior to Plotting

We will use the following dataset(s) to understand plotly:

`/anvil/projects/tdm/data/zillow/Metro_time_series.csv`

First, let's review the dataset's columns to identify which ones might be suitable for creating a plot. We can use the `.columns` function

[source,python]
----
import plotly.express as px
import pandas as pd
myDF = pd.read_csv("/anvil/projects/tdm/data/zillow/Metro_time_series.csv")
myDF.columns
----

----
Index(['Date', 'RegionName', 'AgeOfInventory', 'DaysOnZillow_AllHomes',
'InventorySeasonallyAdjusted_AllHomes', 'InventoryRaw_AllHomes',
'InventorySeasonallyAdjusted_BottomTier',
'InventorySeasonallyAdjusted_MiddleTier',
'InventorySeasonallyAdjusted_TopTier',
'MedianListingPricePerSqft_1Bedroom',
'MedianListingPricePerSqft_2Bedroom',
'MedianListingPricePerSqft_3Bedroom',
'MedianListingPricePerSqft_4Bedroom',
'MedianListingPricePerSqft_5BedroomOrMore',
'MedianListingPricePerSqft_AllHomes',
'MedianListingPricePerSqft_CondoCoop',
'MedianListingPricePerSqft_DuplexTriplex',
'MedianListingPricePerSqft_SingleFamilyResidence',
'MedianListingPrice_1Bedroom', 'MedianListingPrice_2Bedroom',
'MedianListingPrice_3Bedroom', 'MedianListingPrice_4Bedroom',
'MedianListingPrice_5BedroomOrMore', 'MedianListingPrice_AllHomes',
'MedianListingPrice_CondoCoop', 'MedianListingPrice_DuplexTriplex',
'MedianListingPrice_SingleFamilyResidence',
'MedianPctOfPriceReduction_AllHomes',
'MedianPctOfPriceReduction_CondoCoop',
'MedianPctOfPriceReduction_SingleFamilyResidence',
'MedianPriceCutDollar_AllHomes', 'MedianPriceCutDollar_CondoCoop',
'MedianPriceCutDollar_SingleFamilyResidence',
'MedianRentalPricePerSqft_1Bedroom',
'MedianRentalPricePerSqft_2Bedroom',
'MedianRentalPricePerSqft_3Bedroom',
'MedianRentalPricePerSqft_4Bedroom',
'MedianRentalPricePerSqft_5BedroomOrMore',
'MedianRentalPricePerSqft_AllHomes',
'MedianRentalPricePerSqft_CondoCoop',
'MedianRentalPricePerSqft_DuplexTriplex',
'MedianRentalPricePerSqft_MultiFamilyResidence5PlusUnits',
'MedianRentalPricePerSqft_SingleFamilyResidence',
'MedianRentalPricePerSqft_Studio', 'MedianRentalPrice_1Bedroom',
'MedianRentalPrice_2Bedroom', 'MedianRentalPrice_3Bedroom',
'MedianRentalPrice_4Bedroom', 'MedianRentalPrice_5BedroomOrMore',
'MedianRentalPrice_AllHomes', 'MedianRentalPrice_CondoCoop',
'MedianRentalPrice_DuplexTriplex',
'MedianRentalPrice_MultiFamilyResidence5PlusUnits',
'MedianRentalPrice_SingleFamilyResidence', 'MedianRentalPrice_Studio',
'ZHVIPerSqft_AllHomes', 'PctOfHomesDecreasingInValues_AllHomes',
'PctOfHomesIncreasingInValues_AllHomes',
'PctOfHomesSellingForGain_AllHomes',
'PctOfHomesSellingForLoss_AllHomes',
'PctOfListingsWithPriceReductionsSeasAdj_AllHomes',
'PctOfListingsWithPriceReductionsSeasAdj_BottomTier',
'PctOfListingsWithPriceReductionsSeasAdj_CondoCoop',
'PctOfListingsWithPriceReductionsSeasAdj_MiddleTier',
'PctOfListingsWithPriceReductionsSeasAdj_SingleFamilyResidence',
'PctOfListingsWithPriceReductionsSeasAdj_TopTier',
'PctOfListingsWithPriceReductions_AllHomes',
'PctOfListingsWithPriceReductions_BottomTier',
'PctOfListingsWithPriceReductions_CondoCoop',
'PctOfListingsWithPriceReductions_MiddleTier',
'PctOfListingsWithPriceReductions_SingleFamilyResidence',
'PctOfListingsWithPriceReductions_TopTier', 'PriceToRentRatio_AllHomes',
'Sale_Counts_Msa', 'Sale_Counts_Seas_Adj_Msa', 'Sale_Prices_Msa',
'InventoryTierShare_BottomTier_AllHomes',
'InventoryTierShare_MiddleTier_AllHomes',
'InventoryTierShare_TopTier_AllHomes', 'ZHVI_1bedroom', 'ZHVI_2bedroom',
'ZHVI_3bedroom', 'ZHVI_4bedroom', 'ZHVI_5BedroomOrMore',
'ZHVI_AllHomes', 'ZHVI_BottomTier', 'ZHVI_CondoCoop', 'ZHVI_MiddleTier',
'ZHVI_SingleFamilyResidence', 'ZHVI_TopTier', 'ZRI_AllHomes',
'ZRI_AllHomesPlusMultifamily', 'ZriPerSqft_AllHomes',
'Zri_MultiFamilyResidenceRental', 'Zri_SingleFamilyResidenceRental'],
dtype='object')
----

As we can see, there are many columns in the dataset! It's essential to understand the available columns before creating any visualizations.

You can use the .head() function and the .tail() function to get a further preview of the data.



== Barplots Using Plotly

Let's start by plotting a Barplot of `MedianListingPrice_1Bedroom` and the first 10 regions that are not null. There are some missing values in our dataset, so we will need to filter for when they are not null.

[source,python]
----
myDF = pd.read_csv("/anvil/projects/tdm/data/zillow/Metro_time_series.csv")
filteredDF = myDF[myDF['MedianListingPrice_1Bedroom'].notnull()][['RegionName', 'MedianListingPrice_1Bedroom']].head(10)
print(filteredDF)
----

----
RegionName MedianListingPrice_1Bedroom
124298 12060 132130.0
124313 12700 249900.0
124352 14460 195000.0
124359 14720 324000.0
124367 15180 113750.0
124412 17200 98500.0
124594 25540 95000.0
124605 25940 449000.0
124799 34820 106900.0
124810 35300 135000.0
----

Now that we see the values we will be plotting, let's use the `plotly` library to plot a barchart of this data!

[source,python]
----
fig = px.bar(
filteredDF,
x='RegionName',
y='MedianListingPrice_1Bedroom',
title='Median Listing Price for 1-Bedroom Homes by Region (First 10)',
labels={'RegionName': 'Region', 'MedianListingPrice_1Bedroom': 'Median Listing Price ($)'}
)
fig.update_layout(
xaxis_title="Region",
yaxis_title="Median Listing Price ($)",
template="plotly_white"
)
fig.show()
----
image::barcharts-plotly-aa.png[Initial Bar Plot using Plotly, width=792, height=500, loading=lazy, title="First bar plot"]


== Boxplots Using Plotly
Now let's use plotly to plot a box-plot.

A box plot (also known as a box-and-whisker plot) is a way of displaying the distribution of data based on the statistics:

* Minimum
* First quartile (Q1)
* Median (Q2)
* Third quartile (Q3)
* Maximum

A box plot requires numerical data to display the distribution and summarize its key statistical properties. Let's use the column `DaysOnZillow_AllHomes` and create a new column called `Year` to create a box-plot in plotly. Note, we will need to clean up the `Date` column to be able to create the `Year` column.



[source,python]
----
import pandas as pd
import plotly.express as px
myDF = pd.read_csv("/anvil/projects/tdm/data/zillow/Metro_time_series.csv")
# Clean up year
myDF['Year'] = pd.to_datetime(myDF['Date']).dt.year
filtered_days_zillow = myDF[myDF['DaysOnZillow_AllHomes'].notnull()][['Year', 'DaysOnZillow_AllHomes']]
fig = px.box(
filtered_days_zillow,
x="Year",
y="DaysOnZillow_AllHomes",
title="Box Plot of Days on Zillow by Year",
)
fig.show()
----

image::barcharts-plotly-aa.png[Box-plot using Plotly, width=792, height=500, loading=lazy, title="First Box-plot Plotly"]



== Histograms with Plotly

[source,python]
----
import plotly.express as px
import pandas as pd
myDF = pd.read_csv("/anvil/projects/tdm/data/zillow/Metro_time_series.csv")
filtered_data = myDF[myDF['DaysOnZillow_AllHomes'].notnull()]
fig = px.histogram(
filtered_data,
x="DaysOnZillow_AllHomes",
nbins=100, # Adjustable
title="Histogram of Days on Zillow",
labels={"DaysOnZillow_AllHomes": "Days on Zillow"}
)
fig.show()
----

image::histograms-plotly-aa.png[Histograms using Plotly, width=792, height=500, loading=lazy, title="Histogram using Plotly"]


Notice how outliers in the histogram appear as really small bars at the extreme ends of the distribution, with very few values compared to the rest of the dataset. The histogram helps us understand the distribution of the number of days the houses are on zillow. Some properties might genuinely stay on Zillow for much longer due to location, pricing, or other factors.

== Scatterplots with Plotly

Scatterplots in Plotly visually show relationships between two variables. Each point on the plot represents a data observation, with its position determined by the x and y values. You can plot a scatterplot in Plotly just as you can using Matplotlib.

Let's plot a scatterplot of `MedianListingPrice_1Bedroom` vs `DaysOnZillow_AllHomes`.


[source,python]
----
import pandas as pd
import plotly.express as px
myDF = pd.read_csv("/anvil/projects/tdm/data/zillow/Metro_time_series.csv")
filteredDF_scatter = myDF.dropna(subset=['DaysOnZillow_AllHomes', 'MedianListingPrice_1Bedroom'])
fig = px.scatter(
filteredDF_scatter,
x='DaysOnZillow_AllHomes',
y='MedianListingPrice_1Bedroom',
title='Days on Zillow vs Median Listing Price for 1-Bedroom Homes',
labels={'DaysOnZillow_AllHomes': 'Days on Zillow (All Homes)',
'MedianListingPrice_1Bedroom': 'Median Listing Price (1-Bedroom)'}
)
fig.show()
----

image::scatterplots-plotly-aa.png[Scatterplots using Plotly, width=792, height=500, loading=lazy, title="Scatterplots using Plotly"]


Based on the scatterplot, we can see find see some interesting things! There appears to be a general downward trend between the number of days a home is listed on Zillow `DaysOnZillow_AllHomes` and the median listing price for 1-bedroom homes `MedianListingPrice_1Bedroom`. This suggests that higher-priced 1-bedroom homes tend to spend fewer days on Zillow. Also, we can see that a significant concentration of points is clustered around lower listing prices (below $400K) and shorter listing durations (less than 150 days).


Notice how the syntax for these visualizations in Plotly remains largely consistent. You only need to change the function call—whether it's px.scatter, px.box, or px.histogram—while keeping most of the parameters the same!


== Additional Examples with Plotly

While this document provides an introduction for using Plotly, watching a live demonstration can significantly enhance your understanding. Dr. Ward has created a video that walks through how to use Plotly, explaining the process step-by-step. This video is particularly helpful for seeing how to apply these techniques and gain deeper insights into Plotly’s capabilities.

I highly recommend taking some time to watch the video, as it complements this document and provides an additional example.

TDM 20200 Project 8 introduction

++++
<iframe id="kaltura_player" src="https://cdnapisec.kaltura.com/p/983291/sp/98329100/embedIframeJs/uiconf_id/29134031/partner_id/983291?iframeembed=true&playerId=kaltura_player&entry_id=1_jn003aiz&flashvars[streamerType]=auto&amp;flashvars[localizationCode]=en&amp;flashvars[leadWithHTML5]=true&amp;flashvars[sideBarContainer.plugin]=true&amp;flashvars[sideBarContainer.position]=left&amp;flashvars[sideBarContainer.clickToClose]=true&amp;flashvars[chapters.plugin]=true&amp;flashvars[chapters.layout]=vertical&amp;flashvars[chapters.thumbnailRotator]=false&amp;flashvars[streamSelector.plugin]=true&amp;flashvars[EmbedPlayer.SpinnerTarget]=videoHolder&amp;flashvars[dualScreen.plugin]=true&amp;flashvars[Kaltura.addCrossoriginToIframe]=true&amp;&wid=1_aheik41m" allowfullscreen webkitallowfullscreen mozAllowFullScreen allow="autoplay *; fullscreen *; encrypted-media *" sandbox="allow-downloads allow-forms allow-same-origin allow-scripts allow-top-navigation allow-pointer-lock allow-popups allow-modals allow-orientation-lock allow-popups-to-escape-sandbox allow-presentation allow-top-navigation-by-user-activation" frameborder="0" title="TDM 10100 Project 13 Question 1"></iframe>
Expand Down

0 comments on commit f8a2b08

Please sign in to comment.