-
Notifications
You must be signed in to change notification settings - Fork 796
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Example of a correlation map #1945
base: main
Are you sure you want to change the base?
Conversation
Pull most recent changes
Original authors @vcuspinera and @AndresPitta
heatmap = alt.Chart(corrMatrix_line).encode( | ||
alt.Y('Var1:N', title = ''), | ||
alt.X('Var2:N', title = '', axis=alt.Axis(labelAngle=20)) | ||
).mark_rect().encode( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would move mark_rect()
to directly after alt.Chart()
and only have a single call to encode()
like many of the other examples.
@@ -0,0 +1,52 @@ | |||
""" | |||
Correlation matrix | |||
-------------- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's important that the length of the underline matches the length of the title when the docs are compiled in Sphinx. You just need to add a few more dashes.
Thanks @eitanlees I'll address your comments soon! |
Sorry - this fell off my radar. Looking at it, it seems like a fairly immense amount of code to create a relatively straightforward chart, so I'm hesitant to add this example as-is to the main example gallery. |
Maybe simplify it to something like this? import altair as alt
from vega_datasets import data
df_iris = data.iris()
corrMatrix = df_iris.corr().reset_index().melt('index')
corrMatrix.columns = ['var1', 'var2', 'correlation']
base = alt.Chart(corrMatrix).transform_filter(
alt.datum.var1 < alt.datum.var2
).encode(
x='var1',
y='var2',
).properties(
width=alt.Step(100),
height=alt.Step(100)
)
rects = base.mark_rect().encode(
color='correlation'
)
text = base.mark_text(
size=30
).encode(
text=alt.Text('correlation', format=".2f"),
color=alt.condition(
"datum.correlation > 0.5",
alt.value('white'),
alt.value('black')
)
)
rects + text |
Or, if you want both versions of the chart together: import altair as alt
from vega_datasets import data
df_iris = data.iris()
corrMatrix = df_iris.corr().reset_index().melt('index')
corrMatrix.columns = ['var1', 'var2', 'correlation']
chart = alt.Chart(corrMatrix).mark_rect().encode(
x=alt.X('var1', title=None),
y=alt.Y('var2', title=None),
color=alt.Color('correlation', legend=None),
).properties(
width=alt.Step(80),
height=alt.Step(80)
)
chart += chart.mark_text(size=25).encode(
text=alt.Text('correlation', format=".2f"),
color=alt.condition(
"datum.correlation > 0.5",
alt.value('white'),
alt.value('black')
)
)
chart | chart.transform_filter("datum.var1 < datum.var2") |
Thanks that is indeed much cleaner! I'm happy with the above and can submit a commit once the term is over... |
@jakevdp @firasm Assuming that wanting to sort the labels of a heatmap in non-alphabetical order is not rare (spent a lot of time on this personally), would it make sense to modify this example to allow for a custom sort? For example, if I want to have the rows and columns sorted in this order: import altair as alt
from vega_datasets import data
# create corr map
source = data.iris()
source_corr = source.corr().reset_index().melt(id_vars='index')
# create dummy ordinal var
sort = {'petalWidth': 0, 'petalLength': 1, 'sepalWidth': 2, 'sepalLength': 3}
heatmap = alt.Chart(source_corr)\
.mark_rect()\
.transform_calculate(
order_rows='%s [datum.index]' % sort,
order_cols='%s [datum.variable]' % sort
)\
.transform_filter(alt.datum.order_rows <= alt.datum.order_cols)\
.encode(
alt.X('index:N', title=None, sort=list(sort.keys())),
alt.Y('variable:N', title=None, sort=list(sort.keys())),
alt.Color('value:Q', legend=None)
)\
.properties(width=300, height=300)
text = heatmap\
.mark_text(size=25)\
.encode(
alt.Text('value:Q', format='.2f'),
color=alt.condition(
'datum.value > 0.5',
alt.value('white'),
alt.value('black')
)
)
heatmap + text Adapted from this StackOverflow question. |
I started working on a package to facilitate creating these plots that might be too complex for the gallery, and that you would want to have easily accessible when doing EDA etc. I included correlation plots, even if they looks somewhat different from what is suggested here: You can see some more examples here. I haven't created a release on PyPI yet and I still need to fix some things, but am happily accepting suggestions for what to include. Also @jakevdp, let me know if you want me to name it something else, in case |
Is it possible to change the jakevdp graph layout from blue colors to red colors? |
@pedromorais007 see this answer: #2779 |
Thanks mattijn for your suggestion.
|
with a normal heatmap this works: import altair as alt
import numpy as np
import pandas as pd
# Compute x^2 + y^2 across a 2D grid
x, y = np.meshgrid(range(-5, 5), range(-5, 5))
z = x**2 + y**2
# Convert this grid to columnar data expected by Altair
source = pd.DataFrame({"x": x.ravel(), "y": y.ravel(), "z": z.ravel()})
c = alt.Chart(source, height=alt.Step(12), width=alt.Step(12)).mark_rect().encode(
x="x:O",
y="y:O",
color=alt.Color("z:Q", scale=alt.Scale(scheme='reds'))
)
c + c.mark_text(size=7).encode(text=alt.Text("z"), color=alt.value("white")) I suspect something is overruling the color scheme in streamlit what you seems using ( |
@pedromorais007 Does it work the way you want if you remove |
Here's a PR of a correlation map that my students (@vcuspinera and @AndresPitta) created.
the output of this example is
Not sure if this is something worth adding to the examples and admittedly this is similar to the Layered heat map with text example.
I think it would be worth adding if I could show only half of the correlation matrix like this example from here