Removing hard dependencies on DataFrame libraries is worthwhile, but requires special handling for all DataFrame specific actions. To illustrate consider the Great Tables output below, which is produced from a Pandas DataFrame:
-
+
import pandas as pdimport polars as plfrom great_tables import GT
@@ -256,51 +256,51 @@
GT(df_pandas)
-
+
@@ -336,7 +336,7 @@
Getting column names
The code below shows the different methods required to get column names as a list from Pandas and Polars.
Notice that the two lines of code aren’t too different—Pandas just requires an extra .tolist() piece. We could create a special function, that returns a list of names, depending on the type of the input DataFrame.
-
+
def get_column_names(data) ->list[str]:# pandas specific ----
@@ -371,7 +371,7 @@
How we made Pa
Inverting dependency with databackend
Inverting dependency on DataFrame libraries means that we check whether something is a specific type of DataFrame, without using imports. This is done through the package databackend, which we copied into Great Tables.
It works by creating placeholder classes, which stand in for the DataFrames they’re detecting:
-
+
from great_tables._databackend import AbstractBackend
@@ -400,7 +400,7 @@
Inve
Separating concerns with singledispatch
While databackend removes dependencies, the use of singledispatch from the built-in functools module separates out the logic for handling Polars DataFrames from the logic for Pandas DataFrames. This makes it easier to think one DataFrame at a time, and also gets us better type hinting.
Here’s a basic example, showing the get_column_names() function re-written using singledispatch:
-
+
from functools import singledispatch
@@ -431,7 +431,7 @@
Se
The use of PdDataFrame is what signifies “run this for Pandas DataFrames”.
With the get_column_names implementations defined, we can call it like a normal function:
-
+
get_column_names(df_pandas) # pandas versionget_column_names(df_polars) # polars version
Let’s look at an example of a simple table with actual data to tie this theory to practice.
-
+
-
+
@@ -442,7 +442,7 @@
A
Table Footer: a place for additional information pertaining to the table content
Here’s a table that takes advantage of the different components available in Great Tables. It contains the names and addresses of people.
-
+
Show the code
from great_tables import GT, md, system_fonts
@@ -466,51 +466,51 @@
A
)
-
+
@@ -612,7 +612,7 @@
Formatting
a compact integer value (fmt_integer()): 134K
The problem grows worse when values need to be conveyed as images or plots. If you’re a medical analyst, for example, you might need to effectively convey whether test results for a patient are improving or worsening over time. Reading such data as a sequence of numbers across a row can slow interpretation. But by using nanoplots, available as the fmt_nanoplot() formatting method, readers can spot trends right away. Here’s an example that provides test results over a series of days.
We enjoy working on Great Tables because we want everybody to easily make beautiful tables. Tables don’t have to be boring, they really could be captivating and insightful. With every release we get closer and closer to realizing our mission and, as such, we’re happy to announce the v0.2.0 release that’s now on PyPI.
The really big feature that’s available with this release is the data_color() method. It gives you several options for colorizing data cells based on the underlying data. The method automatically scales color values according to the data in order to emphasize differences or reveal trends. The example below emphasizes large currency values with a "darkgreen" fill color.
-
+
from great_tables import GT, exibble(
@@ -220,51 +220,51 @@
Great Tables v0.2.0: Easy Data Coloring
))
-
+
@@ -306,7 +306,7 @@
Great Tables v0.2.0: Easy Data Coloring
Note that we use columns= to specify which columns get the colorizing treatment (just currency here) and the palette= is given as a list of color values. From this we can see that the 65100.0 value polarizes the data coloring process; it is "darkgreen" while all other values are "lightblue" (with no interpolated colors in between). Also, isn’t it nice that the text adapts to the background color?
The above example is suitable for emphasizing large values, but, maybe you consider the extreme value to be something that’s out of bounds? For that, we can use the domain= and na_value= arguments to gray-out the extreme values. We’ll also nicely format the currency column in this next example.
Now the very large value is in "lightgray", making all other values easier to compare. We did setting domain=[0, 50] and specifying na_color="lightgray". This caused the out-of-bounds value of 65100 to have a light gray background. Notice that the values are also formatted as currencies, and this is thanks to fmt_currency() which never interferes with styling.
Here’s a more inspirational example that uses a heavily-manipulated version of the countrypops dataset (thanks again, Polars!) along with a color treatment that’s mediated by data_color(). Here, the population values can be easily compared by the amount of "purple" within them.
-
+
from great_tables.data import countrypopsimport polars as plimport polars.selectors as cs
@@ -434,51 +434,51 @@
Before v0.3.0, you could not alter the widths of individual columns. This meant that to great extent your content decided the width of individual columns. Even though browsers do an adequate job in sizing the widths of table columns, it doesn’t always result in a pleasing-to-look-at table. What if you want more space? Maybe you want consistently-sized columns? There’s many reasons to want to have a choice in the matter and the new cols_width() method now makes this possible.
Here’s an example where the widths of all columns are set with our preferred length values (in px).
Setting options across the entire table with tab_options()
The new tab_options() method gives you the freedom to specify any of dozens of global style and layout options for the table. Want a font that’s used across all cells? Use the table_font_names= option. Do you need to make the text smaller, but only in the stub? Use stub_font_size= for that. The number of options is perhaps overwhelming at first but we think you’ll enjoy having them around nonetheless. It makes styling the table (and developing your own table themes) a relatively simple task.
Here’s an example that creates a table with a few common components and then uses tab_options() to set up a collection of fonts for the table with the (also new) system_fonts() function:
system_fonts() helper function in Great Tables makes this easy by providing you with themed, local font stacks that are meant to work across different computing platforms.
Here’s another example where we set the width of the table to span across the entire page (or containing element).
Using the new opt_*() methods to do more complex tasks with table options
While tab_options() is a great method for setting global table options, sometimes you want to set a number of them at once for a combined effect. For that type of operation, we have the opt_*() series of methods. A common thing you might do is align the content in the table header, we can make that an easy thing with opt_align_table_header():
-
+
gt_tbl.opt_align_table_header(align="left")
-
+
@@ -921,54 +921,54 @@
tab_options() to find the two args you need to get the job done.
The opt_all_caps() method transforms the text within the column labels, the stub, and in all row groups so that we get an all-capitalized (yet somewhat sized down) look that better differentiates the labels from the data. It’s rather easy to use, just do this:
-
+
gt_tbl.opt_all_caps()
-
+
@@ -1058,54 +1058,54 @@
tab_options() all at once, making life generally easier.
Here’s one last example, this time using opt_vertical_padding(). You’d use that if you’re dissatisfied with the level of top/bottom padding within cells of all locations (e.g., in the table body, in the column labels, etc.). You can either make a table taller or more ‘compressed’ with a single argument: scale=. Here’s an example where the amount of vertical padding is reduced, resulting in a table taking up less vertical space.
-
+
gt_tbl.opt_vertical_padding(scale=0.5)
-
+
@@ -1206,7 +1206,7 @@
A new formatting method: fmt_image()
Wouldn’t it be great to add graphics to your table? The fmt_image() method provides an easy way to add image files on disk into table body cells. The cells need to contain some reference to an image file. The path= and file_pattern= arguments give you some flexibility in defining exactly where the image files live.
Here’s an example using the metro dataset that’s included within Great Tables.
The recent v0.4.0 release of Great Tables contains nanoplots as a major new feature. So, in this post I’ll concentrate on showing you all the things you can do with nanoplots. What are nanoplots? Well, with nanoplots you can do this:
-
+
Show the code
from great_tables import GT, md
@@ -245,51 +245,51 @@
Great Tables v0.4.0: Nanoplots and More
)
-
+
@@ -367,7 +367,7 @@
Great Tables v0.4.0: Nanoplots and More
Nanoplots, small interactive plots in your table
Nanoplots are small yet information-laden plots that fit nicely into table cells. They are interactive, allowing for more information to be shown on hovering (or through touch when that interaction is available). Nanoplots try to show individual data points with reasonably good visibility (space is limited, this is going in a table after all!) and the plot representations change depending on the data fed into them.
We can generate nanoplots via the fmt_nanoplot() method. Let’s make two nanoplots of the two different available plot types: "line" and "bar":
It’s possible to add in a reference line and a reference area to individual plots. These may be useful to highlight a particular statistic (e.g., median or minimum value) or a bounded region of interest (e.g., the area between the first and third quartiles). Here is an example of how to use these options via the reference_line= and reference_area= arguments:
We can also have single-value bar plots and line plots. These will run in the horizontal direction and such plots are meant for easy value comparisons (which works great in tables). To make this work, give fmt_nanoplot() a column of numeric values. The following example shows how fmt_nanoplot() can be used to create single-value bar and line plots.
We provide a lot of options for customizing your nanoplots. With the nanoplot_options() helper function, it’s possible to change the look and feel for a set of nanoplots. The options= argument of fmt_nanoplot() is where you’d need to invoke that helper function. Some possibilities for customization include determining which nanoplot elements are present, changing the sizes and colors of different elements, and a whole lot more! Here’s an example where both line- and bar-based nanoplots retain their basic compositional elements, but their appearance is quite different.
-
+
from great_tables import nanoplot_options(
@@ -719,51 +719,51 @@
Let’s get right to making a display table with Great Tables. The package has quite a few datasets and so we’ll start by making use of the very small, but useful, exibble dataset. After importing the GT class and that dataset, we’ll introduce that Pandas table to GT().
-
+
from great_tables import GT, exibble# Create a display table with the `exibble` dataset
@@ -241,51 +241,51 @@
A Basic Table
# Now, show the gt tablegt_tbl
-
+
@@ -405,7 +405,7 @@
A Basic Table
More Complex Tables
Let’s take things a bit further and create a table with the included gtcars dataset. Great Tables provides a large selection of methods and they let you refine the table display. They were designed so that you can easily create a really presentable and beautiful table visualization.
For this next table, we’ll incorporate a Stub component and this provides a place for the row labels. Groupings of rows will be generated through categorical values in a particular column (we just have to cite the column name for that to work). We’ll add a table title and subtitle with tab_header(). The numerical values will be formatted with the fmt_integer() and fmt_currency() methods. Column labels will be enhanced via cols_label() and a source note will be included through use of the tab_source_note() method. Here is the table code, followed by the table itself.
More Complex Tables .tab_source_note(source_note="Source: the gtcars dataset within the Great Tables package.")
)
-
+
@@ -595,7 +595,7 @@
More Complex Tables
With the six different methods applied, the table looks highly presentable! The rendering you’re seeing here has been done through Quarto (this entire site has been generated with quartodoc). If you haven’t yet tried out Quarto, we highly recommend it!
For this next example we’ll use the airquality dataset (also included in the package; it’s inside the data submodule). With this table, two spanners will be added with the tab_spanner() method. This method is meant to be easy to use, you only need to provide the text for the spanner label and the columns associated with the spanner. We also make it easy to move columns around. You can use cols_move_to_start() (example of that below) and there are also the cols_move_to_end() and cols_move() methods.
-
+
from great_tables.data import airqualityairquality_mini = airquality.head(10).assign(Year=1973)
@@ -617,51 +617,51 @@
More Complex Tables .cols_move_to_start(columns=["Year", "Month", "Day"])
)
-
+
@@ -810,7 +810,7 @@
Formatting Table Ce
fmt(): set a column format with a formatting function
We strive to make formatting a simple task but we also want to provide the user a lot of power through advanced options and we ensure that varied combinations of options works well. For example, most of the formatting methods have a locale= argument. We want as many users as possible to be able to format numbers, dates, and times in ways that are familiar to them and are adapted to their own regional specifications. Now let’s take a look at an example of this with a smaller version of the exibble dataset:
Formatting Table Ce
.fmt_time(columns="time", time_style="h_m_s_p"))
-
+
@@ -904,7 +904,7 @@
Formatting Table Ce
Using Styles within a Table
We can use the tab_style() method in combination with loc.body() and various style.*() functions to set styles on cells of data within the table body. For example, the table-making code below applies a yellow background color to the targeted cells.
Aside from style.fill() we can also use style.text() and style.borders() to focus the styling on cell text and borders. Here’s an example where we perform several types of styling on targeted cells (the key is to put the style.*() calls in a list).
-
+
from great_tables import GT, style, exibble(
@@ -1049,51 +1049,51 @@
Using Styles w
))
-
+
@@ -1148,7 +1148,7 @@
Using Styles w
Column Selection with Polars (and How It Helps with Styling)
Styles can also be specified using Polars expressions. For example, the code below uses the Temp column to set color to "lightyellow" or "lightblue".
-
+
import polars as plfrom great_tables import GT, from_column, style, loc
@@ -1172,51 +1172,51 @@
)
)
-
+
@@ -1280,7 +1280,7 @@
+
import polars.selectors as cs(
@@ -1294,51 +1294,51 @@
Great Tables: The Polars DataFrame Styler of Your Dreams
However, there are fewer options for styling tables for presentation. You could convert from polars to pandas, and use the built-in pandas DataFrame styler, but this has one major limitation: you can’t use polars expressions.
As it turns out, polars expressions make styling tables very straightforward. The same polars code that you would use to select or filter combines with Great Tables to highlight, circle, or bolden text.
In this post, I’ll show how Great Tables uses polars expressions to make delightful tables, like the one below.
-
+
Code
import polars as pl
@@ -277,51 +277,51 @@
Great Tables: The Polars DataFrame Styler of Your Dreams
)
-
+
@@ -469,7 +469,7 @@
Th
Creating GT object
First, we’ll import the necessary libraries, and do a tiny bit of data processing.
-
+
import polars as plimport polars.selectors as cs
@@ -495,56 +495,56 @@
Creating GT object
The default polars output above is really helpful for data analysis! By passing it to the GT constructor, we can start getting it ready for presentation.
-
+
gt_air = GT(pl_airquality)gt_air
-
+
@@ -618,7 +618,7 @@
Creating GT object
Set title and subtitle
The simplest method in gt is GT.tab_header(), which lets you add a title and subtitle.
-
+
( gt_air
@@ -629,51 +629,51 @@
Set title and subti
))
-
+
@@ -754,7 +754,7 @@
Set title and subti
Set body styles
The .tab_style() method sets styles—like fill color, or text properties—on different parts of the table. Let’s use it twice with a polars expression. First to highlight the row corresponding to the max Wind value, and then to bold that value.
Notice that there are now labels for “Time” and “Measurement” sitting above the column names. This is useful for emphasizing columns that share something in common.
Use GT.cols_labels() with html() to create human-friendly labels (e.g. convert things like cal_m_2 to cal/m2).
-
+
from great_tables import html(
@@ -1060,51 +1060,51 @@
Set column spanners )
)
-
+
@@ -1187,7 +1187,7 @@
Set column spanners
Putting it all together
Finally, we’ll combine everything from the sections above into a single block of code, and use a few more rows of data.
Let’s say you choose the digits above, and write this as 4/7—meaning a final digit of 4 for home and 7 for away. You would mark yourself on this square:
-
+
Code
df = (
@@ -256,51 +256,51 @@
What is Super B
)
-
+
@@ -477,7 +477,7 @@
Why analyze squares?
What squares are most likely to win?
We looked back at games for the KC Chiefs (away), and games for the San Francisco 49ers (home), and calculated the proportion of the time each team ended with a specific digit. Putting this together for the two teams, here is the chance of winning on a given square: