Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support mask= argument in LocBody #566

Merged
merged 13 commits into from
Jan 22, 2025
Merged

Conversation

jrycw
Copy link
Collaborator

@jrycw jrycw commented Dec 16, 2024

Related Issue: #389.

Hello team,

This is a rough idea for supporting the passing of Polars expressions to the mask= argument of LobBody.

The procedures are outlined as follows:

  1. Exclude the stub and group columns if necessary.
  2. Use df.select() to process the Polars expression and obtain a boolean-like matrix.
  3. Use two for-loops to iterate through the cells and select the desired cells based on whether their values are True.

This idea is preliminary but should provide a foundation for further discussions.

import polars as pl
from great_tables import GT, loc, style
from great_tables.data import gtcars
from polars import selectors as cs

df_mini = pl.from_pandas(gtcars).head().pivot(on="year", index=["trim", "mfr"], values="hp")
(
    GT(df_mini,rowname_col="trim", groupname_col="mfr")
    .tab_style(
        style=style.text(color="red"),
        locations=loc.body(mask=cs.numeric().gt(600))
    )
)

image

print(df_mini.select(cs.numeric().gt(600)))
shape: (3, 4)
┌────────┬────────┬────────┬────────┐
│ 2017.0 ┆ 2015.0 ┆ 2014.0 ┆ 2016.0 │
│ ---    ┆ ---    ┆ ---    ┆ ---    │
│ bool   ┆ bool   ┆ bool   ┆ bool   │
╞════════╪════════╪════════╪════════╡
│ true   ┆ null   ┆ null   ┆ null   │
│ null   ┆ false  ┆ false  ┆ true   │
│ null   ┆ false  ┆ null   ┆ null   │
└────────┴────────┴────────┴────────┘

Copy link

codecov bot commented Dec 16, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 90.73%. Comparing base (4043f14) to head (be7c141).
Report is 15 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #566      +/-   ##
==========================================
+ Coverage   90.67%   90.73%   +0.06%     
==========================================
  Files          46       46              
  Lines        5381     5416      +35     
==========================================
+ Hits         4879     4914      +35     
  Misses        502      502              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jrycw jrycw marked this pull request as draft December 16, 2024 08:36
@jrycw jrycw changed the title Support mask= argument in GT.tab_style() Support mask= argument in LocBody() Dec 16, 2024
@jrycw jrycw changed the title Support mask= argument in LocBody() Support mask= argument in LocBody Dec 16, 2024
cols_excl = [*(stub_var if excl_stub else []), *(group_var if excl_group else [])]

frame: PlDataFrame = data._tbl_data
df = frame.select(expr)
Copy link
Collaborator

@machow machow Dec 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the rules for the mask were as follows?:

  • the mask expressions should--when run by frame.select(mask_expr)--return a DataFrame that:
    • columns: has the same or fewer columns (names must match original frame, but could be in a different order).
    • rows: has the same number of rows
  • after running the expression, we validate this right away
  • the mask is assumed to be in the original row order
  • in this case we can capture the column name and row number of each cell (e.g. with enumerate(mask_result.iter_rows())).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for summarizing the rules.

It seems that for the last part, we only need a single loop to gather all the information required by CellPos.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this version?

For prototyping, use assert for validation, which can later be replaced with raise ValueError().

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've replaced assert with raise ValueError() and added more test cases.

@machow
Copy link
Collaborator

machow commented Jan 14, 2025

@jrycw this is looking really good! Anything else needed for this PR? I'm happy to help with any pieces!

@jrycw
Copy link
Collaborator Author

jrycw commented Jan 14, 2025

@machow One thing I’m unsure about is the naming of the function. Specifically, I’m not clear on what the i or c represents in resolve_rows_i and resolve_cols_c. Does resolve_mask_i make sense? Other than that, I think the feature is ready for users to experiment with. The idea of mask= is great, and I believe it will offer a lot of flexibility to users.

@rich-iannone
Copy link
Member

@machow One thing I’m unsure about is the naming of the function. Specifically, I’m not clear on what the i or c represents in resolve_rows_i and resolve_cols_c. Does resolve_mask_i make sense? Other than that, I think the feature is ready for users to experiment with. The idea of mask= is great, and I believe it will offer a lot of flexibility to users.

I can help!

i == index (int)
c == column (str)

this is admittedly a bit of a holdover from the R package (which also has the resolve_cols_i() and resolve_rows_l() functions). But based on this, maybe just resolve_mask is fine?

@jrycw
Copy link
Collaborator Author

jrycw commented Jan 15, 2025

@rich-iannone Thanks for the explanation! I agree that resolve_mask seems like a better fit here.

@machow
Copy link
Collaborator

machow commented Jan 15, 2025

From a quick glance it's looking really nice -- if it's ready for review I'm happy to take a closer look!

@jrycw
Copy link
Collaborator Author

jrycw commented Jan 16, 2025

The PR should be ready for review; please proceed.

@jrycw jrycw marked this pull request as ready for review January 16, 2025 01:18
@rich-iannone rich-iannone requested a review from machow January 16, 2025 06:14
Copy link
Collaborator

@machow machow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking really great--thanks for taking the time to implement this! I left several small suggestions for some error wording, and tweaks. Always happy to pick things up from here if useful. Thanks again for all the work you put into this!

great_tables/_locations.py Outdated Show resolved Hide resolved
great_tables/_locations.py Outdated Show resolved Hide resolved
great_tables/_locations.py Outdated Show resolved Hide resolved
great_tables/_locations.py Outdated Show resolved Hide resolved
great_tables/_locations.py Show resolved Hide resolved
@jrycw
Copy link
Collaborator Author

jrycw commented Jan 22, 2025

@machow, I truly appreciate your review—all your suggestions are spot on! Also, it’s great to see all green checks again.

Copy link
Collaborator

@machow machow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really great, thanks! @rich-iannone do you want to merge?

@rich-iannone rich-iannone merged commit 15b293e into posit-dev:main Jan 22, 2025
14 checks passed
@jrycw jrycw deleted the style-mask branch January 23, 2025 01:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants