-
Notifications
You must be signed in to change notification settings - Fork 796
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test suite is failing with Vegafusion 2 #3701
Comments
Linking #3631 (comment). If I inspect https://github.com/vega/altair/actions/runs/11995228691/job/33438600170 it seems vegafusion is not being installed as a dependency. |
Reproducible: import altair as alt
from vega_datasets import data
source = data.disasters.url
chart = alt.Chart(source).transform_filter(
alt.datum.Entity != 'All natural disasters'
).mark_circle(
opacity=0.8,
stroke='black',
strokeWidth=1,
strokeOpacity=0.4
).encode(
x=alt.X('Year:T', title=None, scale=alt.Scale(domain=['1899','2018'])),
y=alt.Y(
'Entity:N',
sort=alt.EncodingSortField(field="Deaths", op="sum", order='descending'),
title=None
),
size=alt.Size('Deaths:Q',
scale=alt.Scale(range=[0, 2500]),
legend=alt.Legend(title='Deaths', clipHeight=30, format='s')
),
color=alt.Color('Entity:N', legend=None),
tooltip=[
"Entity:N",
alt.Tooltip("Year:T", format='%Y'),
alt.Tooltip("Deaths:Q", format='~s')
],
).properties(
width=450,
height=320,
title=alt.Title(
text="Global Deaths from Natural Disasters (1900-2017)",
subtitle="The size of the bubble represents the total death count per year, by type of disaster",
anchor='start'
)
).configure_axisY(
domain=False,
ticks=False,
offset=10
).configure_axisX(
grid=False,
).configure_view(
stroke=None
)
chart.transformed_data() VegaFusionRuntime.pre_transform_datasets(self, spec, datasets, local_tz, default_input_tz, row_limit, inline_datasets, trim_unused_columns, dataset_format)
527 # Serialize inline datasets
528 inline_arrow_dataset = self._import_inline_datasets(
529 inline_datasets,
530 inline_dataset_usage=get_inline_column_usage(spec)
531 if trim_unused_columns
532 else None,
533 )
--> 535 values, warnings = self.runtime.pre_transform_datasets(
536 spec,
537 pre_tx_vars,
538 local_tz=local_tz,
539 default_input_tz=default_input_tz,
540 row_limit=row_limit,
541 inline_datasets=inline_arrow_dataset,
542 )
544 def normalize_timezones(
545 dfs: list[nw.DataFrame[IntoFrameT] | nw.LazyFrame[IntoFrameT]],
546 ) -> list[DataFrameLike]:
547 # Convert to `local_tz` (or, set to UTC and then convert if starting
548 # from time-zone-naive data), then extract the native DataFrame to return.
549 processed_datasets = []
ValueError: DataFusion error: Execution error: Error parsing timestamp from '1900' using format '%B %d, %Y %H:%M': input contains invalid characters |
@mattijn does this work if you change to: "Year:Q" Or provide a format string somehow? I'm not sure what mini-language EditYeah it is https://github.com/apache/datafusion/blob/d9abdadda066808345b5d9f7ba234a51b8bb2d9c/Cargo.toml#L96 |
on my phone, but this should work if you provide a d3/vega time format string. I believe you can provide this in the chart constructor. The reason it's happening is that I switched to using datafusion's native time parsing logic for simplicity and performance, and I wasn't able to get chrono to automatically parse just the year as a date. I'll have time later today to look at this if still needed. |
If there is any possibility to solve this on the
But I think since these example are coming from a url, this route might not work out Side note@MarcoGorelli I feel like I remember reading some |
For the record, it is not because of the altar v5.5.0 release, but because of the vegafusion v2.0.0 release which was released near simultaneous. The issue can be reproduced with the following minimal specification: import altair as alt
source = alt.Data(values=[{'Year': '1900'}])
chart = alt.Chart(source).mark_tick().encode(x='Year:T')
chart.transformed_data() And when switching to |
|
Working on a PR now #3702 |
Yes, this is a VegaFusion 2 thing, not an Altair 5.5 thing. As I mentioned above:
This can be addressed by specifying the date format for the column as %Y. source = alt.UrlData(
data.disasters.url,
format=alt.DataFormat(parse={"Year": "date:%Y"})
) I'm torn on whether to make all date parsing a little less efficient by always checking for this as a special case (with a Separately, as we look at reworking the vega_datasets logic, I wonder if something like |
@jonmmease IMO I don't think these should be interpreted as dates without a format specifier. A 4 digit number (string?) is so ambiguous |
Somewhat of a drive-by, but I fixed the `ruff` directives so they apply to only the parameterize blocks All other changes are to resolve rule violations that were hidden #3701
I think it would be great if the default in altair was that four digit integers would be understood as dates/years when using |
Edit to the above. There was an error in VegaFusion 2.0.0 where it wasn't handling the custom date parse format. In 2.0.1, this URL format works: source = alt.UrlData(
data.disasters.url,
format=alt.DataFormat(parse={"Year": "date:%Y"})
) Rather than xfail the tests, how would folks feel about updating the examples to provide an explicit date format like this? |
@jonmmease I like the end result here, the main tweak I'd add is to use Side noteThis does make me wonder if there would be any value in being able to specify this in a less verbose way. |
What happened?
Running the test suite with the latest versions of all dependencies results in:
Full logs:
........................................................................ [ 4%]
........................................................................ [ 8%]
........................................................................ [ 12%]
........................................................................ [ 16%]
........................................................................ [ 20%]
........................................................................ [ 24%]
........................................................................ [ 28%]
........................................................................ [ 33%]
........................................................................ [ 37%]
........................................................................ [ 41%]
........................................................................ [ 45%]
........................................................................ [ 49%]
........................................................................ [ 53%]
........................................................................ [ 57%]
..........................................................F............. [ 62%]
.........................................F.............................. [ 66%]
..s.ssss...F............................................................ [ 70%]
.....F................X................................................. [ 74%]
........................................................................ [ 78%]
.X....X................................................................. [ 82%]
........................................................................ [ 86%]
........................................................................ [ 91%]
..............X......................................................... [ 95%]
.......................X.........x...................................... [ 99%]
.......... [100%]
=================================== FAILURES ===================================
_____ test_primitive_chart_examples[False-natural_disasters.py-686-cols28] _____
[gw2] linux -- Python 3.12.7 /opt/hostedtoolcache/Python/3.12.7/x64/bin/python
filename = 'natural_disasters.py', rows = 686, cols = ['Deaths', 'Year']
to_reconstruct = False
tests/test_transformed_data.py:82:
altair/vegalite/v5/api.py:4058: in transformed_data
return transformed_data(self, row_limit=row_limit, exclude=exclude)
altair/utils/_transformed_data.py:138: in transformed_data
datasets, _ = vf.runtime.pre_transform_datasets(
self = VegaFusionRuntime(cache_capacity=64, worker_threads=4)
spec = {'$schema': 'https://vega.github.io/schema/vega/v5.json', 'axes': [{'grid': False, 'labelFlush': True, 'labelOverlap':...: {'grid': False}, 'axisY': {'domain': False, 'offset': 10, 'ticks': False}, 'style': {'cell': {'stroke': None}}}, ...}
datasets = [('data_0', ())], local_tz = 'UTC', default_input_tz = None
row_limit = None, inline_datasets = {}, trim_unused_columns = False
dataset_format = 'auto'
E ValueError: DataFusion error: Execution error: Error parsing timestamp from '1900' using format '%B %d, %Y %H:%M': input contains invalid characters
/opt/hostedtoolcache/Python/3.12.7/x64/lib/python3.12/site-packages/vegafusion/runtime.py:535: ValueError
_____ test_compound_chart_examples[True-falkensee.py-all_rows3-all_cols3] ______
[gw2] linux -- Python 3.12.7 /opt/hostedtoolcache/Python/3.12.7/x64/bin/python
filename = 'falkensee.py', all_rows = [2, 38, 38]
all_cols = [['event'], ['population'], ['population']], to_reconstruct = True
tests/test_transformed_data.py:142:
altair/vegalite/v5/api.py:4688: in transformed_data
return transformed_data(self, row_limit=row_limit, exclude=exclude)
altair/utils/_transformed_data.py:138: in transformed_data
datasets, _ = vf.runtime.pre_transform_datasets(
self = VegaFusionRuntime(cache_capacity=64, worker_threads=4)
spec = {'$schema': 'https://vega.github.io/schema/vega/v5.json', 'axes': [{'aria': False, 'domain': False, 'grid': True, 'gri...Finite(+datum["year"]))) && isValid(datum["population"]) && isFinite(+datum["population"])', 'type': 'filter'}]}], ...}
datasets = [('data_0', ()), ('data_1', ()), ('data_2', ())], local_tz = 'UTC'
default_input_tz = None, row_limit = None, inline_datasets = {}
trim_unused_columns = False, dataset_format = 'auto'
E ValueError: DataFusion error: Execution error: Error parsing timestamp from '1933' using format '%B %d, %Y %H:%M': input contains invalid characters
/opt/hostedtoolcache/Python/3.12.7/x64/lib/python3.12/site-packages/vegafusion/runtime.py:535: ValueError
_____ test_compound_chart_examples[False-falkensee.py-all_rows3-all_cols3] _____
[gw3] linux -- Python 3.12.7 /opt/hostedtoolcache/Python/3.12.7/x64/bin/python
filename = 'falkensee.py', all_rows = [2, 38, 38]
all_cols = [['event'], ['population'], ['population']], to_reconstruct = False
tests/test_transformed_data.py:142:
altair/vegalite/v5/api.py:4688: in transformed_data
return transformed_data(self, row_limit=row_limit, exclude=exclude)
altair/utils/_transformed_data.py:138: in transformed_data
datasets, _ = vf.runtime.pre_transform_datasets(
self = VegaFusionRuntime(cache_capacity=64, worker_threads=4)
spec = {'$schema': 'https://vega.github.io/schema/vega/v5.json', 'axes': [{'aria': False, 'domain': False, 'grid': True, 'gri...m["population"])', 'type': 'filter'}], 'url': 'vegafusion+dataset://table_7c2f0057_a249_4dc8_8e0c_bd46ba873edb'}], ...}
datasets = [('source_0', ()), ('source_1', ()), ('source_2', ())]
local_tz = 'UTC', default_input_tz = None, row_limit = None
inline_datasets = {'table_0d3e98ef_595c_466b_8231_3067287bafae': start end event
0 1933 1945 Nazi Rule
1 ...9 40179
33 2010 40511
34 2011 40465
35 2012 40905
36 2013 41258
37 2014 41777}
trim_unused_columns = False, dataset_format = 'auto'
E ValueError: DataFusion error: Execution error: Error parsing timestamp from '1933' using format '%B %d, %Y %H:%M': input contains invalid characters
/opt/hostedtoolcache/Python/3.12.7/x64/lib/python3.12/site-packages/vegafusion/runtime.py:535: ValueError
_____ test_primitive_chart_examples[True-natural_disasters.py-686-cols28] ______
[gw0] linux -- Python 3.12.7 /opt/hostedtoolcache/Python/3.12.7/x64/bin/python
filename = 'natural_disasters.py', rows = 686, cols = ['Deaths', 'Year']
to_reconstruct = True
tests/test_transformed_data.py:82:
altair/vegalite/v5/api.py:4058: in transformed_data
return transformed_data(self, row_limit=row_limit, exclude=exclude)
altair/utils/_transformed_data.py:138: in transformed_data
datasets, _ = vf.runtime.pre_transform_datasets(
self = VegaFusionRuntime(cache_capacity=64, worker_threads=4)
spec = {'$schema': 'https://vega.github.io/schema/vega/v5.json', 'axes': [{'grid': False, 'labelFlush': True, 'labelOverlap':...: {'grid': False}, 'axisY': {'domain': False, 'offset': 10, 'ticks': False}, 'style': {'cell': {'stroke': None}}}, ...}
datasets = [('data_0', ())], local_tz = 'UTC', default_input_tz = None
row_limit = None, inline_datasets = {}, trim_unused_columns = False
dataset_format = 'auto'
E ValueError: DataFusion error: Execution error: Error parsing timestamp from '1900' using format '%B %d, %Y %H:%M': input contains invalid characters
/opt/hostedtoolcache/Python/3.12.7/x64/lib/python3.12/site-packages/vegafusion/runtime.py:535: ValueError
=========================== short test summary info ============================
FAILED tests/test_transformed_data.py::test_primitive_chart_examples[False-natural_disasters.py-686-cols28] - ValueError: DataFusion error: Execution error: Error parsing timestamp from '1900' using format '%B %d, %Y %H:%M': input contains invalid characters
FAILED tests/test_transformed_data.py::test_compound_chart_examples[True-falkensee.py-all_rows3-all_cols3] - ValueError: DataFusion error: Execution error: Error parsing timestamp from '1933' using format '%B %d, %Y %H:%M': input contains invalid characters
FAILED tests/test_transformed_data.py::test_compound_chart_examples[False-falkensee.py-all_rows3-all_cols3] - ValueError: DataFusion error: Execution error: Error parsing timestamp from '1933' using format '%B %d, %Y %H:%M': input contains invalid characters
FAILED tests/test_transformed_data.py::test_primitive_chart_examples[True-natural_disasters.py-686-cols28] - ValueError: DataFusion error: Execution error: Error parsing timestamp from '1900' using format '%B %d, %Y %H:%M': input contains invalid characters
What would you like to happen instead?
To have tests green as 🥦
Which version of Altair are you using?
5.6.0dev
The text was updated successfully, but these errors were encountered: