Plotting with plot_eai_exp_geom fails due to exposures vs impact shape mismatch #992

luseverin · 2025-01-14T08:47:18Z

Describe the bug
I am using the lines_polys_handler handler to compute impacts using lines as exposures and plotting the resulting EAI using plot_eai_exp_geom. However, the plot_eai_exp_geom fails with a ValueError: Length of values (861875) does not match length of index (861881). Apparently, this comes from a mismatch between the shape of the original exposures and the reaggregated impacts causing the error when attempting to do the following value assignement:

File ~/Documents/PhD/workspace/climada_python/climada/util/lines_polys_handler.py:357, in plot_eai_exp_geom(imp_geom, centered, figsize, **kwargs)
    [355](https://file+.vscode-resource.vscode-cdn.net/Users/lseverino/Documents/PhD/workspace/dev/~/Documents/PhD/workspace/climada_python/climada/util/lines_polys_handler.py:355)     kwargs["legend"] = True
    [356](https://file+.vscode-resource.vscode-cdn.net/Users/lseverino/Documents/PhD/workspace/dev/~/Documents/PhD/workspace/climada_python/climada/util/lines_polys_handler.py:356) gdf_plot = gpd.GeoDataFrame(imp_geom.geom_exp)
--> [357](https://file+.vscode-resource.vscode-cdn.net/Users/lseverino/Documents/PhD/workspace/dev/~/Documents/PhD/workspace/climada_python/climada/util/lines_polys_handler.py:357) gdf_plot["impact"] = imp_geom.eai_exp`

I have had a look, and apparently the error happens when the index of the exposures' geodataframe has non-unique values, i.e. see lines 5 to 8:

The error comes from an ill-defined Exposures but I assume we would want to catch that in a way? Any suggestion on a fix?

To Reproduce
Code example:

from shapely.geometry import Point, LineString, Polygon
gdf_test = gpd.GeoDataFrame
data = {
    "name": ["A", "B", "C"],
    "value": [10, 20, 30],
    "geometry": [
        Point(1, 1),            
        Point(1, 2),
        Point(1, 3)
    ],
}
gdf_test = gpd.GeoDataFrame(data, crs="EPSG:4326") 
gdf_test.index = [0,1,0] #need to specify non-unique index to cause the error!

#hazard and impf
from climada.util.api_client import Client
import climada.util.lines_polys_handler as u_lp
from climada.entity.impact_funcs import ImpactFuncSet
from climada.entity.impact_funcs.storm_europe import ImpfStormEurope
from climada.entity import Exposures


haz = Client().get_hazard("storm_europe", name="test_haz_WS_nl", status="test_dataset")

#exposures
exp_line = Exposures(gdf_test)

#impact function
impf_line = ImpfStormEurope.from_welker()
impf_set = ImpactFuncSet([impf_line])

exp_line.data["impf_WS"] = 1  # specify impact function

impact = u_lp.calc_geom_impact(
    exp=exp_line,
    impf_set=impf_set,
    haz=haz,
    res=500,
    to_meters=True,
    disagg_met=u_lp.DisaggMethod.FIX,
    disagg_val=1e5,
    agg_met=u_lp.AggMethod.SUM,
);

u_lp.plot_eai_exp_geom(impact);

Climada Version: [5.0.1.dev0]

System Information (please complete the following information):

Operating system and version: [macOS 14.3.1]
Python version: [3.9.18]

The text was updated successfully, but these errors were encountered:

chahank · 2025-01-14T09:06:04Z

Good catch, and this makes sense that it fails at the moment. I guess this poses the fundamental question of whether we want to accept exposures with repeating index. I am not sure what is the advantage of allowing it. Any opinion @emanuel-schmid @peanutfun ?

For this case we could also just make it happen inside of the lines polygons module. But maybe before that, do you understand whether it is just a problem with the plotting, or is the disaggregation / aggregation result fundamentally wrong with repeating indices? If it is just for plotting, we can adjust just the plotting method. If it is the result, then we need to reset the index for all dissagregation operations.

luseverin · 2025-01-14T09:31:30Z

But maybe before that, do you understand whether it is just a problem with the plotting, or is the disaggregation / aggregation result fundamentally wrong with repeating indices?

I did a quick test and the computed impact.eai_exp with the well-defined exposures looks like this:
array([0.08465482, 0.72462291, 2.84040059])
whereas for the ill-defined exposures, it looks like this:
array([0.80927773, 2.84040059])
So essentially the impacts of the points [0,1] of the well-defined exposures are aggregated on the first point of the ill-defined exposures (I checked and it is the exact sum of the two points). So apparently the problem occurs in the reaggregation already?

chahank · 2025-01-14T09:48:50Z

Thanks for checking! This is what I was expecting. So, we should raise an error in the aggregation stage if there are repeated indices. We could also force a reindexing in the dissagregation stage, but not sure.

And this is on the level of the geodataframe, not the exposure in itself.

luseverin · 2025-01-14T10:06:55Z

What would be the issue with a forced reindexing? Shouldn't a dataframe have a unique index in the first place anyways?
Once we decide on how to deal with this I can take care of implementing the fix.

chahank · 2025-01-14T10:19:03Z

I agree that unique indices are good. But, in general, the code should not modify the input date if not absoutely needed.

Here is an example of when resetting the index would be a problem. Suppose you have a large exposure for all of Europe. Now, you make computations for sub-regions separately (maybe for computation reasons), but then want to put results together later on. If you have a unique index for the full Europe, this is an easy task. If one resets the index for each sub-region calculation, this may become more complicated.

Hence, I would first raise an error if there are repeated indices (with instructions on how to resolve it). Where exactly in the code to raise the errors needs a bit of reflection maybe.

luseverin · 2025-01-14T11:14:39Z

Here is an example of when resetting the index would be a problem. Suppose you have a large exposure for all of Europe. Now, you make computations for sub-regions separately (maybe for computation reasons), but then want to put results together later on. If you have a unique index for the full Europe, this is an easy task. If one resets the index for each sub-region calculation, this may become more complicated.

That makes sense, thanks for the example!

Hence, I would first raise an error if there are repeated indices (with instructions on how to resolve it). Where exactly in the code to raise the errors needs a bit of reflection maybe.

I agree. I can have a look into that and make a suggestion on where the raising of the error is the most appropriate.

peanutfun · 2025-01-15T10:40:25Z

A very quick solution would probably be to call reset_index(drop=True) on the exposure GDF inside the lines-polygon handler. This returns a new object and does not modify the exposure.

The underlying problem here is that the code assumes that the index labels in the exposure GDF are unique (which they might not be) and even more so that it is integer (which is not required by any means). If the index is indeed an 0..N-1 integer, then it is trivial. You can also uniquely identify all objects in a dataframe by their position using the length N of the frame and the .iloc accessor. Therefore, I think the code can be rewritten to completely ignore the index of the exposures GDF and just use an integer index internally.

Regarding the question of unique indices when concatenating exposures: The default behavior of pd.concat is to not modify indices. Since users might indeed use special indices for their exposure, it might be intrusive to just assume that they want to reset the index after concatenating exposures. So I think the cleanest solution would be to add a keyword ignore_index to the exposures concatenation and make it behave just like pd.concat.

luseverin added the bug label Jan 14, 2025

luseverin assigned chahank and emanuel-schmid Jan 14, 2025

luseverin mentioned this issue Jan 14, 2025

Add osm-flex to CLIMADA core #981

Merged

13 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Plotting with plot_eai_exp_geom fails due to exposures vs impact shape mismatch #992

Plotting with plot_eai_exp_geom fails due to exposures vs impact shape mismatch #992

luseverin commented Jan 14, 2025

chahank commented Jan 14, 2025

luseverin commented Jan 14, 2025 •

edited

Loading

chahank commented Jan 14, 2025 •

edited

Loading

luseverin commented Jan 14, 2025

chahank commented Jan 14, 2025

luseverin commented Jan 14, 2025

peanutfun commented Jan 15, 2025

Plotting with plot_eai_exp_geom fails due to exposures vs impact shape mismatch #992

Plotting with plot_eai_exp_geom fails due to exposures vs impact shape mismatch #992

Comments

luseverin commented Jan 14, 2025

chahank commented Jan 14, 2025

luseverin commented Jan 14, 2025 • edited Loading

chahank commented Jan 14, 2025 • edited Loading

luseverin commented Jan 14, 2025

chahank commented Jan 14, 2025

luseverin commented Jan 14, 2025

peanutfun commented Jan 15, 2025

luseverin commented Jan 14, 2025 •

edited

Loading

chahank commented Jan 14, 2025 •

edited

Loading