Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plotting with plot_eai_exp_geom fails due to exposures vs impact shape mismatch #992

Open
luseverin opened this issue Jan 14, 2025 · 7 comments
Assignees
Labels

Comments

@luseverin
Copy link
Collaborator

Describe the bug
I am using the lines_polys_handler handler to compute impacts using lines as exposures and plotting the resulting EAI using plot_eai_exp_geom. However, the plot_eai_exp_geom fails with a ValueError: Length of values (861875) does not match length of index (861881). Apparently, this comes from a mismatch between the shape of the original exposures and the reaggregated impacts causing the error when attempting to do the following value assignement:

File ~/Documents/PhD/workspace/climada_python/climada/util/lines_polys_handler.py:357, in plot_eai_exp_geom(imp_geom, centered, figsize, **kwargs)
    [355](https://file+.vscode-resource.vscode-cdn.net/Users/lseverino/Documents/PhD/workspace/dev/~/Documents/PhD/workspace/climada_python/climada/util/lines_polys_handler.py:355)     kwargs["legend"] = True
    [356](https://file+.vscode-resource.vscode-cdn.net/Users/lseverino/Documents/PhD/workspace/dev/~/Documents/PhD/workspace/climada_python/climada/util/lines_polys_handler.py:356) gdf_plot = gpd.GeoDataFrame(imp_geom.geom_exp)
--> [357](https://file+.vscode-resource.vscode-cdn.net/Users/lseverino/Documents/PhD/workspace/dev/~/Documents/PhD/workspace/climada_python/climada/util/lines_polys_handler.py:357) gdf_plot["impact"] = imp_geom.eai_exp`

I have had a look, and apparently the error happens when the index of the exposures' geodataframe has non-unique values, i.e. see lines 5 to 8:
Screenshot 2025-01-14 at 09 33 52

The error comes from an ill-defined Exposures but I assume we would want to catch that in a way? Any suggestion on a fix?

To Reproduce
Code example:

from shapely.geometry import Point, LineString, Polygon
gdf_test = gpd.GeoDataFrame
data = {
    "name": ["A", "B", "C"],
    "value": [10, 20, 30],
    "geometry": [
        Point(1, 1),            
        Point(1, 2),
        Point(1, 3)
    ],
}
gdf_test = gpd.GeoDataFrame(data, crs="EPSG:4326") 
gdf_test.index = [0,1,0] #need to specify non-unique index to cause the error!

#hazard and impf
from climada.util.api_client import Client
import climada.util.lines_polys_handler as u_lp
from climada.entity.impact_funcs import ImpactFuncSet
from climada.entity.impact_funcs.storm_europe import ImpfStormEurope
from climada.entity import Exposures


haz = Client().get_hazard("storm_europe", name="test_haz_WS_nl", status="test_dataset")

#exposures
exp_line = Exposures(gdf_test)

#impact function
impf_line = ImpfStormEurope.from_welker()
impf_set = ImpactFuncSet([impf_line])

exp_line.data["impf_WS"] = 1  # specify impact function

impact = u_lp.calc_geom_impact(
    exp=exp_line,
    impf_set=impf_set,
    haz=haz,
    res=500,
    to_meters=True,
    disagg_met=u_lp.DisaggMethod.FIX,
    disagg_val=1e5,
    agg_met=u_lp.AggMethod.SUM,
);

u_lp.plot_eai_exp_geom(impact);

Climada Version: [5.0.1.dev0]

System Information (please complete the following information):

  • Operating system and version: [macOS 14.3.1]
  • Python version: [3.9.18]
@chahank
Copy link
Member

chahank commented Jan 14, 2025

Good catch, and this makes sense that it fails at the moment. I guess this poses the fundamental question of whether we want to accept exposures with repeating index. I am not sure what is the advantage of allowing it. Any opinion @emanuel-schmid @peanutfun ?

For this case we could also just make it happen inside of the lines polygons module. But maybe before that, do you understand whether it is just a problem with the plotting, or is the disaggregation / aggregation result fundamentally wrong with repeating indices? If it is just for plotting, we can adjust just the plotting method. If it is the result, then we need to reset the index for all dissagregation operations.

@luseverin
Copy link
Collaborator Author

luseverin commented Jan 14, 2025

But maybe before that, do you understand whether it is just a problem with the plotting, or is the disaggregation / aggregation result fundamentally wrong with repeating indices?

I did a quick test and the computed impact.eai_exp with the well-defined exposures looks like this:
array([0.08465482, 0.72462291, 2.84040059])
whereas for the ill-defined exposures, it looks like this:
array([0.80927773, 2.84040059])
So essentially the impacts of the points [0,1] of the well-defined exposures are aggregated on the first point of the ill-defined exposures (I checked and it is the exact sum of the two points). So apparently the problem occurs in the reaggregation already?

@chahank
Copy link
Member

chahank commented Jan 14, 2025

Thanks for checking! This is what I was expecting. So, we should raise an error in the aggregation stage if there are repeated indices. We could also force a reindexing in the dissagregation stage, but not sure.

And this is on the level of the geodataframe, not the exposure in itself.

@luseverin
Copy link
Collaborator Author

What would be the issue with a forced reindexing? Shouldn't a dataframe have a unique index in the first place anyways?
Once we decide on how to deal with this I can take care of implementing the fix.

@chahank
Copy link
Member

chahank commented Jan 14, 2025

I agree that unique indices are good. But, in general, the code should not modify the input date if not absoutely needed.

Here is an example of when resetting the index would be a problem. Suppose you have a large exposure for all of Europe. Now, you make computations for sub-regions separately (maybe for computation reasons), but then want to put results together later on. If you have a unique index for the full Europe, this is an easy task. If one resets the index for each sub-region calculation, this may become more complicated.

Hence, I would first raise an error if there are repeated indices (with instructions on how to resolve it). Where exactly in the code to raise the errors needs a bit of reflection maybe.

@luseverin
Copy link
Collaborator Author

Here is an example of when resetting the index would be a problem. Suppose you have a large exposure for all of Europe. Now, you make computations for sub-regions separately (maybe for computation reasons), but then want to put results together later on. If you have a unique index for the full Europe, this is an easy task. If one resets the index for each sub-region calculation, this may become more complicated.

That makes sense, thanks for the example!

Hence, I would first raise an error if there are repeated indices (with instructions on how to resolve it). Where exactly in the code to raise the errors needs a bit of reflection maybe.

I agree. I can have a look into that and make a suggestion on where the raising of the error is the most appropriate.

@peanutfun
Copy link
Member

A very quick solution would probably be to call reset_index(drop=True) on the exposure GDF inside the lines-polygon handler. This returns a new object and does not modify the exposure.

The underlying problem here is that the code assumes that the index labels in the exposure GDF are unique (which they might not be) and even more so that it is integer (which is not required by any means). If the index is indeed an 0..N-1 integer, then it is trivial. You can also uniquely identify all objects in a dataframe by their position using the length N of the frame and the .iloc accessor. Therefore, I think the code can be rewritten to completely ignore the index of the exposures GDF and just use an integer index internally.

Regarding the question of unique indices when concatenating exposures: The default behavior of pd.concat is to not modify indices. Since users might indeed use special indices for their exposure, it might be intrusive to just assume that they want to reset the index after concatenating exposures. So I think the cleanest solution would be to add a keyword ignore_index to the exposures concatenation and make it behave just like pd.concat.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants