Spatial join examples #767
Replies: 4 comments 1 reply
-
cc @jovan-stojanovic who worked on it |
Beta Was this translation helpful? Give feedback.
-
This is true, but the reason for doing this was that joining airports to weather was equivalent to joining: 3 376 rows to 11 282 238 rows. |
Beta Was this translation helpful? Give feedback.
-
Yes, this is something that was directly extracted from the Global Historical Climatology Network website and was left uncurated, in the spirit of working with "real-world" dirty data. I don't know if there is a way to extract this additional information from other datasets from the website, but for this example, I used longitude and latitude. |
Beta Was this translation helpful? Give feedback.
-
If the task is to predict the delays of future flights I wonder if we should use a TimeSeriesSplit in the flight delays examples rather than KFold? |
Beta Was this translation helpful? Give feedback.
-
it seems joining weather data to the flights table does not help delay prediction.
The example 07 reports an an accuracy of 0.58, this is also what I see when not joining anything to the flights table:
prints
I also see the same thing when using the interpolation join instead of the fuzzy
join used in example 7.
We can discuss ideas for other example datasets here rather than in #742 .
Also, in example 7 I wonder why we join weather to airports, and then the resulting join to the flights.
This causes some flights to be joined with the wrong airport because the second join involves the date.
IMO it would be more natural to join flights and airports, which come from the same database and have an exact correspondance, and then fuzzy-join the weather on top.
Finally, in the stations table "NAME" and other columns are empty and "STATE" doesn't seem to contain state names; there may be an issue with the dataset?
Beta Was this translation helpful? Give feedback.
All reactions