Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better filtering for SNCF/Eurostar trains #748

Open
Altonss opened this issue Jan 3, 2025 · 10 comments
Open

Better filtering for SNCF/Eurostar trains #748

Altonss opened this issue Jan 3, 2025 · 10 comments

Comments

@Altonss
Copy link
Collaborator

Altonss commented Jan 3, 2025

Most TGVs/Ouigo seem to come from sbb.ch, with the new filtering capabilities we should maybe filter this out? 🤔
Same goes for Eurostar, which seems to be even worse: some trains seem to appear 3 times (2 times under thalys, and 1 time as Eurostar from NS data).

As long as the official data source is working fine, it makes more sense to use it. The only challenge with this filtering is ensuring that no important data is missed.

@jbruechert
Copy link
Collaborator

For Eurostar the problem is made worse by the separate feeds for Eurostar / Thalys, which at the last time I checked still survived the merger. Maybe they have started to unify it finally?

@Altonss
Copy link
Collaborator Author

Altonss commented Jan 3, 2025

For Eurostar the problem is made worse by the separate feeds for Eurostar / Thalys, which at the last time I checked still survived the merger. Maybe they have started to unify it finally?

According to an official comment on https://transport.data.gouv.fr/datasets/eurostar-gtfs , this feed already contains the unified data. We could just one available in the fr.json (so unskip it there, and remove it from eu.json)

@jbruechert
Copy link
Collaborator

Whether it's in eu.json or fr.json is a tangential issue, most importantly we should probably drop thalys from eu.json

@jbruechert
Copy link
Collaborator

But in general I agree, if it's clear which source is the best one (up to date, realtime, quality) then feel free to add agency removal to the other sources.

@Altonss
Copy link
Collaborator Author

Altonss commented Jan 3, 2025

But in general I agree, if it's clear which source is the best one (up to date, realtime, quality) then feel free to add agency removal to the other sources.

Actually instead of removing agency from sources, it would be nicer to be able to mark certain agencies as secondary for a certain feeds. This way we could prioritize the best feed for this agency, while keeping redundancy for the few routes that might be only available in secondary sources. This way we could ensure no data is missing. (A good example is #749 where I cannot be certain that by dropping SNCF agency entirely from CH feed, no route will be missing).

@jbruechert
Copy link
Collaborator

This would basically boil down to improved (fuzzy) merging support in MOTIS, right? That would be great in any case

@Altonss
Copy link
Collaborator Author

Altonss commented Jan 4, 2025

This would basically boil down to improved (fuzzy) merging support in MOTIS, right? That would be great in any case

Yes exactly this 👍

@felixguendling
Copy link
Contributor

If anyone is interested in improving fuzzy merging in nigiri, I am happy to answer questions. We could also setup a meeting if someone wants to work on it.

@Altonss
Copy link
Collaborator Author

Altonss commented Jan 12, 2025

Noticed some merging of Eurostar isn't working well:

  • for example, with Eurostar 9450 (Dortmund Hbf -> Paris Nord), the data from Delfi and Eurostar isn't merged...

There are slight changes in the data (like station names slightly different, delfi having platform data for some stops in germany, and the arrival time being slightly off (3min)) 😬 .

@felixguendling
Copy link
Contributor

Maybe it's sufficient to change the threshold for merging here?
https://github.com/motis-project/nigiri/blob/master/src/loader/merge_duplicates.cc#L141

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants