Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Add GTFS-NewShapes as experimental #272

Merged
merged 7 commits into from
Aug 30, 2021

Conversation

ericouyang
Copy link
Contributor

@ericouyang ericouyang commented May 10, 2021

Background

Detours are a major part of a dynamic urban environment. Deviations from the original route path can occur in a wide range of situations, including planned events and traffic incidents.

Today, in GTFS-RT, in these situations, there is already the ability to mark a stop as SKIPPED when a deviated route will no longer visit the stop. Additionally, an Alert can be created where the effect=DETOUR.

While these can communicate some of the impact of detours, neither of these approaches communicate the new shape of the route to enable passenger applications to display the path and assist in rider navigation.

Proposal

This proposal builds on a recent addition (#221) to support the specification of trip-level properties in real-time and adds in shape_id as a supported field. This can reference an existing shape in the static GTFS from shapes.txt or a new Shape specified in real-time via an encoded polyline

Both the gtfs-realtime.proto file and documentation have been updated.

This pull request is a subset of the GTFS-ServiceChanges v3.1 spec:
https://bit.ly/gtfs-service-changes-v3_1

This proposal builds on prior art from @lionel-nj and @barbeau (see MobilityData#47)

@google-cla google-cla bot added the cla: yes label May 10, 2021
@ericouyang ericouyang force-pushed the gtfs-servicechanges-v3.1-newshapes branch from e12fb3a to 8d8fdfc Compare May 10, 2021 18:30
@gcamp
Copy link
Contributor

gcamp commented May 10, 2021

I think it would be worthwhile to maybe not duplicate exactly what is in GTFS for the NewShape paradigm. Sending all that information that way is far from compact for a real time system. I would suggest sending encoded polyline , but open to any alternatives in the same vein.

@skinkie
Copy link
Contributor

skinkie commented May 10, 2021

I agree with @gcamp. What is the reason you want to build a topological shape?

@barbeau
Copy link
Collaborator

barbeau commented May 10, 2021

Encoded polylines are definitely a more compact way to represent lines, but they only encode lat/lon (to my knowledge). This means we'd still need a way to represent shape_dist_traveled and shape_pt_sequence if we want to retain all the information from static GTFS.

We could consider dropping shape_pt_sequence as this information should be implicit in the encoded polyline.

shape_dist_traveled is optional and could be a separate array (repeated) of float values, with a requirement that if provided it's length is the same number of values as the points in the line. Although that may be prone to error.

@skinkie
Copy link
Contributor

skinkie commented May 10, 2021

Encoded polylines are definitely a more compact way to represent lines, but they only encode lat/lon (to my knowledge). This means we'd still need a way to represent shape_dist_traveled and shape_pt_sequence if we want to retain all the information from static GTFS.

But this information should be exchanged "once" to introduce a new shape. Given the current data exchange it will be exchanged every time. Has some thought already been done on how to detect which data is actually new and should be added to the system, opposed to update everything that flies into a fetch?

shape_dist_traveled is optional and could be a separate array (repeated) of float values, with a requirement that if provided it's length is the same number of values as the points in the line. Although that may be prone to error.

I could argue that representing this information as a column store will result in a more compact representation. Hence the repeated approach. Within the format the number of elements is known, and it will have the exact same effect if one of the values in the row based format is 'forgotten'.

@ericouyang
Copy link
Contributor Author

Thanks all for the feedback on this!

@gcamp - I'd love to get some more context in terms of how the payload size impacts Transit App. Does the GTFS-realtime pb feed get transmitted as-is to the end clients, hence needing to be sensitive from a mobile data consumption perspective? If so, to @skinkie's point, does your system today repeat exchange of largely static information that doesn't change very frequently, like GTFS-realtime Alerts, which probably would have similar update characteristics?

As suggested, we ran an experiment on our side to see the impact of using encoding on filesize and did find using encoded polylines to be much smaller:

Agency Unencoded NewShapes Size (Original Proposal) Encoded NewShapes Size (Alternate Proposal) Example size of TripUpdates, for scale
Agency 1 (~150 routes) 108KB 24KB ~1.5MB
Agency 2 (~120 routes) 84KB 20KB ~850KB
Agency 3 (~110 routes) 200KB 36KB ~250KB
Agency 4 (~70 routes) 60KB 12KB ~200KB

Our methodology here was to randomly generate a new shape where each point was 100-200m away from the previous point for 25% of the all routes as an approximation of a more extreme situation where a lot of routes are on detour.

Given that the sizes are much smaller, I'm open to updating this proposal accordingly. I'd love to hear more from other folks on this, particularly other producers & consumers on the tradeoff here between filesize, consistency with GTFS Static, and interpretability.

For reference, here's a snippet of what the .proto would look like for this alternate representation:

message Shape {
  // Identifier of the shape. Must be different than any shape_id defined in the (CSV) GTFS.
  // NOTE: This field is still experimental, and subject to change. It may be formally adopted in the future.
  required string shape_id = 1;

  // Encoded polyline representation of the shape. This polyline must contain at least two points.
  // NOTE: This field is still experimental, and subject to change. It may be formally adopted in the future.
  required string encoded_points = 2;

  // Optional list of actual cumulative distances traveled along the shape to each point.
  // See definition of shapes.shape_dist_traveled in (CSV) GTFS.
  // NOTE: This field is still experimental, and subject to change. It may be formally adopted in the future.
  repeated float shape_dist_traveled = 3;

  // The extensions namespace allows 3rd-party developers to extend the
  // GTFS Realtime Specification in order to add and evaluate new features and
  // modifications to the spec.
  extensions 1000 to 1999;

  // The following extension IDs are reserved for private use by any organization.
  extensions 9000 to 9999;
}

@gcamp
Copy link
Contributor

gcamp commented Jul 12, 2021

To me encoded polyline is a 👍. The only downside is that encoded polyline is not lossless and some precision is lost during encoding. Not to the level that will be noticed by a user but it might make some automated test harder to design.

Consistency with GTFS is a non-issue for me as this is a trivial conversion.

@botanize
Copy link
Contributor

It looks like the current version of this pull-request defines a message Shape that includes an encoded polyline in the proto file, but the reference.md file documents a ShapePoint message that includes the individual coordinate pairs.

@botanize
Copy link
Contributor

Changing or setting shape_id in TripUpdates works fine last-minute changes, but doesn't work well for near-term changes. By limiting the application of NewShapes to TripUpdates it's impossible to apply NewShapes to trips beginning the next service day through the next GTFS-static update, up to a week away. This gaps exists because TripUpdates only apply to the current service day and most consumers will only consume GTFS-static weekly, or require multiple days to ingest the feed.

The ability to apply near-term service changes is critical. We often know a day or two ahead of a detour, and want to provide the best information to customers as soon as possible. For example, we know there will be a detour affecting many of our routes tomorrow and we'd like to show trip plans for tomorrow with correct detour routing using a new shape. But even with this proposal the best we can do is show an incorrect routing for tomorrow's trip and add a service alert. We don't like to rely on service alerts because people either don't read them, are overwhelmed by the number of alerts, or don't understand the impact of them the way they do a visualization of a detour (speaking for myself here).

To meet our needs for near-term detour communication we could add another message to this proposal, which like Alerts uses time ranges and selectors to apply the new shape defined in Shapes to one or more current or future trips. Trips that are currently active in TripUpdates could use the proposed mechanism (the TripProperties.shape_id field), or both, with the contents of TripUpdates taking precedence over the alerts style selector message.

@ericouyang
Copy link
Contributor Author

Great catch, @botanize! I've updated the reference file now to reflect the updated proposal

That's a great point about near-to-mid-term changes. I can't find the conversation now, but I believe there was some previous alignment in the community around GTFS-ServiceChanges representing things for up to the next 7 days. Anything that's longer should instead look to be reflected in Static GTFS.

I think the way in which one would do this is by creating a TripUpdate entity where TripDescriptor.start_date would be for a future start date. I think doing something like this rather than adding a new message reduces potential ambiguity in terms of how to represent it.

@gcamp
Copy link
Contributor

gcamp commented Aug 17, 2021

I would add a link to the encoded polyline doc so it's clear what is being returned.

@ericouyang
Copy link
Contributor Author

ericouyang commented Aug 17, 2021

I've called for a vote for adoption of this experimental future on the Google Group (https://groups.google.com/g/gtfs-realtime/c/YWY9IoMQF7g?pli=1).

Please vote with a +1 (in favor) or -1 (against) before Wednesday, Aug 25th at 23:59:59 UTC. Thanks!

@ericouyang
Copy link
Contributor Author

Thanks for the catch, @gcamp. It was already in the .proto file but wasn't in the reference file. Updated that so they're both consistent as well as also updated the description of the PR with a link to that resource.

Copy link
Contributor

@juanborre juanborre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading the whole conversation, it is unclear to me why the dist_traveled field was dropped.

Although that information is not useful if all we want to do is to draw a line on a map, it is definitely useful for tools like a trip planner or a predictive model in order to calculate travel times.

Do we rely on the consumer calculating straight line distances based on the GPS coordinates of the polyline?

@ericouyang Could you provide more details about that decision? 🙏

gtfs-realtime/proto/gtfs-realtime.proto Outdated Show resolved Hide resolved
@ericouyang
Copy link
Contributor Author

Reading the whole conversation, it is unclear to me why the dist_traveled field was dropped.

Although that information is not useful if all we want to do is to draw a line on a map, it is definitely useful for tools like a trip planner or a predictive model in order to calculate travel times.

Do we rely on the consumer calculating straight line distances based on the GPS coordinates of the polyline?

@ericouyang Could you provide more details about that decision? 🙏

@juanborre - Good question! I removed it in the spirit of following the guiding principles to avoid speculative features and on the presumption that it's largely redundant information from the polyline itself. As a producer, if we were to populate this field, we would end up calculating the straight line distances anyways to pass along, which I would guess that trip planners already need to do today since shape_dist_traveled is optional in shapes.txt.

@scmcca scmcca added the Status: Voting Pull Requests where the advocate has called for a vote as described in the changes.md label Aug 21, 2021
@colemccarren
Copy link

+1 RTA Maryland 👏

@lauramatson
Copy link

I think the way in which one would do this is by creating a TripUpdate entity where TripDescriptor.start_date would be for a future start date. I think doing something like this rather than adding a new message reduces potential ambiguity in terms of how to represent it.

I think this would work for us as a producer, but I want to make sure it wouldn't cause any issues for consumers. During summer construction & event season, the same trip may show up 5 times because there would be a different shape / combination of detours each weekday. Is it safe to assume consumers will be able to process the same trip id showing up multiple times as long as the start_date clarifies each unique trip instance?

@scmcca scmcca removed the Status: Voting Pull Requests where the advocate has called for a vote as described in the changes.md label Aug 25, 2021
@gcamp
Copy link
Contributor

gcamp commented Aug 25, 2021

+1 Transit

@paulswartz
Copy link
Contributor

+1 @mbta

@ericouyang
Copy link
Contributor Author

Adoption of this proposal as experimental has been accepted, with 3 votes in favor and 0 votes opposed. Thanks so much to everyone for your input into this and excited to see this improving rider experiences!

@scmcca
Copy link
Contributor

scmcca commented Aug 26, 2021

I noticed the voting period was extended by 1 day and that 2/3 votes happened outside of the original period that you announced on the Google Changes mailing list (ending on August 24th at 23:59:59 UTC). While article 7.4 of the Specification Amendment Process states:

If the advocate continues the work on proposal then a new vote can be called for at any point in time.

I think the quiet extension of a voting period invalidates those last 2/3 votes.

Normally the practice is that another 7 day block must follow between content changes and vote recalls (albeit this is not strictly codified in the SAP, it should be!), especially if they are made during a voting period (article 6.2 allows only editorial changes to be made).

Let me know if I'm missing some context!

@ericouyang
Copy link
Contributor Author

Thanks for flagging that, @scmcca! I had intended for the voting period to be slightly longer than the minimum requirement, but accidentally wrote "Wednesday, Aug 24th at 23:59:59 UTC" instead of "Wednesday, Aug 25th at 23:59:59 UTC" in the original announcement. Apologies for creating confusion there.

As there were only editorial changes made during this time period, it is our understanding that all votes are valid in support of this proposal.

Due to the incorrect original date, we'll keep this PR open until the end of the week and then the intent is for the change to be merged is as an experimental addition.

@barbeau barbeau merged commit 99fdfba into google:master Aug 30, 2021
@scmcca scmcca added proposal GTFS Realtime Issues and Pull Requests that focus on GTFS Realtime labels May 20, 2022
@alesk1978
Copy link

Just want to make sure I understand correctly, the Shapepoint message that was in the proposal document is not used anymore, we only use the encoded polyline, correct?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GTFS Realtime Issues and Pull Requests that focus on GTFS Realtime
Projects
None yet
Development

Successfully merging this pull request may close these issues.