-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add formatters and parsers for standard time formats #4729
Conversation
I think we shouldn't have the links of RFCs hard-coded in the docs, but just specify the RFC or ISO number. |
@bew, there are already many links throughout the stdlib docs, I also see no reason why not to? |
I don't care if the links are hardcoded, though I don't see any harm and such links to specifications and definitions are all over the doc comments in stdlib. |
I'm not entirely sure we want an explicit method for each time format, it might lead to an explosion of methods. |
@RX14 I think those standard formats should be included in stdlib and be easily accessible. |
If we're not going to add more then this amount is probably fine, but i'd like to get the rest of the core team's opinion on this too. |
I'm not fond of adding many methods, but alternatives ain't better. If limited to these 3, I guess it's fine. Please use ISO8601 naming, it's a well known international standard; I never heard about RFC3339 before. We should harmonize how to represent RFC and ISO numbers. For example the methods don't have an underscore ( |
I have switched from calling it ISO8601 to RFC3339 because ISO8601 includes a number of formats and variations, that are not supported by this formatter. The purpose of RFC3339 is exactly to make ISO8601 better usable for internet protocols by limiting it to a specific subset that fits this use-case.
See also What's the difference between ISO 8601 and RFC 3339 Date Formats? Harmonizing the standards' names seems like a good idea. Though in text ISO/RFC should be separated from the number with a space - this is the usual naming scheme employed by the standards organisations themselves as well as most other resources. Apart from the occasional internal line break, this is easier readable (try for yourself: the first references is this comment are without, the rest with space) and people are used to this. |
We should be able to parse the ISO 8601 spec, and all it's corner cases, in the stdlib. And even if we emit RFC 3339 standard timestamps, they're automatically ISO 8601 timestamps so it's much more discoverable to call it |
src/time/format.cr
Outdated
RFC_3339 = new "%FT%X%:z" | ||
|
||
# The [RFC 1123](https://tools.ietf.org/html/rfc1123#page-55) datetime format. | ||
RFC_1123 = new "%a, %d %b %Y %H:%M:%S GMT", Time::Kind::Utc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't parse timezones, it's not up to spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it was initially intended for formatting only.
src/time/format.cr
Outdated
RFC_1123 = new "%a, %d %b %Y %H:%M:%S GMT", Time::Kind::Utc | ||
|
||
# The [RFC 2822](https://tools.ietf.org/html/rfc2822) datetime format. | ||
RFC_2822 = new "%a, %d %b %Y %H:%M:%S %z" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RFC 2822 obsoletes RFC 822, RFC 1123 updates RFC 822 to enforce 4 digit years: it's the same time format, except RFC 2822 defines some parts of it as obsolete. There's no need for 2 separate patterns here at all, they're different versions of the same spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. Keeping track of all these formats is really disturbing... So we should keep RFC 2822 and add notes, that it can also be used (mostly) for 822 and 1123?
Also note that rust's chrono uses hand-written parsers for ISO 8601 and RFC 2822, despite having a strftime-based parser. We should do the same: it's impossible to parse these complex standards using a simple format string. |
As I understand it, RFC 3339 should generally be preferred over ISO 8601 in the context of internet protocols. For these purposes, you'd deliberatly not want the full ISO 8601 spec, which also includes durations and date-only values. I would expect Edit: There are some further explanations about Rust's chrono in chronotope/chrono#43 (comment) |
I am aware that the formatters can't parse all variants of the full specs. But I'd suggest to implement custom parsers in a different PR. |
The date format used for HTTP applications (such as cookies) according to [RFC 2612](https://tools.ietf.org/html/rfc2616#section-3.3.1) is *not* an arbitrary [RFC 1123](https://tools.ietf.org/html/rfc1123#page-55) string but has a fixed timezone `GMT`. Therefore it is not the same as `Time#to_rfc2822` and the method should not be called `rfc1123_date`.
70c7e6a
to
558d19e
Compare
In 558d19e the original behaviour of I am not sure, if this formatter should better be moved to |
@straight-shoota Isn't that exactly the same as |
That's what I had initially assumed, but as these failed specs show, they differ in the notation of the time zone. |
I repeat myself and stand strong here: the stdlib must support ISO 8601 which is an international standard, not RFC 3339 which is an internet protocol. The stdlib musn't restrict itself to web applications. Renaming |
This PR supports ISO 8601 in due form of it's profile RFC 3339. It provide formatters and parsers for standard time formats to properly support exchanging time values in computer and network systems. For this purpose, we neither need nor want to support the full ISO 8601 standard as this would be very complicated and mostly unnecessary. We could call this ISO 8601 and technically, that wouldn't be wrong, because it essentially is ISO 8601 (though not all of it). But it would still be bad naming - similar to TL;DR: Renaming from RFC 3339 to ISO 8601 has no benefit and is misleading. Changing the behaviour from supporting RFC 3339 to full ISO 8601 would require much effort for little benefit and we'd probably still wan't to support RFC 3339 anyway. |
@straight-shoota RFC 3339 is a subset of ISO 8601. That means that emitting RFC 3339 means you support ISO 8601. However, we do not correctly parse ISO 8601. Nothing needs to be changed in the emitting side, but we should be able to parse ISO 8601 properly. And that's impossible using just a By emitting RFC 3339 and parsing ISO 8601, we have compatibility with both specs. Why not? |
Let's put it the other way around: what's the use case for parsing full ISO 8601 with all it's variations? Wouldn't RFC 3339 be enough? Parsing RFC 3339 correctly already requires a customized parser. And parsing ISO 8601 is a different story. |
@straight-shoota the usecase is parsing ISO 8601, some of which might not be RFC 3339 compatible. This is because i've literally never heard of RFC 3339 before, so why expect all ISO 8601 to be compatible with a standard which isn't even well-known? |
It seems like many people don't know ISO 8601 but call it that when they really mean RFC 3339. I am not entirely opposed to implementing a full ISO 8601 format parser but I think this would be a) to much for this PR and b) there should be a parser for the more strictly defined RFC 3339 format as well. |
Why not to keep |
@akzhan Because it's only a subset and not full ISO 8601. Therefore the parse method would be utterly misnamed (as it parses only one major variant), and the formatter would also only create one specific flavour of ISO 8601 - which is commonly known as RFC 3339. |
Please cool down. Starting a flame war over this is pointless. Seeing how Python, Rust, Ruby, along others, all use ISO 8601 as reference and naming, seeing that Wikipedia redirects the RFC article to the ISO page, the single fact that the RFC is a subset to the ISO standard, all confirm my opinion on this topic. Introducing RFC 3339 methods means that introducing an ISO 8601 parser later would lead to have somewhat duplicate methods (bad: duplication), or to rename the methods (bad: breaking change). Hence the strong opposition. A proper ISO 8601 parser should be introduced (now or later); RFC named methods should not. |
Sorry if this was perceived this way, but I have no intention to flame, just to defend my argument since I am convinced about my opinion as well ;) Rust does not use ISO 8601, but RFC 3339, as does Go. They are relatively modern languages. The duplication of having both RFC 3339 and ISO 8601 is negligible: Both formatters should produce the same string by default, but the ISO formatter should support other formats as well. Both parsers would understand the RFC 3339 format, but the ISO 8601 would also understand e.g. ordinal dates or week numbers. So it is not exactly the same but each has its case: RFC 3339 for a fast and simple strict mode, ISO 8601 for more broader, quirky uses. Though I am still not convinced that the standard lib should necessarily support full ISO 8601. As explained before, many languages don't have it. |
So, I think we could proceed in one of the following ways:
Did I miss something? What would you suggest @RX14 @ysbaddaden? Another issue is how to deal with alternative representations in the parser and formatter. |
I'd say option B2 is what I prefer @straight-shoota. |
About B2: I think that boolean parameters is bad choice. They aren't extendable. Flags enum is OK. [@Flags]
enum Iso8601Flavor
Rfc3339,
end
Rfc3339 = Iso8601Flavor::Rfc3339
parse_iso8601(rule : Iso8601Flavor) |
@akzhan You're probably right. But because of this I would rather prefer B3 in order to provide easy access to a parser for a neatly defined specification RFC 3339 and a different parser method for the more flexible ISO 8601 formats. Having to use an enum for this is quite cumbersome. If this would be advisable, we could just use the plain |
#5123 incorporates this and has a better implementation of the formats. |
This adds format and parse methods to
Time
for the commonly used standardized time formats:2016-02-15T00:00:00+0000
Mon, 15 Feb 2016 00:00:00 -0400
The formats are defined as
Time::Format::RFC_1123
andTime::Format::RFC_3339
.A use case is already include in
HTTP::Cookie
. These are common formats for exchanging time information in information networks and parsing/formatting should be directly supported byTime
class.