-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make Symbol Graph Format Extensible #39
Make Symbol Graph Format Extensible #39
Conversation
67877ce
to
22adbd3
Compare
22adbd3
to
d735e76
Compare
3e3f3fa
to
d0b5e41
Compare
Hi @theMomax, I haven't reviewed these changes in detail yet, but I'm curious about these:
I'm not sure I follow why these are required for swiftlang/swift-docc#210, can you elaborate? Are we expecting clients like Swift-DocC to declare their own kind identifiers? |
Thanks, @franklinsch, these are very valid questions.
Actually, none of those are strictly required by swiftlang/swift-docc#335 to resolve swiftlang/swift-docc#210. However, before I get into detail on this, let me first answer your second question:
Yes, we do. This corresponds to the first point:
My PR swiftlang/swift-docc#335 makes use of this by declaring five new kind identifiers. This change is strictly required as the symbols with the new kind identifiers have to be processed and displayed differently from other symbols (see e.g. here). Now, coming back to your first point about why the lower three points are required. I see them as a pure consequence of
If we allow users of SymbolKit to define new symbol kind identifiers, these custom identifiers should be treated exactly like the predefined ones. If I hadn't introduced this change, parsing would behave like here in #7. This behavior from #7 could lead to a lot of trouble with the KindIdentifier's Equatable conformance, which simply compares the identifier string. I.e. the declared
I definitely expected Symbol/Relationship to consider custom mixins that conform to Mixins are an awesome way to attach additional information to a symbol graph, which could be very interesting for many applications. For example, imagine a tool that attaches authorship information or other git metadata for automated generation of change-protocols/differential documentation. If, however, custom mixins are not considered in the encoding/decoding process, then these applications have no other choice than maintaining a fork of SymbolKit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the summary above, that's super helpful. I agree that the work you're doing is a good opportunity to make the symbol graph coding logic more flexible.
I think the extension kinds you defined in swiftlang/swift-docc#335 could also belong directly in SymbolKit though, since they map well to existing kinds. I'd love to hear more of your thoughts about why you preferred the approach of declaring them in Swift-DocC instead. Should some existing kinds and mixins move to Swift-DocC and should SymbolKit treat them as opaque values instead?
I really appreciate how you've clearly spent time designing an API that makes it easy for clients to define their own mixins and kinds. I have some thoughts on how we could possibly simplify this code, as the extra flexibility it offers does result in more complexity.
/// - Note: When working in an uncontrolled environment where other parts of your executable might be disturbed by your | ||
/// modifications to the symbol graph structure, register identifiers on your coder instead using | ||
/// ``CustomizableCoder/register(symbolKinds:)`` and maintain your own list of ``allCases``. | ||
public static func register(_ identifiers: Self...) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Registering these at the static level on this model seems odd to me from an architecture perspective. It limits how SymbolKit can be used within the same client, e.g., if you're decoding symbol graphs from two different languages that use different kinds within the same client. Is there a reason we can't just parse the value as a string and just wrap it in a KindIdentifier
value, without requiring clients to configure something ahead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Registering these at the static level on this model seems odd to me from an architecture perspective. It limits how SymbolKit can be used within the same client, e.g., if you're decoding symbol graphs from two different languages that use different kinds within the same client.
That's exactly the reason why I provide the alternative of registering the custom kinds to the Encoder/Decoder. However, I think that working with the static context is easier, if working in a controlled environment where there is only one "view" of the symbol graph format. KindIdentifier
provides static functionality such as allCases
, and I think this functionality should still work when extending the graph format.
Is there a reason we can't just parse the value as a string and just wrap it in a KindIdentifier value, without requiring clients to configure something ahead?
There might be some consequences for memory allocation as already discussed in #7, but I think the biggest problem is the Equatable
conformance. For known kinds, SymbolKit removes the language specifier prefix, i.e. swift.enum
is recognized as just enum
and stored this way. Therefore, KindIdentifier(rawValue: "enum") == KindIdentifier(identifier: "swift.enum")
yields true (note that the right initializer is used for decoding, whereas the left one is used for declaring the valid kinds in code). This is the behavior we want, as the language tag is handled elsewhere.
If we don't know all valid kinds (because we don't allow custom kinds to be registered), we can't just throw away the prefix, as it might be important (as for enum.case
). Thus, swift.custom
must be stored as swift.custom
and KindIdentifier(rawValue: "custom") == KindIdentifier(identifier: "swift.custom")
yields false. This is very bad, if one wants to handle the kinds in a language agnostic way (as it is currently done in Swift-DocC).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two comments here:
- This implementation of
register(_:)
as written will create data races in Swift-DocC due to the way it loads symbol graphs and symbols concurrently. This is why change KindIdentifier into a struct to handle unknown symbol kinds #7 originally had its set synchronized, and ultimately did away with the set entirely. The fact that these are registered separately from decoding sidesteps the issue somewhat, but doesn't prevent a data race if registering these symbol kinds on-demand. - I briefly looked into this a while back, but is it possible to register the symbol's "current language" to the decoder before decoding the symbol kind while it's being parsed, then storing that language token to
KindIdentifier
as well as the parsed kind if it's read in the raw identifier? That way, we can reliably detect thatswift.class
andobjc.class
are the same kind, while also letting new languages participate in fully-parsed symbol kinds without needing to conform to SymbolKit's existing listing. This could also be an enhancement that happens later on, but i'd be curious to see whether we can make it work.
let encoder: ((Self, Mixin, inout KeyedEncodingContainer<CodingKeys>) throws -> ())? | ||
let decoder: ((Self, KeyedDecodingContainer<CodingKeys>) throws -> Mixin?)? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should be storing these in this model. Also, I can see why you've made these closures for type-erasure purposes, but could we instead just keep a mapping from keys to the types they encode to/decode from, such as a [String: Mixin.Type]
dictionary? Then, we'd look up the mixin type for a given key and use that for decoding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where would you like them to be stored? I can only build the encoder
/decoder
closures from inside an extension to Mixin
, otherwise I cannot access the Self
I need for calling container.decode
/container.encode
.
I can see why you've made these closures for type-erasure purposes, but could we instead just keep a mapping from keys to the types they encode to/decode from, such as a [String: Mixin.Type] dictionary? Then, we'd look up the mixin type for a given key and use that for decoding.
I don't think that's possible with our target language version (might be possible from Swift 5.6):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not seeing the error you're seeing with Swift 5.7. It's worth trying with 5.6 as well, but we should consider bumping the Swift version to 5.6 here, or guarding this code in an #if swift(>=5.7)
check. It would remove the need for the type-erasing closures, and we can document that custom mixins are only available in 5.6/5.7. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would definitely be possible. I'd prefer to make a clean cut, though. If we want to go this route, I'd prefer to bump the language version for SymbolKit (and all of apple/swift-docc) to 5.7 and make Mixin
Hashable
and Codable
in one move. I think it would be really weird if bumping the language version to 5.7 in a project using SymbolKit would cause it to behave entirely different.
// However, if this `Mixin` does not conform to `Equatable`, | ||
// the `maybeEquatable` merely checks that the compared elements' | ||
// types match. | ||
var maybeEquatable: MaybeEquatable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These mechanisms to implement equality and hashing are ingenious, but I think they're overly-complicated for this use case. If Relationship
is Equatable but some of its members are not (e.g., some mixins), then maybe these should not contribute to equality checking of Relationship
. It would be the client's responsibility to verify equality for their custom mixins that are Equatable. I'm leaning towards SymbolKit checking equality of Equatable mixins defined in SymbolKit, and to document that clients should do equality checking for their own mixins if required.
What is the purpose of Relationship
and Symbol
being Equatable anyway? cc @QuietMisdreavus
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Relationship
is Hashable so that Swift-DocC can build sets of them when resolving external symbols. Equatable is there because Hashable requires it. As far as i can tell, Symbol
does not implement either of these protocols (though now it could, given that there's a mechanism for Mixin
to be hashed/equated).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm leaning towards SymbolKit checking equality of Equatable mixins defined in SymbolKit, and to document that clients should do equality checking for their own mixins if required.
I disagree. I think this would make handling Symbols/Relationships very annoying for clients with custom mixins. E.g. if you wanted to use the types as a key in a Dictionary, you'd always have to wrap them in a type that implements the additional equality constraints.
What would be a situation where you think my approach would be problematic @franklinsch? Or is this more about code cleanliness/performance concerns?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The added flexibility is nice, my concern is more about SymbolKit having an internal expectation that mixins are Equatable when in fact that's not enforced by the type system. Should we just make Mixin
Equatable and Hashable, then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with making Mixin
Equatable and Hashable is that it restricts us to Swift 5.7 - before then there's no way to use Mixin generically since the Equatable definition needs to reference the type in question.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my concern is more about SymbolKit having an internal expectation that mixins are Equatable when in fact that's not enforced by the type system
SymbolKit can rightfully make the assumption that everything important for equality will be considered in Symbol
/Relationship
's Equatable
conformance. SymbolKit knows for itself all predefined Mixins are Equatable
and the rest is up to those defining custom Mixins.
Should we just make Mixin Equatable and Hashable, then?
If we were to do that, Mixin
would get "self or associated type requirements", which (pre Swift 5.7) would break A LOT of source code. I'd only do that if we also bump SymbolKit's minimum language version to 5.7.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not opposed to bumping SymbolKit to 5.7 on the main
branch, which tracks the minor/major release after 5.7. The SymbolKit API is still evolving, and clients that need to use <5.7 can depend on SymbolKit's release/5.7
branch instead. Also, because SymbolKit is (mostly) a model library that evolves alongside the Swift compiler, it's not unreasonable to align SymbolKit's compiler version requirements with the toolchain version, even though of course, backwards compatibility wherever we can would be better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess that would also mean bumping the language version for all of apple/swift-docc to 5.7 then? Don't get me wrong, I'm absolutely not opposed to adopting Swift 5.7 - after all, it would be a huge quality of life improvement for all maintainers. I'm just not sure if getting rid of this (admittedly rather complex and - consequently - very ugly) type-erasure code justifies such a huge breakage.
Sources/SymbolKit/Mixin.swift
Outdated
/// | ||
/// Next to logging warnings, the function allows for either re-throwing the error, | ||
/// skipping the errornous entry, or providing a default value. | ||
static func onDecodingError(_ error: Error) throws -> Mixin? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem like it belongs on the Mixin model type. It's purely related to the coding process of the model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could also implement this using the userInfo
on Encoder
/Decoder
meaning that one would have to also register
this behavior for custom Mixins. Does that sound good?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that would be better from an API perspective, yes. Could it be registered at the same time as the mixin?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, sure. I'll try it that way!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolved in b93d460.
I'm gradually reading through this; it'll take some time to digest! What i've read so far is pretty impressive. There are a couple issues i remember running into that are fixed by this PR. |
@swift-ci Please test |
I realized I had forgotten to answer this @franklinsch : To put it short, citing the SymbolKit Readme, SymbolKit is The specification and reference model for the Symbol Graph File Format. The types I add in swiftlang/swift-docc#335 are not part of the Symbol Graph File Format. I use them to bring the graph in a different structure that better fits DocC's document structure. However, this is an implementation detail of Swift-DocC and is irrelevant for the Swift compiler or other language integrations. |
17608f7
to
590659a
Compare
I just cleaned up the code again after moving the onDecodingError/onEncodingError functions out of the Mixin protocol. Since I needed to store said error handlers as closures anyway, I merged them with the type-erasure-closures for encoding/decoding. I feel like this looks way cleaner than before and makes me think again if we can avoid bumping the project to Swift 5.7. Please take another look at how it's implemented now. Of course, there still is the |
This looks much cleaner indeed, thanks for doing that! Regarding the rest of the type-erasure code, I'm personally ok with bumping to 5.7 on the
Is it an inconvenience? I expect most Mixins to be model types that would be able to get the synthesised Equatable/Hashable requirements. |
The The bigger inconvenience is that - even in Swift 5.7 - one would have to add a lot of Therefore, I suggest to do either of the following:
In both cases I could start two follow up PRs on this repo and apple/swift-docc that add the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some thoughts about the PR and its discussion so far:
I'm fine with keeping the type-erasure code in SymbolKit, though (as i mention below) i would prefer if it were refactored into its own file(s) and more thoroughly documented. It's clever, which is a good thing and a bad thing, but it also resolves an issue with SymbolKit's maintenance - that new mixins need to be accommodated in their parent types' corresponding Equatable/Hashable implementations.
I think updating SymbolKit and Swift-DocC to Swift 5.7 (and other versions in the future) is fine, specifically because of how tied they are to the Swift compiler itself. SymbolKit is effectively the model library for handling symbol graphs as they appear from the compiler. If we add something to the compiler, there's a corresponding change here. Similarly for Swift-DocC, since it's distributed with the Swift toolchains, there's less of an impetus to keep a strict backwards-compatibility story. In both of these cases, people who need to accommodate older compilers can use release branches with little loss of functionality. Specifically in SymbolKit's case, having the model match what you're expecting to parse is important - it's better to have exactly the information you need, instead of that information plus a bunch of empty fields and properties where future data would have gone. I also agree that that enhancement can come later on, to keep the source churn per-PR down.
Overall, i really like this PR. Great work!
/// - Note: When working in an uncontrolled environment where other parts of your executable might be disturbed by your | ||
/// modifications to the symbol graph structure, register identifiers on your coder instead using | ||
/// ``CustomizableCoder/register(symbolKinds:)`` and maintain your own list of ``allCases``. | ||
public static func register(_ identifiers: Self...) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two comments here:
- This implementation of
register(_:)
as written will create data races in Swift-DocC due to the way it loads symbol graphs and symbols concurrently. This is why change KindIdentifier into a struct to handle unknown symbol kinds #7 originally had its set synchronized, and ultimately did away with the set entirely. The fact that these are registered separately from decoding sidesteps the issue somewhat, but doesn't prevent a data race if registering these symbol kinds on-demand. - I briefly looked into this a while back, but is it possible to register the symbol's "current language" to the decoder before decoding the symbol kind while it's being parsed, then storing that language token to
KindIdentifier
as well as the parsed kind if it's read in the raw identifier? That way, we can reliably detect thatswift.class
andobjc.class
are the same kind, while also letting new languages participate in fully-parsed symbol kinds without needing to conform to SymbolKit's existing listing. This could also be an enhancement that happens later on, but i'd be curious to see whether we can make it work.
…nd make graph format extensible - add symbol and relationship kinds introduced in apple/swift #59047 (swiftlang/swift#59047) - change Symbol.KindIdentifier to use a struct-type with static properties for known values - change Codable conformance of Symbol and Relationship to allow for encoding and decoding unknown Mixins
…tering unknown Mixin types
…ion + unify error handling closures and coding closures
c2b8b54
to
9c622f0
Compare
I moved the code to separate files and added a detailed explanation. I changed the behavior a little. Previously, mixins that did not conform to
Now, mixins that are not The previous behavior can still be achieved by providing an "empty"
Yes, however, as you already pointed out, it is not intended to be used "on demand". I added a respective note about concurrent access to the documentation. If one wants to register kinds on demand in a concurrent decoding process, one can still do so by registering them on the decoder in a synchronized manner.
That is an interesting idea! Would the "current language" have to be set manually based on the file that is decoded? I think that would be rather annoying. One could also use the // Symbol.swift
public init(from decoder: Decoder) throws {
let container = try decoder.container(keyedBy: CodingKeys.self)
identifier = try container.decode(Identifier.self, forKey: .identifier)
kind = try container.decode(Kind.self, forKey: .kind)
kind.identifier = kind.identifier.trimming(identifier.interfaceLanguage) I'm not sure if the assumption |
The idea i had in mind (that i'm not sure is even possible) is to have the |
I think that is not possible while using What I dislike about this approach is that decoding behavior changes when decoding a |
@QuietMisdreavus any remaining feedback here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is good to go now. Thanks so much!
@swift-ci Please test |
This PR is required to allow SwiftDocC to restructure the Symbol Graph Format after decoding the Symbol Graph Files. This is necessary to implement extensions to external types (fix swiftlang/swift-docc#210).
Summary
This PR contains four major changes:
Please refer to SymbolKit.docc/GraphFormatExtensions.md to see how the new APIs are to be used.
The most critical part of this implementation is how to handle decoding (and encoding) of custom
Symbol.KindIdentifier
s andMixin
s.Symbol
(andRelationship
's) Codable implementations must know how to encode/decode the custom types. This PR solves this problem by having the instance defining the custom extensions register the extensions to the Encoder/Decoder. This is realized viaEncoder
/Decoder
'suserInfo
. This choice has two key advantages over registering the extensions to the static context:Next to the user info based approach, this PR also offers the option to register
Symbol.KindIdentifier
to thestatic
context for convenience.#7 only provided a solution for
Symbol.KindIdentifier
which did not register custom identifiers in a global context, but stored the raw value in theSymbol.KindIdentifier
locally. This, however, has one disadvantage: Custom identifiers are parsed differently than those defined by SymbolKit. The language prefix cannot be removed, since the initializer cannot know, if the first section of the identifier is a language, or already important for the language-agnostic identifier.Dependencies
This PR should be merged after #45 as this PR is based on that branch.
Furthermore, this PR incorporates similar changes as #7 which seems to be stale.
Testing
I added test cases that:
Benchmark
Benchmarked this PR (e6478fd) against main (cfa80d7) using Swift-DocC's
bin/benchmark.swift
. Swift-DocC was on main (ed1770f974c3a1c6d2d1437cc07bc90a8f220c89). I used a bundle generated usingswift run make-test-bundle --sizeFactor 50
.There is no significant change in runtime:
Checklist
Make sure you check off the following items. If they cannot be completed, provide a reason.
./bin/test
script and it succeeded