-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Addressing HTTP servers over Unix domain sockets #577
Comments
It seems you don't need just addressing for this, but some kind of protocol as well. I recommend using https://wicg.io/ to see if there's interest to turn this into something more concrete. |
I'm not sure I understand why any additional protocol would be necessary. It's just HTTP over a stream socket. The server accepts connections and speaks HTTP just like it would for a TCP socket. Indeed, I can set up such a server today, and it works fine provided that the client provides a way to specify the socket, e.g., |
I don't even understand how this is not a thing yet. Especially now that Windows started supporting AF_UNIX sockets natively, it seems to be the best, cross-platform way to connect web and native apps without consuming a TCP port. |
Let me take a step back, what exactly is the ask from the URL Standard here? |
The ask is for the URL standard to specify a syntax for referring to a page served via HTTP over a UNIX domain socket. Currently, applications that want to support connecting to an HTTP service have to pick from one of the following three:
None of these are ideal. Deciding on a standardized URL syntax allows different implementations to implement the functionality in a common, standards-compliant way. |
I see, https://wicg.io/ is the place for that. The URL standard defines the generic syntax. If you want to define the syntax for a particular URL scheme as well as behavior, you would do that in something that builds upon the URL standard. E.g., https://fetch.spec.whatwg.org/#data-urls for |
Let me rephrase: the specific ask for the URL standard is to provide an allowance in the URL syntax for specifying a UNIX domain socket, either in lieu of the port (e.g., |
I recommend using something like |
It's the same protocol over a stream socket, just a different address (ie. authority part). Ok, so it's a different protocol in the sense of IP, but so are IPPROTO_IP and IPPROTO_IPV6, and the URL standard doesn't treat those as different. The relevant comparison I think are address families for stream sockets, like AF_INET, AF_INET6 and AF_UNIX. Once the stream socket has been established (as specified by the authority part of the URL), HTTP software shouldn't care or even know how the stream is transported. Most invented, non-standard approaches for HTTP-over-unix-sockets seem to gravitate to something like a different scheme (since the authority part can't really be disambiguated from a hostname if relative socket paths are allowed from what I can see), like http+unix or https+unix, and then percent-encoding the socket into the authority part, and then everything works naturally from there from what I can see. I've also seen (and used) enclosing the socket path in [] in the authority part and keeping the scheme as http or https, but I think that namespace clashes with IPv6 style numeric addresses like [::1]:80. RFC 3986 (in section 3.2.2) kind of leaves space for this by anticipating future formats within the [], and providing a version prefix to disambiguate them. Overall I like this approach the best (it extends into the error space so it doesn't change the interpretation of any valid existing URL, lives in an extension space envisioned by the standard, minimally extends just the appropriate part of the standard (authority part), keeps the schemes http and https to mean "this is a resource we talk to this authority using the http(s) protocol for", and so preserves compatibility for software that uses the scheme to know what protocol to speak with the authority over the socket. |
Changing the syntax of URLs is not really something we're willing to do. That has a substantive cost on the overall ecosystem. The benefits would have to be tremendous. |
Syntax in
|
The strongest argument I can think of for this is: http(s) URLs have special parsing quirks which don't apply if the scheme is http+unix. So for a perfect 1:1 behaviour match, UDSs would need to use an actual http URL, not a custom scheme (similar to IP addresses). That said, I'm also not a fan of adding yet another kind of host (file paths). My preference would be to use a combination of:
This is a perfectly valid HTTP URL, and should be capable of representing any HTTP request target. Alternatively, you could try to get (Note: this would also mean that all UDS URLs have the same origin, although that could be remedied by adding a discriminator to the fake hostname to make your own zones of trust, e.g. |
I'm not sure using the fragment is really tenable for these use cases (and local web dev, especially). Many web applications use the fragment for their own purposes in JavaScript, whereas the host (at least it my experience) tends to be handled more opaquely. What would be the main drawback for allowing additional characters within [] for the host portion of an HTTP URL? |
Ah yes, you're right, it wouldn't work for local web development. I was thinking more about generic HTTP servers. The main drawbacks IMO are:
|
Yes, I think the place for the UDS socket is in the authority portion - that's the bit that has the responsibility for describing the endpoint of the stream socket to talk to for this resource. Putting it elsewhere feels like an abuse and likely to cause unforeseen problems (HTTP client software will certainly have the host portion of the URL available in the portion of the code that establishes the stream socket, but may not have the fragment). I think the namespace collision with IPv6 literals and syntax validation for UDS paths can be solved by:
It's up to the host to decode and translate the path into whatever native scheme that OS uses (just as it is for the path portion of the URI). For me the motivation for supporting HTTP over UDS goes way beyond web browsers (and I would see that as a minor use case for this) - for better or worse HTTP has become a lingua franca protocol for anything that wants to communicate on the Internet (consider websockets for some of the forces that drive this), and that is increasingly machine to machine. For example: we run an online marketplace that serves about 10 million requests a day over HTTP (excluding static resources offloaded to a CDN), but each of those involve several HTTP interactions with other services to construct the response: Elasticsearch queries, S3 to fetch image sources that are resized, etc, a whole host of REST services for shipping estimates, geocoding, ratings and reviews, federated authentication providers etc. So, by volume, the overwhelming majority of HTTP requests our webservers are party to are between them and other servers, and aren't transporting web pages. As the trend toward microservices and containerization continues this will only increase, and it's particularly there that I see HTTP-over-UDS being useful:
The other trend is for UIs to be implemented in HTML rather than some OS-native widget set (Android, iOS, GTK, QT, MacOS native controls, Windows native controls, etc), even when the application is entirely local on the user's device. There are very good reasons for this:
In this use case the hierarchical namespace issue is important and addresses a major downside to this pattern - choosing a port from the flat, system-wide shared namespace (ok, so the listening socket can specify 0 and have the OS pick a random unused port on some systems, but that's a bit ugly). Much nicer to use Finally, consider things like headless Chrome in an automated CI/CD pipeline - the software managing the tests being run on the deployment candidate version could start a number of headless chrome instances and run tests in parallel, easily addressing the websocket each provides with a UDS path like The tech already exists to make these obvious next steps in application provisioning and inter-service communication happen (even Windows supports Local sockets aka UDS), and the scope of the change for existing HTTP client software should be small and of limited scope (URL parsing, name resolution and stream socket establishment steps) but it can't happen unless there is a standardised way to address these sockets. |
What exactly is wrong with #577 (comment)? @karwa |
You ask the IETF, just like Personally, I'd go with something like:
Yes, the escaping is ugly, but it's much cleaner than overloading IPV6 in URLs. Alternatively, you might be able to get away with: |
@mnot any update on this? Was it implemented? Should this ticket be reopened? I'm also interested in this. |
I just left a comment with some context; I don't know that anything else has happened. |
I haven't read anything here that seems to justify breaking with the familiar pattern, "<protocol>://<domain>/<filepath>" or injecting a lot of special characters into the URL, or mimicking an IPv6 address. The protocol is simply "http". The domain is right there in the name, "Unix Domain Socket". Like any other top level domain - net, com, org - the domain is simply "unix". I don't know any reason that a web browser application cannot parse the domain from a URL, recognize a nonstandard domain name, and invoke a special handler for a non-network socket. The difficulty seems to be in distinguishing the path to the socket from the path to the resource file. The "HTTP with socket path as the port" option, above, makes the most sense. And since a special handler must already be invoked for this "unix domain", I expect that colons - ":" - can continue to be used as the "port" separator for the socket path. Altogether, that suggests a straightforward URL, as in: "http://unix:/var/run/server/ht.socket:/path/to/resource.html". Is there any reason that those repeating ":/" character sequences would pose a problem in a URL? This approach would not impose any limitation on the use of ":" in the resource path name, since a "unix domain" must be followed by a socket path, and that path will always be delimited by ":/". Any subsequent colons must then be part of the resource path name. And, of course, this URL format still supports specifying any arbitrary protocol, served through a unix domain socket. And there is nothing redundant or misleading in the URL, as would be the case with any format requiring the name "localhost" or involving special parameter passing. |
http+uds:///path/to/socket? |
@michael-o, that doesn't provide any means to specify the resource path, as it is putting the path to the socket where the resource path should go. |
What you absolutely don't want is the ability for any web server in the wild to use your browser to issue arbitrary HTTP requests to arbitrary Unix sockets. It is already quite difficult for people to grasp the notion that LAN-only services and localhost-ony services can be attacked by remote web servers (CSRF, DNS rebinding attacks to LAN services or localhost-services). If a web browser, were to allow arbitrary websites to issue HTTP request to arbitrary UNIX sockets, this would open up a wide range of attack opportunity (eg. using DNS rebinding attacks to attack UNIX-socket bound Docker servers) including attacks based on protocol-confusion. If you wanted such a feature to be mostly safe, you would have to actively opt-in:
Firefox currently allows to use a SOCKS proxy over UNIX socket (including multiple suchs proxies when using FoxyProxy). It would be possible to have a Unix-bound SOCKS proxy which would resolve some domain names to Unix socket. |
@randomstuff only because it is addressable doesn't mean it is reachable. And after all websites currently can already contain "file:///" urls or similar. |
You don't really want to put the UDS path in the URL's path, because somebody could write: <a href="/help">...</a> And that would overwrite the path to the UDS, meaning a broken link. Instead, you really want this to be part of the hostname. Hostnames are intrinsically abstract already, so there is no fundamental reason they can't resolve to a local socket. In other words, @randomstuff 's project is doing the conceptually correct thing by providing a mapping from hostnames to sockets. And perhaps most importantly, it shows that this need can be met without changing the URL standard. |
Reading back through this discussion, it has not at all been established that there is a consensus as to "where" the underlying issue should lie, and so, any "solution" offered can appear to simply "miss the point", depending upon your point of view. I find myself back-and-forth about the various approaches suggested, including my own. I can summarize at least four alternatives proposed here to the issue of, to generalize, "Addressing Unix Domain Sockets".
Without first saying which approach we are thinking about, the conversation can become kind of silly, since any solution which "works", works. Otherwise, it may be that I both enjoy, and cringe at, "bike shedding" as much as anyone else. |
For context about the pitfalls of stuffing/smuggling a Unix socket path in a HTTP URI, the Node.js Requests and got libraries would allow stuffing a Unix domain socket path in a HTTP URI like so: I would think that the ability to address arbitrary Unix domain sockets in HTTP(S) URIs is fraught with peril. If this were part of the URI standards, client applications and libraries would be expected to implement this feature and this would certainly end up generating a lot of vulnerabilities such as CVE-2022-33987: attacks on arbitrary Unix domain socket application through malicious redirects or more generally through malicious URIs. What might be useful is:
but this is really outside of the scope of the URL standard. |
While you have a good point it is sort of a shame to block UNIX sockets due to this. The same problems exist for local services, LAN servers (like routers) and even cloud VM metadata servers are open to vulnerabilities due to this. Really every redirect target should be carefully considered, and every DNS lookup should have the resulting IP treated with scrutiny. Unfortunately that isn't the world that we live in, developers are careless and many (most?) popular HTTP libraries don't even expose the primitives to do this. I am not aware of even a single library that prevents this by default. In practice things like However while this vulnerability is not specific to UNIX sockets it is maybe wise to avoid adding more surfaces that can be accessed via this common issue. |
Isn't this just security through obscurity? Or is the idea that the service hosting the domain socket needs to opt-in. Presumably because it has some sort of heuristics to block misdirected requests. |
Yes. One motivation of OP was access control:
However, in order to increase the security of some local application (reduction of the attack surface, rely on implicit authentication through UID and filesystem access control), this might end-up:
Some opt-in mechanism could mitigate these issues to some extent. |
While this may increase the attack surface of some services it will also decrease the attack surface of others as the original message explains. So it is important to weight the benefits as well as consider possible mitigations that can make the tradeoffs more favourable. |
Given the ambiguity in addressing unix domain sockets, I am still inclined to fault the basic RFC 3986. So, here is a brief review, several rants, and another suggestion for unix domain socket addressing, simply using the square bracket "hack". Assuming the general concept of "Uniform Resource Identifier" from Section 1.1.3., the basic structure is defined in Section 3 as having 5 components: scheme, authority, path, query, and fragment. First off, then, what type of URI component is a unix domain socket (UDS) address? The original context here is "HTTP servers", and "http" is, itself, a type of "scheme". So, UDS as "scheme" is not my first choice. Now, RFC 3986 uses the term "resource" without much constraint, saying 'This specification does not limit the scope of what might be a resource; rather, the term "resource" is used in a general sense for whatever might be identified by a URI.' Effectively, a "resource" is whatever the user wants it to be. Is a UDS a "resource" itself? For the purpose here, "no". The "resource" implied by an HTTP server is some other specific data delivered using HTTP. Then, is a UDS a type of "path", "query", or "fragment"? From Section 3.3, "The path component contains data, usually organized in hierarchical form, that, along with data in the non-hierarchical query component (Section 3.4), serves to identify a resource within the scope of the URI's scheme and naming authority (if any)." Since the UDS is not the "resource", and, since the "path" identifies a "resource", then the UDS cannot be a "path". Similarly, from Sections 3.4. Query and 3.5 Fragment, both of these components are also references to the "resource". So the UDS is also not either a "query" or a "fragment". And that leads to the inference that the UDS must be a kind of "authority". RFC 3986 actually subdivides the "authority" component itself into three parts, in Section 3.2.:
And here, the same analysis can be applied. Is the UDS a type of "userinfo"? Section 3.2.1. says, "The userinfo subcomponent may consist of a user name and, optionally, scheme-specific information about how to gain authorization to access the resource." Hmm - "scheme-specific information about how to gain authorization to access the resource" - "how to gain authorization". Does the UDS tell "how to gain authorization"? Sort of - maybe - not really - I'd say "no". Is the UDS a type of "host"? From Section 3.2.2., "The host subcomponent of authority is identified by an IP literal encapsulated within square brackets, an IPv4 address in dotted- decimal form, or a registered name." Is, then, the UDS a type of "IP literal", "IPv4 address", or a "registered name"? Hmm - what is an "IP literal"? Again, from Section 3.2.2.:
Since a UDS is not any of an "IPv6address / IPvFuture", an "Pv4 address", or a "registered name", then "no", a UDS is also not any type of "host". And then, using RFC 3986, there is only one interpretation remaining. Is the UDS a type of "port"? From Section 3.2.3. Port:
Well, clearly, and as has been mentioned previously in this discussion, the UDS is not a "DIGIT". And here is where I find fault with RFC 3986, in its limited scope when defining "port". Except that, Section 3.2.3. goes on to say, "The type of port designated by the port number (e.g., TCP, UDP, SCTP) is defined by the URI scheme." And that statement suggests asking "What sort of Communication Protocol is UDS?" Of course a UDS is not itself a kind of communication protocol, but the relationship should become apparent. It may be more illuminating to ask the converse, "What sort of Sockets are TCP, UDP, and SCTP?" And then, the Unix - in this case Linux - man pages offer some guidance.
And generally, "What is a 'socket'"? In part:
and then:
Here is my first rant about RFC 3986. The "port" component of the defined URI has presumed an Address Family, here implying AF_INET exclusively, along with what is a merely incidental association with a port "number". There is no explanation or justification given for this presumption. Alternatively, it might be supposed that this presumption of an Address Family is an erroneous interpretation by the reader of RFC 3986. It may instead be supposed that the "port" component of the URI is simply a general concept to be associated with any Address Family which might be included from the list given from man(2)socket. And so, I believe that this is the interpretation, while not "official", yet, that must be taken with RFC 3986. Then, "What is the 'port' subcomponent of authority of an Address Family AF_UNIX socket?" Here, man(7)unix tells us, "Traditionally, UNIX domain sockets can be either unnamed, or bound to a filesystem pathname (marked as being of type socket)." In our case, we are looking for a URI, so "unnamed" is not useful. Instead, the man page offers "a filesystem pathname". That seems clear enough. Therefore, an RFC 3986 URI "port" for an AF_UNIX socket might also be interpreted as simply "a filesystem pathname", instead of exclusively as a number. Allowing that, then the remaining problem only involves appropriate delimiters, to allow correctly parsing the resulting URI for the AF_UNIX "port". Referring again to Section 2.2.:
Incidentally, it may be noted that this RFC 3986 list of delimiters is missing the percent "%", from Section 2.1 Percent-Encoding, and the set of White Space characters generally. The reader is now well into the realm of "inferring", "guessing", and "interpreting", instead of specifically "defining". Here is my second rant about RFC 3986, related to the use of delimiters. The Section 3. URI syntax explicitly defines the ":" as separating the "scheme" from the "authority". Subsequently, in Section 3.2., it says 'The authority component is preceded by a double slash ("//") and is terminated by the next slash ("/"), question mark ("?"), or number sign ("#") character, or by the end of the URI.' Taken together, this double slash actually provides no information whatsoever in the URI and only serves to "poison" the parsing of the URI, by requiring the parser to distinguish potentially between ":///...", "://...", and ":/...". For instance, the "file" scheme, RFC 8089, supports optionally leaving out this useless "//" altogether. RFC 3986 offers no explanation or justification for this use the double slash "//". The delimiter might as well have been defined explicitly as "://". This makes any use of the slash "/" as a delimiter in the URI potentially problematic, where it is also used as an essential component of any unix "filesystem pathname", when referring to the proposed UDS AF_UNIX "port", as well as, already, referring to an actual "resource" by pathname. A third rant regards Section 3.2.2 Host, which says:
The only reason that these square brackets are needed is because of the repeated and overloaded use of the colon ":" as a delimiter in the "authority", in Section 3.2 preceding the "port", and in Section 3.2.1, potentially subdividing the "userinfo". Considering that RFC 3513 defines the use of colon ":" as the field delimiter in an IPv6 address, this should have glaringly suggested that the same ":" would be a bad choice for a delimiter in the RFC 3986 "authority" component and subcomponents of the URI. And there are plenty of alternative characters to choose, from the small ASCII character set, for use as delimiters in the "authority". The use of the square brackets, then, is a "hack", consequent of a bad choice for delimiter in the "authortiy" component of the URI. Be that as it may, suppose that the prohibition "This is the only place where square bracket characters are allowed in the URI syntax", is ignored. Then, this same "hack" can be applied equally to the unfortunate choice of the slash "/" as a delimiter within the URI syntax with respect to the "port" subcomponent of the "authority", as with the "host" subcomponent. I propose now another alternative to addressing unix domain sockets. By example, using the square bracket "hack", the result would allow, for instance, all of:
All of these examples otherwise strictly follow the RFC 3986 URI syntax. That is the least intrusive "hack" to UDS addressing and merely extends an existing URI "hack". A "cleaner" revision to RFC 3986 would be to eliminate the use of either the colon ":" or the slash "/" as delimiters in the URI syntax delineating its components and subcomponents, except for the initial ":" separating the "scheme" and "authority". There are 11 other "sub-delims" defined in RFC 3986 that seem perfectly usable as delimiters in the URI "authority", which would obviate the need for using these square bracket "hacks" completely. With reference to previous remarks about security issues, it may be noted that man(7)unix describes AF_UNIX as supporting communication "between processes on the same machine", so there would be no "remote access" possible, despite the http/https "scheme", if that constraint were followed. And, since the UDS "port" is just a Unix "filesystem pathname", there are many existing security measures available. On the other hand, this suggested UDS AF_UNIX "port" addressing clearly does lend itself to replacing "localhost" with "some-remote-host", to access some UDS on, literally, a remote host. But then, any http/https "server" will be providing its own security measures, should it allow UDS addressing at all, so that's a different issue and not really a problem here. This does introduce another concept, access to a UDS by a local http/https server, as opposed to UDS access only by a local html display client. There is still the question of whether the http/https schemes would need to be formally updated to acknowledge any kind of UDS AF_UNIX "port" addressing. Reading at RFC 9110, Sections 4.2.1. http URI Scheme and 4.2.2. https URI Scheme:
By my reading, "no". The http/https schemes simply refer to the RFC 3986 URI "optional port number" definition, and would therefore follow any update to RFC 3986 itself. The much more difficult issue remains with any html display client, which must be taught to recognize any kind of UDS AF_UNIX "port" addressing. Again, strictly, that is a separate issue. But this does point-out that the proposal here implies that there are two distinct "solution" arenas to confront: first, RFC 3986 itself, and second, the various de facto standard html display clients extent. The Node.js security issue mentioned by @randomstuff is - well - a Node.js security issue, as was mentioned. It's not a server security issue and has nothing to do with UDS AF_UNIX "port" addressing per se. Of course, that also doesn't mean that html display client security issues go away. It's just a separate problem - though, it's still a problem. It is interesting that this raises the question of security in the "reverse" direction, from a remote "server" potentially accessing a local "client resource", through a UDS. That is not something inherent in the original concept of http client/server communication, but a consequence of allowing the "client" to potentially act, itself, as a kind of "server", using some client facility, as with javascript, to access a local resource. The security model, then, requires simply that the client be smart enough not to do "anything stupid" at the behest of the server. Ha! |
Lots of different proposals have been made above:
Changing the URL syntax requires coming up with a solution for all URLs, not just HTTP. Backwards compatibility needs to be considered for a very large ecosystem, and incremental deployment needs to be considered. As Anne said above, these factors raise the bar considerably for any proposal, and so should be a last resort (there's currently an effort by IPv6 people to do a similar thing, and it's not going well for these reasons). Creating a new TLD for one protocol isn't good architecture, and a lot of people are going to push back on it. Again, a proposal in this area is likely to hit friction from other, unrelated communities (in this case, DNS). Appending a suffix to the URL scheme implies that the suffix makes sense for other URL schemes. This means that wider review and discussion will need to take place to get it adopted. That makes defining a new URL scheme the approach that's most likely to succeed. Such a scheme could define itself to use an authority that is not grounded in DNS, so it could be something like:
Defining it as a new scheme would also provide an opportunity to answer a lot of questions like "is HTTP/1 or HTTP/2 used"? "does it use TLS"? and so on. But that's just my opinion. If there's interest in solving this problem, I'd suggest that someone write a document outlining a proposal and bring it to the IETF HTTP WG - there are are larger diversity of HTTP implementers represented there that can provide feedback. |
On reflection, I'm going to totally agree with that.
There is nothing in any of my, or several other, proposals that is specific to only the http/https "schemes", as the term is defined in RFC 3986. Again, RFC 8820, Section 2.1, "URI Schemes", strongly discourages the introduction of new "schemes". I have suggest three alternatives for - to put it generally - Address Family "port" addressing. Extending the overloaded use of the colon ":" delimiter:
Extending the square bracket hack:
Using alternate delimiters, eliminating the double slash "//", the square bracket hack "["..."]", and
More generally, any specific delimiter between RFC 3986 "authority" and "path" would solve the URI issue raised here. To illustrate, where RFC 3986 has defined:
This would instead become:
The essential problem for Address Family "port" addressing comes down to RFC 3986 failing to just define a specific delimiter between its "authority" and "path" components, or, stating this another way, failing to define a specific Section 3.3. even provides an unconvincing example of "path" while trying to "paper-over" this failure:
Why try to "shoehorn" mailto:[email protected] into an example of "path"? "[email protected]" looks like |
I wrote that RFC. That is not what Section 2.1 says. |
Hmm - copying the text:
and, https://www.rfc-editor.org/info/bcp35
Then, by "exceptional circumstance", you meant modifying, literally, the document BCP35 itself, and not the resulting list of registered "schemes" referencing BCP35? I stand corrected. Still, there is the problem of modifying existing, or creating new, applications able to utilize any particular scheme. I don't expect that my web browser actually supports the currently 374 different registered schemes available. In fact, the trend has been for, for instance, web browsers to drop support for less commonly used schemes - no more gopher, ftp, or mailto - with some functionality being replaced by specialized scheme applications or by "groupware" suites. I still don't agree that defining and registering a new scheme, exclusively to support html rendering from a local unix domain socket, is a good idea. Rather, that use case does serve to illuminate a deeper systemic fault in RFC 3986. I did rather like gopher, though, ... |
@mot: A while ago I already worte to the IETF mailing lists about such a change, but they just forwarded me here. I don't remember all the details, as the whole thing started ages ago (ok, probably more like about one year), but I could try to look for these related mails. You already have seen my initial suggestion in another ticket? It would be backwards compatible by allowing for default values to be omitted. It would work with everything that currently uses the URL Schema (in a standards compliant way, at least). And it would also allow for the very verbose way of specifying all the protocols down to the wire.... |
Understand that RFC 8820 is best current practice for applications that use HTTP (what some people call "HTTP APIs" or "REST APIs") -- it's saying that it's exceptional that a one of them would require a new scheme.
Browsers are going to have to change if they want to support anything that happens here, so that isn't a decisive factor regarding syntax.
HTTP isn't just for HTML. To be clear, I don't think a new scheme is the only way to do this; it's just more straightforward than other suggestions so far.
I hadn't, but that seems like a lot of work (and abstraction) to get to the goals here. Normally, protocols can negotiate transitions like this (see eg the evolution from HTTP 1-3). What's different here is that unix domain sockets have a completely different authority, and a subtly different transport (as opposed to TCP). |
Just wanted to chime in here with my own opinion on this. Unix domain sockets are an OS-specific transport. Windows has named pipes instead. Due to their local nature, simply embedding the Unix socket path or named pipe path would result in two different and somewhat incompatible representations for applications which must work on both Windows and Unix-like systems. The ideal solution, would be to have some kind of alternative, ideally OS-neutral, namespace, perhaps under the IPv6 link-local or other reserved range, which maps directly to OS-specific network transports like Unix domain sockets or named pipes. Note that unlike as stated above, it is expected that the top level domain or IPv6 prefix, under which the Unix domain sockets or named pipes will be mapped, be configurable, to prevent collisions. For example, one could connect to The rationale for link local here is because at least on Linux, they fail "closed" i.e. will not result in any actual TCP connection if they are not recognized. The 127.0.0.0/8 range can also be used, in which case they can still leak a TCP connection, but the surface is still limited to the local host. |
AFAIU, AF_UNIX has come to Windows.
One benefit of filesystem sockets is that you can skip the numeric address part and directly map human-friendly names (virtual hostnames) into (human-friendly) paths. This way you avoid the cumbersome task of managing a mapping human-firendly virtual hostnames into numeric addresses. |
True, I know that AF_UNIX does exist on Windows. But the idea of mapping IPv6 addressing to Unix sockets would not be limited to Unix sockets, but rather also to other stream-based TCP-like transports like
Hostnames tend to be stable, Unix domain sockets tend to be not. The same effect could be accomplished by putting one of those IP addresses into the The connection to The advantage of this mapping is that the set of allowed Unix domain sockets that could be connected to is naturally restricted to the end-user-defined mapping of IPv6 prefixes to filesystem path prefixes. Only the unix sockets under path prefixes mentioned in a user-defined mapping would be visible to the application. (To put things another way, the |
That's what I was saying by "you don't have to manage numeric addresses". In your proposition, you would still have to maintain a (
You can achieve the same effect by directly mapping some domain names into Unix socket paths (without the intermediate IPv6 address). Note: if you map host names to special IP addresses which are mapped to Unix sockets, you then have to decide what happens when you receive one of those special IPv6 address from the DNS. Accepting them might be a vulnerability (and for example open up your service to DNS-rebinding attacks). Loopback IP addresses and private IP addresses are often filtered for this reason. |
Sure, and I guess you might be kind of right about that. The mapping of IPv6 addressing to unix domain sockets in this manner was something that could be easily done in an LD_PRELOAD library, which ensured that it would also work with applications that resolved the domain name to an IP and connected to that IP without having to change the application. But the mapping might be a little bit more flexible because you can map multiple domains to a single IP, which can be useful for testing name based virtual hosting.
Any sane application will use security features like checking the Host header or rejecting an SSL certificate to prevent this. Besides, it doesn't need to be under fe8f::3:0:0/96, it could also be under an IPv4 loopback prefix. |
All of this is orthogonal to the problem, which is we need a standardised way to express an http(s) URL that points to a unix domain socket. There is no reason why an arbitrary (and legacy) difference between two arbitrary operating systems should place a limit on a standard like an URL definition. Other types of sockets on other platforms are discussions that should go under a separate issue. |
What is the next step, a draft RFC? I keep hitting this problem at https://github.com/apache/httpd, if an RFC is the way forward I can make some time for it. |
You're right, this is the url repo on GitHub, and as such would be the place to define a URL standard for encoding a Unix domain socket path. The Unix socket paths would of course be interpreted in an OS-dependent manner, which is generally not a problem at all. Consider All of the comments above would effectively be means of encoding a file path string in place of the domain name or IP address of a URL string. This is not much different from the specification of an "interface" in a programming language like Java or Go, which generally specify what is available to a user and what functions an implementation has to implement, but they generally do not specify how they should be implemented. And as discussed above, Windows Which means that we can effectively generalize this issue from "encoding a Unix socket path" to "encoding a filesystem path like string which acts as the equivalent of a hostname, which could be interpreted in a system-dependent manner". And just like how I mentioned that
So I'm quite supportive of the issue at hand, so long as it's not restricted to AF_UNIX sockets, because if we need to support other types of transport, then we won't need to have all of this discussion again. |
But one thing I sort of need to point out in this context is still the fact that the URL standard is still an "interface". Which means that we need to differentiate between attempts to modify the "interface" of a URL by changing the URL standard, and attempts to modify the "implementation" of URLs by changing individual implementations, the former of which is merely a means of abstractly expressing a Unix domain socket or similar string in a URL with few constraints on the actual implementation, and the latter of which is the actual means of connecting to a Unix domain socket i.e. All of the above syntax proposals would effectively be modifying the "interface" of a URL to support an extra method of connecting to a Unix domain socket. In many cases, interface implementations are not required to implement every possible method, if it is known that users of the interface will not use that method, and that is certainly true in other contexts (such as the Java Collections API with immutable or unmodifiable collections used with functions that only attempt to read from the collection). Sorry, but the Liskov substitution principle is not very applicable, otherwise every client that implements URLs would have to support every single URL scheme, and that is simply infeasible. Which means that even if we do have a standard for encoding a Unix domain socket path or similar string in a URL string, we cannot guarantee that every implementation of URLs (such as in browsers) will honor it. This effectively means that many of the linked issues regarding non-support of Unix domain sockets in various clients that take in URLs might be considered to be wishful thinking, that is, even if a scheme for encoding a Unix domain socket path is devised, there is no guarantee that every app will end up supporting it. On the other hand, my and @randomstuff's proposals of proxy servers or LD_PRELOAD libraries to support the connection of clients to Unix domain sockets are means of changing the implementation of URLs. It is similar to adding support for a new filesystem in an operating system kernel: the new filesystem can be used by applications transparently, by referencing paths on that filesystem in file access APIs, without having to change the application, because all the different filesystems all share the same interface. This is generally much more feasible to accomplish. LD_PRELOAD might not be possible for Go binaries at this moment, but this is being worked on. Ultimately, this means that the mere act of connecting to a Unix domain socket is not necessarily something that requires changing the URL standard, if it is possible to shoehorn it into some existing interface. It may seem very hacky or unsightly, but the major advantage is that client applications do not need to be changed, considering how many HTTP clients or web browsers there exists out in the wild. A similar issue exists in issue #392 where there is discussion on encoding an IPv6 link local zone identifier in a URL. The mere act of connecting to an ipv6 link local address is something that can simply be done by changing the implementation. For example, interpreting subdomains of the A more relatable example is the fact that the URL syntax did not need to be changed in order for connections to domain names to go over IPv6. If the URL standard did not have the square bracket notation, then it would have still been possible to connect to IPv6 websites on the network layer, the only limitation would have been that it would have required the use of a domain name to do so. The main reason why the URL standard ultimately did need to be changed in that case is because of the legitimate interest in connecting to IPv6 literal websites in the same way we could have done it with IPv4. |
It is often desirable to run various HTTP servers that are only locally connectable. These could be local daemons that expose an HTTP API and/or web GUI, a local dev instance of a web server, et cetera.
For these use cases, using Unix domain sockets provides two major advantages over TCP on localhost:
Indeed, due to these advantages, many servers/services already provide options for listening via a Unix domain socket rather a local TCP port. Unfortunately, there is not currently an agreed-upon way to address such a service in a URL. As a result, clients who choose to support it end up creating there own bespoke approach (e.g., a special command-line flag, or a custom URL format), while others choose not to support it so as not to bring their URL parsing out-of-spec (among other potential concerns).
Here are some of the various URL formats I've seen used or suggested:
unix:/path/to/socket.sock
. This lacks both the protocol and resource path, so it can only be used for clients that already know they'll be speaking to a specific HTTP API, and is not generally usable.http://localhost:[/path/to/socket.sock]/resource
. Only allowed when host islocalhost
. Paths containing]
could either be disallowed or URL encoded.http+unix://%2Fpath%2Fto%2Fsocket.sock/resource
. Distinct scheme allows existinghttp
URL parsing to stay the same. URL encoding reduces read- and type-ability.http+unix://[/path/to/socket.sock]/resource
or justhttp://[/path/to/socket.sock]/resource
. (The latter would require using the leading/
of the socket path to disambiguate from an IPv6 address.)References:
Archived Google+ post suggesting the socket-as-port approach:
https://web.archive.org/web/20190321081447/https://plus.google.com/110699958808389605834/posts/DyoJ6W6ufET
My request for this functionality if Firefox, which sent me here:
https://bugzilla.mozilla.org/show_bug.cgi?id=1688774
Some previous discussion that was linked in the Firefox bug:
https://daniel.haxx.se/blog/2008/04/14/http-over-unix-domain-sockets/
https://bugs.chromium.org/p/chromium/issues/detail?id=451721
The text was updated successfully, but these errors were encountered: