Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

http_reply: please add support for http_reply(codes(Cs), ...) #9

Closed
triska opened this issue Jul 27, 2015 · 9 comments
Closed

http_reply: please add support for http_reply(codes(Cs), ...) #9

triska opened this issue Jul 27, 2015 · 9 comments

Comments

@triska
Copy link
Member

triska commented Jul 27, 2015

It would be very useful to reply from a list of codes using http_reply(codes(Cs), ...), especially since http_get/3 already supports reading to a list of codes. Please consider supporting this.

My immediate use case is the implementation of a simple reverse HTTP proxy: I query existing HTTP servers and relay their responses to the client. I can use a memory file for this, but would prefer to simply use a list of codes.

@JanWielemaker
Copy link
Member

I have my doubts. For example, http_get/3 uses a memory file as an intermediate for returning a list of codes. http_get/3 is basically superseded by http_open/3 Shouldn't the proxy do something along these lines (omitting details and error handling for clarity):

proxy(Request) :-
    <compute URL to fetch from>,
    http_open(Target, In, <some options>),
    <formulate header from options>,
    copy_stream_data(In, current_output),
    close(In),

@triska
Copy link
Member Author

triska commented Aug 11, 2015

Ideally, I want to reply simply with:

throw(http_reply(codes(Cs)))

So, if possible, I would like to avoid using copy_stream_data/2 and close/1: I hope that it is possible to channel keep-alive connections through the proxy? Closing the stream would prevent this.

@JanWielemaker
Copy link
Member

Using throw(http_reply(codes(Cs))) is a really inefficient approach. http_get/3 first makes a memory file, then creates a code list (which is a big blow up), then does a throw, which copies the code list, then needs to compute the content length in bytes (as the codes may contain non-ASCII, so we need to do multi-byte encoding) and then writes the whole lot. I think that is not something we should aim at.

Using the approach I proposed copies a lot less and even allows passing the data as it arrives if we use chunked encoding or a closed connection. Keep-alive is indeed not provided by the current http_open/3, but there is no real reason why this is not the case. It can use the same connection reuse as http_get/3, returning a range stream that will simply generate an end-of-file after the specified number of characters are read. The primitives for that are already in the HTTP package.

@triska
Copy link
Member Author

triska commented Oct 29, 2015

One requirement I also have: The proxy cannot just blindly copy everything that the target URL emits. Redirections are an example where rewriting of the emitted content is necessary, because the target web server emits URL redirections from its own point of view, yet the new target goals need to make sense in terms of the proxy's view, where the target service can be made available under a different umbrella URL, and redirections must match that.

In #11, you suggest the new option bytes/2, which makes a lot of sense. I guess that would also solve this issue.

@triska
Copy link
Member Author

triska commented Nov 3, 2015

I have now tried the copy_stream_data/2 method you show above. One issue I found: It does not faithfully relay the Content-type header field. I have a situation where a target server replies to the client — and therefore also to the reverse proxy — with:

HTTP/1.1 200 OK
Date: Tue, 03 Nov 2015 22:29:45 GMT
Last-modified: Tue, 03 Nov 2015 21:37:00 GMT
Connection: close
Content-Length: 372
Content-Type: text/html

The reverse proxy parses this reply correctly: I see from debug messages that content_type('text/html') is in the headers. The reverse proxy should imitate the target server and should therefore emit the exact same content type header field in its own HTTP handler of the client request.

But alas, invariably the charset UTF-8 is appended to the resulting content type, so the client is sent a different header when accessing the target via the proxy. This does not happen in the proxy code itself (see below for the relevant snippet), but happens somewhere in the libraries. When I telnet to the reverse proxy and ask it to deliver the same file as above, I get:

HTTP/1.1 200 OK
Date: Tue, 03 Nov 2015 22:31:55 GMT
Content-Type: text/html; charset=UTF-8
Connection: close
Content-Length: 376

and somewhere in this translation the UTF-8 encoded characters seem to be accidentally encoded twice.

The main code snippet I use to relay the message from the target server to the client is:

handle_request(Request) :-
        memberchk(request_uri(URI), Request),
        atomic_list_concat(['http://127.0.0.1:4041',URI], Target),
        memberchk(method(get), Request),
        http_open(Target, In, [headers(Hs)]),
        memberchk(content_type(Type), Hs),
        format("Content-type: ~w~n~n", [Type]),
        copy_stream_data(In, current_output),
        close(In).

I guess I need to set a stream option so that the content type is not changed?

The rudimentary reverse proxy implementation is available as Proloxy. Wouter and you now have write access, please feel free to try any changes if you want to experiment with this. In my experience, the reverse proxy helps to catch many edge cases regarding encodings and missing options to handle different use cases of the HTTP libraries.

@JanWielemaker
Copy link
Member

There is probably no clean way out. The contract is that text in Prolog is Unicode and if you need to send it it will be sent as UTF-8, being the only encoding that unambiguously represents all of Unicode, is supported by Prolog as well as all at least somewhat maintained clients.

You want to formulate a request using bytes with an unknown encoding that you claim to be text. The correct approach in the HTTP framework would be to discover the encoding of the source and translate it to Unicode (and consequently write the output as UTF-8). The only way out is to dump bytes and claim some content type for them is using the throw(http_reply(...)) interface.

I added bytes(Type, Bytes) to the supported data types, where Bytes is either a string or a list of 0..255 integers. Using strings avoid copying the data during the exception handling! Here is a working
proxy. Tried to as a proxy for SWISH which uses data with several encodings as well as POST messages.

:- use_module(library(http/http_open)).
:- use_module(library(http/http_client)).
:- use_module(library(http/thread_httpd)).
:- use_module(library(http/http_dispatch)).

server(Port) :-
        http_server(http_dispatch,
                    [ port(Port)
                    ]).

:- http_handler(/, proxy('http://localhost:3050'), [prefix]).

proxy(To, Request) :-
        memberchk(method(Method), Request),
        proxy(Method, To, Request).

proxy(Method, To, Request) :-
        data_method(Method), !,
        read_data(Request, Data),
        memberchk(request_uri(URI), Request),
        atomic_list_concat([To,URI], Target),
        http_open(Target, In,
                  [ method(Method),
                    post(Data),
                    header(content_type, ContentType)
                  ]),
        call_cleanup(
            read_string(In, _, Bytes),
            close(In)),
        throw(http_reply(bytes(ContentType, Bytes))).
proxy(Method, To, Request) :-
        memberchk(request_uri(URI), Request),
        atomic_list_concat([To,URI], Target),
        http_open(Target, In,
                  [ method(Method),
                    header(content_type, ContentType)
                  ]),
        call_cleanup(
            read_string(In, _, Bytes),
            close(In)),
        throw(http_reply(bytes(ContentType, Bytes))).

read_data(Request, bytes(ContentType, Bytes)) :-
        memberchk(input(In), Request),
        memberchk(content_type(ContentType), Request),
        (   memberchk(content_length(Len), Request)
        ->  read_string(In, Len, Bytes)
        ;   read_string(In, _, Bytes)
        ).

data_method(post).
data_method(put).

@triska
Copy link
Member Author

triska commented Nov 7, 2015

Many thanks Jan, this is a truly awesome improvement!

Could you please also account for this in the logging infrastructure, so that we see more compact log entries when bytes are emitted? I got it working by adding the following at the end of http_log.pl:

map_exception(http_reply(bytes(ContentType,Bytes),_), bytes(ContentType,L)) :-
        string_length(Bytes, L).

Otherwise, we see lots of error("Unknown message: http_reply(bytes(....))") terms in the log files. With this change, we see terms of the form:

completed(1, 0.001994, 8, 200, bytes('text/plain',8)).

@JanWielemaker
Copy link
Member

Added the log change. Dunno about the https thing. You'll have to understand the exact traffic and
the rules used by browsers when to use an HTTP request. Apparently something in the message
tells the browser to upgrade to HTTPS. In any case, please keep topics separated.

@triska
Copy link
Member Author

triska commented Nov 30, 2015

Thank you for adding bytes/2!

@triska triska closed this as completed Nov 30, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants