-
Notifications
You must be signed in to change notification settings - Fork 343
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reconsider HttpClient interface #874
Comments
I like the JS version, but given many HTTP clients won't support streaming or it will be challenging to implement, I think we could have a default implementation of the I am not sure why there is that |
Mostly historical reasons - the first attempt of implementation of HTTP clients. The
I like the refactored version in JS as well and agree with the streaming. IMO it's acceptable if not all clients support streaming. A default implementation of the stream method in the base class, with a warning when it's not fully supported, would be okay fallback. |
This is preferrable in the arguably common case when streaming cannot be implemented. I can't shake the feeling that stream-based implementations should be the default because it's easy to read a stream into memory when needed, but doing it the other way around nullifies the benefit of streams. Plus it's not nice to require people to implement both a streaming and non-streaming version. |
As of now, the python
BaseHttpClient
look like this:crawlee-python/src/crawlee/http_clients/_base.py
Line 55 in beac9fa
It has two methods,
send_request
andcrawl
. This is the first iteration of decoupled HTTP clients.Later on, we refactored the JS version to use this one:
https://github.com/apify/crawlee/blob/f912b8b06da2bc4f3f3db508cc39c936a5c87f23/packages/core/src/http_clients/base-http-client.ts#L179
It also has two methods,
sendRequest
andstream
. Unlike the python version, the signatures of those methods match quite well. It is worth noting that the two serious attempts at implementing this interface (so far) both couldn't manage to implementstream
correctly. Although we could probably live without it in the most common case, streaming is paramount for downloading files (potentially large ones), which is a use case that we want to support.We should simplify this interface and make it look the same in both versions. Any thoughts on how to achieve that? @vdusek @B4nan @barjin... and whoever else wants to chat 🙂
The text was updated successfully, but these errors were encountered: