Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support fetching external URLs #88

Open
pdawyndt opened this issue Dec 22, 2021 · 5 comments
Open

Support fetching external URLs #88

pdawyndt opened this issue Dec 22, 2021 · 5 comments
Labels
wontfix This will not be worked on
Milestone

Comments

@pdawyndt
Copy link
Contributor

pdawyndt commented Dec 22, 2021

This code should output a random haiku (scraped from a web page), but access to external URLS does not seem to be working. Strange limitation for a Python runtime that is already running in a browser (and has BeautifulSoup on board).

from urllib.request import urlopen

# download web page with random haiku
url = 'http://haikuguy.com/issa/random.php'
reader = urlopen(url)

# parse page until start of haiku is found
marker = '<p class="english">'
line = reader.readline().decode('utf-8')
while line and not line.startswith(marker):
    line = reader.readline().decode('utf-8')

# read three haiku lines and display them
if line.startswith(marker):
    print(line[len(marker):].strip()[:-6])
    line = reader.readline().decode('utf-8')
    print(line.strip()[:-6])
    line = reader.readline().decode('utf-8')
    print(line.strip()[:-4])
@alexmojaki
Copy link
Contributor

See pyodide/pyodide#662 and pyodide/pyodide#375. You can read a URL with pyodide.open_url or js.fetch. I don't know why urllib isn't patched or something in pyodide.

In any case, the browser has many security restrictions which cause trouble for most URLs. Your example isn't https so it gets blocked. Other URLs are typically blocked by CORS.

@winniederidder
Copy link
Contributor

@alexmojaki Is providing an override of urllib something you are doing in your own packages? Or should I make a Papyros-specific part? It would be similar to input/matplotlib by making urllib functions point to e.g. pyodide.open_url calls. The most common calls can easily be covered by this.

@alexmojaki
Copy link
Contributor

I have no plans for this, go ahead.

@bmesuere
Copy link
Member

bmesuere commented Feb 9, 2022

@winniederidder you may add this to papyros, but note that external connections will be blocked by the CSP on Dodona.

@winniederidder
Copy link
Contributor

After some studying, this issue would affect atleast the following libraries: urllib, urllib3, http.client, requests, SSL, websocket-client, websockets, and more. Doing this in a maintainable way from our side is thus not really feasible. This is something that has to be fixed upstream and is thus a limitation of Pyodide. As @bmesuere mentioned, projects integrating Papyros will have protection mechanisms limiting the accessible URLs.
I will open a new issue to discuss limitations of Pyodide for things like this.

@winniederidder winniederidder added the wontfix This will not be worked on label May 9, 2022
@winniederidder winniederidder added this to the later milestone May 9, 2022
@winniederidder winniederidder changed the title Support online file access Support accessing external URLs May 16, 2022
@winniederidder winniederidder changed the title Support accessing external URLs Support fetching external URLs May 16, 2022
@winniederidder winniederidder modified the milestones: later, Unplanned May 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

4 participants