-
Notifications
You must be signed in to change notification settings - Fork 11
/
Copy pathurlquick.py
1515 lines (1205 loc) · 54.2 KB
/
urlquick.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
#!/usr/bin/env python
# -*- coding: utf-8 -*-
#
# The MIT License (MIT)
#
# Copyright (c) 2017 William Forde
#
# Permission is hereby granted, free of charge, to any person obtaining a copy of
# this software and associated documentation files (the "Software"), to deal in
# the Software without restriction, including without limitation the rights to
# use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
# the Software, and to permit persons to whom the Software is furnished to do so,
# subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
# FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
# COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
# IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
# CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
"""
Urlquick
--------
A light-weight http client with requests like interface. Featuring persistent connections and caching support.
This project was originally created for use by Kodi add-ons, but has grown into something more.
I found, that while requests has a very nice interface, there was a noticeable lag when importing the library.
The other option available is to use urllib2, but then you loose the benefit of persistent connections that requests
have. Hence the reason for this project.
All GET, HEAD and POST requests are cached locally for a period of 4 hours. When the cache expires,
conditional headers are added to a new request e.g. "Etag" and "Last-modified". Then if the server
returns a 304 Not-Modified response, the cache is reused, saving having to re-download the content body.
Inspired by: urlfetch & requests
urlfetch: https://github.com/ifduyue/urlfetch
requests: http://docs.python-requests.org/en/master/
"""
__all__ = ["request", "get", "head", "post", "put", "patch", "delete", "cache_cleanup", "Session"]
__author__ = "William Forde"
__version__ = "0.9.3"
# Standard library imports
from collections import MutableMapping, defaultdict
from codecs import open as _open, getencoder
from base64 import b64encode, b64decode
from datetime import datetime
import json as _json
import logging
import hashlib
import socket
import time
import zlib
import ssl
import sys
import re
import os
# Check python version to set the object that can detect non unicode strings
py3 = sys.version_info >= (3, 0)
if py3:
# noinspection PyUnresolvedReferences
from http.client import HTTPConnection, HTTPSConnection, HTTPException
# noinspection PyUnresolvedReferences
from urllib.parse import urlsplit, urlunsplit, urljoin, SplitResult, urlencode, parse_qsl, quote, unquote
# noinspection PyUnresolvedReferences
from http.cookies import SimpleCookie
# noinspection PyShadowingBuiltins
unicode = str
CACHE_LOCATION = os.getcwd()
else:
# noinspection PyUnresolvedReferences, PyCompatibility
from httplib import HTTPConnection, HTTPSConnection, HTTPException
# noinspection PyUnresolvedReferences, PyCompatibility
from urlparse import urlsplit, urlunsplit, urljoin, SplitResult, parse_qsl as _parse_qsl
# noinspection PyUnresolvedReferences, PyCompatibility
from urllib import urlencode as _urlencode, quote as _quote, unquote as _unquote
# noinspection PyUnresolvedReferences, PyCompatibility
from Cookie import SimpleCookie
CACHE_LOCATION = os.getcwdu()
def quote(data, safe=b"/", encoding="utf8", errors="strict"):
data = data.encode(encoding, errors)
return _quote(data, safe).decode("ascii")
def unquote(data, encoding="utf-8", errors="replace"):
data = data.encode("ascii", errors)
return _unquote(data).decode(encoding, errors)
def parse_qsl(qs, encoding="utf8", errors="replace", **kwargs):
qs = qs.encode(encoding, errors)
qsl = _parse_qsl(qs, **kwargs)
return [(key.decode(encoding, errors), value.decode(encoding, errors)) for key, value in qsl]
def urlencode(query, doseq=False, encoding="utf8", errors=""):
# Fetch items as a tuple of (key, value)
items = query.items() if hasattr(query, "items") else query
new_query = []
# Process the items and encode unicode strings
for key, value in items:
key = key.encode(encoding, errors)
if isinstance(value, (list, tuple)):
value = [_value.encode(encoding, errors) for _value in value]
else:
value = value.encode(encoding, errors)
new_query.append((key, value))
# Decode the output of urlencode back into unicode and return
return _urlencode(new_query, doseq).decode("ascii")
# Cacheable request types
CACHEABLE_METHODS = (u"GET", u"HEAD", u"POST")
CACHEABLE_CODES = (200, 203, 204, 300, 301, 302, 303, 307, 308, 410, 414)
REDIRECT_CODES = (301, 302, 303, 307, 308)
#: The default max age of the cache in seconds is used when no max age is given in request.
MAX_AGE = 14400 # 4 Hours
# Unique logger for this module
logger = logging.getLogger("urlquick")
class UrlError(IOError):
"""Base exception. All exceptions and errors will subclass from this."""
class Timeout(UrlError):
"""Request timed out."""
class MaxRedirects(UrlError):
"""Too many redirects."""
class ContentError(UrlError):
"""Failed to decode content."""
class ConnError(UrlError):
"""A Connection error occurred."""
class SSLError(ConnError):
"""An SSL error occurred."""
class HTTPError(UrlError):
"""Raised when HTTP error occurs."""
def __init__(self, url, code, msg, hdrs):
self.code = code
self.msg = msg
self.hdrs = hdrs
self.filename = url
def __str__(self):
error_type = "Client" if self.code < 500 else "Server"
return "HTTP {} Error {}: {}".format(error_type, self.code, self.msg)
class MissingDependency(ImportError):
"""Missing optional Dependency 'HTMLement'"""
class CaseInsensitiveDict(MutableMapping):
"""
A case-insensitive `dict` like object.
Credit goes to requests for this code
http://docs.python-requests.org/en/master/
"""
def __init__(self, *args):
self._store = {}
for _dict in args:
if _dict:
self.update(_dict)
def __repr__(self):
return str(dict(self.items()))
def __setitem__(self, key, value):
if value is not None:
key = make_unicode(key, "ascii")
value = make_unicode(value, "iso-8859-1")
self._store[key.lower()] = (key, value)
def __getitem__(self, key):
return self._store[key.lower()][1]
def __delitem__(self, key):
del self._store[key.lower()]
def __iter__(self):
return (casedkey for casedkey, _ in self._store.values())
def __len__(self):
return len(self._store)
def copy(self):
"""Return a shallow copy of the case-insensitive dictionary."""
return CaseInsensitiveDict(self._store.values())
class CachedProperty(object):
"""
Cached property.
A property that is only computed once per instance and then replaces
itself with an ordinary attribute. Deleting the attribute resets the
property.
"""
def __init__(self, fget=None):
self.__get = fget
self.__doc__ = fget.__doc__
self.__name__ = fget.__name__
self.__module__ = fget.__module__
self.allow_setter = False
def __get__(self, instance, owner):
try:
return instance.__dict__[self.__name__]
except KeyError:
value = instance.__dict__[self.__name__] = self.__get(instance)
return value
def __set__(self, instance, value):
if self.allow_setter:
instance.__dict__[self.__name__] = value
else:
raise AttributeError("Can't set attribute")
def __delete__(self, instance):
instance.__dict__.pop(self.__name__, None)
class CacheHandler(object):
def __init__(self, uid, max_age=MAX_AGE):
self.max_age = max_age
self.response = None
# Filepath to cache file
cache_dir = self.cache_dir()
self.cache_file = cache_file = os.path.join(cache_dir, uid)
if os.path.exists(cache_file):
self.response = self._load()
if self.response is None:
self.delete(cache_file)
@classmethod
def cache_dir(cls):
"""Returns the cache directory."""
cache_dir = cls.safe_path(os.path.join(CACHE_LOCATION, u".cache"))
if not os.path.exists(cache_dir):
os.makedirs(cache_dir)
return cache_dir
@staticmethod
def delete(cache_path):
"""Delete cache from disk."""
try:
os.remove(cache_path)
except EnvironmentError:
logger.error("Faild to remove cache: %s", cache_path)
else:
logger.debug("Removed cache: %s", cache_path)
@staticmethod
def isfilefresh(cache_path, max_age):
return (time.time() - os.stat(cache_path).st_mtime) < max_age
def isfresh(self):
"""Return True if cache is fresh else False."""
# Check that the response is of status 301 or that the cache is not older than the max age
if self.response.status in (301, 308, 414) or self.max_age == -1:
return True
elif self.max_age == 0:
return False
else:
return self.isfilefresh(self.cache_file, self.max_age)
def reset_timestamp(self):
"""Reset the last modified timestamp to current time."""
os.utime(self.cache_file, None)
def add_conditional_headers(self, headers):
"""Return a dict of conditional headers from cache."""
# Fetch cached headers
cached_headers = self.response.headers
# Check for conditional headers
if u"Etag" in cached_headers:
logger.debug("Found conditional header: ETag = %s", cached_headers[u"ETag"])
headers[u"If-none-match"] = cached_headers[u"ETag"]
if u"Last-modified" in cached_headers:
logger.debug("Found conditional header: Last-Modified = %s", cached_headers[u"Last-modified"])
headers[u"If-modified-since"] = cached_headers[u"Last-Modified"]
def update(self, headers, body, status, reason, version=11, strict=True):
# Convert headers into a Case Insensitive Dict
headers = CaseInsensitiveDict(headers)
# Remove Transfer-Encoding from header if exists
if u"Transfer-Encoding" in headers:
del headers[u"Transfer-Encoding"]
# Ensure that reason is unicode
# noinspection PyArgumentList
reason = unicode(reason)
# Create response data structure
self.response = CacheResponse(headers, body, status, reason, version, strict)
# Save response to disk
self._save(headers=dict(headers), body=body, status=status, reason=reason, version=version, strict=strict)
def _load(self):
"""Load the cache response that is stored on disk."""
try:
# Atempt to read the raw cache data
with _open(self.cache_file, "rb", encoding="utf8") as stream:
json_data = _json.load(stream)
except (IOError, OSError):
logger.exception("Cache Error: Failed to read cached response.")
return None
except TypeError:
logger.exception("Cache Error: Failed to deserialize cached response.")
return None
# Decode body content using base64
json_data[u"body"] = b64decode(json_data[u"body"].encode("ascii"))
json_data[u"headers"] = CaseInsensitiveDict(json_data[u"headers"])
return CacheResponse(**json_data)
def _save(self, **response):
# Base64 encode the body to make it json serializable
response[u"body"] = b64encode(response["body"]).decode("ascii")
try:
# Save the response to disk using json Serialization
with _open(self.cache_file, "wb", encoding="utf8") as stream:
_json.dump(response, stream, indent=4, separators=(",", ":"))
except (IOError, OSError):
logger.exception("Cache Error: Failed to write response to cache.")
self.delete(self.cache_file)
except TypeError:
logger.exception("Cache Error: Failed to serialize response.")
self.delete(self.cache_file)
@staticmethod
def safe_path(path):
"""
Convert path into a encoding that best suits the platform os.
Unicode when on windows and utf8 when on linux/bsd.
:type path: str
:param path: The path to convert.
:return: Returns the path as unicode or utf8 encoded str.
"""
# Notting needs to be down if on windows as windows works well with unicode already
# We only want to convert to bytes when we are on linux.
if not sys.platform.startswith("win"):
path = path.encode("utf8")
return path
@classmethod
def hash_url(cls, url, data=None):
"""Return url as a sha1 encoded hash."""
# Make sure that url is of type bites
if isinstance(url, unicode):
url = url.encode("utf8")
if data:
# Make sure that data is of type bites
if isinstance(data, unicode):
data = data.encode("utf8")
url += data
# Convert hashed url to unicode
urlhash = hashlib.sha1(url).hexdigest()
if isinstance(urlhash, bytes):
urlhash = unicode(urlhash)
# Append urlhash to the filename
return cls.safe_path(u"cache-{}".format(urlhash))
@classmethod
def from_url(cls, url, data=None, max_age=MAX_AGE):
"""Initialize CacheHandler with url instead of uid."""
uid = cls.hash_url(url, data)
return cls(uid, max_age)
def __bool__(self):
return self.response is not None
def __nonzero__(self):
return self.response is not None
def cache_cleanup(max_age=None):
"""
Remove all stale cache files.
:param int max_age: [opt] The max age the cache can be before removal.
defaults => :data:`MAX_AGE <urlquick.MAX_AGE>`
"""
handler = CacheHandler
max_age = MAX_AGE if max_age is None else max_age
cache_dir = handler.cache_dir()
# Loop over all cache files and remove stale files
filestart = handler.safe_path(u"cache-")
for cachefile in os.listdir(cache_dir):
# Check that we actually have a cache file
if cachefile.startswith(filestart):
cache_path = os.path.join(cache_dir, cachefile)
# Check if the cache is not fresh and delete if so
if not handler.isfilefresh(cache_path, max_age):
handler.delete(cache_path)
class CacheAdapter(object):
def __init__(self):
self.__cache = None
def cache_check(self, method, url, data, headers, max_age=None):
# Fetch max age from request header
max_age = max_age if max_age is not None else int(headers.pop(u"x-max-age", MAX_AGE))
if method == u"OPTIONS":
return None
# Check if cache exists first
self.__cache = cache = CacheHandler.from_url(url, data, max_age)
if cache:
if method in ("PUT", "DELETE"):
logger.debug("Cache purged, %s request invalidates cache", method)
cache.delete(cache.cache_file)
elif cache.isfresh():
logger.debug("Cache is fresh, returning cached response")
return cache.response
else:
logger.debug("Cache is stale, checking for conditional headers")
cache.add_conditional_headers(headers)
def handle_response(self, method, status, callback):
if status == 304:
logger.debug("Server return 304 Not Modified response, using cached response")
callback()
self.__cache.reset_timestamp()
return self.__cache.response
# Cache any cachable response
elif status in CACHEABLE_CODES and method.upper() in CACHEABLE_METHODS:
response = callback()
logger.debug("Caching %s %s response", status, response[3])
# Save response to cache and return the cached response
self.__cache.update(*response)
return self.__cache.response
class CacheResponse(object):
"""A mock HTTPResponse class"""
def __init__(self, headers, body, status, reason, version=11, strict=True):
self.headers = headers
self.status = status
self.reason = reason
self.version = version
self.strict = strict
self.body = body
def getheaders(self):
"""Return the response headers"""
return self.headers
def read(self):
"""Return the body of the response"""
return self.body
def close(self):
pass
class ConnectionManager(CacheAdapter):
def __init__(self):
self.request_handler = {"http": {}, "https": {}}
super(ConnectionManager, self).__init__()
def make_request(self, req, timeout, verify, max_age):
# Only check cache if max_age set to a valid value
if max_age >= 0:
cached_resp = self.cache_check(req.method, req.url, req.data, req.headers, max_age=max_age)
if cached_resp:
return cached_resp
# Request resource and cache it if possible
resp = self.connect(req, timeout, verify)
callback = lambda: (resp.getheaders(), resp.read(), resp.status, resp.reason)
cached_resp = self.handle_response(req.method, resp.status, callback)
if cached_resp:
return cached_resp
else:
return resp
# Default to un-cached response
return self.connect(req, timeout, verify)
def connect(self, req, timeout, verify):
# Fetch connection from pool and attempt to reuse if available
pool = self.request_handler[req.type]
if req.host in pool:
try:
# noinspection PyTypeChecker
return self.send_request(pool[req.host], req)
except Exception as e:
# Remove the connection from the pool as it's unusable
pool[req.host].close()
del pool[req.host]
# Raise the exception if it's not a subclass of UrlError
if not isinstance(e, UrlError):
raise
# Create a new connection
if req.type == "https":
# noinspection PyProtectedMember
context = ssl._create_unverified_context() if verify is False else None
conn = HTTPSConnection(req.host, timeout=timeout, context=context)
else:
conn = HTTPConnection(req.host, timeout=timeout)
# Make first connection to server
response = self.send_request(conn, req)
# Add connection to the pool if the response is not set to close
if not response.will_close:
pool[req.host] = conn
return response
@staticmethod
def send_request(conn, req):
try:
# Setup request
conn.putrequest(str(req.method), str(req.selector), skip_host=1, skip_accept_encoding=1)
# Add all headers to request
for hdr, value in req.header_items():
conn.putheader(hdr, value)
# Send the body of the request witch will initiate the connection
conn.endheaders(req.data)
return conn.getresponse()
except socket.timeout as e:
raise Timeout(e)
except ssl.SSLError as e:
raise SSLError(e)
except (socket.error, HTTPException) as e:
raise ConnError(e)
def close(self):
"""Close all persistent connections and remove."""
for pool in self.request_handler.values():
for key in list(pool.keys()):
conn = pool.pop(key)
conn.close()
class Request(object):
"""A Request Object"""
def __init__(self, method, url, headers, data=None, json=None, params=None, referer=None):
#: Tuple of (username, password) for basic authentication.
self.auth = None
# Convert url into a fully ascii unicode string using urlencoding
self._referer_url = referer
self._urlparts = urlparts = self._parse_url(url, params)
# Ensure that method is always unicode
if isinstance(method, bytes):
method = method.decode("ascii")
#: The URI scheme.
self.type = urlparts.scheme
#: The HTTP request method to use.
self.method = method.upper()
#: Dictionary of HTTP headers.
self.headers = headers = headers.copy()
#: Urlencoded url of the remote resource.
self.url = urlunsplit((urlparts.scheme, urlparts.netloc, urlparts.path, urlparts.query, urlparts.fragment))
#: The URI authority, typically a host, but may also contain a port separated by a colon.
self.host = urlparts.netloc.lower()
# Add Referer header if not the original request
if referer:
self.headers[u"Referer"] = referer
# Add host header to be compliant with HTTP/1.1
if u"Host" not in headers:
self.headers[u"Host"] = self._urlparts.hostname
# Construct post data from a json object
if json:
self.headers[u"Content-Type"] = u"application/json"
data = _json.dumps(json)
if data:
# Convert data into a urlencode string if data is a dict
if isinstance(data, dict):
self.headers[u"Content-Type"] = u"application/x-www-form-urlencoded"
data = urlencode(data, True).encode("utf8")
elif isinstance(data, unicode):
data = data.encode("utf8")
if u"Content-Length" not in headers:
# noinspection PyArgumentList
self.headers[u"Content-Length"] = unicode(len(data))
#: Request body, to send to the server.
self.data = data
def _parse_url(self, url, params=None, scheme=u"http"):
"""
Parse a URL into it's individual components.
:param str url: Url to parse
:param dict params: params to add to url as query
:return: A 5-tuple of URL components
:rtype: urllib.parse.SplitResult
"""
# Make sure we have unicode
if isinstance(url, bytes):
url = url.decode("utf8")
# Check for valid url structure
if not url[:4] == u"http":
if self._referer_url:
url = urljoin(self._referer_url, url, allow_fragments=False)
elif url[:3] == u"://":
url = url[1:]
# Parse the url into each element
scheme, netloc, path, query, _ = urlsplit(url.replace(u" ", u"%20"), scheme=scheme)
if scheme not in ("http", "https"):
raise ValueError("Unsupported scheme: {}".format(scheme))
# Insure that all element of the url can be encoded into ascii
self.auth, netloc = self._ascii_netloc(netloc)
path = self._ascii_path(path) if path else u"/"
query = self._ascii_query(query, params)
# noinspection PyArgumentList
return SplitResult(scheme, netloc, path, query, u"")
@staticmethod
def _ascii_netloc(netloc):
"""Make sure that host is ascii compatible."""
auth = None
if u"@" in netloc:
# Extract auth
auth, netloc = netloc.rsplit(u"@", 1)
if u":" in auth:
auth = tuple(auth.split(u":", 1))
else:
auth = (auth, u"")
return auth, netloc.encode("idna").decode("ascii")
@staticmethod
def _ascii_path(path):
"""Make sure that path is url encoded and ascii compatible."""
try:
# If this statement passes then path must contain only ascii characters
return path.encode("ascii").decode("ascii")
except UnicodeEncodeError:
# Path must contain non ascii characters
return quote(path)
@staticmethod
def _ascii_query(query, params):
"""Make sure that query is urlencoded and ascii compatible."""
if query:
# Ensure that query contains only valid characters
qsl = parse_qsl(query)
query = urlencode(qsl)
if query and params:
extra_query = urlencode(params, doseq=True)
return u"{}&{}".format(query, extra_query)
elif params:
return urlencode(params, doseq=True)
elif query:
return query
else:
return u""
@property
def selector(self):
"""Resource selector, with the url path and query parts."""
if self._urlparts.query:
return u"{}?{}".format(self._urlparts.path, self._urlparts.query)
else:
return self._urlparts.path
def header_items(self):
"""Return list of tuples (header_name, header_value) of the Request headers, as native type of :class:`str`."""
if py3:
return self.headers.items()
else:
return self._py2_header_items()
def _py2_header_items(self):
"""Return request headers with no unicode value to be compatible with python2"""
# noinspection PyCompatibility
for key, value in self.headers.iteritems():
key = key.encode("ascii")
value = value.encode("iso-8859-1")
yield key, value
class UnicodeDict(dict):
def __init__(self, *mappings):
super(UnicodeDict, self).__init__()
for mapping in mappings:
if mapping:
# noinspection PyUnresolvedReferences
for key, value in mapping.items():
if value is not None:
key = make_unicode(key)
value = make_unicode(value)
self[key] = value
def make_unicode(data, encoding="utf8", errors=""):
"""Ensure that data is a unicode string"""
if isinstance(data, bytes):
return data.decode(encoding, errors)
else:
# noinspection PyArgumentList
return unicode(data)
# ########################## Public API ##########################
class Session(ConnectionManager):
"""
Provides cookie persistence and connection-pooling plus configuration.
:param kwargs: Default configuration for session variables.
:ivar int max_repeats: Max number of repeat redirects. Defaults to `4`
:ivar int max_redirects: Max number of redirects. Defaults to `10`
:ivar bool allow_redirects: Enable/disable redirection. Defaults to ``True``
:ivar bool raise_for_status: Raise HTTPError if status code is > 400. Defaults to ``False``
:ivar int max_age: Max age the cache can be, before it’s considered stale. -1 will disable caching.
Defaults to :data:`MAX_AGE <urlquick.MAX_AGE>`
"""
def __init__(self, **kwargs):
super(Session, self).__init__()
self._headers = CaseInsensitiveDict()
# Set Default headers
self._headers[u"Accept"] = u"*/*"
self._headers[u"Accept-Encoding"] = u"gzip, deflate"
self._headers[u"Accept-language"] = u"en-gb,en-us,en"
self._headers[u"Connection"] = u"keep-alive"
# Session Controls
self._cm = ConnectionManager()
self._cookies = dict()
self._params = dict()
self._auth = None
# Set session configuration settings
self.max_age = kwargs.get("max_age", MAX_AGE)
self.max_repeats = kwargs.get("max_repeats", 4)
self.max_redirects = kwargs.get("max_redirects", 10)
self.allow_redirects = kwargs.get("allow_redirects", True)
self.raise_for_status = kwargs.get("raise_for_status", False)
@property
def auth(self):
"""
Default Authentication tuple to attach to Request.
:return: Default authentication tuple.
:rtype: tuple
"""
return self._auth
@auth.setter
def auth(self, value):
"""Set Default Authentication tuple."""
if isinstance(value, (tuple, list)):
self._auth = value
else:
raise ValueError("Invalid type: {}, dict required".format(type(value)))
@property
def cookies(self):
"""
Dictionary of cookies to attach to each request.
:return: Session cookies
:rtype: dict
"""
return self._cookies
@cookies.setter
def cookies(self, _dict):
"""Replace session cookies with new cookies dict"""
if isinstance(_dict, dict):
self._cookies = _dict
else:
raise ValueError("Invalid type: {}, dict required".format(type(_dict)))
@property
def headers(self):
"""
Dictionary of headers to attach to each request.
:return: Session headers
:rtype: dict
"""
return self._headers
@property
def params(self):
"""
Dictionary of querystrings to attach to each Request. The dictionary values
may be lists for representing multivalued query parameters.
:return: Session params
:rtype: dict
"""
return self._params
@params.setter
def params(self, _dict):
"""Replace session params with new params dict"""
if isinstance(_dict, dict):
self._params = _dict
else:
raise ValueError("Invalid type: {}, dict required".format(type(_dict)))
def get(self, url, params=None, **kwargs):
"""
Sends a GET request.
Requests data from a specified resource.
:param str url: Url of the remote resource.
:param dict params: [opt] Dictionary of url query key/value pairs.
:param kwargs: Optional arguments that :func:`request <urlquick.request>` takes.
:return: A requests like Response object.
:rtype: urlquick.Response
"""
kwargs["params"] = params
return self.request(u"GET", url, **kwargs)
def head(self, url, **kwargs):
"""
Sends a HEAD request.
Same as GET but returns only HTTP headers and no document body.
:param str url: Url of the remote resource.
:param kwargs: Optional arguments that :func:`request <urlquick.request>` takes.
:return: A requests like Response object.
:rtype: urlquick.Response
"""
return self.request(u"HEAD", url, **kwargs)
def post(self, url, data=None, json=None, **kwargs):
"""
Sends a POST request.
Submits data to be processed to a specified resource.
:param str url: Url of the remote resource.
:param data: [opt] Dictionary (will be form-encoded) or bytes sent in the body of the Request.
:param json: [opt] Json data sent in the body of the Request.
:param kwargs: Optional arguments that :func:`request <urlquick.request>` takes.
:return: A requests like Response object.
:rtype: urlquick.Response
"""
return self.request(u"POST", url, data=data, json=json, **kwargs)
def put(self, url, data=None, **kwargs):
"""
Sends a PUT request.
Uploads a representation of the specified URI.
:param str url: Url of the remote resource.
:param data: [opt] Dictionary (will be form-encoded) or bytes sent in the body of the Request.
:param kwargs: Optional arguments that :func:`request <urlquick.request>` takes.
:return: A requests like Response object.
:rtype: urlquick.Response
"""
return self.request(u"PUT", url, data=data, **kwargs)
def patch(self, url, data=None, **kwargs):
"""
Sends a PATCH request.
:param str url: Url of the remote resource.
:param data: [opt] Dictionary (will be form-encoded) or bytes sent in the body of the Request.
:param kwargs: Optional arguments that :func:`request <urlquick.request>` takes.
:return: A requests like Response object.
:rtype: urlquick.Response
"""
return self.request(u"PATCH", url, data=data, **kwargs)
def delete(self, url, **kwargs):
"""
Sends a DELETE request.
:param str url: Url of the remote resource.
:param kwargs: Optional arguments that :func:`request <urlquick.request>` takes.
:return: A requests like Response object.
:rtype: urlquick.Response
"""
return self.request(u"DELETE", url, **kwargs)
def request(self, method, url, params=None, data=None, headers=None, cookies=None, auth=None,
timeout=10, allow_redirects=None, verify=True, json=None, raise_for_status=None, max_age=None):
"""
Make request for remote resource.
:param str method: HTTP request method, GET, HEAD, POST.
:param str url: Url of the remote resource.
:param dict params: [opt] Dictionary of url query key/value pairs.
:param data: [opt] Dictionary (will be form-encoded) or bytes sent in the body of the Request.
:param dict headers: [opt] HTTP request headers.
:param dict cookies: [opt] Dictionary of cookies to send with the request.
:param tuple auth: [opt] (username, password) for basic authentication.
:param int timeout: [opt] Connection timeout in seconds.
:param bool allow_redirects: [opt] Enable/disable redirection. Defaults to ``True``.
:param bool verify: [opt] Controls whether to verify the server's TLS certificate. Defaults to ``True``
:param json: [opt] Json data sent in the body of the Request.
:param bool raise_for_status: [opt] Raise's HTTPError if status code is > 400. Defaults to ``False``.
:param int max_age: [opt] Age the 'cache' can be, before it’s considered stale. -1 will disable caching.
Defaults to :data:`MAX_AGE <urlquick.MAX_AGE>`
:return: A requests like Response object.
:rtype: urlquick.Response
:raises MaxRedirects: If too many redirects was detected.
:raises ConnError: If connection to server failed.
:raises HTTPError: If response status is greater or equal to 400 and raise_for_status is ``True``.
:raises SSLError: If an SSL error occurs while sending the request.
:raises Timeout: If the connection to server timed out.
"""
# Fetch settings from local or session
allow_redirects = self.allow_redirects if allow_redirects is None else allow_redirects
raise_for_status = self.raise_for_status if raise_for_status is None else raise_for_status
# Ensure that all mappings of unicode data
req_headers = CaseInsensitiveDict(self._headers, headers)
req_cookies = UnicodeDict(self._cookies, cookies)
req_params = UnicodeDict(self._params, params)
# Add cookies to headers
if req_cookies and u"Cookie" not in req_headers:
header = u"; ".join([u"{}={}".format(key, value) for key, value in req_cookies.items()])
req_headers[u"Cookie"] = header