Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

求助,抓取的url好像有问题 #2

Open
suikascarlet opened this issue Jun 26, 2019 · 0 comments
Open

求助,抓取的url好像有问题 #2

suikascarlet opened this issue Jun 26, 2019 · 0 comments

Comments

@suikascarlet
Copy link

尝试跑了一下,有http的报错

python2.7 ./easy_university_selection.py 10010 10035 512 2018 10148

好像是抓取的链接根本不存在的问题
在gkcx.eol.cn找了很久也没找到该怎么改

http://gkcx.eol.cn/schoolhtm/scores/provinceScores643_10010_10035_10036.xml

python2.7 ./easy_university_selection.py 10010 10035 512 2018 10148
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
年份:2018
地区:山西
分数:512 理科
过滤:专科
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
加载高校库完成,共有2766所高校信息载入
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
抓取高校库中所有高校在[山西]地区[理科]招生分数线
http://gkcx.eol.cn/schoolhtm/scores/provinceScores643_10010_10035_10036.xml
Traceback (most recent call last):
File "./easy_university_selection.py", line 679, in
spider_university_province_score_line('10036', '本一批次')
File "./easy_university_selection.py", line 384, in spider_university_province_score_line
'http://gkcx.eol.cn/schoolhtm/scores/provinceScores', 'provinceScores', tier, info)
File "./easy_university_selection.py", line 406, in spider_score_line
res_data = urllib2.urlopen(req)
File "/usr/lib/python2.7/urllib2.py", line 154, in urlopen
return opener.open(url, data, timeout)
File "/usr/lib/python2.7/urllib2.py", line 435, in open
response = meth(req, response)
File "/usr/lib/python2.7/urllib2.py", line 548, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python2.7/urllib2.py", line 467, in error
result = self._call_chain(*args)
File "/usr/lib/python2.7/urllib2.py", line 407, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 654, in http_error_302
return self.parent.open(new, timeout=req.timeout)
File "/usr/lib/python2.7/urllib2.py", line 435, in open
response = meth(req, response)
File "/usr/lib/python2.7/urllib2.py", line 548, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python2.7/urllib2.py", line 473, in error
return self._call_chain(*args)
File "/usr/lib/python2.7/urllib2.py", line 407, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 556, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 500: Internal Server Error

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant