-
Notifications
You must be signed in to change notification settings - Fork 153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UnicodeWarning and UnicodeEncodeError issues #136
Comments
Now it reads the image list file as unicode, and it is comparing with os.listdir() which is returning not unicode. I don't think it is serious, but I can check it tomorrow. |
Ok. The dump is proceeding, I'll check at the end if some image is missing. (Update: I forgot to count them, there is a big dump at https://archive.org/details/wiki-eswikiarquitecturacom though.) |
Some more despite #124 , on wikihow.com with latest master: Downloaded 30 pages |
Can you reproduce this error still? The one you mentioned in the last comment has already been fixed. Not sure about the original one. |
Can't reproduce now either. Though the original comment might have been about an image list produced with one version of dumpgenerator and then used with another, incompatible one.
|
Analysing http://africanspecies.net/api.php |
1 similar comment
Analysing http://africanspecies.net/api.php |
I'm also wondering whether resume works... it would be terrible if the bug makes us "close" incomplete dumps. Analysing http://wiki.megatec.ru/api.php |
Sorry if this is bad etiquette (I'm new), but I was wondering if there was any update on this? Getting
I'm using the most recent dumpgenerator.py as of this writing. |
Hello DrDevice. This bug still need a fix. A workaround: You can remove the image filename in the -images.txt file in the dump directory, and then resume. According to that wiki, it is "Capture d'écran 2015-06-13 11.20.59.png". If you find more errors, remove them too, but I don't see more weird chars in the list. http://ark.gamepedia.com/index.php?title=Special%3APrefixIndex&prefix=&namespace=6 |
emijrp, thank you very much! That seems to have cleared it up! It's been trucking on for a couple hours now, no errors. Crossing my fingers! :) |
This is still an issue. I've tried patches from #279, didn't help. |
I recently ran into the same issue with a similar message but for another part of the script. The decode statement at https://github.com/WikiTeam/wikiteam/blob/master/dumpgenerator.py#L1999 Anyways, the exception thrown was:
And it turns out it was due to the fact that the Python 2.7 script used 'ascii' as a default encoding for the sys module as shown by This was fixed by modifying
|
@ouaibe Thanks for the tip, I thought it must've been a bug in wikiteam. They should be able to set this somewhere theirselves right? |
I'd like to pile on and say that I've also stumbled upon this issue or a similar one:
Trying to resume, I'm hitting #250, meaning that
But, of course, resuming doesn't do a whole since it will hit the same The workaround described by @ouaibe worked. Editing Python 2.7.16 |
Simple incompatibility between old image list and current master, or something more?
Resuming download, using directory eswikiarquitecturacom-20140628-wikidump
[...]
You didn't provide a path for index.php, we try this one: http://es.wikiarquitectura.com/index.php
Checking api.php... http://es.wikiarquitectura.com/api.php
api.php is OK
Checking index.php... http://es.wikiarquitectura.com/index.php
index.php is OK
Analysing http://es.wikiarquitectura.com/api.php
Loading config file...
Resuming previous dump process...
Title list was completed in the previous session
XML dump was completed in the previous session
Image list was completed in the previous session
./dumpgenerator.py:1232: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
if filename2 not in listdir:
The text was updated successfully, but these errors were encountered: