Can't save HTML file attachments #75

davidswelt · 2017-05-17T14:59:16Z

Could support for arbitrary files, or at least HTML attachments be added?

Content type is contentType': u'text/html', and calling the "file" method on this attachment item produces errors that vary with python version, e.g. "UnicodeEncodeError: 'ascii' codec can't encode characters in position 11-12: ordinal not in range(128)", or a ValueError in the latest PyZotero.

Would it make sense to fail gracefully or throw a documented exception?

File "./zot.py", line 1584, in dumpFiles f.write(self.zot.file(item.key)) File "/usr/local/lib/python2.7/site-packages/pyzotero/zotero.py", line 187, in wrapped_f return retrieved.json() File "/usr/local/lib/python2.7/site-packages/requests/models.py", line 819, in json return json.loads(self.text, **kwargs) File "/usr/local/Cellar/python/2.7.10/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 338, in loads return _default_decoder.decode(s) File "/usr/local/Cellar/python/2.7.10/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 366, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/usr/local/Cellar/python/2.7.10/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 384, in raw_decode raise ValueError("No JSON object could be decoded")

The text was updated successfully, but these errors were encountered:

urschrei · 2017-05-17T17:32:11Z

I've just pushed v1.2.11 to PyPI, which should allow you to dump HTML snapshots. Note that these are zip archives, so they'll need to be decompressed first. If you use the .dump() method, it'll append .zip to the filename.

Let me know if this works for you – retrieval of arbitrary file attachments is supported for most common types, but I overlooked this use case. Sorry!

davidswelt · 2017-05-17T22:38:05Z

OK, this works better. But if dump() does not adhere to the file name given, and if it does not actually return the resulting filename, how is the calling code supposed to know what the file is?

Generally, if "filename" is "foo.html", it would make much more sense to unzip it right there and then in file(). Calling code could check the new "snapshot" variable, but is this meant to be public?

I'm not using dump() by the way because it isn't present in older versions of Pyzotero.

urschrei · 2017-05-17T23:50:56Z

dump() will adhere to a file name if it's given, which implies that the user is familiar with the snapshot format, and that it's compressed (which I now also call out in the docs). As for unzipping automatically:

I could unzip the contents in the specified (or working) dir. That's going to cause difficulties, because some snapshot components are generically-named (item.css), so things will be overwritten if you're dumping several snapshots at once.
dump the contents into a newly-created folder under the specified or default path. But what to call it? The file returned by the API is always item.html

One possibility is dumping the snapshot contents into folders with the same name as their item key, which is predictable and easy to document.

(the snapshot variable is not meant to be public, and is going away again)

This causes tests related to #75 to pass again

davidswelt · 2017-05-18T02:03:55Z

In that case, I would suggest that dump() will return the file name it has chosen (if done automatically) , maybe a fully qualified path. I would also provide a function that gives the file name that is suggested when calling file(), or a function that will return the MIME type of the file that is output, i.e., application/gzip. The contentType does not seem to have the right type. The behavior right now seems odd: a file of unclear type is written, and one has to check the “snapshot” variable and append .gz to the filename if it is True. If you were to unpack it, I would unpack into a folder named foo.html if foo.html is the filename. Note that any file name can overwrite existing ones - there is no guarantee it is unique. In Zot_bib_web, I create a folder named after the key, and all attachments go inside that folder. As long as the attachments have distinct names it’s good.

…

On May 17, 2017, at 7:50 PM, Stephan Hügel ***@***.***> wrote: dump() will adhere to a file name if it's given, which implies that the user is familiar with the snapshot format, and that it's compressed (which I now also call out in the docs). As for unzipping automatically: • I could unzip the contents in the specified (or working) dir. That's going to cause difficulties, because some snapshot components are generically-named (item.css), so things will be overwritten if you're dumping several snapshots at once. • dump the contents into a newly-created folder under the specified or default path. But what to call it? The file returned by the API is always item.html One possibility is dumping the snapshot contents into folders with the same name as their item key, which is predictable and easy to document. (the snapshot variable is not meant to be public, and is going away again) — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

urschrei · 2017-05-18T13:04:25Z

It turns out that in the case of snapshots, the fileName property refers to the primary file to be opened inside the archive.
As such, the file name of a snapshot is undefined by the API, although its MIME type should be application/zip (which will be fixed). This should allow consumers of the file method to know what to do with the attachment they retrieve.
As for dump, I think I'm going to default to writing the file with the attachment item key + zip, and I'll probably add an optional extract_snapshot=False keyword, which will extract into a folder of the same name if set to True (in both cases, this only applies in the absence of a user-supplied filename)

davidswelt · 2017-05-18T20:40:08Z

As for the Zotero API, once the MIME type is fixed, this makes sense. It's good to know this filename.
For dump(), it's good to be able to expand, but for the client it's more important to know what the name of the generated file is. Consider returning this file name. Right now it returns nothing, and it's a backwards-compatible change.

urschrei · 2017-05-18T20:43:52Z

It's good to know this filename.
for the client it's more important to know what the name of the generated file is. Consider returning this file name.

I don't follow. To which file name are you referring?

davidswelt · 2017-05-18T20:49:35Z

dump() decides about the filename for the file it creates (by default). Either with, or without ".zip". The calling code will want to do something with it. In my case, I want to make a URL from it, to link to this file, or maybe I will unzip it. Rather than recreating the logic that is in PyZotero (and which may be updated in the future), I'm suggesting that PyZotero's dump() function returns the file name it has created.

Am I missing something?

urschrei · 2017-05-18T20:56:29Z

I'm suggesting that PyZotero's dump() function returns the file name it has created.

Now I get it. Yes, that's a good approach. I'm going to leave this open until the MIME change has happened and I've landed the change.

This change also alters dump() to return the path and file name, see discussion in #75

This causes tests related to #75 to pass again

This change also alters dump() to return the path and file name, see discussion in #75

This causes tests related to #75 to pass again

This change also alters dump() to return the path and file name, see discussion in #75

This causes tests related to #75 to pass again

This change also alters dump() to return the path and file name, see discussion in #75

This causes tests related to #75 to pass again

This change also alters dump() to return the path and file name, see discussion in #75

This causes tests related to #75 to pass again

This change also alters dump() to return the path and file name, see discussion in #75

This causes tests related to #75 to pass again

This change also alters dump() to return the path and file name, see discussion in #75

This causes tests related to #75 to pass again

This change also alters dump() to return the path and file name, see discussion in #75

This causes tests related to #75 to pass again

This change also alters dump() to return the path and file name, see discussion in #75

This causes tests related to #75 to pass again

This change also alters dump() to return the path and file name, see discussion in #75

This causes tests related to #75 to pass again

This change also alters dump() to return the path and file name, see discussion in #75

This causes tests related to #75 to pass again

This change also alters dump() to return the path and file name, see discussion in #75

This causes tests related to #75 to pass again

This change also alters dump() to return the path and file name, see discussion in #75

This causes tests related to #75 to pass again

This change also alters dump() to return the path and file name, see discussion in #75

This causes tests related to #75 to pass again

This change also alters dump() to return the path and file name, see discussion in #75

This causes tests related to #75 to pass again

This change also alters dump() to return the path and file name, see discussion in #75

This causes tests related to #75 to pass again

This change also alters dump() to return the path and file name, see discussion in #75

This causes tests related to #75 to pass again

This change also alters dump() to return the path and file name, see discussion in #75

This causes tests related to #75 to pass again

This change also alters dump() to return the path and file name, see discussion in #75

This causes tests related to #75 to pass again

This change also alters dump() to return the path and file name, see discussion in #75

This causes tests related to #75 to pass again

This change also alters dump() to return the path and file name, see discussion in #75

urschrei added a commit that referenced this issue May 17, 2017

Alter test attachment doc link mode

80eab44

This causes tests related to #75 to pass again

urschrei added a commit that referenced this issue May 18, 2017

Dump HTML snapshots with unique file names based on item key

648371c

This change also alters dump() to return the path and file name, see discussion in #75

urschrei added a commit that referenced this issue Jun 3, 2017

Alter test attachment doc link mode

323704f

This causes tests related to #75 to pass again

urschrei added a commit that referenced this issue Jun 3, 2017

Dump HTML snapshots with unique file names based on item key

598a556

This change also alters dump() to return the path and file name, see discussion in #75

urschrei added a commit that referenced this issue Oct 8, 2017

Alter test attachment doc link mode

c141f59

This causes tests related to #75 to pass again

urschrei added a commit that referenced this issue Oct 8, 2017

Dump HTML snapshots with unique file names based on item key

6f0e4d1

This change also alters dump() to return the path and file name, see discussion in #75

urschrei added a commit that referenced this issue Oct 22, 2017

Alter test attachment doc link mode

be45a6f

This causes tests related to #75 to pass again

urschrei added a commit that referenced this issue Oct 22, 2017

Dump HTML snapshots with unique file names based on item key

81d0364

This change also alters dump() to return the path and file name, see discussion in #75

urschrei added a commit that referenced this issue Nov 2, 2017

Alter test attachment doc link mode

a14be3a

This causes tests related to #75 to pass again

urschrei added a commit that referenced this issue Nov 2, 2017

Dump HTML snapshots with unique file names based on item key

a565070

This change also alters dump() to return the path and file name, see discussion in #75

urschrei added a commit that referenced this issue Nov 2, 2017

Alter test attachment doc link mode

1b13ef6

This causes tests related to #75 to pass again

urschrei added a commit that referenced this issue Nov 2, 2017

Dump HTML snapshots with unique file names based on item key

1ab79be

This change also alters dump() to return the path and file name, see discussion in #75

urschrei added a commit that referenced this issue Nov 20, 2017

Alter test attachment doc link mode

1609c2b

This causes tests related to #75 to pass again

urschrei added a commit that referenced this issue Nov 20, 2017

Dump HTML snapshots with unique file names based on item key

3f97081

This change also alters dump() to return the path and file name, see discussion in #75

urschrei added a commit that referenced this issue Jan 30, 2018

Alter test attachment doc link mode

c8ebe82

This causes tests related to #75 to pass again

urschrei added a commit that referenced this issue Jan 30, 2018

Dump HTML snapshots with unique file names based on item key

b037fcb

This change also alters dump() to return the path and file name, see discussion in #75

urschrei added a commit that referenced this issue Feb 3, 2018

Alter test attachment doc link mode

b74097e

This causes tests related to #75 to pass again

urschrei added a commit that referenced this issue Feb 3, 2018

Dump HTML snapshots with unique file names based on item key

328a958

This change also alters dump() to return the path and file name, see discussion in #75

urschrei added a commit that referenced this issue May 15, 2018

Alter test attachment doc link mode

a055d1d

This causes tests related to #75 to pass again

urschrei added a commit that referenced this issue May 15, 2018

Dump HTML snapshots with unique file names based on item key

1c4746e

This change also alters dump() to return the path and file name, see discussion in #75

urschrei added a commit that referenced this issue Jun 16, 2018

Alter test attachment doc link mode

eada415

This causes tests related to #75 to pass again

urschrei added a commit that referenced this issue Jun 16, 2018

Dump HTML snapshots with unique file names based on item key

0e8188e

This change also alters dump() to return the path and file name, see discussion in #75

urschrei added a commit that referenced this issue Jun 17, 2018

Alter test attachment doc link mode

88b2018

This causes tests related to #75 to pass again

urschrei added a commit that referenced this issue Jun 17, 2018

Dump HTML snapshots with unique file names based on item key

916abac

This change also alters dump() to return the path and file name, see discussion in #75

urschrei added a commit that referenced this issue Aug 14, 2018

Alter test attachment doc link mode

dd04383

This causes tests related to #75 to pass again

urschrei added a commit that referenced this issue Aug 14, 2018

Dump HTML snapshots with unique file names based on item key

b8bef88

This change also alters dump() to return the path and file name, see discussion in #75

urschrei added a commit that referenced this issue Oct 21, 2018

Alter test attachment doc link mode

b029245

This causes tests related to #75 to pass again

urschrei added a commit that referenced this issue Oct 21, 2018

Dump HTML snapshots with unique file names based on item key

96b9ae5

This change also alters dump() to return the path and file name, see discussion in #75

urschrei added a commit that referenced this issue Nov 30, 2018

Alter test attachment doc link mode

7776c60

This causes tests related to #75 to pass again

urschrei added a commit that referenced this issue Nov 30, 2018

Dump HTML snapshots with unique file names based on item key

2713d38

This change also alters dump() to return the path and file name, see discussion in #75

urschrei added a commit that referenced this issue Nov 30, 2018

Alter test attachment doc link mode

1670154

This causes tests related to #75 to pass again

urschrei added a commit that referenced this issue Nov 30, 2018

Dump HTML snapshots with unique file names based on item key

9b33f98

This change also alters dump() to return the path and file name, see discussion in #75

urschrei added a commit that referenced this issue Dec 31, 2018

Alter test attachment doc link mode

046c06c

This causes tests related to #75 to pass again

urschrei added a commit that referenced this issue Dec 31, 2018

Dump HTML snapshots with unique file names based on item key

20048d9

This change also alters dump() to return the path and file name, see discussion in #75

urschrei added a commit that referenced this issue Jan 1, 2019

Alter test attachment doc link mode

388f058

This causes tests related to #75 to pass again

urschrei added a commit that referenced this issue Jan 1, 2019

Dump HTML snapshots with unique file names based on item key

a1549a3

This change also alters dump() to return the path and file name, see discussion in #75

urschrei added a commit that referenced this issue Jan 3, 2019

Alter test attachment doc link mode

3949ccb

This causes tests related to #75 to pass again

urschrei added a commit that referenced this issue Jan 3, 2019

Dump HTML snapshots with unique file names based on item key

5afe194

This change also alters dump() to return the path and file name, see discussion in #75

urschrei added a commit that referenced this issue Apr 27, 2019

Alter test attachment doc link mode

d7ab1cd

This causes tests related to #75 to pass again

urschrei added a commit that referenced this issue Apr 27, 2019

Dump HTML snapshots with unique file names based on item key

c4b285d

This change also alters dump() to return the path and file name, see discussion in #75

urschrei added a commit that referenced this issue Jun 9, 2019

Alter test attachment doc link mode

72b3030

This causes tests related to #75 to pass again

urschrei added a commit that referenced this issue Jun 9, 2019

Dump HTML snapshots with unique file names based on item key

2cb2848

This change also alters dump() to return the path and file name, see discussion in #75

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't save HTML file attachments #75

Can't save HTML file attachments #75

davidswelt commented May 17, 2017 •

edited

Loading

urschrei commented May 17, 2017

davidswelt commented May 17, 2017

urschrei commented May 17, 2017

davidswelt commented May 18, 2017 via email •

edited

Loading

urschrei commented May 18, 2017

davidswelt commented May 18, 2017

urschrei commented May 18, 2017

davidswelt commented May 18, 2017

urschrei commented May 18, 2017

Can't save HTML file attachments #75

Can't save HTML file attachments #75

Comments

davidswelt commented May 17, 2017 • edited Loading

urschrei commented May 17, 2017

davidswelt commented May 17, 2017

urschrei commented May 17, 2017

davidswelt commented May 18, 2017 via email • edited Loading

urschrei commented May 18, 2017

davidswelt commented May 18, 2017

urschrei commented May 18, 2017

davidswelt commented May 18, 2017

urschrei commented May 18, 2017

davidswelt commented May 17, 2017 •

edited

Loading

davidswelt commented May 18, 2017 via email •

edited

Loading