Make datasets more easily loadable #61

ChrisBeaumont · 2013-07-26T18:40:10Z

This is what I had in mind for #60 -- instead of asking users to download files like the Perseus extinction map, why not bundle it with the code with an easy loader function?

astrofrog · 2013-07-26T19:34:10Z

One reason why I like being explicit is because we want users to be able to take the examples in the docs and apply it to their cases, which they can't really do if we are hard-coding specific examples with special methods. So I would personally prefer not to have this kind of convenience provided (to me, the data shouldn't be part of the package itself, it's an example file for the docs).

On the other hand, I think we could consider allowing a filename to be passed to compute, which would already simplify a number of examples.

ChrisBeaumont · 2013-07-26T21:02:00Z

Ok, you can make the final call here. I like being able to %paste code snippets in IPython, to quickly run examples -- as stupid as it is, an extra step like a data download has prevented me from trying examples before. I like how scikit-learn handles this stuff: http://scikit-learn.org/stable/auto_examples/applications/plot_outlier_detection_housing.html
and
https://github.com/scikit-learn/scikit-learn/tree/master/sklearn/datasets/data

But it's just a preference, and I defer to you. Either way, I agree compute should do something sensible if given a fits path (i.e. load and use the first HDU)

astrofrog · 2013-07-26T21:18:32Z

@ChrisBeaumont - I'll look into it a bit more. In this case, the reason why I think copying and pasting verbatim isn't particularly useful is because the user won't learn much more than what's already in the docs. IMHO they are most likely to copy and paste and modify, which means they want to use it on their own files. But I want to take some time to think about it and have a look over the examples you show.

ChrisBeaumont · 2013-07-26T21:23:28Z

One likely use case where they won't care about using their own data is to get a handle on the data structure, the API, the impact of pruning parameters, etc. People may want to do that before they dive in with their own data. Just something to think about.

astrofrog · 2013-07-28T11:18:40Z

After thinking about this a bit more, I agree that something like this would be handy, though I think we need to ensure we keep some of the initial examples with the full loading, and that we also make sure we provide links from time to time to say 'if you want to load your own dataset, see X'. I can try and work on this this afternoon based on what you have here.

astrofrog · 2013-07-28T13:47:10Z

@ChrisBeaumont - could you rebase this, then I can open a pull request to your branch with suggestions.

ChrisBeaumont · 2013-07-28T15:13:03Z

ok, done

astrofrog · 2013-08-02T06:53:46Z

Oops, it looks like this will need rebasing again. Once you've done that, I'll open a pull request to your branch with suggestions - but in principle I now like this idea. I just think that the first time we show the example we should show the full thing and add a little paragraph explaining where the load functions are coming from.

astrofrog · 2013-08-30T12:15:50Z

@ChrisBeaumont - here are a couple of ideas:

instead of load_perseus(), simply store the path to the example data as a string (e.g. example_perseus) and pass that to compute - and also modify compute so it can read from FITS files as discussed above.
rather than store the path to the whole file, store the path to the directory containing the files, and update the examples to:

from astrodendro import Dendrogram, examples
Dendrogram(examples + 'PerA_Extn2MASS_F_Gal.fits')

Then it shows more explicitly that the argument is a path. What do you think?

ChrisBeaumont · 2013-08-30T13:52:03Z

I don't like examples as a path string -- that isn't obvious from the import statement, and usually you import code, not data. Plus, the PerA_Extn... name is cumbersome.

What do you think about a function example_file('perseus') -> path_to_perseus_file? This, combined with the ability for compute to auto-load a fits file from a path, addresses the original motivation for this issue (make using example data easier and more readable)

astrofrog · 2013-08-30T14:02:49Z

Ok, that sounds like a reasonable compromise - do you want to update the examples to show this?

astrofrog · 2013-09-20T15:15:40Z

Summary of a chat we had at CfA with @ChrisBeaumont and @keflavich:

Pros: don't need to download file to run examples
Cons: users wouldn't necessarily know how to modify the script to use with their data

One option - the first time, show the full thing, then explain that in the rest of the docs, we use a convenience to replace the from astropy.io import fits and fits.getdata lines, then use the convenience in the other examples to avoid always repeating the I/O code.

We decided to delay this until after 0.1 release.

datasets module

408ba4d

ghost assigned ChrisBeaumont Sep 20, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make datasets more easily loadable #61

Make datasets more easily loadable #61

ChrisBeaumont commented Jul 26, 2013

astrofrog commented Jul 26, 2013

ChrisBeaumont commented Jul 26, 2013

astrofrog commented Jul 26, 2013

ChrisBeaumont commented Jul 26, 2013

astrofrog commented Jul 28, 2013

astrofrog commented Jul 28, 2013

ChrisBeaumont commented Jul 28, 2013

astrofrog commented Aug 2, 2013

astrofrog commented Aug 30, 2013

ChrisBeaumont commented Aug 30, 2013

astrofrog commented Aug 30, 2013

astrofrog commented Sep 20, 2013

Make datasets more easily loadable #61

Are you sure you want to change the base?

Make datasets more easily loadable #61

Conversation

ChrisBeaumont commented Jul 26, 2013

astrofrog commented Jul 26, 2013

ChrisBeaumont commented Jul 26, 2013

astrofrog commented Jul 26, 2013

ChrisBeaumont commented Jul 26, 2013

astrofrog commented Jul 28, 2013

astrofrog commented Jul 28, 2013

ChrisBeaumont commented Jul 28, 2013

astrofrog commented Aug 2, 2013

astrofrog commented Aug 30, 2013

ChrisBeaumont commented Aug 30, 2013

astrofrog commented Aug 30, 2013

astrofrog commented Sep 20, 2013