Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make datasets more easily loadable #61

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ChrisBeaumont
Copy link
Contributor

This is what I had in mind for #60 -- instead of asking users to download files like the Perseus extinction map, why not bundle it with the code with an easy loader function?

@astrofrog
Copy link
Contributor

One reason why I like being explicit is because we want users to be able to take the examples in the docs and apply it to their cases, which they can't really do if we are hard-coding specific examples with special methods. So I would personally prefer not to have this kind of convenience provided (to me, the data shouldn't be part of the package itself, it's an example file for the docs).

On the other hand, I think we could consider allowing a filename to be passed to compute, which would already simplify a number of examples.

@ChrisBeaumont
Copy link
Contributor Author

Ok, you can make the final call here. I like being able to %paste code snippets in IPython, to quickly run examples -- as stupid as it is, an extra step like a data download has prevented me from trying examples before. I like how scikit-learn handles this stuff: http://scikit-learn.org/stable/auto_examples/applications/plot_outlier_detection_housing.html
and
https://github.com/scikit-learn/scikit-learn/tree/master/sklearn/datasets/data

But it's just a preference, and I defer to you. Either way, I agree compute should do something sensible if given a fits path (i.e. load and use the first HDU)

@astrofrog
Copy link
Contributor

@ChrisBeaumont - I'll look into it a bit more. In this case, the reason why I think copying and pasting verbatim isn't particularly useful is because the user won't learn much more than what's already in the docs. IMHO they are most likely to copy and paste and modify, which means they want to use it on their own files. But I want to take some time to think about it and have a look over the examples you show.

@ChrisBeaumont
Copy link
Contributor Author

One likely use case where they won't care about using their own data is to get a handle on the data structure, the API, the impact of pruning parameters, etc. People may want to do that before they dive in with their own data. Just something to think about.

@astrofrog
Copy link
Contributor

After thinking about this a bit more, I agree that something like this would be handy, though I think we need to ensure we keep some of the initial examples with the full loading, and that we also make sure we provide links from time to time to say 'if you want to load your own dataset, see X'. I can try and work on this this afternoon based on what you have here.

@astrofrog
Copy link
Contributor

@ChrisBeaumont - could you rebase this, then I can open a pull request to your branch with suggestions.

@ChrisBeaumont
Copy link
Contributor Author

ok, done

@astrofrog
Copy link
Contributor

Oops, it looks like this will need rebasing again. Once you've done that, I'll open a pull request to your branch with suggestions - but in principle I now like this idea. I just think that the first time we show the example we should show the full thing and add a little paragraph explaining where the load functions are coming from.

@astrofrog
Copy link
Contributor

@ChrisBeaumont - here are a couple of ideas:

  • instead of load_perseus(), simply store the path to the example data as a string (e.g. example_perseus) and pass that to compute - and also modify compute so it can read from FITS files as discussed above.
  • rather than store the path to the whole file, store the path to the directory containing the files, and update the examples to:
from astrodendro import Dendrogram, examples
Dendrogram(examples + 'PerA_Extn2MASS_F_Gal.fits')

Then it shows more explicitly that the argument is a path. What do you think?

@ChrisBeaumont
Copy link
Contributor Author

I don't like examples as a path string -- that isn't obvious from the import statement, and usually you import code, not data. Plus, the PerA_Extn... name is cumbersome.

What do you think about a function example_file('perseus') -> path_to_perseus_file? This, combined with the ability for compute to auto-load a fits file from a path, addresses the original motivation for this issue (make using example data easier and more readable)

@astrofrog
Copy link
Contributor

Ok, that sounds like a reasonable compromise - do you want to update the examples to show this?

@ghost ghost assigned ChrisBeaumont Sep 20, 2013
@astrofrog
Copy link
Contributor

Summary of a chat we had at CfA with @ChrisBeaumont and @keflavich:

  • Pros: don't need to download file to run examples
  • Cons: users wouldn't necessarily know how to modify the script to use with their data

One option - the first time, show the full thing, then explain that in the rest of the docs, we use a convenience to replace the from astropy.io import fits and fits.getdata lines, then use the convenience in the other examples to avoid always repeating the I/O code.

We decided to delay this until after 0.1 release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants