how to deal with large (many observations) datasets #44

atn38 · 2019-06-24T14:34:53Z

With a data table of ~400k observations and 60 variables, the complete static report takes upwards of 10 mins to complete. Does the dynamic plotting functionality have the same challenges? Could we do something with large datasets to reduce load? Randomly sample the dataset then generate report from that sample?

CoastalPlainSoils · 2019-06-25T03:38:51Z

Hmmm good question. I have no idea. However, I think if it takes that long to complete, there should definitely be something to let the user know the site is processing the request and if it is possible, the app could give an estimate time frame for completion?!

clnsmth · 2019-06-25T14:16:59Z

I'm moving the conversation of #11 here.

clnsmth · 2019-06-25T14:18:08Z

I suggest this be an optional argument rather than an arbitrary limit to number of rows that can be read in. If performance and wait times are a concern for users, we could address this issue by supplying the user with a status bar (which has proven difficult to do) or inform the user of limitations through an expectation matrix (suggested here) in the package documentation.

clnsmth · 2019-06-25T14:32:23Z

@atn38, since the UI team has figured out how to return messages from a function to the GUI, you could add messages to each static report function to inform the user of status.

Alternatively, as @CoastalPlainSoils suggests, you may be able to create a progress bar using the progress package.

wetlandscapes · 2019-06-27T19:56:16Z

I kind of like the idea of being able to randomly sample a large data set. In that context, some useful options would be:

Indicate the % of the dataset (rows) to be explored. There would be an indicator of the resultant rows returned from the sample.
Set a seed. This would allow someone to generate the same report twice.

sheilasaia · 2019-06-27T20:54:20Z

add printing to console for report status on data summary tab. @wetlandscapes will give this a go!

sheilasaia · 2019-07-12T13:55:50Z

can i also add that we might want to limit the size of the download to someone's computer too? for example, warn them (and maybe stop download) if they're about to download a huge .shp file.

clnsmth · 2019-07-12T15:26:36Z

I suggest the random sampling and warnings become enhancements to be implemented after the production release. Until then, file size issues can be communicated in the GUI messages and project docs. Note: A user will have to find a data package to use with datapie in DataONE first, where the file size information is clearly presented.

atn38 changed the title ~~how to deal with large (many observations)~~ how to deal with large (many observations) datasets Jun 24, 2019

sheilasaia added the duplicate This issue or pull request already exists label Jun 25, 2019

clnsmth added enhancement New feature or request and removed duplicate This issue or pull request already exists labels Jun 25, 2019

clnsmth mentioned this issue Jun 25, 2019

limit maximum number of rows for dataframes being read in #11

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to deal with large (many observations) datasets #44

how to deal with large (many observations) datasets #44

atn38 commented Jun 24, 2019 •

edited

Loading

CoastalPlainSoils commented Jun 25, 2019

clnsmth commented Jun 25, 2019

clnsmth commented Jun 25, 2019

clnsmth commented Jun 25, 2019

wetlandscapes commented Jun 27, 2019

sheilasaia commented Jun 27, 2019 •

edited

Loading

sheilasaia commented Jul 12, 2019

clnsmth commented Jul 12, 2019

how to deal with large (many observations) datasets #44

how to deal with large (many observations) datasets #44

Comments

atn38 commented Jun 24, 2019 • edited Loading

CoastalPlainSoils commented Jun 25, 2019

clnsmth commented Jun 25, 2019

clnsmth commented Jun 25, 2019

clnsmth commented Jun 25, 2019

wetlandscapes commented Jun 27, 2019

sheilasaia commented Jun 27, 2019 • edited Loading

sheilasaia commented Jul 12, 2019

clnsmth commented Jul 12, 2019

atn38 commented Jun 24, 2019 •

edited

Loading

sheilasaia commented Jun 27, 2019 •

edited

Loading