Third development release

freeman-lab released this 23 Aug 22:04

· 1743 commits to master since this release

4783f5a

This update adds new functionality for loading data, alongside changes to the API for loading, and a variety of smaller bug fixes.

API changes

All data loading is performed through the new Thunder Context, a thin wrapper for a Spark Context. This context is automatically created when starting thunder, and has methods for loading data from different input sources.
tsc.loadText behaves identically to the load from previous versions.
Example data sets can now be loaded from tsc.makeExample, tsc.loadExample, and tsc.loadExampleEC2.
Output of the pack operation now preserves xy definition, but outputs will be transposed relative to previous versions.

New features

Include design matrix with example data set on EC2
Faster nmf implementation by changing update equation order (#15)
Support for loading local MAT files into RDDs through tsc.loadMatLocal
Preliminary support for loading binary files from HDFS using tsc.loadBinary (depends on features currently only available in Spark's master branch)

Bug fixes

Used pillow instead of PIL (#11)
Fixed important typo in documentation page (#18)
Fixed sorting bug in local correlations

Assets 2