Third development release
This update adds new functionality for loading data, alongside changes to the API for loading, and a variety of smaller bug fixes.
API changes
- All data loading is performed through the new Thunder Context, a thin wrapper for a Spark Context. This context is automatically created when starting thunder, and has methods for loading data from different input sources.
tsc.loadText
behaves identically to theload
from previous versions.- Example data sets can now be loaded from
tsc.makeExample
,tsc.loadExample
, andtsc.loadExampleEC2
. - Output of the
pack
operation now preserves xy definition, but outputs will be transposed relative to previous versions.
New features
- Include design matrix with example data set on EC2
- Faster
nmf
implementation by changing update equation order (#15) - Support for loading local MAT files into RDDs through
tsc.loadMatLocal
- Preliminary support for loading binary files from HDFS using
tsc.loadBinary
(depends on features currently only available in Spark's master branch)