Skip to content
Mikhail Semeniuk edited this page Dec 28, 2016 · 55 revisions

MLeap For Spark

MLeap deploys Spark ML (and some MLlib) transformers and pipelines in production without a Spark Context.

MLeap For Scikit-Learn

MLeap extends scikit-learn's functionality to be able to serialize and deploy scikit transformers, pipelines and feature unions without any dependencies on scikit (numpy, scipy, c++ libraries). It also serializes transformers and pipelines as Spark, so you can load and deploy your scikit pipelines on Spark infrastructure with a few lines of code.

Tutorials

Demos

Supported Transformers

Features

Transformer Spark Scikit-Learn TensorFlow
Binarizer x x
Bucketizer x
ChiSqSelector x
CountVectorizer
ElementwiseProduct x x
HashingTermFrequency x x
Imputer x x
Interaction x x
LSH
MaxAbsScaler x
MinMaxScaler x x
Ngram x
Normalizer x
OneHotEncoder x x
PCA x x
QuantileDiscretizer x
PolynomialExpansion x x
ReverseStringIndexer x x
StandardScaler x x
StopWordsRemover x
StringIndexer x x
Tokenizer x x
VectorAssembler x x

Classification

Transformer Spark Scikit-Learn TensorFlow
DecisionTreeClassifier x x
GradientBoostedTreeClassifier x
LogisticRegression x x
LogisticRegressionCv x x
NaiveBayesClassifier x
OneVsRest x
RandomForestClassifier x x
SupportVectorMachines x x
MultiLayerPerceptron x

Regression

Transformer Spark Scikit-Learn TensorFlow
AFTSurvivalRegression
DecisionTreeRegression x x
GeneralizedLinearRegression
GradientBoostedTreeRegression x
IsotonicRegression
LinearRegression x x
RandomForestRegression x x

Clustering

Transformer Spark Scikit-Learn TensorFlow
BisectingKMeans x
GaussianMixtureModel x
KMeans x
LDA

Linear Algebra

  • CholeskyDecomposition