Releases
v0.7
This is a stable release of 0.7 version
hcho3
released this
30 Dec 21:52
Changes
This version represents a major change from the last release (v0.6), which was released one year and half ago.
Updated Sklearn API
Add compatibility layer for scikit-learn v0.18: sklearn.cross_validation
now deprecated
Updated to allow use of all XGBoost parameters via **kwargs
.
Updated nthread
to n_jobs
and seed
to random_state
(as per Sklearn convention); nthread
and seed
are now marked as deprecated
Updated to allow choice of Booster (gbtree
, gblinear
, or dart
)
XGBRegressor
now supports instance weights (specify sample_weight
parameter)
Pass n_jobs
parameter to the DMatrix
constructor
Add xgb_model
parameter to fit
method, to allow continuation of training
Refactored gbm to allow more friendly cache strategy
Specialized some prediction routine
Robust DMatrix
construction from a sparse matrix
Faster consturction of DMatrix
from 2D NumPy matrices: elide copies, use of multiple threads
Automatically remove nan from input data when it is sparse.
This can solve some of user reported problem of istart != hist.size
Fix the single-instance prediction function to obtain correct predictions
Minor fixes
Thread local variable is upgraded so it is automatically freed at thread exit.
Fix saving and loading count::poisson
models
Fix CalcDCG to use base-2 logarithm
Messages are now written to stderr instead of stdout
Keep built-in evaluations while using customized evaluation functions
Use bst_float
consistently to minimize type conversion
Copy the base margin when slicing DMatrix
Evaluation metrics are now saved to the model file
Use int32_t
explicitly when serializing version
In distributed training, synchronize the number of features after loading a data matrix.
Migrate to C++11
The current master version now requires C++11 enabled compiled(g++4.8 or higher)
Predictor interface was factored out (in a manner similar to the updater interface).
Makefile support for Solaris and ARM
Test code coverage using Codecov
Add CPP tests
Add Dockerfile
and Jenkinsfile
to support continuous integration for GPU code
New functionality
Ability to adjust tree model's statistics to a new dataset without changing tree structures.
Ability to extract feature contributions from individual predictions, as described in here and here .
Faster, histogram-based tree algorithm (tree_method='hist'
) .
GPU/CUDA accelerated tree algorithms (tree_method='gpu_hist'
or 'gpu_exact'
), including the GPU-based predictor.
Monotonic constraints: when other features are fixed, force the prediction to be monotonic increasing with respect to a certain specified feature.
Faster gradient caculation using AVX SIMD
Ability to export models in JSON format
Support for Tweedie regression
Additional dropout options for DART: binomial+1, epsilon
Ability to update an existing model in-place: this is useful for many applications, such as determining feature importance
Python package:
New parameters:
learning_rates
in cv()
shuffle
in mknfold()
max_features
and show_values
in plot_importance()
sample_weight
in XGBRegressor.fit()
Support binary wheel builds
Fix MultiIndex
detection to support Pandas 0.21.0 and higher
Support metrics and evaluation sets whose names contain -
Support feature maps when plotting trees
Compatibility fix for Python 2.6
Call print_evaluation
callback at last iteration
Use appropriate integer types when calling native code, to prevent truncation and memory error
Fix shared library loading on Mac OS X
R package:
New parameters:
silent
in xgb.DMatrix()
use_int_id
in xgb.model.dt.tree()
predcontrib
in predict()
monotone_constraints
in xgb.train()
Default value of the save_period
parameter in xgboost()
changed to NULL (consistent with xgb.train()
).
It's possible to custom-build the R package with GPU acceleration support.
Enable JVM build for Mac OS X and Windows
Integration with AppVeyor CI
Improved safety for garbage collection
Store numeric attributes with higher precision
Easier installation for devel version
Improved xgb.plot.tree()
Various minor fixes to improve user experience and robustness
Register native code to pass CRAN check
Updated CRAN submission
JVM packages
Add Spark pipeline persistence API
Fix data persistence: loss evaluation on test data had wrongly used caches for training data.
Clean external cache after training
Implement early stopping
Enable training of multiple models by distinguishing stage IDs
Better Spark integration: support RDD / dataframe / dataset, integrate with Spark ML package
XGBoost4j now supports ranking task
Support training with missing data
Refactor JVM package to separate regression and classification models to be consistent with other machine learning libraries
Support XGBoost4j compilation on Windows
Parameter tuning tool
Publish source code for XGBoost4j to maven local repo
Scala implementation of the Rabit tracker (drop-in replacement for the Java implementation)
Better exception handling for the Rabit tracker
Persist num_class
, number of classes (for classification task)
XGBoostModel
now holds BoosterParams
libxgboost4j is now part of CMake build
Release DMatrix
when no longer needed, to conserve memory
Expose baseMargin
, to allow initialization of boosting with predictions from an external model
Support instance weights
Use SparkParallelismTracker
to prevent jobs from hanging forever
Expose train-time evaluation metrics via XGBoostModel.summary
Option to specify host-ip
explicitly in the Rabit tracker
Documentation
Better math notation for gradient boosting
Updated build instructions for Mac OS X
Template for GitHub issues
Add CITATION
file for citing XGBoost in scientific writing
Fix dropdown menu in xgboost.readthedocs.io
Document updater_seq
parameter
Style fixes for Python documentation
Links to additional examples and tutorials
Clarify installation requirements
Changes that break backward compatibility
#1519 XGBoost-spark no longer contains APIs for DMatrix; use the public booster interface instead.
#2476 XGBoostModel.predict()
now has a different signature
You can’t perform that action at this time.