-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Roadmap] Phasing out the support for old binary format. #7547
Comments
How necessary is this? Was the default_left changed to improve performance? |
Yes. Most of the improvement comes from the typed array where we can omit the construction of Actually, there is, but it's not quite useful. The representation of boolean is We can continue the support for the current JSON model for a very long time since the additional code is not much (1 condition to check whether it's bool or int), but I think it's also quite easy to move away from it since users can simply replace |
Is there a simple way to silence this warning "Found JSON model saved before XGBoost 1.6, please save the model using current version again. The support for old JSON model will be discontinued in XGBoost 2.3." when using the java interface? I.e. ml.dmlc.xgboost4j.java.XGBoost class from ml.dmlc.xgboost-jvm_2.12 maven artifact. The C code seems to accept some "verbosity" configuration, but so far I have not found way to set this config from the Java code. |
@trivialfis @hcho3 I am working on a project that uses the JSON format to save, load and analyze the serialized model. Will we continue to have support for the JSON serialization format moving forward? Given that JSON has much broader support across different languages/ libraries compared to UBJSON, it would be great to continue having that as a serialization option. Thanks! |
We will support JSON. It shares the same code path with UBJson, so rest assured, they will live together. |
XGBoost has a custom binary model format that has been used since day 1. Later in 1.0, we
introduced the JSON format as an alternative, which has a schema and has better
extensibility. The JSON format has been used as a default format for memory snapshot
serialization (pickle, rds, etc) and has extra features including categorical data support,
extra data feature names, and features types. However, for performance and compatibility
reasons we have continued the support for the old binary format. In 1.6 we plan to add
universal binary JSON as an extension to the current JSON format also as a replacement for the old
binary format.
Motivation
The old binary format is essentially copying internal structures like parameters, tree
nodes into a memory buffer, so it has a fixed memory layout that's difficult to change and
debug. If we look at the
Learner
class it's full of conditions to work around someissues in binary format accumulated over the past. These issues root from the situation
that we can not change the binary output in any way, which also has an indirect impact on
how we write code. For instance, we can not change the
RegTree
structure due to how thenode is stored in the output and it's the very core of XGBoost. To overcome these issues
and clear some room for future development we need to phase out its use.
Roadmap
If the Universal Binary JSON implementation is accepted, I propose the following roadmap
for phasing out the support of the old binary format:
default. Emit warning when users are loading old JSON format. This is necessary since
the
default_left
is changed from boolean to integer.note
The text was updated successfully, but these errors were encountered: