-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When we run an analysis, what do we want to get back? #3
Comments
F1 Score |
Confusion Matrix |
Y Hat |
prediction scores |
Feature ranking, a list of selected features. For GLM, F-stat/t-stat and p-values of predictors, model goodness of fit |
We should probably save the sklearn estimators representing any transformations and the classifier. The sklearn doc recommends pickle for estimator persistence. Pickle is a binary serialization format in Python. @dcgoss, @awm33, and others -- can we store binary files in our database? |
Python object serialization to base64 encoded text@dcgoss cool. I think we the following solution will work: import base64
import pickle
payload = ['a', 'list', 2, 'encode']
byte_pickle = pickle.dumps(payload, protocol=4)
base64_text = base64.b64encode(byte_pickle).decode()
# Save `base64_text` using a text field in the database
byte_pickle = base64.b64decode(base64_text.encode())
pickle.loads(byte_pickle) FYI |
@dhimmel base64 text is usually fine for small sizes. Can also be stored as text in JSON fields. How big are the binaries? Is |
I pickle-->base64-->text converted If I add an extra step to compress, so the entire compression becomes: byte_pickle = pickle.dumps(best_clf, protocol=4)
byte_pickle = zlib.compress(byte_pickle)
base64_text = base64.b64encode(byte_pickle).decode() Then |
@dhimmel Compressing is a good move. If we think this would go into the tens of megabytes or more, we may want to consider using blob storage such as S3 or GCS. Postgres can handle gigabytes of text, but it's not great for performance. |
We need to design our results json so that we can later visualize the most important results via the results viewer from the UI team.
The text was updated successfully, but these errors were encountered: