Model Serialization (pickle)¶
This notebook shows how to save and reload a trained DeepGBoost model using
Python's standard pickle module.
We cover:
- Training a regressor and a classifier
- Saving models to disk with
pickle.dump - Loading them back with
pickle.load - Verifying that predictions are identical before and after serialization
In [1]:
Copied!
# If running outside of the repo, install with:
# pip install deepgboost
#
# If running from the repo root:
# pip install -e '.[dev]'
import deepgboost
print("DeepGBoost version:", deepgboost.__version__)
# If running outside of the repo, install with:
# pip install deepgboost
#
# If running from the repo root:
# pip install -e '.[dev]'
import deepgboost
print("DeepGBoost version:", deepgboost.__version__)
DeepGBoost version: 0.1.0
In [2]:
Copied!
import pickle
import tempfile
from pathlib import Path
import numpy as np
from sklearn.datasets import load_diabetes, load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score, accuracy_score
from deepgboost import DeepGBoostRegressor, DeepGBoostClassifier
import pickle
import tempfile
from pathlib import Path
import numpy as np
from sklearn.datasets import load_diabetes, load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score, accuracy_score
from deepgboost import DeepGBoostRegressor, DeepGBoostClassifier
1. Regression¶
In [3]:
Copied!
X, y = load_diabetes(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
reg = DeepGBoostRegressor(
n_trees=10,
n_layers=15,
max_depth=4,
learning_rate=0.1,
random_state=42,
)
reg.fit(X_train, y_train)
preds_before = reg.predict(X_test)
print(f"R² before saving: {r2_score(y_test, preds_before):.4f}")
X, y = load_diabetes(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
reg = DeepGBoostRegressor(
n_trees=10,
n_layers=15,
max_depth=4,
learning_rate=0.1,
random_state=42,
)
reg.fit(X_train, y_train)
preds_before = reg.predict(X_test)
print(f"R² before saving: {r2_score(y_test, preds_before):.4f}")
R² before saving: 0.4645
Save to disk¶
In [4]:
Copied!
model_path = Path(tempfile.gettempdir()) / "deepgboost_regressor.pkl"
with open(model_path, "wb") as f:
pickle.dump(reg, f)
print(f"Model saved to: {model_path}")
print(f"File size: {model_path.stat().st_size / 1024:.1f} KB")
model_path = Path(tempfile.gettempdir()) / "deepgboost_regressor.pkl"
with open(model_path, "wb") as f:
pickle.dump(reg, f)
print(f"Model saved to: {model_path}")
print(f"File size: {model_path.stat().st_size / 1024:.1f} KB")
Model saved to: /tmp/deepgboost_regressor.pkl File size: 347.3 KB
Load from disk and verify predictions¶
In [5]:
Copied!
with open(model_path, "rb") as f:
reg_loaded = pickle.load(f)
preds_after = reg_loaded.predict(X_test)
print(f"R² after loading: {r2_score(y_test, preds_after):.4f}")
print(f"Predictions identical: {np.allclose(preds_before, preds_after)}")
with open(model_path, "rb") as f:
reg_loaded = pickle.load(f)
preds_after = reg_loaded.predict(X_test)
print(f"R² after loading: {r2_score(y_test, preds_after):.4f}")
print(f"Predictions identical: {np.allclose(preds_before, preds_after)}")
R² after loading: 0.4645 Predictions identical: True
2. Classification¶
In [6]:
Copied!
X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
clf = DeepGBoostClassifier(
n_trees=5,
n_layers=10,
max_depth=4,
learning_rate=0.15,
random_state=42,
)
clf.fit(X_train, y_train)
preds_before = clf.predict(X_test)
print(f"Accuracy before saving: {accuracy_score(y_test, preds_before):.4f}")
X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
clf = DeepGBoostClassifier(
n_trees=5,
n_layers=10,
max_depth=4,
learning_rate=0.15,
random_state=42,
)
clf.fit(X_train, y_train)
preds_before = clf.predict(X_test)
print(f"Accuracy before saving: {accuracy_score(y_test, preds_before):.4f}")
Accuracy before saving: 0.9649
Save to disk¶
In [7]:
Copied!
model_path = Path(tempfile.gettempdir()) / "deepgboost_classifier.pkl"
with open(model_path, "wb") as f:
pickle.dump(clf, f)
print(f"Model saved to: {model_path}")
print(f"File size: {model_path.stat().st_size / 1024:.1f} KB")
model_path = Path(tempfile.gettempdir()) / "deepgboost_classifier.pkl"
with open(model_path, "wb") as f:
pickle.dump(clf, f)
print(f"Model saved to: {model_path}")
print(f"File size: {model_path.stat().st_size / 1024:.1f} KB")
Model saved to: /tmp/deepgboost_classifier.pkl File size: 103.0 KB
Load from disk and verify predictions¶
In [8]:
Copied!
with open(model_path, "rb") as f:
clf_loaded = pickle.load(f)
preds_after = clf_loaded.predict(X_test)
print(f"Accuracy after loading: {accuracy_score(y_test, preds_after):.4f}")
print(f"Predictions identical: {np.array_equal(preds_before, preds_after)}")
with open(model_path, "rb") as f:
clf_loaded = pickle.load(f)
preds_after = clf_loaded.predict(X_test)
print(f"Accuracy after loading: {accuracy_score(y_test, preds_after):.4f}")
print(f"Predictions identical: {np.array_equal(preds_before, preds_after)}")
Accuracy after loading: 0.9649 Predictions identical: True
Note: Only load pickle files from sources you trust. Pickle does not provide any security guarantees — a malicious
.pklfile can execute arbitrary code when loaded.