Skip to content

Utilities

deepgboost.common.utils.bootstrap_sampler(n_samples, n_layers, layer_idx, subsample_min_frac=0.3, rng=None)

Dynamic bootstrap sampler (paper sec. 3.1.3).

Sample size grows linearly from subsample_min_frac * n_samples at layer 0 to n_samples at the last layer. This avoids over-fitting in early boosting steps while allowing the final layers to see the whole dataset.

Parameters:

Name Type Description Default
n_samples int

Total number of training samples.

required
n_layers int

Total number of boosting layers.

required
layer_idx int

Index of the current layer (0-based).

required
subsample_min_frac float

Minimum fraction of samples used at layer 0.

0.3
rng Generator or None

Random number generator for reproducibility.

None

Returns:

Type Description
np.ndarray of shape (size,)

Row indices to use for training.

Source code in src/deepgboost/common/utils.py
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
def bootstrap_sampler(
    n_samples: int,
    n_layers: int,
    layer_idx: int,
    subsample_min_frac: float = 0.3,
    rng: np.random.Generator | None = None,
) -> np.ndarray:
    """
    Dynamic bootstrap sampler (paper sec. 3.1.3).

    Sample size grows linearly from ``subsample_min_frac * n_samples`` at
    layer 0 to ``n_samples`` at the last layer.  This avoids over-fitting in
    early boosting steps while allowing the final layers to see the whole
    dataset.

    Parameters
    ----------
    n_samples : int
        Total number of training samples.
    n_layers : int
        Total number of boosting layers.
    layer_idx : int
        Index of the current layer (0-based).
    subsample_min_frac : float
        Minimum fraction of samples used at layer 0.
    rng : np.random.Generator or None
        Random number generator for reproducibility.

    Returns
    -------
    np.ndarray of shape (size,)
        Row indices to use for training.
    """
    if rng is None:
        rng = np.random.default_rng()

    min_size = max(1, int(subsample_min_frac * n_samples))
    if n_layers <= 1:
        size = n_samples
    else:
        size = int(
            min_size + (n_samples - min_size) * layer_idx / (n_layers - 1),
        )
        size = min(size, n_samples)

    return rng.choice(n_samples, size=size, replace=True)

deepgboost.common.utils.weight_solver(tree_pred, y_real, method='nnls', sample_weight=None)

Compute combination weights for the T bagged trees in a layer.

Parameters:

Name Type Description Default
tree_pred np.ndarray of shape (n_samples, n_trees)

Each column is one bagged tree's prediction on the full dataset.

required
y_real np.ndarray of shape (n_samples,)

Pseudo-residuals target (used only when method="nnls").

required
method (nnls, uniform)

"nnls" — Non-Negative Least Squares: solves min_w ||y - tree_pred @ w|| s.t. w >= 0, then normalises to sum(w) = 1. Gives each tree an optimal, data-driven weight. "uniform" — Equal weight 1/n_trees for every tree, exactly as in a standard RandomForest. Combined with n_layers=1 and learning_rate=1.0 this makes DeepGBoost mathematically equivalent to RandomForest.

"nnls"
sample_weight np.ndarray of shape (n_samples,) or None

Optional per-sample weights. When provided, both tree_pred and y_real are pre-multiplied by sqrt(sample_weight) before NNLS so the solver minimises the Hessian-weighted residual norm.

None

Returns:

Type Description
np.ndarray of shape (n_trees,)

Non-negative weights summing to 1.

Source code in src/deepgboost/common/utils.py
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
def weight_solver(
    tree_pred: np.ndarray,
    y_real: np.ndarray,
    method: str = "nnls",
    sample_weight: np.ndarray | None = None,
) -> np.ndarray:
    """
    Compute combination weights for the T bagged trees in a layer.

    Parameters
    ----------
    tree_pred : np.ndarray of shape (n_samples, n_trees)
        Each column is one bagged tree's prediction on the full dataset.
    y_real : np.ndarray of shape (n_samples,)
        Pseudo-residuals target (used only when ``method="nnls"``).
    method : {"nnls", "uniform"}
        ``"nnls"``    — Non-Negative Least Squares: solves
                        ``min_w ||y - tree_pred @ w||`` s.t. ``w >= 0``,
                        then normalises to ``sum(w) = 1``.  Gives each tree
                        an optimal, data-driven weight.
        ``"uniform"`` — Equal weight ``1/n_trees`` for every tree, exactly
                        as in a standard RandomForest.  Combined with
                        ``n_layers=1`` and ``learning_rate=1.0`` this makes
                        DeepGBoost mathematically equivalent to
                        RandomForest.
    sample_weight : np.ndarray of shape (n_samples,) or None
        Optional per-sample weights.  When provided, both ``tree_pred`` and
        ``y_real`` are pre-multiplied by ``sqrt(sample_weight)`` before NNLS
        so the solver minimises the Hessian-weighted residual norm.

    Returns
    -------
    np.ndarray of shape (n_trees,)
        Non-negative weights summing to 1.
    """
    n_outputs = tree_pred.shape[1]

    if method == "uniform":
        return np.full(n_outputs, 1.0 / n_outputs)

    if method == "nnls":
        if sample_weight is not None:
            sw = np.sqrt(np.clip(sample_weight, 1e-6, None))
            A = tree_pred * sw[:, np.newaxis]
            b = y_real * sw
        else:
            A, b = tree_pred, y_real
        weights, _ = nnls(A, b)
    else:
        raise ValueError(
            f"Unknown weight_solver method: '{method}'. "
            "Valid options are 'nnls' and 'uniform'.",
        )

    total = weights.sum()
    if total == 0.0:
        return np.full(n_outputs, 1.0 / n_outputs)
    return weights / total

deepgboost.common.utils.sigmoid(x)

Numerically stable sigmoid function.

Parameters:

Name Type Description Default
x ndarray

Input array of any shape.

required

Returns:

Type Description
ndarray

Element-wise sigmoid values in (0, 1).

Source code in src/deepgboost/common/utils.py
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
def sigmoid(
    x: np.ndarray,
) -> np.ndarray:
    """
    Numerically stable sigmoid function.

    Parameters
    ----------
    x : np.ndarray
        Input array of any shape.

    Returns
    -------
    np.ndarray
        Element-wise sigmoid values in (0, 1).
    """
    return np.where(
        x >= 0,
        1.0 / (1.0 + np.exp(-x)),
        np.exp(x) / (1.0 + np.exp(x)),
    )

deepgboost.common.utils.softmax(x, axis=-1)

Row-wise softmax with numerical stability via max subtraction.

Parameters:

Name Type Description Default
x ndarray

Input array of any shape.

required
axis int

Axis along which softmax is computed.

-1

Returns:

Type Description
ndarray

Array of the same shape as x with values in (0, 1) summing to 1 along axis.

Source code in src/deepgboost/common/utils.py
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
def softmax(
    x: np.ndarray,
    axis: int = -1,
) -> np.ndarray:
    """
    Row-wise softmax with numerical stability via max subtraction.

    Parameters
    ----------
    x : np.ndarray
        Input array of any shape.
    axis : int, default=-1
        Axis along which softmax is computed.

    Returns
    -------
    np.ndarray
        Array of the same shape as ``x`` with values in (0, 1) summing
        to 1 along ``axis``.
    """
    shifted = x - x.max(axis=axis, keepdims=True)
    exp_x = np.exp(shifted)
    return exp_x / exp_x.sum(axis=axis, keepdims=True)