scnn.models
Non-convex and convex formulations of two-layer neural networks.
- Overview:
This module provides implementations of non-convex and convex formulations for two-layer ReLU and Gated ReLU networks. The difference between ReLU and Gated ReLU networks is the activation function; Gated ReLU networks use fixed “gate” vectors when computing the activation pattern while standard ReLU networks use the model parameters. Concretely, the prediction function for a two ReLU network is
\[h(X) = \sum_{i=1}^p (X W_{1i}^{\top})_+ \cdot W_{2i}^{\top},\]where \(W_{1} \in \mathbb{R}^{p \times d}\) are the parameters of the first layer, and \(W_{2} \in \mathbb{R}^{c \times p}\) are the parameters of the second layer. In contrast, Gated ReLU networks predict as
\[h(X) = \sum_{i=1}^p \text{diag}(X g_i > 0) X W_{1i}^{\top} \cdot W_{2i}^{\top},\]where the \(g_i\) vectors are fixed (ie. not learned) gates.
The convex reformulations of the ReLU and Gated ReLU models are obtained by enumerating the possible activation patterns \(D_i = \text{diag}(1(X g_i > 0))\). For a Gated ReLU model, the activations are exactly specified by the set of gate vectors, while for ReLU models the space of activation is much larger. Using a (possibly subsampled) set of activations \(\mathcal{D}\), the prediction function for the convex reformulation of a two-layer ReLU network can be written as
\[g(X) = \sum_{D_i \in \mathcal{D}}^m D_i X (v_{i} - w_{i}),\]where \(v_i, w_i \in \mathbb{R}^{m \times d}\) are the model parameters. For Gated ReLU models, the convex reformulation is
\[g(X) = \sum_{i=1}^m \text{diag}(X g_i > 0) X U_{i},\]where \(U \in \mathbb{R}^{m \times d}\) are the model parameters and \(g_i\) are the gate vectors from the non-convex model. For both convex reformulations, a one-vs-all strategy is used for the convex reformulation when the output dimension satisfies \(c > 1\).
- class scnn.models.ConvexGatedReLU(G: ndarray, c: int = 1, bias: bool = False, G_bias: ndarray | None = None)
Convex reformulation of a Gated ReLU Network with two-layers.
This model has the prediction function
\[g(X) = \sum_{i=1}^m \text{diag}(X g_i > 0) X U_{1i}.\]A one-vs-all strategy is used to extend the model to multi-dimensional targets.
- c
the output dimension.
- Type:
int
- d
the input dimension.
- Type:
int
- p
the number of neurons.
- Type:
int
- bias
whether or not the model uses a bias term.
- Type:
bool
- G
the gate vectors for the Gated ReLU activation stored as a (d x p) matrix.
- G_bias
an optional vector of biases for the gates.
- parameters
the parameters of the model stored as a list of tensors.
- Type:
List[numpy.ndarray]
- get_parameters() List[ndarray]
Get the model parameters.
- set_parameters(parameters: List[ndarray])
Set the model parameters.
This method safety checks the dimensionality of the new parameters.
- Parameters:
parameters – the new model parameters.
- class scnn.models.ConvexReLU(G: ndarray, c: int = 1, bias: bool = False, G_bias: ndarray | None = None)
Convex reformulation of a ReLU Network with two-layers.
This model has the prediction function
\[g(X) = \sum_{D_i \in \mathcal{D}}^m D_i X (v_{i} - w_{i}),\]- A one-vs-all strategy is used to extend the model to multi-dimensional
targets.
- c
the output dimension.
- Type:
int
- d
the input dimension.
- Type:
int
- p
the number of neurons.
- Type:
int
- bias
whether or not the model uses a bias term.
- Type:
bool
- G
the gate vectors used to generate the activation patterns \(D_i\), stored as a (d x p) matrix.
- G_bias
an optional vector of biases for the gates.
- parameters
the parameters of the model stored as a list of two (c x p x d) matrices.
- Type:
List[numpy.ndarray]
- get_parameters() List[ndarray]
Get the model parameters.
- set_parameters(parameters: List[ndarray])
Set the model parameters.
This method safety checks the dimensionality of the new parameters.
- Parameters:
parameters – the new model parameters.
- class scnn.models.GatedModel(G: ndarray, c: int, bias: bool = False, G_bias: ndarray | None = None)
Abstract class for models with fixed gate vectors.
- c
the output dimension.
- Type:
int
- d
the input dimension.
- Type:
int
- p
the number of neurons. This is is always 1 for a linear model.
- Type:
int
- bias
whether or not the model uses a bias term.
- Type:
bool
- G
the gate vectors for the Gated ReLU activation stored as a (d x p) matrix.
- G_bias
an optional vector of biases for the gates.
- compute_activations(X: ndarray) ndarray
Compute activations for models with fixed gate vectors.
- Parameters:
X – (n x d) matrix of input examples.
- Returns:
(n x p) matrix of activation patterns.
- Return type:
D
- class scnn.models.LinearModel(d: int, c: int, bias: bool = False)
Basic linear model.
This model has the prediction function \(g(X) = X W^\top\), where \(W \in \mathbb{R}^{c \times d}\) is a matrix of weights.
- c
the output dimension.
- Type:
int
- d
the input dimension.
- Type:
int
- p
the number of neurons. This is is always 1 for a linear model.
- Type:
int
- bias
whether or not the model uses a bias term.
- Type:
bool
- parameters
a list of NumPy arrays comprising the model parameters.
- Type:
List[numpy.ndarray]
- get_parameters() List[ndarray]
Get the model parameters.
- set_parameters(parameters: List[ndarray])
Set the model parameters.
This method safety checks the dimensionality of the new parameters.
- Parameters:
parameters – the new model parameters.
- class scnn.models.Model
Base class for convex and non-convex models.
- c
the output dimension.
- Type:
int
- d
the input dimension.
- Type:
int
- p
the number of neurons.
- Type:
int
- bias
whether or not the model uses a bias term.
- Type:
bool
- parameters
a list of NumPy arrays comprising the model parameters.
- Type:
List[numpy.ndarray]
- class scnn.models.NonConvexGatedReLU(G: ndarray, c: int = 1, bias: bool = False, G_bias: ndarray | None = None)
Convex reformulation of a Gated ReLU Network with two-layers.
This model has the prediction function
\[h(X) = \sum_{i=1}^m \text{diag}(X g_i > 0) X W_{1i} \cdot W_{2i},\]- c
the output dimension.
- Type:
int
- d
the input dimension.
- Type:
int
- p
the number of neurons.
- Type:
int
- bias
whether or not the model uses a bias term.
- Type:
bool
- G
the gate vectors for the Gated ReLU activation stored as a (d x p) matrix.
- G_bias
an optional vector of biases for the gates.
- parameters
the parameters of the model stored as a list of tensors.
- Type:
List[numpy.ndarray]
- get_parameters() List[ndarray]
Get the model parameters.
Returns: A list of model parameters.
- set_parameters(parameters: List[ndarray])
Set the model parameters.
This method safety checks the dimensionality of the new parameters.
- Parameters:
parameters – the new model parameters.
- class scnn.models.NonConvexReLU(d: int, p: int, c: int = 1, bias: bool = False)
Convex reformulation of a ReLU Network with two-layers.
This model has the prediction function
\[h(X) = \sum_{i=1}^p (X W_{1i}^{\top})_+ \cdot W_{2i}^{\top},\]- c
the output dimension.
- Type:
int
- d
the input dimension.
- Type:
int
- p
the number of neurons.
- Type:
int
- bias
whether or not the model uses a bias term.
- Type:
bool
- parameters
the parameters of the model stored as a list of matrices with shapes: [(p x d), (c x p)]
- Type:
List[numpy.ndarray]
- get_parameters() List[ndarray]
Get the model parameters.
Returns: list of model parameters.
- set_parameters(parameters: List[ndarray])
Set the model parameters.
This method safety checks the dimensionality of the new parameters.
- Parameters:
parameters – the new model parameters.