scnn.models

Non-convex and convex formulations of two-layer neural networks.

Overview:

This module provides implementations of non-convex and convex formulations for two-layer ReLU and Gated ReLU networks. The difference between ReLU and Gated ReLU networks is the activation function; Gated ReLU networks use fixed “gate” vectors when computing the activation pattern while standard ReLU networks use the model parameters. Concretely, the prediction function for a two ReLU network is

\[h(X) = \sum_{i=1}^p (X W_{1i}^{\top})_+ \cdot W_{2i}^{\top},\]

where \(W_{1} \in \mathbb{R}^{p \times d}\) are the parameters of the first layer, and \(W_{2} \in \mathbb{R}^{c \times p}\) are the parameters of the second layer. In contrast, Gated ReLU networks predict as

\[h(X) = \sum_{i=1}^p \text{diag}(X g_i > 0) X W_{1i}^{\top} \cdot W_{2i}^{\top},\]

where the \(g_i\) vectors are fixed (ie. not learned) gates.

The convex reformulations of the ReLU and Gated ReLU models are obtained by enumerating the possible activation patterns \(D_i = \text{diag}(1(X g_i > 0))\). For a Gated ReLU model, the activations are exactly specified by the set of gate vectors, while for ReLU models the space of activation is much larger. Using a (possibly subsampled) set of activations \(\mathcal{D}\), the prediction function for the convex reformulation of a two-layer ReLU network can be written as

\[g(X) = \sum_{D_i \in \mathcal{D}}^m D_i X (v_{i} - w_{i}),\]

where \(v_i, w_i \in \mathbb{R}^{m \times d}\) are the model parameters. For Gated ReLU models, the convex reformulation is

\[g(X) = \sum_{i=1}^m \text{diag}(X g_i > 0) X U_{i},\]

where \(U \in \mathbb{R}^{m \times d}\) are the model parameters and \(g_i\) are the gate vectors from the non-convex model. For both convex reformulations, a one-vs-all strategy is used for the convex reformulation when the output dimension satisfies \(c > 1\).

class scnn.models.ConvexGatedReLU(G: ndarray, c: int = 1, bias: bool = False, G_bias: ndarray | None = None)

Convex reformulation of a Gated ReLU Network with two-layers.

This model has the prediction function

\[g(X) = \sum_{i=1}^m \text{diag}(X g_i > 0) X U_{1i}.\]

A one-vs-all strategy is used to extend the model to multi-dimensional targets.

c

the output dimension.

Type:: int

d

the input dimension.

Type:: int

p

the number of neurons.

Type:: int

bias

whether or not the model uses a bias term.

Type:: bool

G: the gate vectors for the Gated ReLU activation stored as a (d x p) matrix.

G_bias: an optional vector of biases for the gates.

parameters

the parameters of the model stored as a list of tensors.

Type:: List[numpy.ndarray]

get_parameters() → List[ndarray]: Get the model parameters.

set_parameters(parameters: List[ndarray])

Set the model parameters.

This method safety checks the dimensionality of the new parameters.

Parameters:: parameters – the new model parameters.

class scnn.models.ConvexReLU(G: ndarray, c: int = 1, bias: bool = False, G_bias: ndarray | None = None)

Convex reformulation of a ReLU Network with two-layers.

This model has the prediction function

\[g(X) = \sum_{D_i \in \mathcal{D}}^m D_i X (v_{i} - w_{i}),\]

A one-vs-all strategy is used to extend the model to multi-dimensional: targets.

c

the output dimension.

Type:: int

d

the input dimension.

Type:: int

p

the number of neurons.

Type:: int

bias

whether or not the model uses a bias term.

Type:: bool

G: the gate vectors used to generate the activation patterns \(D_i\), stored as a (d x p) matrix.

G_bias: an optional vector of biases for the gates.

parameters

the parameters of the model stored as a list of two (c x p x d) matrices.

Type:: List[numpy.ndarray]

get_parameters() → List[ndarray]: Get the model parameters.

set_parameters(parameters: List[ndarray])

Set the model parameters.

This method safety checks the dimensionality of the new parameters.

Parameters:: parameters – the new model parameters.

class scnn.models.GatedModel(G: ndarray, c: int, bias: bool = False, G_bias: ndarray | None = None)

Abstract class for models with fixed gate vectors.

c

the output dimension.

Type:: int

d

the input dimension.

Type:: int

p

the number of neurons. This is is always 1 for a linear model.

Type:: int

bias

whether or not the model uses a bias term.

Type:: bool

G: the gate vectors for the Gated ReLU activation stored as a (d x p) matrix.

G_bias: an optional vector of biases for the gates.

compute_activations(X: ndarray) → ndarray

Compute activations for models with fixed gate vectors.

Parameters:: X – (n x d) matrix of input examples.
Returns:: (n x p) matrix of activation patterns.
Return type:: D

class scnn.models.LinearModel(d: int, c: int, bias: bool = False)

Basic linear model.

This model has the prediction function \(g(X) = X W^\top\), where \(W \in \mathbb{R}^{c \times d}\) is a matrix of weights.

c

the output dimension.

Type:: int

d

the input dimension.

Type:: int

p

the number of neurons. This is is always 1 for a linear model.

Type:: int

bias

whether or not the model uses a bias term.

Type:: bool

parameters

a list of NumPy arrays comprising the model parameters.

Type:: List[numpy.ndarray]

get_parameters() → List[ndarray]: Get the model parameters.

set_parameters(parameters: List[ndarray])

Set the model parameters.

This method safety checks the dimensionality of the new parameters.

Parameters:: parameters – the new model parameters.

class scnn.models.Model

Base class for convex and non-convex models.

c

the output dimension.

Type:: int

d

the input dimension.

Type:: int

p

the number of neurons.

Type:: int

bias

whether or not the model uses a bias term.

Type:: bool

parameters

a list of NumPy arrays comprising the model parameters.

Type:: List[numpy.ndarray]

class scnn.models.NonConvexGatedReLU(G: ndarray, c: int = 1, bias: bool = False, G_bias: ndarray | None = None)

Convex reformulation of a Gated ReLU Network with two-layers.

This model has the prediction function

\[h(X) = \sum_{i=1}^m \text{diag}(X g_i > 0) X W_{1i} \cdot W_{2i},\]

c

the output dimension.

Type:: int

d

the input dimension.

Type:: int

p

the number of neurons.

Type:: int

bias

whether or not the model uses a bias term.

Type:: bool

G: the gate vectors for the Gated ReLU activation stored as a (d x p) matrix.

G_bias: an optional vector of biases for the gates.

parameters

the parameters of the model stored as a list of tensors.

Type:: List[numpy.ndarray]

get_parameters() → List[ndarray]

Get the model parameters.

Returns: A list of model parameters.

set_parameters(parameters: List[ndarray])

Set the model parameters.

This method safety checks the dimensionality of the new parameters.

Parameters:: parameters – the new model parameters.

class scnn.models.NonConvexReLU(d: int, p: int, c: int = 1, bias: bool = False)

Convex reformulation of a ReLU Network with two-layers.

This model has the prediction function

\[h(X) = \sum_{i=1}^p (X W_{1i}^{\top})_+ \cdot W_{2i}^{\top},\]

c

the output dimension.

Type:: int

d

the input dimension.

Type:: int

p

the number of neurons.

Type:: int

bias

whether or not the model uses a bias term.

Type:: bool

parameters

the parameters of the model stored as a list of matrices with shapes: [(p x d), (c x p)]

Type:: List[numpy.ndarray]

get_parameters() → List[ndarray]

Get the model parameters.

Returns: list of model parameters.

set_parameters(parameters: List[ndarray])

Set the model parameters.

This method safety checks the dimensionality of the new parameters.

Parameters:: parameters – the new model parameters.