scnn.models

Non-convex and convex formulations of two-layer neural networks.

Overview:

This module provides implementations of non-convex and convex formulations for two-layer ReLU and Gated ReLU networks. The difference between ReLU and Gated ReLU networks is the activation function; Gated ReLU networks use fixed “gate” vectors when computing the activation pattern while standard ReLU networks use the model parameters. Concretely, the prediction function for a two ReLU network is

\[h(X) = \sum_{i=1}^p (X W_{1i}^{\top})_+ \cdot W_{2i}^{\top},\]

where \(W_{1} \in \mathbb{R}^{p \times d}\) are the parameters of the first layer, and \(W_{2} \in \mathbb{R}^{c \times p}\) are the parameters of the second layer. In contrast, Gated ReLU networks predict as

\[h(X) = \sum_{i=1}^p \text{diag}(X g_i > 0) X W_{1i}^{\top} \cdot W_{2i}^{\top},\]

where the \(g_i\) vectors are fixed (ie. not learned) gates.

The convex reformulations of the ReLU and Gated ReLU models are obtained by enumerating the possible activation patterns \(D_i = \text{diag}(1(X g_i > 0))\). For a Gated ReLU model, the activations are exactly specified by the set of gate vectors, while for ReLU models the space of activation is much larger. Using a (possibly subsampled) set of activations \(\mathcal{D}\), the prediction function for the convex reformulation of a two-layer ReLU network can be written as

\[g(X) = \sum_{D_i \in \mathcal{D}}^m D_i X (v_{i} - w_{i}),\]

where \(v_i, w_i \in \mathbb{R}^{m \times d}\) are the model parameters. For Gated ReLU models, the convex reformulation is

\[g(X) = \sum_{i=1}^m \text{diag}(X g_i > 0) X U_{i},\]

where \(U \in \mathbb{R}^{m \times d}\) are the model parameters and \(g_i\) are the gate vectors from the non-convex model. For both convex reformulations, a one-vs-all strategy is used for the convex reformulation when the output dimension satisfies \(c > 1\).

class scnn.models.ConvexGatedReLU(G: ndarray, c: int = 1, bias: bool = False, G_bias: ndarray | None = None)

Convex reformulation of a Gated ReLU Network with two-layers.

This model has the prediction function

\[g(X) = \sum_{i=1}^m \text{diag}(X g_i > 0) X U_{1i}.\]

A one-vs-all strategy is used to extend the model to multi-dimensional targets.

c

the output dimension.

Type:

int

d

the input dimension.

Type:

int

p

the number of neurons.

Type:

int

bias

whether or not the model uses a bias term.

Type:

bool

G

the gate vectors for the Gated ReLU activation stored as a (d x p) matrix.

G_bias

an optional vector of biases for the gates.

parameters

the parameters of the model stored as a list of tensors.

Type:

List[numpy.ndarray]

get_parameters() List[ndarray]

Get the model parameters.

set_parameters(parameters: List[ndarray])

Set the model parameters.

This method safety checks the dimensionality of the new parameters.

Parameters:

parameters – the new model parameters.

class scnn.models.ConvexReLU(G: ndarray, c: int = 1, bias: bool = False, G_bias: ndarray | None = None)

Convex reformulation of a ReLU Network with two-layers.

This model has the prediction function

\[g(X) = \sum_{D_i \in \mathcal{D}}^m D_i X (v_{i} - w_{i}),\]
A one-vs-all strategy is used to extend the model to multi-dimensional

targets.

c

the output dimension.

Type:

int

d

the input dimension.

Type:

int

p

the number of neurons.

Type:

int

bias

whether or not the model uses a bias term.

Type:

bool

G

the gate vectors used to generate the activation patterns \(D_i\), stored as a (d x p) matrix.

G_bias

an optional vector of biases for the gates.

parameters

the parameters of the model stored as a list of two (c x p x d) matrices.

Type:

List[numpy.ndarray]

get_parameters() List[ndarray]

Get the model parameters.

set_parameters(parameters: List[ndarray])

Set the model parameters.

This method safety checks the dimensionality of the new parameters.

Parameters:

parameters – the new model parameters.

class scnn.models.GatedModel(G: ndarray, c: int, bias: bool = False, G_bias: ndarray | None = None)

Abstract class for models with fixed gate vectors.

c

the output dimension.

Type:

int

d

the input dimension.

Type:

int

p

the number of neurons. This is is always 1 for a linear model.

Type:

int

bias

whether or not the model uses a bias term.

Type:

bool

G

the gate vectors for the Gated ReLU activation stored as a (d x p) matrix.

G_bias

an optional vector of biases for the gates.

compute_activations(X: ndarray) ndarray

Compute activations for models with fixed gate vectors.

Parameters:

X – (n x d) matrix of input examples.

Returns:

(n x p) matrix of activation patterns.

Return type:

D

class scnn.models.LinearModel(d: int, c: int, bias: bool = False)

Basic linear model.

This model has the prediction function \(g(X) = X W^\top\), where \(W \in \mathbb{R}^{c \times d}\) is a matrix of weights.

c

the output dimension.

Type:

int

d

the input dimension.

Type:

int

p

the number of neurons. This is is always 1 for a linear model.

Type:

int

bias

whether or not the model uses a bias term.

Type:

bool

parameters

a list of NumPy arrays comprising the model parameters.

Type:

List[numpy.ndarray]

get_parameters() List[ndarray]

Get the model parameters.

set_parameters(parameters: List[ndarray])

Set the model parameters.

This method safety checks the dimensionality of the new parameters.

Parameters:

parameters – the new model parameters.

class scnn.models.Model

Base class for convex and non-convex models.

c

the output dimension.

Type:

int

d

the input dimension.

Type:

int

p

the number of neurons.

Type:

int

bias

whether or not the model uses a bias term.

Type:

bool

parameters

a list of NumPy arrays comprising the model parameters.

Type:

List[numpy.ndarray]

class scnn.models.NonConvexGatedReLU(G: ndarray, c: int = 1, bias: bool = False, G_bias: ndarray | None = None)

Convex reformulation of a Gated ReLU Network with two-layers.

This model has the prediction function

\[h(X) = \sum_{i=1}^m \text{diag}(X g_i > 0) X W_{1i} \cdot W_{2i},\]
c

the output dimension.

Type:

int

d

the input dimension.

Type:

int

p

the number of neurons.

Type:

int

bias

whether or not the model uses a bias term.

Type:

bool

G

the gate vectors for the Gated ReLU activation stored as a (d x p) matrix.

G_bias

an optional vector of biases for the gates.

parameters

the parameters of the model stored as a list of tensors.

Type:

List[numpy.ndarray]

get_parameters() List[ndarray]

Get the model parameters.

Returns: A list of model parameters.

set_parameters(parameters: List[ndarray])

Set the model parameters.

This method safety checks the dimensionality of the new parameters.

Parameters:

parameters – the new model parameters.

class scnn.models.NonConvexReLU(d: int, p: int, c: int = 1, bias: bool = False)

Convex reformulation of a ReLU Network with two-layers.

This model has the prediction function

\[h(X) = \sum_{i=1}^p (X W_{1i}^{\top})_+ \cdot W_{2i}^{\top},\]
c

the output dimension.

Type:

int

d

the input dimension.

Type:

int

p

the number of neurons.

Type:

int

bias

whether or not the model uses a bias term.

Type:

bool

parameters

the parameters of the model stored as a list of matrices with shapes: [(p x d), (c x p)]

Type:

List[numpy.ndarray]

get_parameters() List[ndarray]

Get the model parameters.

Returns: list of model parameters.

set_parameters(parameters: List[ndarray])

Set the model parameters.

This method safety checks the dimensionality of the new parameters.

Parameters:

parameters – the new model parameters.