scnn.solvers

Optimization methods for training neural networks by convex reformulation.

Notes

We only support the squared loss at the moment.

class scnn.solvers.AL(model: Model, max_primal_iters: int = 10000, max_dual_iters: int = 10000, tol: float = 1e-06, constraint_tol: float = 1e-06, delta: float = 1000)

Augmented Lagrangian (AL) method for ReLU model.

This optimizer solves the convex ReLU training problem by forming the “augmented Lagrangian”,

\[\mathcal{L}(v, w, \gamma, \xi) = F(v,w) + \delta \sum_{D_i} \left[\|(\frac{\gamma_i}{\delta} - (2D_i - I)X v_i)_+\|_2^2 + \|(\frac{\xi_i}{\delta} - (2D_i - I)X v_i)_+\|_2^2 \right],\]

where \(\delta > 0\) is the penalty strength, \((\gamma, \xi)\) are the dual parameters, and

\[F(v,w) = L\left(\sum_{D_i \in \mathcal{D}} D_i X (v_{i} - w_{i}), y\right) + \lambda R(v, w),\]

is the regularized training loss.

The AL method alternates between the “primal” problem of minimizing \(\mathcal{L}(v, w, \gamma, \xi)\) in terms of \(v, w\) and the updates to the dual parameters, \(\gamma, \xi\). This procedure will return an primal-dual optimal pair \((v,w), (\gamma, \xi)\) in the (dual) iteration limit.

The solver will terminate early if an approximate KKT point is computed.

model: the model to be optimized.

max_primal_iters: the maximum number of iterations to run the primal optimization method before exiting.

max_dual_iters: the maximum number of dual updates that will be performed before exiting.

tol: the maximum squared l2-norm of the gradient of the Lagrangian function for the KKT conditions to be considered approximately satisfied.

constraint_tol: the maximum violation of the constraints permitted for the KKT conditions to be considered approximately satisfied.

delta: the initial penalty strength.

class scnn.solvers.ApproximateConeDecomposition(model: Model, max_iters: int = 10000, tol: float = 1e-06, rho: float = 1e-10, d_max_iters: int = 10000, d_tol: float = 1e-10)

Two-step method for approximately optimizing ReLU models.

ConeDecomposition first solves the Gated ReLU problem,

\[\min_{u} L\left(\sum_{D_i \in \mathcal{D}} D_i X u_{i}), y\right) + \lambda R(u),\]

and then decomposes the solution \(u^*\) onto the Minkowski differences \(K_i - K_i\) to approximate the ReLU training problem. The cone decomposition is approximated by solving

\[\min_{w} \sum_{D_i \in \mathcal{D}}\left[ \| \left(\tilde X_i w - \min\{\tilde X_i u^*, 0\}\right)_+\|_2^2 + \rho \|w_i\|_2^2 \right],\]

where \(\tilde X_i = (2D_i - I) X\). The regularization \(\rho\) controls the quality of the approximation; taking \(\rho \rightarrow 0\) will return a feasible solution. A feasible solution is guaranteed to preserve the value of the loss \(L\), but can substantially blow-up the model norm. As such, it is only an approximation to the ReLU training problem when \(\lambda > 0\).

model: the model that should be optimized.

max_iters: the maximum number of iterations to run the R-FISTA sovler.

tol: the tolerance for terminating R-FISTA early.

rho: the strength of the penalty term in the decomposition program.

d_max_iters: the maximum number of iterations for the decomposition program.

d_tol: the tolerance for terminating the decomposition program.

class scnn.solvers.CVXPYSolver(model: Model, solver, solver_kwargs={}, clean_sol=False)

Solve convex reformulations using CVXPY as a interface to different interior-point solvers.

CVXPY provides a framework for denoting and solving convex optimization problems. The framework is compatible with a variety of solvers (mainly interior point methods); see choosing a solver for details. Note that some solvers may require additional libraries and/or licences to be installed.

model: the model that should be optimized.

solver: the underlying solver to use with CVXPY.

solver_kwargs: a dictionary of keyword arguments to be passed directly to the underlying solver.

clean_sol

whether or not to clean the solution using a proximal-gradient step.

Notes:

This solver only supports computation on CPU. The user’s choice of
device will be overridden if necessary.

class scnn.solvers.ExactConeDecomposition

Two-step method for approximately optimizing ReLU models.

ConeDecomposition first solves the Gated ReLU problem,

\[\min_{u} L\left(\sum_{D_i \in \mathcal{D}} D_i X u_{i}), y\right) + \lambda R(u),\]

and then decomposes the solution \(u^*\) onto the Minkowski differences \(K_i - K_i\) to approximate the ReLU training problem. The cone decomposition is solved exactly using interior-point methods and CVXPY…

class scnn.solvers.LeastSquaresSolver(model: Model, solver: str = 'lsmr', max_iters: int = 1000, tol=1e-06)

Direct solver for the unregularized or \(\ell_2\)-regularized Gated ReLU training problem.

The Gated ReLU problem with no regularization or \(\ell_2\) regularization is

\[F(u) = \frac{1}{2}\|\sum_{D_i \in \mathcal{D}} D_i X u_{i} - y\|_2^2 + \lambda \sum_{D_i \in \mathcal{D}} \|u_i\|_2^2.\]

This is a convex quadratic problem equivalent to ridge regression which this optimizer solves using conjugate-gradient (CG) type methods. Either LSMR [1] or LSQR [2] can be used; implementations are provided by scipy.sparse.linalg. See SOL for details on these methods.

model: the model that should be optimized.

solver: the underlying CG-type solver to use.

max_iters: the maximum number of iterations to run the optimization method.

tol: the tolerance for terminating the optimization procedure early.

Notes

This solver only supports computation on CPU. The user’s choice of
backend will be overridden if necessary.
This solver only supports L2
regularization or no regularization.

References

[1] D. C.-L. Fong and M. A. Saunders, LSMR: An iterative algorithm: for sparse least-squares problems, SIAM J. Sci. Comput. 33:5, 2950-2971, published electronically Oct 27, 2011.

[2] C. C. Paige and M. A. Saunders, LSQR: An algorithm for sparse linear equations and sparse least squares, TOMS 8(1), 43-71 (1982)

class scnn.solvers.Optimizer(model: Model)

Base class for optimizers.

model: the model that should be optimized.

class scnn.solvers.RFISTA(model: Model, max_iters: int = 10000, tol: float = 1e-06)

Fast proximal-gradient solver for Gated ReLU models.

This optimizer solves the convex Gated ReLU training problem by directly minimizing the convex reformulation,

\[F(u) = L\left(\sum_{D_i \in \mathcal{D}} D_i X u_{i}), y\right) + \lambda R(u),\]

where \(L\) is a convex loss function, \(R\) is a regularizer, and \(\lambda\) is the regularization strength.

model: the model to be optimized.

max_iters: the maximum number of iterations to run the solver.

tol: the tolerance for terminating the optimization procedure early; tol will be checked against the squared norm of the minimum-norm subgradient.