scnn.solvers

Optimization methods for training neural networks by convex reformulation.

Notes

We only support the squared loss at the moment.

class scnn.solvers.AL(model: Model, max_primal_iters: int = 10000, max_dual_iters: int = 10000, tol: float = 1e-06, constraint_tol: float = 1e-06, delta: float = 1000)

Augmented Lagrangian (AL) method for ReLU model.

This optimizer solves the convex ReLU training problem by forming the “augmented Lagrangian”,

\[\mathcal{L}(v, w, \gamma, \xi) = F(v,w) + \delta \sum_{D_i} \left[\|(\frac{\gamma_i}{\delta} - (2D_i - I)X v_i)_+\|_2^2 + \|(\frac{\xi_i}{\delta} - (2D_i - I)X v_i)_+\|_2^2 \right],\]

where \(\delta > 0\) is the penalty strength, \((\gamma, \xi)\) are the dual parameters, and

\[F(v,w) = L\left(\sum_{D_i \in \mathcal{D}} D_i X (v_{i} - w_{i}), y\right) + \lambda R(v, w),\]

is the regularized training loss.

The AL method alternates between the “primal” problem of minimizing \(\mathcal{L}(v, w, \gamma, \xi)\) in terms of \(v, w\) and the updates to the dual parameters, \(\gamma, \xi\). This procedure will return an primal-dual optimal pair \((v,w), (\gamma, \xi)\) in the (dual) iteration limit.

The solver will terminate early if an approximate KKT point is computed.

model

the model to be optimized.

max_primal_iters

the maximum number of iterations to run the primal optimization method before exiting.

max_dual_iters

the maximum number of dual updates that will be performed before exiting.

tol

the maximum squared l2-norm of the gradient of the Lagrangian function for the KKT conditions to be considered approximately satisfied.

constraint_tol

the maximum violation of the constraints permitted for the KKT conditions to be considered approximately satisfied.

delta

the initial penalty strength.

class scnn.solvers.ApproximateConeDecomposition(model: Model, max_iters: int = 10000, tol: float = 1e-06, rho: float = 1e-10, d_max_iters: int = 10000, d_tol: float = 1e-10)

Two-step method for approximately optimizing ReLU models.

ConeDecomposition first solves the Gated ReLU problem,

\[\min_{u} L\left(\sum_{D_i \in \mathcal{D}} D_i X u_{i}), y\right) + \lambda R(u),\]

and then decomposes the solution \(u^*\) onto the Minkowski differences \(K_i - K_i\) to approximate the ReLU training problem. The cone decomposition is approximated by solving

\[\min_{w} \sum_{D_i \in \mathcal{D}}\left[ \| \left(\tilde X_i w - \min\{\tilde X_i u^*, 0\}\right)_+\|_2^2 + \rho \|w_i\|_2^2 \right],\]

where \(\tilde X_i = (2D_i - I) X\). The regularization \(\rho\) controls the quality of the approximation; taking \(\rho \rightarrow 0\) will return a feasible solution. A feasible solution is guaranteed to preserve the value of the loss \(L\), but can substantially blow-up the model norm. As such, it is only an approximation to the ReLU training problem when \(\lambda > 0\).

model

the model that should be optimized.

max_iters

the maximum number of iterations to run the R-FISTA sovler.

tol

the tolerance for terminating R-FISTA early.

rho

the strength of the penalty term in the decomposition program.

d_max_iters

the maximum number of iterations for the decomposition program.

d_tol

the tolerance for terminating the decomposition program.

class scnn.solvers.CVXPYSolver(model: Model, solver, solver_kwargs={}, clean_sol=False)

Solve convex reformulations using CVXPY as a interface to different interior-point solvers.

CVXPY provides a framework for denoting and solving convex optimization problems. The framework is compatible with a variety of solvers (mainly interior point methods); see choosing a solver for details. Note that some solvers may require additional libraries and/or licences to be installed.

model

the model that should be optimized.

solver

the underlying solver to use with CVXPY.

solver_kwargs

a dictionary of keyword arguments to be passed directly to the underlying solver.

clean_sol

whether or not to clean the solution using a proximal-gradient step.

Notes:
  • This solver only supports computation on CPU. The user’s choice of

    device will be overridden if necessary.

class scnn.solvers.ExactConeDecomposition

Two-step method for approximately optimizing ReLU models.

ConeDecomposition first solves the Gated ReLU problem,

\[\min_{u} L\left(\sum_{D_i \in \mathcal{D}} D_i X u_{i}), y\right) + \lambda R(u),\]

and then decomposes the solution \(u^*\) onto the Minkowski differences \(K_i - K_i\) to approximate the ReLU training problem. The cone decomposition is solved exactly using interior-point methods and CVXPY…

class scnn.solvers.LeastSquaresSolver(model: Model, solver: str = 'lsmr', max_iters: int = 1000, tol=1e-06)

Direct solver for the unregularized or \(\ell_2\)-regularized Gated ReLU training problem.

The Gated ReLU problem with no regularization or \(\ell_2\) regularization is

\[F(u) = \frac{1}{2}\|\sum_{D_i \in \mathcal{D}} D_i X u_{i} - y\|_2^2 + \lambda \sum_{D_i \in \mathcal{D}} \|u_i\|_2^2.\]

This is a convex quadratic problem equivalent to ridge regression which this optimizer solves using conjugate-gradient (CG) type methods. Either LSMR [1] or LSQR [2] can be used; implementations are provided by scipy.sparse.linalg. See SOL for details on these methods.

model

the model that should be optimized.

solver

the underlying CG-type solver to use.

max_iters

the maximum number of iterations to run the optimization method.

tol

the tolerance for terminating the optimization procedure early.

Notes

  • This solver only supports computation on CPU. The user’s choice of

    backend will be overridden if necessary.

  • This solver only supports L2

    regularization or no regularization.

References

[1] D. C.-L. Fong and M. A. Saunders, LSMR: An iterative algorithm

for sparse least-squares problems, SIAM J. Sci. Comput. 33:5, 2950-2971, published electronically Oct 27, 2011.

[2] C. C. Paige and M. A. Saunders, LSQR: An algorithm for sparse linear equations and sparse least squares, TOMS 8(1), 43-71 (1982)

class scnn.solvers.Optimizer(model: Model)

Base class for optimizers.

model

the model that should be optimized.

class scnn.solvers.RFISTA(model: Model, max_iters: int = 10000, tol: float = 1e-06)

Fast proximal-gradient solver for Gated ReLU models.

This optimizer solves the convex Gated ReLU training problem by directly minimizing the convex reformulation,

\[F(u) = L\left(\sum_{D_i \in \mathcal{D}} D_i X u_{i}), y\right) + \lambda R(u),\]

where \(L\) is a convex loss function, \(R\) is a regularizer, and \(\lambda\) is the regularization strength.

model

the model to be optimized.

max_iters

the maximum number of iterations to run the solver.

tol

the tolerance for terminating the optimization procedure early; tol will be checked against the squared norm of the minimum-norm subgradient.