scnn.solvers
Optimization methods for training neural networks by convex reformulation.
Notes
We only support the squared loss at the moment.
- class scnn.solvers.AL(model: Model, max_primal_iters: int = 10000, max_dual_iters: int = 10000, tol: float = 1e-06, constraint_tol: float = 1e-06, delta: float = 1000)
Augmented Lagrangian (AL) method for ReLU model.
This optimizer solves the convex ReLU training problem by forming the “augmented Lagrangian”,
\[\mathcal{L}(v, w, \gamma, \xi) = F(v,w) + \delta \sum_{D_i} \left[\|(\frac{\gamma_i}{\delta} - (2D_i - I)X v_i)_+\|_2^2 + \|(\frac{\xi_i}{\delta} - (2D_i - I)X v_i)_+\|_2^2 \right],\]where \(\delta > 0\) is the penalty strength, \((\gamma, \xi)\) are the dual parameters, and
\[F(v,w) = L\left(\sum_{D_i \in \mathcal{D}} D_i X (v_{i} - w_{i}), y\right) + \lambda R(v, w),\]is the regularized training loss.
The AL method alternates between the “primal” problem of minimizing \(\mathcal{L}(v, w, \gamma, \xi)\) in terms of \(v, w\) and the updates to the dual parameters, \(\gamma, \xi\). This procedure will return an primal-dual optimal pair \((v,w), (\gamma, \xi)\) in the (dual) iteration limit.
The solver will terminate early if an approximate KKT point is computed.
- model
the model to be optimized.
- max_primal_iters
the maximum number of iterations to run the primal optimization method before exiting.
- max_dual_iters
the maximum number of dual updates that will be performed before exiting.
- tol
the maximum squared l2-norm of the gradient of the Lagrangian function for the KKT conditions to be considered approximately satisfied.
- constraint_tol
the maximum violation of the constraints permitted for the KKT conditions to be considered approximately satisfied.
- delta
the initial penalty strength.
- class scnn.solvers.ApproximateConeDecomposition(model: Model, max_iters: int = 10000, tol: float = 1e-06, rho: float = 1e-10, d_max_iters: int = 10000, d_tol: float = 1e-10)
Two-step method for approximately optimizing ReLU models.
ConeDecomposition first solves the Gated ReLU problem,
\[\min_{u} L\left(\sum_{D_i \in \mathcal{D}} D_i X u_{i}), y\right) + \lambda R(u),\]and then decomposes the solution \(u^*\) onto the Minkowski differences \(K_i - K_i\) to approximate the ReLU training problem. The cone decomposition is approximated by solving
\[\min_{w} \sum_{D_i \in \mathcal{D}}\left[ \| \left(\tilde X_i w - \min\{\tilde X_i u^*, 0\}\right)_+\|_2^2 + \rho \|w_i\|_2^2 \right],\]where \(\tilde X_i = (2D_i - I) X\). The regularization \(\rho\) controls the quality of the approximation; taking \(\rho \rightarrow 0\) will return a feasible solution. A feasible solution is guaranteed to preserve the value of the loss \(L\), but can substantially blow-up the model norm. As such, it is only an approximation to the ReLU training problem when \(\lambda > 0\).
- model
the model that should be optimized.
- max_iters
the maximum number of iterations to run the R-FISTA sovler.
- tol
the tolerance for terminating R-FISTA early.
- rho
the strength of the penalty term in the decomposition program.
- d_max_iters
the maximum number of iterations for the decomposition program.
- d_tol
the tolerance for terminating the decomposition program.
- class scnn.solvers.CVXPYSolver(model: Model, solver, solver_kwargs={}, clean_sol=False)
Solve convex reformulations using CVXPY as a interface to different interior-point solvers.
CVXPY provides a framework for denoting and solving convex optimization problems. The framework is compatible with a variety of solvers (mainly interior point methods); see choosing a solver for details. Note that some solvers may require additional libraries and/or licences to be installed.
- model
the model that should be optimized.
- solver
the underlying solver to use with CVXPY.
- solver_kwargs
a dictionary of keyword arguments to be passed directly to the underlying solver.
- clean_sol
whether or not to clean the solution using a proximal-gradient step.
- Notes:
- This solver only supports computation on CPU. The user’s choice of
device will be overridden if necessary.
- class scnn.solvers.ExactConeDecomposition
Two-step method for approximately optimizing ReLU models.
ConeDecomposition first solves the Gated ReLU problem,
\[\min_{u} L\left(\sum_{D_i \in \mathcal{D}} D_i X u_{i}), y\right) + \lambda R(u),\]and then decomposes the solution \(u^*\) onto the Minkowski differences \(K_i - K_i\) to approximate the ReLU training problem. The cone decomposition is solved exactly using interior-point methods and CVXPY…
- class scnn.solvers.LeastSquaresSolver(model: Model, solver: str = 'lsmr', max_iters: int = 1000, tol=1e-06)
Direct solver for the unregularized or \(\ell_2\)-regularized Gated ReLU training problem.
The Gated ReLU problem with no regularization or \(\ell_2\) regularization is
\[F(u) = \frac{1}{2}\|\sum_{D_i \in \mathcal{D}} D_i X u_{i} - y\|_2^2 + \lambda \sum_{D_i \in \mathcal{D}} \|u_i\|_2^2.\]This is a convex quadratic problem equivalent to ridge regression which this optimizer solves using conjugate-gradient (CG) type methods. Either LSMR [1] or LSQR [2] can be used; implementations are provided by scipy.sparse.linalg. See SOL for details on these methods.
- model
the model that should be optimized.
- solver
the underlying CG-type solver to use.
- max_iters
the maximum number of iterations to run the optimization method.
- tol
the tolerance for terminating the optimization procedure early.
Notes
- This solver only supports computation on CPU. The user’s choice of
backend will be overridden if necessary.
- This solver only supports
L2
regularization or no regularization.
- This solver only supports
References
- [1] D. C.-L. Fong and M. A. Saunders, LSMR: An iterative algorithm
for sparse least-squares problems, SIAM J. Sci. Comput. 33:5, 2950-2971, published electronically Oct 27, 2011.
[2] C. C. Paige and M. A. Saunders, LSQR: An algorithm for sparse linear equations and sparse least squares, TOMS 8(1), 43-71 (1982)
- class scnn.solvers.Optimizer(model: Model)
Base class for optimizers.
- model
the model that should be optimized.
- class scnn.solvers.RFISTA(model: Model, max_iters: int = 10000, tol: float = 1e-06)
Fast proximal-gradient solver for Gated ReLU models.
This optimizer solves the convex Gated ReLU training problem by directly minimizing the convex reformulation,
\[F(u) = L\left(\sum_{D_i \in \mathcal{D}} D_i X u_{i}), y\right) + \lambda R(u),\]where \(L\) is a convex loss function, \(R\) is a regularizer, and \(\lambda\) is the regularization strength.
- model
the model to be optimized.
- max_iters
the maximum number of iterations to run the solver.
- tol
the tolerance for terminating the optimization procedure early; tol will be checked against the squared norm of the minimum-norm subgradient.