Tensor Train Optimization
ttml.tt_rlinesearch
Linesearch for performing tensor completion with TensorTrains
- ttml.tt_rlinesearch.TTLS
- class ttml.tt_rlinesearch.TensorTrainLineSearch(tt, y, idx, task='regression', sample_weight=None, red_idx=None, cg_method='fr', line_search_method='armijo', line_search_params=None, memory=1, max_stepsize=None, min_initial_stepsize=1e-06, initial_stepsize_method='bb1', last_step_size=None, default_stepsize=None, auto_scale=False, **kwargs)[source]
Implements Riemannian conjugate gradient descent with linesearch for tensor train.
- Parameters
tt (TensorTrain) – TensorTrain to be optimized. During optimization it will be copied, not modified.
y (array<float64>) – Target values. Should be flat array with same backend as tt.
idx (array<int64> shape (len(y),tt.order)) – Indices of dense tensor corresponding to values y. Potential duplicate values in idx are automatically merged.
task (str (default: “regression”)) –
Whether to perform regression or binary classification.
If task=”regression then MSE is minimized.
If task=”classification”. The labels are assumed to be 0 or 1, and cross entropy is minimized. Note that predictions of the classifier will be on the logit scale, only the objective changes.
sample_weight (array<float64> or None (default: None)) – Weights associated to all sample points. If None, use unit weight.
cg_method (str (default: “fr”)) – Which conjugate gradient method to use. Currently supported are ‘fr’ (Fletcher-Reeves), ‘sd’ (steepest descent).
line_search_method (str (default: “armijo”)) – Which line search method to use. Supported are “armijo”, “wolfe” (strong Wolfe conditions) and “tnc” (exact line search using TNC, mainly for debugging.)
line_search_params (None or dict (default None)) – Extra kwargs to pass to the line search method.
memory (int (default: 1)) – With memory > 1, perform nonmonotone line search with memory steps of memory. Using memory > 1 with Wolfe line search is not properly supported.
max_stepsize (int or None (default: None)) – Maximum stepsize to be taken.
initial_stepsize_method (str (default: "bb1")) –
Method to compute initial stepsize for backtracking. Allowed values:
”bb1” : Riemannian Barzilai-Borwein stepsize of first type.
”bb2” : Riemannian Barzilai-Borwein stepsize of second type.
”qopt” : Quasi-optimal Riemannian CG stepsize, as proposed by Steinlechner.
”scalar” : take twice the difference between current and previous loss value, divided by the derivative of the line search function.
last_step_size (int or None) – Last step size taken, to be used for a warm start of the optimizer.
default_stepsize (float (default: 1.0)) – Fall back default stepsize
auto_scale (bool (default False)) – Use the first gradient norms to estimate the scale of the step size. Only affects first step size. This is useful if optimal stepsize is particularly large.
- bb_stepsize(bb_type=1)[source]
Riemannian Barzilai-Borwein stepsize.
There are two variants of the BB stepsize, this is controlled by the argument bb_type. If the stepsize cannot be computed, or would be excessively high, default_stepsize is returned instead.
- cg_fletcher_reeves()[source]
Fletcher-Reeves scale parameter.
This is given by
\[\beta_{k+1}^{FR} = \frac{\langle\nabla f(x_{k+1}),\, \nabla f(x_{k+1})\rangle_{x_{k+1}}} {\langle\nabla f(x_k),\nabla f(x_k)\rangle_{x_k}}\]
- plot_linesearch(alpha0=1.0, alpha_max=None, c1=1.3, c2=0.5, plot_points=20)[source]
Return arrays to plot linesearch objective for debugging.
- Parameters
alpha0 (initial point) –
alpha_max (float or None) – If specified, skip finding a good alpha
c1 (factor to increase alpha by every step if derivative negative) –
c2 (factor to decrease alpha by every step if derivative positive) –
plot_points (number of points returned) –
- Returns
plot_X (X positions of plot points)
phis (values of objective at plot_X)
der_phis (derivative of objective at plot_X)
- quasi_optimal_stepsize(derphi0, default_stepsize=1.0)[source]
Compute the quasi-optimal stepsize based on a linearization.
The formula for this is
\[-\phi'(0) / \|\eta\|^2\]With \(\eta\) the search direction, \(\phi'(0) = \langle\eta,\nabla f\rangle\) and the derivative of the line search objective
- Returns
step_size
- Return type
float
- step(try_armijo=False, try_sd=False)[source]
Perform a step. Replaces self.tt updated tt.
- Parameters
try_armijo (bool, default=False) – Force using armijo linesearch. This is called if wolfe line search fails
try_sd (bool, default=False) – If armijo also fails, it may be because the search direction is bad, so we use this to force steepest descent direction.
- Returns
phi0 (float) – New value of the loss function
derphi0 (float) – Derivative of loss function in search direction (at beginning of step)
step_size (float) – Size of step taken
- ttml.tt_rlinesearch.armijo_backtracking(phi, derphi, phi0=None, derphi0=None, old_phi0=None, c1=0.0001, amin=0, amax=None, old_step_size=1.0, alpha0=None, **kwargs)[source]
Scalar line search method to find step size satisfying Armijo conditions.
- Parameters
c1 (float, optional) – Parameter for Armijo condition rule.
amax (float, optional) – Maxmimum and minimum step size
amin (float, optional) – Maxmimum and minimum step size
- ttml.tt_rlinesearch.strong_wolfe_line_search(phi, derphi, phi0=None, old_phi0=None, derphi0=None, c1=0.0001, c2=0.9, amax=None, **kwargs)[source]
Scalar line search method to find step size satisfying strong Wolfe conditions.
- Parameters
c1 (float, optional) – Parameter for Armijo condition rule.
c2 (float, optional) – Parameter for curvature condition rule.
amax (float, optional) – Maximum step size
- Returns
step_size – The next step size
- Return type
float
ttml.tt_opt
Meta optimizer class for Tensor Trains
- class ttml.tt_opt.TensorTrainOptimizer(tt, y, idx, task='regression', sample_weight=None, red_idx=None, task_kwargs=None, **kwargs)[source]
- Parameters
tt (TensorTrain) – TensorTrain to be optimized. During optimization it will be copied, not modified.
y (array<float64>) – Target values. Should be flat array with same backend as tt.
idx (array<int64> shape (len(y),tt.order)) – Indices of dense tensor corresponding to values y. Potential duplicate values in idx are automatically merged.
task (str (default: “regression”)) –
Whether to perform regression or binary classification.
If task=”regression then MSE is minimized.
If task=”classification”. The labels are assumed to be 0 or 1, and cross entropy is minimized. Note that predictions of the classifier will be on the logit scale, only the objective changes.
sample_weight (array<float64> or None (default: None)) – Weights associated to all sample points. If None, use unit weight.
red_idx (array<int, int> or None) – Unique indices of idx in lexicographic order. If None (default) this is computed.
task_kwargs (dict (optional)) – Dictionary of keyword arguments to always be passed to the loss and egrad functions.
- egrad(tt=None, y=None, idx=None, sample_weight=None, normalize=False, merge=True, **kwargs)[source]
Compute the sparse Euclidean gradient and loss at current point.
Keyword arguments are the same as self.loss, except normalize=False by default.
- Returns
loss (float)
egrad (array<float>) – Array with length the number of unique entries in idx. Corresponds to sparse Euclidean tangent vector, with indices obtained by a lexical sort applied to idx. See also
utils.merge_sum().
- loss(tt=None, y=None, idx=None, sample_weight=None, normalize=True, **kwargs)[source]
Compute the loss at current point.
- Returns
loss
- Return type
float
- Parameters
tt (TensorTrain or None) – Compute loss at tt instead. If None (default) then compute at self.tt.
y (array<float> or None) – Target labels for loss. If None (default), use self.y
idx (array<int, int> or None) – Tensor indices to use for loss. If None (default) use self.idx
sample_weight (array<float> or None) – Array same shape as y giving sample weights. If None, use weight 1 for each entry
normalize (bool (default: True)) – Divide loss function by number of samples (or sum of sample_weight) if True
ttml.tt_radam
Implements Riemannian ADAM stochastic gradient descent algorithm for tensor trains.
This is an implementation of the algorithm in the paper ‘Riemannian Adaptive Optimization Methods’ by Becigneul and Ganea
- class ttml.tt_radam.TensorTrainSGD(tt, y, idx, batch_size, lr=1.0, beta1=0.9, beta2=0.9, task='regression', sample_weight=None, red_idx=None, **kwargs)[source]
Riemannian Adam optimizer for tensor trains.
- Parameters
tt (TensorTrain) – TensorTrain to be optimized. During optimization it will be copied, not modified.
y (array<float64>) – Target values. Should be flat array with same backend as tt.
idx (array<int64> shape (len(y),tt.order)) – Indices of dense tensor corresponding to values y. Potential duplicate values in idx are automatically merged.
batch_size (int) –
lr (float (default: 1.0)) – Learning rate, needs to be tuned for each problem.
beta1 (float (default: 0.9)) – parameter between 0 and 1 determining the contribution of transport of previous search direction to current search direction
beta2 (float (default: 0.9)) – parameter between 0 and 1 determining contribution of previous gradient norms to the stepsize.
task (str (default: “regression”)) –
Whether to perform regression or binary classification.
If task=”regression then MSE is minimized.
If task=”classification”. The labels are assumed to be 0 or 1, and cross entropy is minimized. Note that predictions of the classifier will be on the logit scale, only the objective changes.
sample_weight (array<float64> or None (default: None)) – Weights associated to all sample points. If None, use unit weight.