Tensor Train Optimization

ttml.tt_rlinesearch

Linesearch for performing tensor completion with TensorTrains

ttml.tt_rlinesearch.TTLS: alias of ttml.tt_rlinesearch.TensorTrainLineSearch

class ttml.tt_rlinesearch.TensorTrainLineSearch(tt, y, idx, task='regression', sample_weight=None, red_idx=None, cg_method='fr', line_search_method='armijo', line_search_params=None, memory=1, max_stepsize=None, min_initial_stepsize=1e-06, initial_stepsize_method='bb1', last_step_size=None, default_stepsize=None, auto_scale=False, **kwargs)[source]

Implements Riemannian conjugate gradient descent with linesearch for tensor train.

Parameters

tt (TensorTrain) – TensorTrain to be optimized. During optimization it will be copied, not modified.
y (array<float64>) – Target values. Should be flat array with same backend as tt.
idx (array<int64> shape (len(y),tt.order)) – Indices of dense tensor corresponding to values y. Potential duplicate values in idx are automatically merged.
task (str (default: “regression”)) –
Whether to perform regression or binary classification.
- If task=”regression then MSE is minimized.
- If task=”classification”. The labels are assumed to be 0 or 1, and cross entropy is minimized. Note that predictions of the classifier will be on the logit scale, only the objective changes.
sample_weight (array<float64> or None (default: None)) – Weights associated to all sample points. If None, use unit weight.
cg_method (str (default: “fr”)) – Which conjugate gradient method to use. Currently supported are ‘fr’ (Fletcher-Reeves), ‘sd’ (steepest descent).
line_search_method (str (default: “armijo”)) – Which line search method to use. Supported are “armijo”, “wolfe” (strong Wolfe conditions) and “tnc” (exact line search using TNC, mainly for debugging.)
line_search_params (None or dict (default None)) – Extra kwargs to pass to the line search method.
memory (int (default: 1)) – With memory > 1, perform nonmonotone line search with memory steps of memory. Using memory > 1 with Wolfe line search is not properly supported.
max_stepsize (int or None (default: None)) – Maximum stepsize to be taken.
initial_stepsize_method (str (default: "bb1")) –
Method to compute initial stepsize for backtracking. Allowed values:
- ”bb1” : Riemannian Barzilai-Borwein stepsize of first type.
- ”bb2” : Riemannian Barzilai-Borwein stepsize of second type.
- ”qopt” : Quasi-optimal Riemannian CG stepsize, as proposed by Steinlechner.
- ”scalar” : take twice the difference between current and previous loss value, divided by the derivative of the line search function.
last_step_size (int or None) – Last step size taken, to be used for a warm start of the optimizer.
default_stepsize (float (default: 1.0)) – Fall back default stepsize
auto_scale (bool (default False)) – Use the first gradient norms to estimate the scale of the step size. Only affects first step size. This is useful if optimal stepsize is particularly large.

bb_stepsize(bb_type=1)[source]

Riemannian Barzilai-Borwein stepsize.

There are two variants of the BB stepsize, this is controlled by the argument bb_type. If the stepsize cannot be computed, or would be excessively high, default_stepsize is returned instead.

cg_constant()[source]: Constant scale parameter. If alpha=0, then this is steepest descent.

cg_fletcher_reeves()[source]

Fletcher-Reeves scale parameter.

This is given by

\[\beta_{k+1}^{FR} = \frac{\langle\nabla f(x_{k+1}),\, \nabla f(x_{k+1})\rangle_{x_{k+1}}} {\langle\nabla f(x_k),\nabla f(x_k)\rangle_{x_k}}\]

plot_linesearch(alpha0=1.0, alpha_max=None, c1=1.3, c2=0.5, plot_points=20)[source]

Return arrays to plot linesearch objective for debugging.

Parameters

alpha0 (initial point) –
alpha_max (float or None) – If specified, skip finding a good alpha
c1 (factor to increase alpha by every step if derivative negative) –
c2 (factor to decrease alpha by every step if derivative positive) –
plot_points (number of points returned) –

Returns

plot_X (X positions of plot points)
phis (values of objective at plot_X)
der_phis (derivative of objective at plot_X)

quasi_optimal_stepsize(derphi0, default_stepsize=1.0)[source]

Compute the quasi-optimal stepsize based on a linearization.

The formula for this is

\[-\phi'(0) / \|\eta\|^2\]

With \(\eta\) the search direction, \(\phi'(0) = \langle\eta,\nabla f\rangle\) and the derivative of the line search objective

Returns: step_size
Return type: float

step(try_armijo=False, try_sd=False)[source]

Perform a step. Replaces self.tt updated tt.

Parameters

try_armijo (bool, default=False) – Force using armijo linesearch. This is called if wolfe line search fails
try_sd (bool, default=False) – If armijo also fails, it may be because the search direction is bad, so we use this to force steepest descent direction.

Returns

phi0 (float) – New value of the loss function
derphi0 (float) – Derivative of loss function in search direction (at beginning of step)
step_size (float) – Size of step taken

ttml.tt_rlinesearch.armijo_backtracking(phi, derphi, phi0=None, derphi0=None, old_phi0=None, c1=0.0001, amin=0, amax=None, old_step_size=1.0, alpha0=None, **kwargs)[source]

Scalar line search method to find step size satisfying Armijo conditions.

Parameters

c1 (float, optional) – Parameter for Armijo condition rule.
amax (float, optional) – Maxmimum and minimum step size
amin (float, optional) – Maxmimum and minimum step size

ttml.tt_rlinesearch.strong_wolfe_line_search(phi, derphi, phi0=None, old_phi0=None, derphi0=None, c1=0.0001, c2=0.9, amax=None, **kwargs)[source]

Scalar line search method to find step size satisfying strong Wolfe conditions.

Parameters

c1 (float, optional) – Parameter for Armijo condition rule.
c2 (float, optional) – Parameter for curvature condition rule.
amax (float, optional) – Maximum step size

Returns

step_size – The next step size

Return type

float

ttml.tt_opt

Meta optimizer class for Tensor Trains

class ttml.tt_opt.TensorTrainOptimizer(tt, y, idx, task='regression', sample_weight=None, red_idx=None, task_kwargs=None, **kwargs)[source]

Parameters

tt (TensorTrain) – TensorTrain to be optimized. During optimization it will be copied, not modified.
y (array<float64>) – Target values. Should be flat array with same backend as tt.
idx (array<int64> shape (len(y),tt.order)) – Indices of dense tensor corresponding to values y. Potential duplicate values in idx are automatically merged.
task (str (default: “regression”)) –
Whether to perform regression or binary classification.
- If task=”regression then MSE is minimized.
- If task=”classification”. The labels are assumed to be 0 or 1, and cross entropy is minimized. Note that predictions of the classifier will be on the logit scale, only the objective changes.
sample_weight (array<float64> or None (default: None)) – Weights associated to all sample points. If None, use unit weight.
red_idx (array<int, int> or None) – Unique indices of idx in lexicographic order. If None (default) this is computed.
task_kwargs (dict (optional)) – Dictionary of keyword arguments to always be passed to the loss and egrad functions.

egrad(tt=None, y=None, idx=None, sample_weight=None, normalize=False, merge=True, **kwargs)[source]

Compute the sparse Euclidean gradient and loss at current point.

Keyword arguments are the same as self.loss, except normalize=False by default.

Returns

loss (float)
egrad (array<float>) – Array with length the number of unique entries in idx. Corresponds to sparse Euclidean tangent vector, with indices obtained by a lexical sort applied to idx. See also utils.merge_sum().

loss(tt=None, y=None, idx=None, sample_weight=None, normalize=True, **kwargs)[source]

Compute the loss at current point.

Returns

loss

Return type

float

Parameters

tt (TensorTrain or None) – Compute loss at tt instead. If None (default) then compute at self.tt.
y (array<float> or None) – Target labels for loss. If None (default), use self.y
idx (array<int, int> or None) – Tensor indices to use for loss. If None (default) use self.idx
sample_weight (array<float> or None) – Array same shape as y giving sample weights. If None, use weight 1 for each entry
normalize (bool (default: True)) – Divide loss function by number of samples (or sum of sample_weight) if True

step()[source]

Do a step.

This should return loss at new point, inner product between search-direction and gradient, and step size of new step.

Should be implemented by the inhereting class.

ttml.tt_radam

Implements Riemannian ADAM stochastic gradient descent algorithm for tensor trains.

This is an implementation of the algorithm in the paper ‘Riemannian Adaptive Optimization Methods’ by Becigneul and Ganea

class ttml.tt_radam.TensorTrainSGD(tt, y, idx, batch_size, lr=1.0, beta1=0.9, beta2=0.9, task='regression', sample_weight=None, red_idx=None, **kwargs)[source]

Riemannian Adam optimizer for tensor trains.

Parameters

tt (TensorTrain) – TensorTrain to be optimized. During optimization it will be copied, not modified.
y (array<float64>) – Target values. Should be flat array with same backend as tt.
idx (array<int64> shape (len(y),tt.order)) – Indices of dense tensor corresponding to values y. Potential duplicate values in idx are automatically merged.
batch_size (int) –
lr (float (default: 1.0)) – Learning rate, needs to be tuned for each problem.
beta1 (float (default: 0.9)) – parameter between 0 and 1 determining the contribution of transport of previous search direction to current search direction
beta2 (float (default: 0.9)) – parameter between 0 and 1 determining contribution of previous gradient norms to the stepsize.
task (str (default: “regression”)) –
Whether to perform regression or binary classification.
- If task=”regression then MSE is minimized.
- If task=”classification”. The labels are assumed to be 0 or 1, and cross entropy is minimized. Note that predictions of the classifier will be on the logit scale, only the objective changes.
sample_weight (array<float64> or None (default: None)) – Weights associated to all sample points. If None, use unit weight.

step()[source]: Do one step inplace