Tensor Train Optimization

ttml.tt_rlinesearch

Linesearch for performing tensor completion with TensorTrains

ttml.tt_rlinesearch.TTLS

alias of ttml.tt_rlinesearch.TensorTrainLineSearch

class ttml.tt_rlinesearch.TensorTrainLineSearch(tt, y, idx, task='regression', sample_weight=None, red_idx=None, cg_method='fr', line_search_method='armijo', line_search_params=None, memory=1, max_stepsize=None, min_initial_stepsize=1e-06, initial_stepsize_method='bb1', last_step_size=None, default_stepsize=None, auto_scale=False, **kwargs)[source]

Implements Riemannian conjugate gradient descent with linesearch for tensor train.

Parameters
  • tt (TensorTrain) – TensorTrain to be optimized. During optimization it will be copied, not modified.

  • y (array<float64>) – Target values. Should be flat array with same backend as tt.

  • idx (array<int64> shape (len(y),tt.order)) – Indices of dense tensor corresponding to values y. Potential duplicate values in idx are automatically merged.

  • task (str (default: “regression”)) –

    Whether to perform regression or binary classification.

    • If task=”regression then MSE is minimized.

    • If task=”classification”. The labels are assumed to be 0 or 1, and cross entropy is minimized. Note that predictions of the classifier will be on the logit scale, only the objective changes.

  • sample_weight (array<float64> or None (default: None)) – Weights associated to all sample points. If None, use unit weight.

  • cg_method (str (default: “fr”)) – Which conjugate gradient method to use. Currently supported are ‘fr’ (Fletcher-Reeves), ‘sd’ (steepest descent).

  • line_search_method (str (default: “armijo”)) – Which line search method to use. Supported are “armijo”, “wolfe” (strong Wolfe conditions) and “tnc” (exact line search using TNC, mainly for debugging.)

  • line_search_params (None or dict (default None)) – Extra kwargs to pass to the line search method.

  • memory (int (default: 1)) – With memory > 1, perform nonmonotone line search with memory steps of memory. Using memory > 1 with Wolfe line search is not properly supported.

  • max_stepsize (int or None (default: None)) – Maximum stepsize to be taken.

  • initial_stepsize_method (str (default: "bb1")) –

    Method to compute initial stepsize for backtracking. Allowed values:

    • ”bb1” : Riemannian Barzilai-Borwein stepsize of first type.

    • ”bb2” : Riemannian Barzilai-Borwein stepsize of second type.

    • ”qopt” : Quasi-optimal Riemannian CG stepsize, as proposed by Steinlechner.

    • ”scalar” : take twice the difference between current and previous loss value, divided by the derivative of the line search function.

  • last_step_size (int or None) – Last step size taken, to be used for a warm start of the optimizer.

  • default_stepsize (float (default: 1.0)) – Fall back default stepsize

  • auto_scale (bool (default False)) – Use the first gradient norms to estimate the scale of the step size. Only affects first step size. This is useful if optimal stepsize is particularly large.

bb_stepsize(bb_type=1)[source]

Riemannian Barzilai-Borwein stepsize.

There are two variants of the BB stepsize, this is controlled by the argument bb_type. If the stepsize cannot be computed, or would be excessively high, default_stepsize is returned instead.

cg_constant()[source]

Constant scale parameter. If alpha=0, then this is steepest descent.

cg_fletcher_reeves()[source]

Fletcher-Reeves scale parameter.

This is given by

\[\beta_{k+1}^{FR} = \frac{\langle\nabla f(x_{k+1}),\, \nabla f(x_{k+1})\rangle_{x_{k+1}}} {\langle\nabla f(x_k),\nabla f(x_k)\rangle_{x_k}}\]
plot_linesearch(alpha0=1.0, alpha_max=None, c1=1.3, c2=0.5, plot_points=20)[source]

Return arrays to plot linesearch objective for debugging.

Parameters
  • alpha0 (initial point) –

  • alpha_max (float or None) – If specified, skip finding a good alpha

  • c1 (factor to increase alpha by every step if derivative negative) –

  • c2 (factor to decrease alpha by every step if derivative positive) –

  • plot_points (number of points returned) –

Returns

  • plot_X (X positions of plot points)

  • phis (values of objective at plot_X)

  • der_phis (derivative of objective at plot_X)

quasi_optimal_stepsize(derphi0, default_stepsize=1.0)[source]

Compute the quasi-optimal stepsize based on a linearization.

The formula for this is

\[-\phi'(0) / \|\eta\|^2\]

With \(\eta\) the search direction, \(\phi'(0) = \langle\eta,\nabla f\rangle\) and the derivative of the line search objective

Returns

step_size

Return type

float

step(try_armijo=False, try_sd=False)[source]

Perform a step. Replaces self.tt updated tt.

Parameters
  • try_armijo (bool, default=False) – Force using armijo linesearch. This is called if wolfe line search fails

  • try_sd (bool, default=False) – If armijo also fails, it may be because the search direction is bad, so we use this to force steepest descent direction.

Returns

  • phi0 (float) – New value of the loss function

  • derphi0 (float) – Derivative of loss function in search direction (at beginning of step)

  • step_size (float) – Size of step taken

ttml.tt_rlinesearch.armijo_backtracking(phi, derphi, phi0=None, derphi0=None, old_phi0=None, c1=0.0001, amin=0, amax=None, old_step_size=1.0, alpha0=None, **kwargs)[source]

Scalar line search method to find step size satisfying Armijo conditions.

Parameters
  • c1 (float, optional) – Parameter for Armijo condition rule.

  • amax (float, optional) – Maxmimum and minimum step size

  • amin (float, optional) – Maxmimum and minimum step size

Scalar line search method to find step size satisfying strong Wolfe conditions.

Parameters
  • c1 (float, optional) – Parameter for Armijo condition rule.

  • c2 (float, optional) – Parameter for curvature condition rule.

  • amax (float, optional) – Maximum step size

Returns

step_size – The next step size

Return type

float

ttml.tt_opt

Meta optimizer class for Tensor Trains

class ttml.tt_opt.TensorTrainOptimizer(tt, y, idx, task='regression', sample_weight=None, red_idx=None, task_kwargs=None, **kwargs)[source]
Parameters
  • tt (TensorTrain) – TensorTrain to be optimized. During optimization it will be copied, not modified.

  • y (array<float64>) – Target values. Should be flat array with same backend as tt.

  • idx (array<int64> shape (len(y),tt.order)) – Indices of dense tensor corresponding to values y. Potential duplicate values in idx are automatically merged.

  • task (str (default: “regression”)) –

    Whether to perform regression or binary classification.

    • If task=”regression then MSE is minimized.

    • If task=”classification”. The labels are assumed to be 0 or 1, and cross entropy is minimized. Note that predictions of the classifier will be on the logit scale, only the objective changes.

  • sample_weight (array<float64> or None (default: None)) – Weights associated to all sample points. If None, use unit weight.

  • red_idx (array<int, int> or None) – Unique indices of idx in lexicographic order. If None (default) this is computed.

  • task_kwargs (dict (optional)) – Dictionary of keyword arguments to always be passed to the loss and egrad functions.

egrad(tt=None, y=None, idx=None, sample_weight=None, normalize=False, merge=True, **kwargs)[source]

Compute the sparse Euclidean gradient and loss at current point.

Keyword arguments are the same as self.loss, except normalize=False by default.

Returns

  • loss (float)

  • egrad (array<float>) – Array with length the number of unique entries in idx. Corresponds to sparse Euclidean tangent vector, with indices obtained by a lexical sort applied to idx. See also utils.merge_sum().

loss(tt=None, y=None, idx=None, sample_weight=None, normalize=True, **kwargs)[source]

Compute the loss at current point.

Returns

loss

Return type

float

Parameters
  • tt (TensorTrain or None) – Compute loss at tt instead. If None (default) then compute at self.tt.

  • y (array<float> or None) – Target labels for loss. If None (default), use self.y

  • idx (array<int, int> or None) – Tensor indices to use for loss. If None (default) use self.idx

  • sample_weight (array<float> or None) – Array same shape as y giving sample weights. If None, use weight 1 for each entry

  • normalize (bool (default: True)) – Divide loss function by number of samples (or sum of sample_weight) if True

step()[source]

Do a step.

This should return loss at new point, inner product between search-direction and gradient, and step size of new step.

Should be implemented by the inhereting class.

ttml.tt_radam

Implements Riemannian ADAM stochastic gradient descent algorithm for tensor trains.

This is an implementation of the algorithm in the paper ‘Riemannian Adaptive Optimization Methods’ by Becigneul and Ganea

class ttml.tt_radam.TensorTrainSGD(tt, y, idx, batch_size, lr=1.0, beta1=0.9, beta2=0.9, task='regression', sample_weight=None, red_idx=None, **kwargs)[source]

Riemannian Adam optimizer for tensor trains.

Parameters
  • tt (TensorTrain) – TensorTrain to be optimized. During optimization it will be copied, not modified.

  • y (array<float64>) – Target values. Should be flat array with same backend as tt.

  • idx (array<int64> shape (len(y),tt.order)) – Indices of dense tensor corresponding to values y. Potential duplicate values in idx are automatically merged.

  • batch_size (int) –

  • lr (float (default: 1.0)) – Learning rate, needs to be tuned for each problem.

  • beta1 (float (default: 0.9)) – parameter between 0 and 1 determining the contribution of transport of previous search direction to current search direction

  • beta2 (float (default: 0.9)) – parameter between 0 and 1 determining contribution of previous gradient norms to the stepsize.

  • task (str (default: “regression”)) –

    Whether to perform regression or binary classification.

    • If task=”regression then MSE is minimized.

    • If task=”classification”. The labels are assumed to be 0 or 1, and cross entropy is minimized. Note that predictions of the classifier will be on the logit scale, only the objective changes.

  • sample_weight (array<float64> or None (default: None)) – Weights associated to all sample points. If None, use unit weight.

step()[source]

Do one step inplace