Tensor Trains

Basics

Tensor trains are a versatile tensor decomposition. They consist of a list of order-3 tensors known as cores. A tensor train encoding an order d dense tensor, has d cores. The second dimension of these cores coincides with the dimensions of the dense tensor. The first and third dimensions of the cores are known as the tensor train rank.

For example, below we create a random tensor train encoding a shape (10, 12, 8) tensor, with tensor train ranks (3, 4)

>>> from ttml.tensor_train import TensorTrain
...
... dims = (10, 12, 8)
... tt_rank = (3, 4)
... tt = TensorTrain.random(dims, tt_rank)
... tt
<TensorTrain of order 3 with outer dimensions (10, 12, 8), TT-rank (3, 4),
and orthogonalized at mode 2>

We can access the cores of the tensor train through simple array indexing or looping. Let’s print the shapes of the three cores in this tensor train

>>> for core in tt
...     print(core.shape)
(1, 10, 3)
(3, 12, 4)
(4, 8, 1)

Note that the first dimension of the first core and the last dimension of the last core is always 1.

Orthogonalization

Note in the representation string above, it is mentioned that the tensor train is orthogonalized at mode 2. This means that the left-matricization of the cores to the left of mode 2 (i.e. the first and second cores) are orthogonal. This means that the following contraction is the identity matrix:

>>> import numpy as np
...
... np.einsum("abi,abj->ij", tt[0], tt[0])
array([[ 1.00000000e+00, -8.84708973e-17,  3.46944695e-18],
       [-8.84708973e-17,  1.00000000e+00, -6.93889390e-18],
       [ 3.46944695e-18, -6.93889390e-18,  1.00000000e+00]])

The same contraction is also an identity matrix for tt[1]. Orthogonalization is extremely important for numerical stability, as well as for working with tangent vectors (more on that later). We can specify on which mode we want the tensor train to be orthogonalized at initialization, but we can also change the orthogonalization mode by using the TensorTrain.orthogonalize() method.

For example, we can orthogonalize the tensor train on mode 1 below. The mode argument can be any integer, or either of the strings 'l' and 'r'. Here 'l' corresponds to a left orthogonalization, which is an orthogonalization with respect to the last mode. Conversely 'r' corresponds to a right orthogonalization, that is, with respect to the first mode.

>>> tt.orthogonalize(mode=1)
...
... A = np.einsum("abi,abj->ij", tt[0], tt[0])
... print(np.allclose(A, np.eye(len(A))))
...
... B = np.einsum("iab,jab->ij", tt[2], tt[2])
... print(np.allclose(B, np.eye(len(B))))
True
True

One consequence of orthogonalization is that computing the norm of the the tensor train can be done very efficiently; the (Frobenius) norm of a tensor trained orthogonalized at mode mu is simply the (Frobenius) norm of core mu.

>>> np.isclose(tt.norm(), np.linalg.norm(tt[1]))
True

Tensor train arithmetic

We can also perform many operations involving two or more tensor trains. Let’s first create two new tensor trains with the same outer dimensions, but different tt-ranks. We can then for example contract the tensor trains using the TensorTrain.dot() method, or alternatively the @ operator.

>>> dims = (4, 6, 6, 5)
... tt1 = TensorTrain.random(dims, (3, 4, 3))
... tt2 = TensorTrain.random(dims, (2, 2, 2))
... tt1 @ tt2
0.016860730327956833

We can verify that this is indeed the Frobenius inner product of these two tensors by comparing the result to contracting the two associated dense tensors. We can turn a tensor train into a dense tensor using the TensorTrain.dense() method.

>>> np.einsum("ijkl,ijkl->", tt1.dense(), tt2.dense())
0.016860730327956833

We can also add/subtract tensor trains or multiply them by scalars.

>>> tt3 = tt1 + 0.1 * tt2
... tt3
<TensorTrain of order 4 with outer dimensions (4, 6, 6, 5), TT-rank (4, 6, 5),
and orthogonalized at mode 3>

Truncation

Note that when we add tensor trains, the outer dimensions stay the same but in principle the tt-rank increases, becoming at most the sum of the tt-ranks. In many cases the rank of the sum is not the sum of the ranks. For example in the case above, the first rank of tt3 is 4 and not 5=2+3, since the first rank is always bounded by the first dimension (which is 4 in this case).

If we now add another copy of tt2 to tt3 we would expect the rank to stay the same, yet this doesn’t always happen due to numerical errors. Note that the middle tt-rank below is 8, even though tt4 can be expressed by a rank (4, 6, 5) tensor train.

>>> tt4 = tt3 + tt2
... tt4
<TensorTrain of order 4 with outer dimensions (4, 6, 6, 5), TT-rank (4, 8, 5),
and orthogonalized at mode 3>

We can truncate the rank of a tensor train by using the TensorTrain.round() method. This uses HOSVD to truncate the tensor train, and it has two methods for rounding; it can round to a pre-specified tt-rank, or it can truncate based on singular values. We do the latter below by specifying the eps=1e-16 keyword, meaning we can round each core in a HOSVD sweep with relative error up to 1e-16.

>>> tt5 = tt4.round(eps=1e-16, inplace=False)
... print(tt5)
... (tt4 - tt5).norm()
<TensorTrain of order 4 with outer dimensions (4, 6, 6, 5), TT-rank (4, 6, 5),
and orthogonalized at mode 3>
3.1508721275986887e-15

Note that the rank has decreased to the correct value, while only gathering an error on the order of machine epsilon. This is because the last two singular values of the second unfolding are very small, we can see them using TensorTrain.sing_vals():

>>> tt4.sing_vals()
[array([1.92032396, 1.1634468 , 0.62421987, 0.4174046 ]),
 array([1.77671233e+00, 9.81237260e-01, 7.97148647e-01, 6.93386512e-01,
        5.05833036e-01, 3.36895334e-01, 2.17444526e-31, 1.32445319e-31]),
 array([1.95129345, 0.99099654, 0.80976405, 0.38830473, 0.09492614])]

We can also round to even lower ranks, at the cost of a higher rounding error.

>>> tt6 = tt4.round(max_rank=4, inplace=False)
... (tt4 - tt6).norm()
0.6129466966082833

Here max_rank=4 means all tt-ranks should be at most 4, but we could also supply a tuple of ints here, e.g. tt4.round(max_rank = (4, 5, 5)).

Accessing entries

To access any specific entries of the tensor train we can use the TensorTrain.gather() method. We need to supply it a list of entries we want to access, encoded as an integer array. For example to access entry (0,0,0,0) and (0,1,0,0) we do the following:

>>> tt4.gather(np.array([[0, 0, 0, 0], [0, 1, 0, 0]]))
array([ 0.02875322, -0.02476423])

Tangent vectors

For Riemannian optimization of tensor trains we need to work with tangent vectors on the manifold of tensor trains of specified rank. Tangent vectors are always associated to a particular tensor train (remember: a tensor train is a point on the tensor-train manifold). For efficient manipulation of tangent vectors, we need to have both the left- and right-orthogonalized cores of a tensor train. Since tensor trains are left-orthogonalized by default, we just need to compute the right-orthogonal cores.

>>> tt = TensorTrain.random((3, 4, 4, 3), (2, 3, 2))
... right_cores = tt.orthogonalize(mode="r", inplace=False)
... tv = TensorTrainTangentVector.random(tt, right_cores)
... tv
<TensorTrainTangentVector of order 4, outer dimensions (3, 4, 4, 3),
and TT-rank (2, 3, 2)>

The arguably most important thing we can do with tangent vectors is that using a ‘retract’ we can ‘move’ in the direction of the tangent vector. We can do this using the TensorTrain.apply_grad() method. Below we apply the retract of tv to tt, after first multiplying it by 1e-6 using the alpha keyword argument.

>>> tt2 = tt.apply_grad(tv, alpha=1e-6)
... (tt-tt2).norm()
8.200339164343568e-07

We see that this changes tt on the order of the ‘step size’ alpha and the norm of tt2.

Transporting a tangent vector to a new point is equivalent to projecting the tangent vector to the tangent space of the new point. This can be done using TensorTrain.grad_proj():

>>> tv2 = tt2.grad_proj(tv)
<TensorTrainTangentVector of order 4, outer dimensions (3, 4, 4, 3),
and TT-rank (2, 3, 2)>

We can also perform arithmetic with tangent vectors, like multiplying them by scalars, adding them, or computing inner products (using TensorTrainTangentVector.inner() or the @ operator). Note that this only makes mathematical sense if the tangent vectors are associated to the same tensor train.

>>> tv1 = TensorTrainTangentVector.random(tt, right_cores)
... tv2 = TensorTrainTangentVector.random(tt, right_cores)
... print((tv1 + tv2).norm())
... tv3 = tv1 - 0.1 * tv2
... print(tv1 @ tv3)
1.278782626647999
0.6505614686324931

The way we usually create tangent vectors is as the gradient of some optimization problem. For example we could try to solve a tensor completion problem. We want to approximate an unknown dense tensor using a tensor train based on knowing the values of particular entries of the dense tensor. We can solve this by starting with a random tensor train and applying Riemannian optimization. At each point the Euclidean gradient is just linear residual error between the entries in the tensor train and the true value. We can convert this Euclidean gradient into a tangent vector using TensorTrain.rgrad_sparse(). Below we illustrate one step of gradient descent for the tensor completion problem.

>>> tt = TensorTrain.random((10, 10, 10, 10, 10), (2, 5, 5, 2))
...
... # Generate 100 random values and 100 random indices of `tt`
... N = 100
... y = np.random.normal(N)
... idx = [np.random.choice(r, size=N) for r in tt.tt_rank]
... idx = np.stack(idx)
...
... # Compute the initial error and the gradient
... prediction = tt.gather(idx)
... residual = prediction - y
... print(np.linalg.norm(residual))
... grad = tt.rgrad_sparse(-residual, idx)
...
... # Take a step in gradient direction and compute new error
... tt2 = tt.apply_grad(grad, alpha=10)
... prediction = tt2.gather(idx)
... residual = y - prediction
... print(np.linalg.norm(residual))
initial error: 200.76000727481116
error after step: 191.21863957535254

Alternative backends

So far all the objects we have use were encoded as numpy arrays, but other backends are supported as well. For example to use tensorflow as a backend for a tensor train, we just need to supply the backend keyword:

>>> tt_tf = TensorTrain.random((4, 4, 4), (2, 2), backend="tensorflow")
... print(tt_tf[0])

Here in principle many backends are supported, such as pytorch, dask, jax or cupy. Support for other backends is handled by autoray. However, not all functionality has been thoroughly tested for most backends. Moreover, all the functions used here are not compiled, so usually things end up being fastest for numpy. This may change in the future. In particular TTML has very limited support for backends other than numpy, since the scikit-learn estimators used for initialization only support numpy anyway.

User Documentation

class ttml.tensor_train.TensorTrain(cores, mode='l', is_orth=False)[source]

Implements a tensor train with orthogonalized cores.

Parameters

cores (list<order 3 tensors>) – List of the TT cores. First core should have shape (1,d0,r0), last has shape (r(n-1),dn,1). Last dimension should match first dimension of consecutive tensors in the list. The tensors can be numpy.ndarray, torch.Tensor or tensorflow.EagerTensor, so long as all the tensors are the same type.
mode (str or int (default: ‘l’)) – The orthogonalization mode. Can be an int, or the string ‘l’ or ‘r’, which respectively get converted to the self.order-1 and 0
is_orth (bool (default: False)) – Whether or not the cores are already orthogonalized. If False, the cores are orthogonalized during init.

Variables

~TensorTrain.cores (list<order 3 tensors>) – List of all the tensor train cores.
~TensorTrain.order (int) – The order of the tensor train (number of cores)
~TensorTrain.dims (tuple<int> of length self.order) – The outer dimensions of the tensor train, i.e. the shape of the dense tensor represented by the tensor train, or shape[1] for each core.
~TensorTrain.tt_rank (tuple<int> of length self.order-1) – The tt-rank. This shape[0] or shape[2] for each core. This tuple starts with cores[0].shape[2], and hence does always start/end with 1.
~TensorTrain.backend (str) – String encoding the backend of the tensors. This is inferred from cores[0].

apply_grad(tangent_vector, alpha=1.0, round=True, inplace=False)[source]

Compute retract of tangent vector.

Parameters

tangent_vector (TensorTrainTangentVector) – Tangent vector, assumed to lie in tangent space at current point.
alpha (float (default: 1.0)) – Stepsize of retract. Using this parameter is equivalent but more efficient to supplying alpha*tangent_vector as first argument.
round (bool (default: True)) – After applying tangent vector we end up with a TT of double the tt-rank. This bool controls whether to project back to original tt-rank, which is necessary if we want to compute the retract.
inplace (bool (default: False)) – If false, return a new TensorTrain, otherwise update this TT inplace
TODO (implement efficient reorthogonalization.) –

apply_gradient_gauge_conditions(left_cores, grad_cores)[source]: Apply gauge conditions to gradient cores inplace

convert_backend(backend)[source]: Change the backend to target backend inplace.

dense()[source]

Contract to dense tensor in left-to-right sweep.

Returns: Dense tensor with shape self.dims
Return type: np.ndarray

dot(other)[source]: Compute dot product with other TT with same outer dimensions

fast_gather(idx)[source]: Faster version of standard gather. Only works with numpy backend

gather(idx)[source]

Gather entries of dense tensor according to indices.

For each row of idx this returns one number. This number is obtained by multiplying the slices of each core corresponding to each index (in a left-to-right fashion).

Parameters: idx (np.ndarray<int>) – Array of indices of shape (len(self.dims), -1).
Returns: Result of contraction.
Return type: np.ndarray

grad_proj(tangent_vector, right_cores=None)[source]

Project TensorTrainTangentVector to tangent space.

This implements parallel transport of tangent_vector to this point.

idx_env(alpha, idx, num_cores=1, flatten=True)[source]

Gather the left and right environment of a TT-core.

Parameters

cores (-) –
alpha (-) – left site of (super)core
idx (-) – positions of data to gather
num_cores (-) – number of cores in the supercore
flatten (-) – If True, always flatten result to 2D array. Otherwise result can be 2D or 3D depending on alpha.

increase_rank(inc, i=None)[source]

Increase the rank of the edge between core i and i+1 by inc.

This operation leaves the dense tensor invariant. To instead decrease the rank, use truncate.

orthogonalize(mode='l', inplace=True, force_rank=True)[source]

Orthogonalize the cores with respect to mode.

Parameters

mode (int or str (default: "l")) – Orthogonalization mode. If “l”, defaults to right-most core, if “r” to left-most core.
inplace (bool (default: True)) – If True then cores are changed in place and return None. Otherwise return a TensorTrain object with orthogonalized cores.
force_rank (bool (default: True)) – If True, check after each step that the rank of the TT hasn’t lowered. If it has, artificially increase it back by mutliplying by random isometry.

classmethod random(dims, tt_rank, mode='l', backend='numpy', auto_rank=True)[source]

Create random TensorTrain of specified shape and rank.

Always uses double precision for the tensors.

Parameters

dims (iterable of ints) –
tt_rank (int or iterable of ints) – if int, all tt-ranks will be the same
mode ("l", "r" or int) –
backend (str (default: "numpy")) –
autorank (bool (default: True)) – if True, automatically losslessly reduce the rank. This option should mainly be set to False for debugging and testing purposes.

rgrad_sparse(grad, idx)[source]

Project sparse euclidean gradient to tangent space.

Parameters

grad (array<float64>) – Array containing the values of the sparse gradient.
idx (array<int64> of shape (len(grad),self.order)) – Array containing the indices of the dense tensor corresponding to the values of the sparse euclidean gradient

Return type

TensorTrainTangentVector

round(max_rank=None, eps=None, inplace=True)[source]

Truncate the tensor train.

If max_rank is specified, truncate to this rank.
If eps is specified, truncate with precision eps (relative to largest singular value of each unfolding)

Parameters

max_rank (int or tuple<int> (optional)) –
eps (float (optional)) –
inplace (bool) – If True, round the tensor train in place. Otherwise performI truncation on a copy.

Return type

TensorTrain

sing_vals()[source]

Compute singular values of each unfolding.

As a side effect the TT is left-orthogonalized. One array of size self.tt_rank[i] is returned for each core i.

tt_proj(tt, right_cores=None, proj_U=True)[source]

Project a TT to tangent space of self.

This needs both left- and right-orthogonal cores. If right_cores is not specified, the right cores are computed first.

class ttml.tensor_train.TensorTrainTangentVector(grad_cores, left_cores, right_cores)[source]

Class for storing a tangent vector to TT manifold.

A tangent vector at a point on the TT-manifold is a list of cores with the same shape as the TT, satisfying a certain gauge condition. This class stores both the left-orthogonal and right orthogonal cores of the original TT. This is because both are needed for most computations.

Parameters

grad_cores (list<order 3 tensors>) – list of first-order variation cores
left_cores (list<order 3 tensors>) – list of left-orthogonal cores
right_cores (list<order 3 tensors>) – list of right-orthogonal cores

convert_backend(backend)[source]: Change the backend to target backend inplace.

inner(other)[source]

Compute inner product between two tangent vectors.

Due to orthogonalization this is just inner product of first-order variations

classmethod random(left_cores, right_cores)[source]: Random tangent vector with unit norm gradients

to_eucl(idx)[source]

Return sparse Euclidean gradient with indices idx.

NB: `idx` should not contain any repeated indices.

to_tt(round=False)[source]

Convert to TensorTrain.

If round=True then round to TT of rank same TT-rank as point, if round=False it will have double the TT-rank of the point.

ttml.tensor_train.contract_cores(cores1, cores2, dir='LR', upto=None, store_parts=False, return_float=True)[source]

Compute contraction of one list of TT cores with another.

Cores should be of outer dimensions If dir=LR, then first cores should have first shape 1, if dir=RL then last shape of last core should be 1.

Parameters

cores1 (list of TT cores.) –
cores2 (list of TT cores.) –
dir (str, default='LR') – Direction of contraction; LR (left-to-right), RL (right-to-left)
upto (int, optional) – Contract only up to this mode
store_parts (bool, default=False) – If True, return list of all intermediate contractions.
return_float (bool, default=True) – If True and the result has total dimension 1, then compress result down to a float. Ignored if store_parts=True.