TBTK
Solvers

# Limiting algorithm specific details from spreading

Depending on the Model and the properties that are sought for, different solution methods are best suited to carry out different tasks. Sometimes it is not clear what the best method for a given task is and it is useful to be able to try different approaches for the same problem. However, different algorithms can vary widely in their approach to solving a problem, and specifically vary widely in how they need the data to be represented in memory to carry out their tasks. Without somehow limiting these differences to a restricted part of the code the specific numerical details of a Solver easily spreads throughout the code and makes it impossible to switch one Solver for another without rewriting the whole code.

In TBTK the approach to solving this problem is to provide a clear separation between the algorithm implemented inside the Solvers, and the specification of the Model and the extraction of properties. Solvers are therefore required to all accept a universal Model object and to internally convert it to whatever representation that is best suited for the algorithm. Solvers are then wrapped in PropertyExtractors which further limits the Solver specific details from spreading to other parts of the code. The idea is to limit the application developers exposure to the Solver as much as possible, freeing mental capacity to focus on the physical problem at hands. Nevertheless, a certain amount of method specific configurations are inevitable and the appropriate place to make such manipulations is through the Solver interface itself. This approach both limits the developers exposure to unnecessary details, while also making sure the developer understands when algorithm specific details are configured.

Contrary to what it may sound like, limiting the developers exposure to the Solver does not mean concealing what is going on inside the Solver and to make it an impenetrable black box. In fact, TBTK aims at making such details as transparent as possible and to invite the interested developer to dig as deep as preferred. To limit the exposure as much as possible rather means that once the developer has chosen Solver and configured it, the Solver specific details should not spread further and the developers should be free to not worry about low level details. In order to chose the right Solver for a given task and to configure it efficiently it is useful to have an as good understanding as possible about what the algorithm actually is doing. Therefore we here describe what some of the native Solvers does, what their strengths and weaknesses are, and how to set up and configure them. In the code examples presented here it is assumed that a Model object has already been created.

# Overview of native Solvers

TBTK currently contains four production ready Solvers. These are called Diagonalizer, BlockDiagonalizer, ArnoldiIterator, and ChebyshevExpander. The first two of these are based on diagonalization, allowing for all eigenvalues and eigenvectors to be calculated. Their strength is that once a problem has been diagonalized, complete knowledge about the system is available and arbitrary properties can be calculated. However, diagonalization scales poorly with system size and is therefore not feasible for very large systems. The BlockDiagonalizer provides important improvements in this regard for large systems if they are block diagonal, in which case the BlockDiagonalizer can handle very large systems compared to the Diagonalizer.

Next, the ArnoldiIterator is similar to the Diagonalizers in the sense that it calculates eigenvalues and eigenvectors. However, it is what is know as an iterative Krylov space Solver, which successively builds up a subspace of the Hilbert space and performs diagonalization on this restricted subspace. Therefore the ArnoldiIterator only extracts a few eigenvalues and eigenvectors. Complete information about a system can therefore usually not be obtained with the help of the ArnoldiIterator, but it can often give access to the relevant information for very large systems if only a few eigenvalues or eigenvectors are needed. Arnoldi iteration is closely related to the Lanczos method and is also the underlying method used when extracting a limited number of eigenvalues and eigenvectors using MATLABS eigs-function.

Finally, the ChebyshevExpander is different from the other methods in that it extracts the Green's function rather than eigenvalues and eigenvectors. The ChebyshevExpanders strenght is also that it can be used for relatively large systems. Moreover, while the Diagonalizers requires that the whole system first is diagonalized, and thereby essentially solves the full problem, before any property can be extracted, the ChebyshevExpander allows for individual Green's functions to be calculated which contains only partial information about the system. However, the Chebyshev method is also an iterative method (but not Krylov), and would in fact require an infinite number of steps to obtain an exact solution.

The Solvers all inherit from a common base class called Solver. This Solver class is an abstract class, meaning it is not actually possible to create an object of the Solver class itself. However, the Solver base class forces every other Solver to implement a method for setting the Model that it should work on. The following call is therefore possible (and required) to call to initialize any given Solver called solver

solver.setModel(model);

The Solver class also provides a corresponding getModel() function for retreiving the Model. It is important here to note that although the Model is passed to the Solver and the Solver will remember the Model, it will not assume ownership of the Model. It is therefore important that the Model remains in memory throughout the lifetime of the Solver and that the caller takes responsibility for any possible cleanup work. The user not familiar with memory management should not worry though, as long as standard practices as outlined in this manual are observed, these issues are irrelevant.

The Solvers are also Communicators, which means that they may output information that is possible to mute. This can become important if a Solver for example is created or executed inside of a loop. In this case extensive output can be produced and it can be desired to turn this off. To do so, call

solver.setVerbose(false);

# Solver::Diagonalizer

The Diagonalizer sets up a dense matrix representing the Hamiltonian and then diagonalizes it to obtain eigenvalues and eigenvectors. The Diagonalizer is probably the simplest possible Solver to work with as long as the system sizes are small enough to make it feasible, which means Hilbert spaces with a basis size of up to a few thousands. A simple set up of the Diagonalizer

//Create a Solver::Diagoanlizer.
Solver::Diagonalizer solver;
//Set the Model to work on.
solver.setModel(model);
//Diagonalize the Hamiltonian
solver.run();

That's it. The problem is solved and can be passed to a corresponding PropertyExtractor for further processing.

## Estimating time and space requirements

Since diagonalization is a rather simple problem conceptually, it is easy to estimate the memory and time costs for the Diagonalizer. Memory wise the Hamiltonian is stored as an upper triangular matrix with complex<double> entries each 16 bytes large. The storage space required for the Hamiltonian is therefore roughly $$16N^2/2 = 8N^2$$ bytes, where $$N$$ is the basis size of the Hamiltonian. Another $$16N^2$$ bytes are required to store the resulting eigenvectors, and $$8N$$ bytes for the eigenvalues. Neglecting the storage for the eigenvalues the approximate memory footprint for the Diagonalizer is $$24N^2$$ bytes. This runs into the GB range around a basis size of 7000.

The time it takes to diagonalize a matrix cannot be estimated with the same precision since it depends on the exact specification of the computer that the calculations are done on, but as of 2018 runs into the hour range for basis sizes of a few thousands. However, knowing the execution time for a certain size on a certain computer, the execution can be rather accurately predicted for other system sizes using that the computation time scales as $$N^3$$.

Sometimes the value of one or several parameters that go into the Hamiltonian are not known a priori, but it is instead part of the problem to figure out the correct value. A common approach for solving such problems is to make an initial guess for the parameters, solve the model for the corresponding parameters, and then update the parameters with the so obtained values. If the problem is well behaved enough, such an approach results in the unknown parameters eventually converging to fixed values. Once the calculated parameter value is equal (within some tolerance) to the input parameter in the current iteration, the parameters are said to have reached self-consistency. That is, the calculated parameters are consistent with themselves in the sense that if they are used as input parameters, they are also the result of the calculation.

When using diagonalization the self-consistent procedure is very straight forward: diagonalize the Hamiltonian, calculate and update parameters, and repeat until convergence. The Diagonalizer is therefore prepared to run such a self-consistency loop. However, the the second step requires special knowledge about the system which the Diagonalizer cannot incorporate without essentially becoming a single purpose solver. The solution to this problem is to allow the application developer to supply a callback function that the Diagonalizer can call once the Hamiltonian has been diagonalized. This function is responsible for calculating and updating relevant parameters, as well as informing the Diagonalizer whether the solution has converged. The interface for such a function is

bool selfConsistencyCallback(Solver::Diagonalizer &solver){
//Calculate and update parameters.
//...
//Determine whether the solution has converged or not.
//...
//Return true if the solution has converged, otherwise false.
if(hasConverged)
return true;
else
return false;
}

The specific details of the self-consistency callback is up to the application developer to fill in, but the general structure has to be as above. That is, the callback has to accept the Diagonalizer as an argument, perform the required work, determine whether the solution has converged and return true or false depending on whether it has or not. In addition to the self-consistency callback, the application developer interested in developing such a self-consistent calculation should also make use of the HoppingAmplitude callback described in the Model chapter for passing relevant parameters back to the Model in the next iteration step.

Once a self-consistency callback is implemented, the Diagonalizer can be configured as follows to make use of it

Solver::Diagonalizer solver;
solver.setModel(model);
solver.setSelfConsistencyCallback(selfConsistencyCallback);
solver.setMaxIterations(100);
solver.run();

Here the third line tells the Solver which function to use as a callback, while the fourth line puts an upper limit to the number of self-consistent steps the Solver will take if self-consistency is not reached. For a complete example of a self-consistent calculation the reader is referred to the SelfConsistentSuperconductivity template in the Template folder.

# Solver::BlockDiagonalizer

The BlockDiagonalizer is similar to the Diagonalizer, except that it takes advantage of possible block-diagonal structures in the Hamiltonian. For this to work it is important that the Models index structure is chosen such that TBTK is able to automatically detect the block-diagonal structure of the Hamiltonian, as described in the Model chapter. The BlockDiagonalizer mimics the Diagonalizer almost perfectly, and the code for initializing and running a BlockDiagonalizer including a self-consistency callback is

SOlver::BlockDiagonalizer solver;
solver.setModel(model);
solver.setSelfConsistencyCallback(selfConsistencyCallback);
solver.setMaxIterations(100);
solver.run();

The only difference here is that the selfCOnsistencyCallback has to take a BlockDiagonalizer as argument rather than a Diagonalizer.

# Solver::ArnoldiIterator

The main drawback of diagonalization is that it scales poorly with system size and becomes prohibitively demanding both in terms of memory and computational time if the individual blocks have a basis size of more than a few thousands. Arnoldi iteration instead utilizes the sparse nature of the Hamiltonian both to reduce the memory footprint and computational time and can therefore handle much larger systems.

To understand Arnoldi iteration, consider a Hamiltonian $$H$$ and pick a random vector $$v_{0}$$ in the corresponding Hilbert space. Next define the recursive relation $$v_{n} = Hv_{n-1}$$. While $$v_{0}$$ is a random vector it can be decomposed into an eigenbasis, and it is clear that the recursive relation will result in the components of $$v_{0}$$ with the largest eigenvalues (in magnitude) to become more and more important for increasing $$n$$. Taking the collection of $$v_{n}$$ generated this way for some finite $$n$$, it is clear that they form a (possibly overcomplete) basis for some subspace of the total Hilbert space. In particular, because eigenvectors with large eigenvalues are favored by the procedure, it will start to approximate the part of the Hilbert space that contains the eigenvectors with extreme eigenvalues. Such a procedure is called an iterative Krylov space method, where Krylov space refers to the space spanned by the generated vectors.

Arnoldi iteration improves the idea by continuously performing orthonormalization of the vectors before generating the next set of vectors and is therefore numerically superior to the somewhat simpler method just described. Once a subspace that is deemed large enough has been generated, diagonalization is performed on this much smaller space, resulting in eigevalues and eigenvectors for the extreme part of the Hilbert space to be calculated. We note that since the generated subspace cannot be guaranteed to contain the eigenvectors perfectly, the procedure really generates approximate eigenvalues and eigenvectors known as Ritz-values and Ritz-vectors. However, the subspace often converge rather quickly to the true extreme subspace and therefore generates very good approximations to the most extreme eigenvectors. It is therefore convenient to think of the ArnoldiSolver simply as a solver that can calculate extreme eigenvalues and eigenvectors.

With this background we are ready to understand how to create and configure a basic ArnoldiIterator

Solver::ArnoldiIterator solver;
solver.setModel(model);
solver.setNumLancxosVectors(200);
solver.setMaxIterations(500);
solver.setNumEigenVectors(100);
solver.setCalculateEigenVectors(true);
solver.run();

As seen, line 1, 2, and 7 are similar to the Diagonalizers and require no further explanation. The third line specifies how many Ritz-vectors (or Lanczos vectors) that are going to be generated during the iterative procedure, while the fourth line specifies the maximum number of iterations. It may be surprising that the number of iterations are not the same as the number of generated Ritz-vectors, but this is due to the fact that the ArnoldiIterator is using a further improvement on the procedure called implicitly restarted Arnoldi iteration. For further information on this the interested reader is referred to the documentation for the ARPACK library. Remembering that we successively build up a larger and larger subspace starting from some random initial vector, it is expected that not all of the generated Ritz-vectors are meaningful, but only the most extreme ones. For this reason the fifth line is used to specify the number of vectors that actually is going to be retained at the end of the calculation. To understand the sixth line we finally mention that the eigenvectors used internally by the ArnoldiIterator during the iterative procedure are not in the basis of the full Hamiltonian. The 100 generated eigenvectors therefore has to be converted to the basis that we are interested in if we actually want to use the eigenvectors. The sixth line tells the ArnoldiIterator to do so.

## Shift and invert (extracting non-extremal eigenvalues)

It is often not the extremal eigenvalues and eigenvectors that are of interest, but rather those around some specific eigenvalue. With a simple trick it is possible to access also these using Arnoldi iteration. Namely, if we first shift the Hamiltonian by some number $$\lambda$$ the eigenvalues around $$\lambda$$ in the original matrix are shifted to lie around zero. Further, since the inverse of a matrix has the same eigenvectors as the original matrix, and inverse eigenvalues, the eigenvectors with eigenvalues around $$\lambda$$ in the original Hamiltonian becomes the new extremal eigenvectors. The ArnoldiIterator implements this mode of execution, which can be run by adding the following two lines before the call to solver.run().

solver.setCentralValue(2);
solver.setMode(Solver::ArnoldiIterator::Mode::ShiftAndInvert);

The shift can also be applied without inversion. This can be beneficial if extremal eigenvalues of a particular sign are of interest. Say that the spectrum of a Hamiltonian is known to be between -1 and 1. By setting the central value to -1 (shifting -1 to 0), the spectrum for the new Hamiltonian is between 0 and 2 and the ArnoldiIterator will therefore only extract the eigenvectors with positive eigenvalues. We note that the function is called setCentralValue() rather than setShift(), since to the user of the Solver the final result is not shifted. The shift is first applied to the Hamiltonian internally, but is also added to the resulting eigenvalues to cancel this modification of the problem, meaning that the final eigenvalues are going to have values around 1 and not 2.

# Solver::ChebyshevExpander

The ChebyshevExpander is a Green's function based solver that calculates Green's functions on the form

$$G_{\mathbf{i}\mathbf{j}}(E) = \frac{1}{\sqrt{s^2 - E^2}}\sum_{m=0}^{\infty}\frac{b_{\mathbf{i}\mathbf{j}}^{(m)}}{1 + \delta_{0m}}F(m\textrm{acos}(E/s))$$,

where $$F(x)$$ is one of the functions $$\cos(x)$$, $$\sin(x)$$, $$e^{ix}$$, and $$e^{-ix}$$. We do not go into details about this method here, but rather refer the interested reader to Phys. Rev. Lett. 78, 275 (2006), Phys. Rev. Lett. 105, 1 (2010), and urn:nbn:se:uu:diva-305212. However, we point out that the expression is a Chebyshev expansion of the Green's function, which really is nothing but Fourier expansion if we make the change of variable $$x = \textrm{acos}(E/s)$$. Since $$\textrm{acos}(E/s)$$ only is defined for $$E/s \in (-1, 1)$$, it is important that the energy $$E$$ is not too big. In fact, the number $$s$$ is a scale factor that has to be chosen for each system to ensure that the eigenvalues of the system are no larger in magnitude than $$s$$.

The main benefit of the Chebyshev expansion of the Green's function is that, in contrast to for example a straight forward Fourier expansion, the expansion coefficients $$b_{\mathbf{i}\mathbf{j}}^{(m)}$$ can be calculated recursively using sparse matrix-vector multiplication. In particular, the coefficients are given by

\begin{eqnarray*} b_{\mathbf{i}\mathbf{j}}^{(m)} &=& \langle j_{\mathbf{i}}^{(0)}|j_{\mathbf{j}}^{(m)}\rangle, \end{eqnarray*}

where

\begin{eqnarray*} |j_{\mathbf{j}}^{(1)}\rangle &=& H|j_{\mathbf{j}}^{(0)}\rangle,\\ |j_{\mathbf{j}}^{(m)}\rangle &=& 2H|j_{\mathbf{j}}^{(m-1)}\rangle - |j_{\mathbf{j}}^{(m-2)}\rangle. \end{eqnarray*}

Further, the vectors $$|j_{\mathbf{i}}^{(0)}\rangle$$ and $$j_{\mathbf{j}}^{(0)}\rangle$$ are the vectors that result from from setting every element equal to zero, except for that associated with the Index $$\mathbf{i}$$ and $$\mathbf{j}$$, respectively, which is set to one. Since Hamiltonians usually are very sparse, and multiplication with sparse matrices can be done relatively quickly even for very large matrices, this means that a large number of expansion coefficients can be calculated quickly for relatively large systems. Nevertheless, an infinite number of expansion coefficients can of course not be calculated in a finite time, and therefore the sum in the equation above has to be cut off at some number of coefficients. Once the coefficients have been calculated, the final step needed to generate the Green's function is to evaluate the sum at as many energy points as is needed.

With this background we are ready to understand how to create and initialize a ChebyshevExpander

const double SCALE_FACTOR = 10;
const NUM_COEFFICIENTS = 1000;
Solver::ChebyshevExpander solver;
solver.setModel(model);
solver.setScaleFactor(SCALE_FACTOR);
solver.setNumCoefficients(NUM_COEFFICIENTS);

In contrast to for example the DiagonalizationSolver, there is no solver.run() command for the ChebyshevExpander. This is because, unlike the Diagonalizer which essentially solves the whole system by diagonalizing it before properties can be extracted, the ChebyshevExpander solves the problem as the properties are extracted. We also note here, that in order for the ChebyshevExpander to work, it is required that one additional call is made to the Model. Namely, at the point of Model construction write

model.construct();
model.constructCOO();

rather than just the first line. This makes the Model create an internal sparse matrix representation of the Hamiltonian on a standard matrix format called COO and is required by the ChebyshevExpander. This requirement is slightly in conflict with the general design philosophy expressed in this manual and is intended to be removed in the future.

By default the ChebyshevExpander performs calculations on the CPU, but is most efficient when run on a GPU. In particular, it is possible to independently configure whether the calculation of the coefficients as well as the generation of the Green's function should be performed on CPU or GPU using

solver.setCalculateCoefficientsOnGPU(true);
solver.setGenerateGreensFunctionsOnGPU(true);

It is recommended that the calculation of the coefficients is performed on GPU whenever possible, while the generation of the Green's function often is more conveniently done on CPU.

Finally, whenever the Green's function is generated for more than a single Index pair, it is possible to speed up the calculation significantly by first generating a lookup table. This is even required when the Green's function is generated on GPU, and the solver can be instructed to generate such a lookup table through

solver.setUseLookupTable(true);