Importing and exporting data

External storage

While the classes described in the other Chapters allow data to be stored in RAM during execution, it is important to also be able to store data outside of program memory. This allows for data to be stored in files in between executions, to be exported to other programs, for external input to be read in, etc. TBTK therefore comes with two methods for writing data structures to file on a format that allows for them to later be read into the same data structures, as well as one method for reading parameter files.

The first method is in the form of a FileWriter and FileReader class, which allows for Properties and Models to be written into HDF5 files. The HDF5 file format (https://support.hdfgroup.org/HDF5/) is a file format specifically designed for scientific data and has wide support in many languages. Data written to file using the FileWriter can therefore easily be imported into for example MATLAB or python code for post-processing. This is particularly true for Properties stored on the Ranges format (see the Properties chapter), since the data sections in the HDF5 files will preserve the Ranges format.

Many classes in TBTK can also be serialized, which mean that they are turned into strings. These strings can then be written to file or passed as arguments to the constructor for the corresponding class to recreate a copy of the original object. TBTK also contains a class called Resources, which allows for very general input and output of strings, including reading data immediately from the web. In combination these two techniques allows for very flexible export and import of data that essentially allows large parts of the current state of the program to be stored in permanent memory. The goal is to make almost every class serializable. This would essentially allow a program to be serialized in the middle of execution and restarted at a later time, or allow for truly distributed applications to communicate their current state across the Internet. However, this is a future vision not yet fully reached.

Finally, TBTK also contains a FileParser that can parse a structured parameter file and create a ParameterSet.

FileReader and FileWriter

The HDF5 file format that is used for the FileReader and FileWriter essentially implements a UNIX like file system inside a file for structured data. It allows for arrays of data, together with meta data called attributes, to be stored in datasets inside the file that resembles files. When reading and writing data using the FileReader and FileWriter, it is therefore common to write several objects into the same HDF5-file. The first thing to know about the FileReader and FileWriter is therefore that the current file it is using is chosen by typing


and similar for the FileWriter. It is important to note here that the FileReader and FileWriter acts as global state machines. What this means is that whatever change that is made to them at runtime is reflected throught the code. If this command is executed in some part of the code, and then some other part of the code is reading a file, it will use the file "Filename.h5" as input. It is possible to check whether a particular file already exists by first setting the filename and the call

bool fileExists = FileReader::exists();

and similar for the FileWriter.

A second important thing to know about HDF5 is that, although it can write new datasets to an already existing file, it does not allow for data sets to be overwritten. If a program is meant to be run repeatedly, overwriting the previous data in the file each time it is rerun, it is therefore required to first delete the previously generated file. This can be done after having set the filename by typing


A similar call also exists for the FileReader, but it may seem harder to find a logical reason for calling it on the FileReader.

A Model or Property can be written to file as follows


where DataType should be replaced by one of the DataTypes listed below, and data should be an object of this data type.

Supported DataTypes

By default the FileWriter writes the data to a dataset with the same name as the DataType listed above. However, sometimes it is useful to specify a custom name, especially if multiple data structures of the same type are going to be written to the same file. It is therefore possible to pass a second parameter to the write function that will be used as name for the dataset

FileWriter::writeDataType(data, "CustomName");

The interface for reading data is completely analogous to that for writing and takes the form

DataType data = FileReader::readDataType();

where DataType once again is a placeholder for one of the actual data type names listed in the table above.

Serializable and Resource

Serialization is a powerful technique whereby an object is able to convert itself into a string. If some classes implements serialization, it is simple to write new serializable classes that consists of such classes since the new class can serialize itself by stringing together the serializations of its components. TBTK is designed to allow for different serialization modes. Some types of serialization may be simpler or more readable in case they are not meant to be imported back into TBTK, while others might be more efficient in terms of execution time and memory requirements. However, currently only serialization into JSON is implemented to any significant extent. We will therefore only describe this mode here.

If a class is serializable, which means it either inherits from the Serializable class, or is pseudo-serializable by implementing the serialize() function, it is possible to create a serialization of a corresponding object as follows

string serialization
= serializeabelObject.serialize(Serializable::Mode::JSON);

Currently the Model and all Properties can be serialized like this. For clarity considering the Model class, a Model can be recreated from a serialization string as follows

Model model(serialization, Serializable::Mode::JSON);

The notation for recreating other types of objects is the same, with Model replaced by the class name of the object of interest.

Having a way to create serialization strings and to recreate objects from such strings, it is useful to also be able to simply write and read such strings to and from file. For this TBTK provides a class called Resource. The interface for writing a string to file using a resource is

Resource resource;

Similarly a string can be read from file using

const string &someString = resource.getData();

The Resource is, however, more powerful than demonstrated so far since it in fact implements an interface for the cURL library (https://curl.haxx.se). This means that it for example is possible to read input from a URL instead of from file. For example, a simple two level system is available at http://www.second-quantization.com/ExampleModel.json that can be used to construct a Model as follows

Model model(resource.getData(), Serializable::Mode::JSON);

FileParser and ParameterSet

While the main purpose of the other two methods is to provide methods for importing and exporting data that faithfully preserve the data structures that are used internally by TBTK, it is also often useful to read other information from files. In particular, it is useful to be able to pass parameter values to a program through a file, rather than to explicitly type the parameters into the code. Especially since the later option requires the program to be recompiled every time a parameter is updated.

For this TBTK provides a FileParser and a ParameterSet. In particular, together they allow for files formated as follows to be read

    int     sizeX       = 50
    int     sizeY       = 50
    double  radius      = 10
    complex phaseFactor = (1, 0)
    bool    useGPU      = true
    string  filename    = Model.json

First the file can be converted into a ParameterSet as follows

ParameterSet parameterSet = FileParser::parseParameterSet("Filename");

Once the ParameterSet is created, the variables can be accessed

int sizeX = parameterSet.getInt("sizeX");
int sizeY = parameterSet.getInt("sizeY");
double radius = parameterSet.getDouble("radius");
complex<double> phaseFactor = parameterSet.getComplex("phaseFactor");
bool useGPU = parameterSet.getBool("useGPU");
string filename = parameterSet.getString("filename");