Containers#
Some examples of using containers (only 2D matrices for now)
Since RLtools
is a header-only library the compiler only needs to know where its include
folder is located (cloned or mounted at /usr/local/include/rl_tools
in the docker image). This is a standard location for header files and the C_INCLUDE_PATH
is set to include it in the Dockerfile.
Most operations in RLtools
are generic and work on any device that supports a C++ 17 compiler (standard library support not required). But there are some device-specific functions like random number generation that are device dependent and hence might require specific implementations that are and often can only be included on that particular device (e.g. Intel CPU, CUDA GPU) hence we include the CPU implementations in this example. In this case, the CPU implementations entail a dependency on
a few standard library objects (size_t
, random number generation, logging, etc.). At the same time also all the basic generic functions that operate e.g. over containers are included.
[1]:
#include <rl_tools/operations/cpu.h>
All objects in RLtools
are encapsulated in the rl_tools
namespace and there is no global state (not even for logging etc.). In programs using RLtools
we usually abbreviate the namespace rl_tools
to rlt
and define three shorthands for frequently used types. Firstly, DEVICE
is the selected device type, T
is the floating point type used (usually float
or double
, where float
can e.g. be preferable for vastly better performance on accelerators). Moreover, we
define TI
as the index type which usually should be the size_t
for the device (to match the device’s hardware and provide the best performance). All algorithms and data structures in RLtools
are agnostic to these types by using the template metaprogramming capabilities of C++. Additionally the DEVICE
type is usually used for a static, compile-time version of multiple dispatch to dispatch certain functions (like e.g. a neural
network layer forward pass) to code that is optimized for a particular device. Through this design, the same higher-level algorithms can be executed on all sorts of devices from HPC clusters over workstations and laptops to smartphones, smartwatches, and microcontrollers without sacrificing performance. Through template metaprogramming e.g. all the matrix dimensions and the number of for-loop iterations are known a priori at compile time and can be used by the compiler to heavily optimize the
code through loop unrolling, inlining etc.
[2]:
namespace rlt = rl_tools;
using DEVICE = rlt::devices::DefaultCPU;
using T = float;
using TI = typename DEVICE::index_t;
In the following we instantiate a device struct. The DEVICE
struct can be empty and hence have no overhead but facilitate tag dispatch. It can also be used as a carrier for additional context that would otherwise be implemented as global state (e.g. logging through a Tensorboard logger). In the first example we will create a matrix and fill it with random numbers (from an isotropic, standard normal distribution) hence we define the
initial seed for our random number generator which is instantiated depending on the device type. This allows us to easily change the DEVICE
definition and have all downstream entities be appropriate for the particular device. Finally, we are creating a matrix. Particularly a dynamic (heap allocated) 3x3
matrix. The static, compile-time configuration of the matrix is defined by a specification type (rlt::matrix::Specification<ELEMENT_TYPE, INDEX_TYPE, ROWS, COLS>
) that carries the
types and compile-time constants. Compiling these attributes into a separate specification instead of having numerous template parameters on the rlt::MatrixDynamic
type brings the benefit that writing functions that take matrices as input becomes easier as we just have to add a typename SPEC
parameter to the template. We can still constrain the usage of a function with only matrices having particular attributes through e.g. static_assert
and
SFINAE. Moreover we can add attributes without breaking functions that are written this way.
[3]:
DEVICE device;
TI seed = 1;
auto rng = rlt::random::default_engine(DEVICE::SPEC::RANDOM(), seed);
rlt::Matrix<rlt::matrix::Specification<T, TI, 3, 3>> m;
Since we created a dynamic matrix (which just consists of a pointer to the beginning of a memory space) we need to allocate it which is done using rlt::malloc
. As with all functions in RLtools
it takes the device
as an input because it provides the (global) context and in this case can be helpful to e.g. align the allocated memory space to certain boundaries to allow for maximum read-write performance for a particular device.
[4]:
rlt::malloc(device, m);
rlt::Matrix
defaults to a dynamic, heap-allocated matrix but we can override this behavior by defining DYNAMIC_ALLOCATION=false
in the specification and get a statically, stack-allocated matrix which does not require rlt::malloc
and rlt::free
.
[5]:
constexpr bool DYNAMIC_ALLOCATION = false;
rlt::Matrix<rlt::matrix::Specification<T, TI, 3, 3, DYNAMIC_ALLOCATION>> m_static;
The memory space is usually not initialized hence we fill it with random numbers (from a standard normal distribution):
[6]:
rlt::randn(device, m, rng);
Now we can print the allocated and filled matrix:
[7]:
rlt::print(device, m);
0.849261 -0.102156 -0.256673
0.904277 -0.538617 -0.506808
-0.408192 0.271856 -0.311355
We can access elements using the get
and set
commands:
[8]:
rlt::get(m, 0, 0)
[8]:
0.849261f
[9]:
rlt::set(m, 0, 0, 1);
rlt::print(device, m);
1.000000 -0.102156 -0.256673
0.904277 -0.538617 -0.506808
-0.408192 0.271856 -0.311355
get
returns a reference so we could technically also set or increment it through the reference:
[10]:
rlt::get(m, 0, 0) += 10;
rlt::print(device, m);
11.000000 -0.102156 -0.256673
0.904277 -0.538617 -0.506808
-0.408192 0.271856 -0.311355
Writing through the reference is not very intuitive so we prefer set
and increment
:
[11]:
rlt::increment(m, 0, 0, -10);
rlt::print(device, m);
1.000000 -0.102156 -0.256673
0.904277 -0.538617 -0.506808
-0.408192 0.271856 -0.311355
Tensors#
Matrices are a simple, 2D data structure but to allow for more complex algorithms we have since introduce a tensor type that can hold arbitrary shapes:
[12]:
using SHAPE = rlt::tensor::Shape<TI, 3, 3, 3>;
using SPEC = rlt::tensor::Specification<T, TI, SHAPE, DYNAMIC_ALLOCATION>;
rlt::Tensor<SPEC> t;
Tensors support most of the operations that matrices also support:
[13]:
rlt::randn(device, t, rng);
rlt::print(device, t);
dim[0] = 0:
-5.703804e-01 -3.422589e-01 1.008072e-01
-9.118625e-01 2.108090e+00 9.476308e-02
5.376303e-01 3.618752e-01 -7.995225e-01
dim[0] = 1:
8.660405e-01 1.061986e+00 6.006763e-01
2.661995e+00 -9.388391e-01 -1.549304e-01
9.058360e-02 -1.328507e+00 1.262284e+00
dim[0] = 2:
2.677846e+00 -1.236785e+00 -9.119245e-02
-8.944708e-01 -2.577802e+00 2.305977e+00
5.642641e-01 5.340819e-01 1.266308e+00
The signature of the set
operations slightly differs from the ones for matrices because tensors can have arbitrary numbers of dimensions and to take advantage of the variadic arguments Args...
the indices have to be last in the operations signature:
[14]:
std::cout << rlt::get(device, t, 0, 1, 1) << std::endl;
T new_value = 1337;
rlt::set(device, t, new_value, 0, 1, 1);
std::cout << rlt::get(device, t, 0, 1, 1) << std::endl;
2.108090e+00
1.337000e+03
Tensors can be sliced by using view
and view_range
[15]:
auto mid3x3 = rlt::view(device, t, 1);
rlt::print(device, mid3x3);
8.660405e-01 1.061986e+00 6.006763e-01
2.661995e+00 -9.388391e-01 -1.549304e-01
9.058360e-02 -1.328507e+00 1.262284e+00
[16]:
auto first_rows = rlt::view(device, t, 0, rlt::tensor::ViewSpec<1>{});
std::cout << "First rows: " << std::endl;
rlt::print(device, first_rows);
std::cout << "Last rows: " << std::endl;
auto last_rows = rlt::view(device, t, 2, rlt::tensor::ViewSpec<1>{});
rlt::print(device, last_rows);
std::cout << "First cols: " << std::endl;
auto first_cols = rlt::view(device, t, 0, rlt::tensor::ViewSpec<2>{});
rlt::print(device, first_cols);
First rows:
-5.703804e-01 -3.422589e-01 1.008072e-01
8.660405e-01 1.061986e+00 6.006763e-01
2.677846e+00 -1.236785e+00 -9.119245e-02
Last rows:
5.376303e-01 3.618752e-01 -7.995225e-01
9.058360e-02 -1.328507e+00 1.262284e+00
5.642641e-01 5.340819e-01 1.266308e+00
First cols:
-5.703804e-01 -9.118625e-01 5.376303e-01
8.660405e-01 2.661995e+00 9.058360e-02
2.677846e+00 -8.944708e-01 5.642641e-01