{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "aba33bb1-8ea7-49a1-a0f7-be1501481371",
   "metadata": {},
   "source": [
    "# Containers\n",
    "\n",
    "\n",
    "  [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/rl-tools/documentation/binder?labpath=01-Containers.ipynb)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "fa73eb2e",
   "metadata": {},
   "source": [
    "Some examples of using containers (only 2D matrices for now)\n",
    "\n",
    "Since `RLtools` is a header-only library the compiler only needs to know where its `include` folder is located (cloned or mounted at `/usr/local/include/rl_tools` in the docker image). This is a standard location for header files and the `C_INCLUDE_PATH` is set to include it in the Dockerfile."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "d50b4e3d-808c-4354-a9fd-6a5bfa8dcc71",
   "metadata": {},
   "source": [
    "Most operations in `RLtools` are generic and work on any device that supports a C++ 17 compiler (standard library support not required). But there are some device-specific functions like random number generation that are device dependent and hence might require specific implementations that are and often can only be included on that particular device (e.g. Intel CPU, CUDA GPU) hence we include the CPU implementations in this example. In this case, the CPU implementations entail a dependency on a few standard library objects (`size_t`, random number generation, logging, etc.). At the same time also all the basic generic functions that operate e.g. over containers are included."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "92f790c3-4526-4d7d-9d6a-c26a3191ee94",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "#include <rl_tools/operations/cpu.h>"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "80c74b0a-cdbd-45de-b405-5c9b4cf60927",
   "metadata": {},
   "source": [
    "All objects in `RLtools` are encapsulated in the `rl_tools` namespace and there is no global state (not even for logging etc.). In programs using `RLtools` we usually abbreviate the namespace `rl_tools` to `rlt` and define three shorthands for frequently used types. Firstly, `DEVICE` is the selected device type, `T` is the floating point type used (usually `float` or `double`, where `float` can e.g. be preferable for vastly better performance on accelerators). Moreover, we define `TI` as the index type which usually should be the `size_t` for the device (to match the device's hardware and provide the best performance). All algorithms and data structures in `RLtools` are agnostic to these types by using the template metaprogramming capabilities of C++. Additionally the `DEVICE` type is usually used for a static, compile-time version of [multiple dispatch](https://en.wikipedia.org/wiki/Multiple_dispatch) to dispatch certain functions (like e.g. a neural network layer forward pass) to code that is optimized for a particular device. Through this design, the same higher-level algorithms can be executed on all sorts of devices from HPC clusters over workstations and laptops to smartphones, smartwatches, and microcontrollers without sacrificing performance. Through template metaprogramming e.g. all the matrix dimensions and the number of for-loop iterations are known a priori at compile time and can be used by the compiler to heavily optimize the code through loop unrolling, inlining etc."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "2fc4a281-4014-4283-a04a-b9648bd74a21",
   "metadata": {},
   "outputs": [],
   "source": [
    "namespace rlt = rl_tools;\n",
    "using DEVICE = rlt::devices::DefaultCPU;\n",
    "using T = float;\n",
    "using TI = typename DEVICE::index_t;"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "ba162f13-7340-43fe-97bb-0b35ad0e54b2",
   "metadata": {},
   "source": [
    "In the following we instantiate a device struct. The `DEVICE` struct can be empty and hence have no overhead but facilitate [tag dispatch](https://www.fluentcpp.com/2018/04/27/tag-dispatching/). It can also be used as a carrier for additional context that would otherwise be implemented as global state (e.g. logging through a Tensorboard logger). In the first example we will create a matrix and fill it with random numbers (from an isotropic, standard normal distribution) hence we define the initial seed for our random number generator which is instantiated depending on the device type. This allows us to easily change the `DEVICE` definition and have all downstream entities be appropriate for the particular device. Finally, we are creating a matrix. Particularly a dynamic (heap allocated) `3x3` matrix. The static, compile-time configuration of the matrix is defined by a specification type (`rlt::matrix::Specification<ELEMENT_TYPE, INDEX_TYPE, ROWS, COLS>`) that carries the types and compile-time constants. Compiling these attributes into a separate specification instead of having numerous template parameters on the `rlt::MatrixDynamic` type brings the benefit that writing functions that take matrices as input becomes easier as we just have to add a `typename SPEC` parameter to the template. We can still constrain the usage of a function with only matrices having particular attributes through e.g. `static_assert` and [SFINAE](https://en.wikipedia.org/wiki/Substitution_failure_is_not_an_error). Moreover we can add attributes without breaking functions that are written this way. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "31ca1109-d829-49da-97d8-ecd9d0a6ee60",
   "metadata": {},
   "outputs": [],
   "source": [
    "DEVICE device;\n",
    "TI seed = 1;\n",
    "auto rng = rlt::random::default_engine(DEVICE::SPEC::RANDOM(), seed);\n",
    "rlt::Matrix<rlt::matrix::Specification<T, TI, 3, 3>> m;"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "00d2ce6b-7751-44c9-b5ef-f99e27229b15",
   "metadata": {},
   "source": [
    "Since we created a dynamic matrix (which just consists of a pointer to the beginning of a memory space) we need to allocate it which is done using `rlt::malloc`. As with all functions in `RLtools` it takes the `device` as an input because it provides the (global) context and in this case can be helpful to e.g. align the allocated memory space to certain boundaries to allow for maximum read-write performance for a particular device. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "87d40a7c-6fa5-4506-aa4e-31914f7d17ef",
   "metadata": {},
   "outputs": [],
   "source": [
    "rlt::malloc(device, m);"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "67f32843-8fa0-446c-ad2f-fb1143b90367",
   "metadata": {},
   "source": [
    "`rlt::Matrix` defaults to a dynamic, heap-allocated matrix but we can override this behavior by defining `DYNAMIC_ALLOCATION=false` in the specification and get a statically, stack-allocated matrix which does not require `rlt::malloc` and `rlt::free`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "f7599bd9-1a08-4e5a-b537-122ae1e8d021",
   "metadata": {},
   "outputs": [],
   "source": [
    "constexpr bool DYNAMIC_ALLOCATION = false;\n",
    "rlt::Matrix<rlt::matrix::Specification<T, TI, 3, 3, DYNAMIC_ALLOCATION>> m_static;"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "80bb5d7f-1234-478a-b4bb-5e312ffa372f",
   "metadata": {},
   "source": [
    "The memory space is usually not initialized hence we fill it with random numbers (from a standard normal distribution):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "eeddd889-48a9-4c5a-b058-8701349d26c6",
   "metadata": {},
   "outputs": [],
   "source": [
    "rlt::randn(device, m, rng);"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "678a5363-50a2-4a89-9923-8357d2dcd249",
   "metadata": {},
   "source": [
    "Now we can print the allocated and filled matrix:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "137d15da-a5d5-4d7b-a295-cf553968575f",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "    0.849261    -0.102156    -0.256673 \n",
      "    0.904277    -0.538617    -0.506808 \n",
      "   -0.408192     0.271856    -0.311355 \n"
     ]
    }
   ],
   "source": [
    "rlt::print(device, m);"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "120caa3e-3cbd-4ad6-9cee-74315ec9f86e",
   "metadata": {},
   "source": [
    "We can access elements using the `get` and `set` commands:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "6597e322-8f3f-4036-b2d2-167b09fbe042",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.849261f"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "rlt::get(m, 0, 0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "e235ec8a-f6dd-4019-aaeb-38cf93e62725",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "    1.000000    -0.102156    -0.256673 \n",
      "    0.904277    -0.538617    -0.506808 \n",
      "   -0.408192     0.271856    -0.311355 \n"
     ]
    }
   ],
   "source": [
    "rlt::set(m, 0, 0, 1);\n",
    "rlt::print(device, m);"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "11f01f47-b2c9-40d0-95d4-8442a8a95992",
   "metadata": {},
   "source": [
    "`get` returns a reference so we could technically also set or increment it through the reference:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "32c2df35-49c3-49ad-9bc7-c96adafff9dc",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "   11.000000    -0.102156    -0.256673 \n",
      "    0.904277    -0.538617    -0.506808 \n",
      "   -0.408192     0.271856    -0.311355 \n"
     ]
    }
   ],
   "source": [
    "rlt::get(m, 0, 0) += 10;\n",
    "rlt::print(device, m);"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "d7ea254a-7c23-4dea-be26-d4c4c044f4c7",
   "metadata": {},
   "source": [
    "Writing through the reference is not very intuitive so we prefer `set` and `increment`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "1cef5ad7-6adf-457e-9f45-d87cbda0f412",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "    1.000000    -0.102156    -0.256673 \n",
      "    0.904277    -0.538617    -0.506808 \n",
      "   -0.408192     0.271856    -0.311355 \n"
     ]
    }
   ],
   "source": [
    "rlt::increment(m, 0, 0, -10);\n",
    "rlt::print(device, m);"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "139e2182-36ad-454e-b07f-ff21618ba14f",
   "metadata": {},
   "source": [
    "### Tensors\n",
    "\n",
    "Matrices are a simple, 2D data structure but to allow for more complex algorithms we have since introduce a tensor type that can hold arbitrary shapes:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "6793e9ba-ada7-4be4-a21d-e53e44d0aa2c",
   "metadata": {},
   "outputs": [],
   "source": [
    "using SHAPE = rlt::tensor::Shape<TI, 3, 3, 3>;\n",
    "using SPEC = rlt::tensor::Specification<T, TI, SHAPE, DYNAMIC_ALLOCATION>;\n",
    "rlt::Tensor<SPEC> t;"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0e02cbb9-63d1-489b-ad7e-e5d4fe9f4b0a",
   "metadata": {},
   "source": [
    "Tensors support most of the operations that matrices also support:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "ce6efd8f-4f17-4a8a-81d5-2802a8ba373c",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "dim[0] = 0: \n",
      "  -5.703804e-01  -3.422589e-01   1.008072e-01\n",
      "  -9.118625e-01   2.108090e+00   9.476308e-02\n",
      "   5.376303e-01   3.618752e-01  -7.995225e-01\n",
      "\n",
      "dim[0] = 1: \n",
      "   8.660405e-01   1.061986e+00   6.006763e-01\n",
      "   2.661995e+00  -9.388391e-01  -1.549304e-01\n",
      "   9.058360e-02  -1.328507e+00   1.262284e+00\n",
      "\n",
      "dim[0] = 2: \n",
      "   2.677846e+00  -1.236785e+00  -9.119245e-02\n",
      "  -8.944708e-01  -2.577802e+00   2.305977e+00\n",
      "   5.642641e-01   5.340819e-01   1.266308e+00\n",
      "\n"
     ]
    }
   ],
   "source": [
    "rlt::randn(device, t, rng);\n",
    "rlt::print(device, t);"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "56d94c4a-eff4-4c5b-a32e-ade3844304e4",
   "metadata": {},
   "source": [
    "The signature of the `set` operations slightly differs from the ones for matrices because tensors can have arbitrary numbers of dimensions and to take advantage of the variadic arguments `Args...` the indices have to be last in the operations signature:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "b90cbee6-11aa-4153-ab0d-ad718324cf75",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2.108090e+00\n",
      "1.337000e+03\n"
     ]
    }
   ],
   "source": [
    "std::cout << rlt::get(device, t, 0, 1, 1) << std::endl;\n",
    "T new_value = 1337;\n",
    "rlt::set(device, t, new_value, 0, 1, 1);\n",
    "std::cout << rlt::get(device, t, 0, 1, 1) << std::endl;"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2dbe5860-ff71-4d64-b90f-444bd98aebdc",
   "metadata": {},
   "source": [
    "Tensors can be sliced by using `view` and `view_range`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "a0bae728-b588-4639-b642-a026b1715dbc",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "   8.660405e-01   1.061986e+00   6.006763e-01\n",
      "   2.661995e+00  -9.388391e-01  -1.549304e-01\n",
      "   9.058360e-02  -1.328507e+00   1.262284e+00\n",
      "\n"
     ]
    }
   ],
   "source": [
    "auto mid3x3 = rlt::view(device, t, 1);\n",
    "rlt::print(device, mid3x3);"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "64df65bf-331f-4b7a-944b-c34327276fe5",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "First rows: \n",
      "  -5.703804e-01  -3.422589e-01   1.008072e-01\n",
      "   8.660405e-01   1.061986e+00   6.006763e-01\n",
      "   2.677846e+00  -1.236785e+00  -9.119245e-02\n",
      "\n",
      "Last rows: \n",
      "   5.376303e-01   3.618752e-01  -7.995225e-01\n",
      "   9.058360e-02  -1.328507e+00   1.262284e+00\n",
      "   5.642641e-01   5.340819e-01   1.266308e+00\n",
      "\n",
      "First cols: \n",
      "  -5.703804e-01  -9.118625e-01   5.376303e-01\n",
      "   8.660405e-01   2.661995e+00   9.058360e-02\n",
      "   2.677846e+00  -8.944708e-01   5.642641e-01\n",
      "\n"
     ]
    }
   ],
   "source": [
    "auto first_rows = rlt::view(device, t, 0, rlt::tensor::ViewSpec<1>{});\n",
    "std::cout << \"First rows: \" << std::endl;\n",
    "rlt::print(device, first_rows);\n",
    "std::cout << \"Last rows: \" << std::endl;\n",
    "auto last_rows = rlt::view(device, t, 2, rlt::tensor::ViewSpec<1>{});\n",
    "rlt::print(device, last_rows);\n",
    "std::cout << \"First cols: \" << std::endl;\n",
    "auto first_cols = rlt::view(device, t, 0, rlt::tensor::ViewSpec<2>{});\n",
    "rlt::print(device, first_cols);"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "C++17",
   "language": "C++17",
   "name": "xcpp17"
  },
  "language_info": {
   "codemirror_mode": "text/x-c++src",
   "file_extension": ".cpp",
   "mimetype": "text/x-c++src",
   "name": "c++",
   "version": "17"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}