{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "403cfae0-a11f-4844-b6f6-57d0d89cfeaf",
   "metadata": {},
   "source": [
    "# Recurrent Neural Networks (RNNs)\n",
    "\n",
    "RLtools initially only supports the [GRU (Gated Recurrent Unit)](https://en.wikipedia.org/wiki/Gated_recurrent_unit), a widely used and time-tested RNN architecture. \n",
    "\n",
    "In this example, we show the supervised training of a simple sequence model that learns to do the set operation `output = max(inputs)`. As in the previous examples, we import the required datastructures and models first:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "bd3b2a63-014b-4466-9f10-f268a3cc233b",
   "metadata": {},
   "outputs": [],
   "source": [
    "#include <vector>\n",
    "#include <algorithm>\n",
    "#include <numeric>\n",
    "#include <iostream>\n",
    "\n",
    "#define RL_TOOLS_BACKEND_ENABLE_OPENBLAS\n",
    "#include <rl_tools/operations/cpu_mux.h>\n",
    "#include <rl_tools/nn/optimizers/adam/instance/operations_generic.h>\n",
    "#include <rl_tools/nn/operations_cpu_mux.h>\n",
    "#include <rl_tools/nn/layers/gru/operations_generic.h>\n",
    "#include <rl_tools/nn_models/sequential/operations_generic.h>\n",
    "#include <rl_tools/nn/optimizers/adam/operations_generic.h>\n",
    "#include <rl_tools/nn/loss_functions/mse/operations_generic.h>\n",
    "namespace rlt = rl_tools;\n",
    "#pragma cling load(\"openblas\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f35217c1-ddb8-4569-bdb3-f6eaded15aae",
   "metadata": {},
   "source": [
    "Then setup the environment:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "3c335c55-ab11-42d2-9f7b-d565e809f226",
   "metadata": {},
   "outputs": [],
   "source": [
    "using T = float;\n",
    "using TYPE_POLICY = rlt::numeric_types::Policy<T>;\n",
    "using DEVICE = rlt::devices::DEVICE_FACTORY<rlt::devices::DefaultCPUSpecification>;\n",
    "using RNG = DEVICE::SPEC::RANDOM::ENGINE<>;\n",
    "using TI = typename DEVICE::index_t;\n",
    "constexpr bool DYNAMIC_ALLOCATION = true;"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bdb3d706-0baf-42bb-9464-a5ac98e5e9cc",
   "metadata": {},
   "source": [
    "Now we can configure the sequence model. Here we use a GRU that directly takes the input and transforms it into its latent space. This latent space is then decoded by the `OUTPUT_LAYER` to predict the outputs:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "5bd081ce-3e1e-4df2-ade5-41ff60717978",
   "metadata": {},
   "outputs": [],
   "source": [
    "constexpr TI SEQUENCE_LENGTH = 10;\n",
    "constexpr TI BATCH_SIZE = 10;\n",
    "constexpr TI INPUT_DIM = 1;\n",
    "constexpr TI HIDDEN_DIM = 8;\n",
    "constexpr TI OUTPUT_DIM = 1;\n",
    "using INPUT_SHAPE = rlt::tensor::Shape<TI, SEQUENCE_LENGTH, BATCH_SIZE, INPUT_DIM>;\n",
    "using GRU_CONFIG = rlt::nn::layers::gru::Configuration<TYPE_POLICY, TI, HIDDEN_DIM, rlt::nn::parameters::groups::Normal>;\n",
    "using GRU = rlt::nn::layers::gru::BindConfiguration<GRU_CONFIG>;\n",
    "using OUTPUT_LAYER_CONFIG = rlt::nn::layers::dense::Configuration<TYPE_POLICY, TI, OUTPUT_DIM, rlt::nn::activation_functions::IDENTITY>;\n",
    "using OUTPUT_LAYER = rlt::nn::layers::dense::BindConfiguration<OUTPUT_LAYER_CONFIG>;"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b981cd6c-da5e-4a15-b5de-ca8f2e9f8acb",
   "metadata": {},
   "source": [
    "As usual, we assemble these layers into a `nn_models::sequential` which is a sequence of layers and implements compile-time autodiff:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "25d8268d-5ce2-4ffd-a574-5dc3a1b60988",
   "metadata": {},
   "outputs": [],
   "source": [
    "template <typename T_CONTENT, typename T_NEXT_MODULE = rlt::nn_models::sequential::OutputModule>\n",
    "using Module = typename rlt::nn_models::sequential::Module<T_CONTENT, T_NEXT_MODULE>;\n",
    "\n",
    "using MODULE_CHAIN = Module<GRU, Module<OUTPUT_LAYER>>;\n",
    "using CAPABILITY = rlt::nn::capability::Gradient<rlt::nn::parameters::Adam>;\n",
    "using MODEL = rlt::nn_models::sequential::Build<CAPABILITY, MODULE_CHAIN, INPUT_SHAPE>;"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "797fac04-28d1-4843-abc9-db4f4e165ac4",
   "metadata": {},
   "source": [
    "We need an optimizer as well, of course:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "6ea581ab-039f-4cfd-8c24-24bfa9add4be",
   "metadata": {},
   "outputs": [],
   "source": [
    "struct ADAM_PARAMS: rlt::nn::optimizers::adam::DEFAULT_PARAMETERS_TENSORFLOW<TYPE_POLICY>{\n",
    "    static constexpr T ALPHA = 0.003;\n",
    "};\n",
    "using ADAM_SPEC = rlt::nn::optimizers::adam::Specification<TYPE_POLICY, TI, ADAM_PARAMS>;\n",
    "using OPTIMIZER = rlt::nn::optimizers::Adam<ADAM_SPEC>;"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5149c6ab-81a6-4fa3-8487-0bf6fb9fd0a8",
   "metadata": {},
   "source": [
    "Now we can instantiate, allocate and initialize the data structures:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "83d3788e-98b6-498e-9315-1a20a2fde77a",
   "metadata": {},
   "outputs": [],
   "source": [
    "constexpr TI DATASET_SIZE = 1000;\n",
    "constexpr TI TESTSET_SIZE = 100;\n",
    "\n",
    "DEVICE device;\n",
    "RNG rng;\n",
    "MODEL model;\n",
    "MODEL::Buffer<> buffer;\n",
    "using TEST_MODEL_TMP = MODEL::template CHANGE_BATCH_SIZE<TI, TESTSET_SIZE>; // inference only model with test set as batch size\n",
    "using TEST_MODEL = TEST_MODEL_TMP::template CHANGE_CAPABILITY<rlt::nn::capability::Forward<>>;\n",
    "TEST_MODEL test_model;\n",
    "TEST_MODEL::Buffer<> test_buffer;\n",
    "MODEL::State<> state;\n",
    "OPTIMIZER optimizer;\n",
    "rlt::Tensor<rlt::tensor::Specification<T, TI, MODEL::INPUT_SHAPE>> input;\n",
    "rlt::Tensor<rlt::tensor::Specification<T, TI, MODEL::OUTPUT_SHAPE>> output_target, d_output;\n",
    "\n",
    "using DATASET_SHAPE = rlt::tensor::Shape<TI, DATASET_SIZE, SEQUENCE_LENGTH, INPUT_DIM>;\n",
    "using DATASET_TARGET_SHAPE = rlt::tensor::Shape<TI, DATASET_SIZE, SEQUENCE_LENGTH, OUTPUT_DIM>;\n",
    "rlt::Tensor<rlt::tensor::Specification<T, TI, DATASET_SHAPE>> dataset_X;\n",
    "rlt::Tensor<rlt::tensor::Specification<T, TI, DATASET_TARGET_SHAPE>> dataset_y;\n",
    "\n",
    "using TESTSET_SHAPE = rlt::tensor::Shape<TI, TESTSET_SIZE, SEQUENCE_LENGTH, INPUT_DIM>;\n",
    "using TESTSET_TARGET_SHAPE = rlt::tensor::Shape<TI, TESTSET_SIZE, SEQUENCE_LENGTH, OUTPUT_DIM>;\n",
    "rlt::Tensor<rlt::tensor::Specification<T, TI, TESTSET_SHAPE>> testset_X;\n",
    "rlt::Tensor<rlt::tensor::Specification<T, TI, TESTSET_TARGET_SHAPE>> testset_y;\n",
    "using TESTSET_SHAPE_PERMUTED = rlt::tensor::Shape<TI, SEQUENCE_LENGTH, TESTSET_SIZE, INPUT_DIM>;\n",
    "using TESTSET_TARGET_SHAPE_PERMUTED = rlt::tensor::Shape<TI, SEQUENCE_LENGTH, TESTSET_SIZE, OUTPUT_DIM>;\n",
    "rlt::Tensor<rlt::tensor::Specification<T, TI, TESTSET_SHAPE_PERMUTED>> testset_X_permuted;\n",
    "rlt::Tensor<rlt::tensor::Specification<T, TI, TESTSET_TARGET_SHAPE_PERMUTED>> testset_y_permuted;\n",
    "rlt::Tensor<rlt::tensor::Specification<T, TI, TESTSET_TARGET_SHAPE_PERMUTED>> testset_output_permuted;\n",
    "\n",
    "rlt::init(device);\n",
    "rlt::malloc(device, rng);\n",
    "constexpr TI SEED = 0;\n",
    "rlt::init(device, rng, SEED);\n",
    "rlt::malloc(device, model);\n",
    "rlt::malloc(device, test_model);\n",
    "rlt::malloc(device, buffer);\n",
    "rlt::malloc(device, test_buffer);\n",
    "rlt::malloc(device, state);\n",
    "rlt::malloc(device, optimizer);\n",
    "rlt::malloc(device, input);\n",
    "rlt::malloc(device, output_target);\n",
    "rlt::malloc(device, d_output);\n",
    "rlt::malloc(device, dataset_X);\n",
    "rlt::malloc(device, dataset_y);\n",
    "rlt::malloc(device, testset_X);\n",
    "rlt::malloc(device, testset_y);\n",
    "rlt::malloc(device, testset_X_permuted);\n",
    "rlt::malloc(device, testset_y_permuted);\n",
    "rlt::malloc(device, testset_output_permuted);\n",
    "rlt::init(device, optimizer);\n",
    "rlt::init_weights(device, model, rng);\n",
    "rlt::reset_optimizer_state(device, optimizer, model);"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "41b87ec4-bddb-4059-9232-a0850231ddc2",
   "metadata": {},
   "source": [
    "The toy task we are facing here is `output = max(inputs)` hence we sample random numbers from a Gaussian and then calculate the max for the target values:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "f2ba5600-693d-4d2d-9a21-ff9c66c681c5",
   "metadata": {},
   "outputs": [],
   "source": [
    "template <typename DATASET_X, typename DATASET_Y>\n",
    "void max_dataset(DATASET_X& dataset_X, DATASET_Y& dataset_y){\n",
    "    static_assert(DATASET_X::SHAPE::FIRST == DATASET_Y::SHAPE::FIRST);\n",
    "    static_assert(DATASET_X::SHAPE::template GET<1> == DATASET_Y::SHAPE::template GET<1>);\n",
    "    rlt::randn(device, dataset_X, rng);\n",
    "    for(TI sample_i = 0; sample_i < DATASET_X::SHAPE::FIRST; sample_i++){\n",
    "        T max;\n",
    "        bool max_set = false;\n",
    "        for(TI step_i = 0; step_i < DATASET_X::SHAPE::template GET<1>; step_i++){\n",
    "            T el = rlt::get(device, dataset_X, sample_i, step_i, 0);\n",
    "            if(!max_set || el > max){\n",
    "                max_set = true;\n",
    "                max = el;\n",
    "            }\n",
    "            rlt::set(device, dataset_y, max, sample_i, step_i, 0);\n",
    "        }\n",
    "    }\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8dce0e0a-514c-4634-80f9-a4ec8824a39a",
   "metadata": {},
   "source": [
    "We want to generate a training and test set. We created a `test_model` that natively operates on `BATCH_SIZE = TESTSET_SIZE` so we can directly feed the testset into it without creating an additional, batched loop. The standard input format in RLtools is `(SEQUENCE_STEPS x BATCH_SAMPLES x FEATURES)` hence we permute the dataset generated with the previous function:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "6652e672-fe94-4ac2-8a55-98fd19b25b73",
   "metadata": {},
   "outputs": [],
   "source": [
    "max_dataset(dataset_X, dataset_y);\n",
    "max_dataset(testset_X, testset_y);\n",
    "// models operate on (SEQUENCE_STEP x BATCH_SIZE x FEATURE_DIM) for performance reasons:\n",
    "auto permuted_X = rlt::permute(device, testset_X, rlt::tensor::PermutationSpec<0, 1>{});\n",
    "auto permuted_y = rlt::permute(device, testset_y, rlt::tensor::PermutationSpec<0, 1>{});\n",
    "rlt::copy(device, device, permuted_X, testset_X_permuted);\n",
    "rlt::copy(device, device, permuted_y, testset_y_permuted);"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8640cf43-9f56-4612-bf5e-fda5efc39737",
   "metadata": {},
   "source": [
    "Here is an example of an input sequence and the expected output:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "189c5dd9-ef4b-4bdf-85e7-e2446be1afb2",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Input: \n",
      "  -1.507621e+00\n",
      "   1.071986e+00\n",
      "   8.269271e-01\n",
      "   1.601774e+00\n",
      "  -1.074195e+00\n",
      "  -5.420533e-01\n",
      "  -6.830205e-01\n",
      "   1.492320e+00\n",
      "   6.583855e-02\n",
      "   9.513746e-01\n",
      "\n",
      "Expected output: \n",
      "  -1.507621e+00\n",
      "   1.071986e+00\n",
      "   1.071986e+00\n",
      "   1.601774e+00\n",
      "   1.601774e+00\n",
      "   1.601774e+00\n",
      "   1.601774e+00\n",
      "   1.601774e+00\n",
      "   1.601774e+00\n",
      "   1.601774e+00\n",
      "\n"
     ]
    }
   ],
   "source": [
    "std::cout << \"Input: \\n\";\n",
    "rlt::print(device, rlt::view(device, dataset_X, 0));\n",
    "std::cout << \"Expected output: \" << std::endl;\n",
    "rlt::print(device, rlt::view(device, dataset_y, 0));"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "32a881a7-455c-4d23-a4ec-a4a1e7e5025c",
   "metadata": {},
   "source": [
    "Now we have everything in place to train the model:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "2de66bf0-102b-46db-a897-6b02c6d36c69",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Epoch 5 train loss: 4.290424e-02 test loss: 3.762348e-02\n",
      "Epoch 10 train loss: 1.859014e-02 test loss: 1.703962e-02\n",
      "Epoch 15 train loss: 1.132567e-02 test loss: 9.239300e-03\n",
      "Epoch 20 train loss: 8.027696e-03 test loss: 5.971038e-03\n",
      "Epoch 25 train loss: 6.056546e-03 test loss: 4.463931e-03\n",
      "Epoch 30 train loss: 4.469625e-03 test loss: 3.213925e-03\n",
      "Epoch 35 train loss: 3.648721e-03 test loss: 3.196432e-03\n",
      "Epoch 40 train loss: 3.001935e-03 test loss: 1.977837e-03\n",
      "Epoch 45 train loss: 2.315897e-03 test loss: 1.590888e-03\n",
      "Epoch 50 train loss: 1.958636e-03 test loss: 1.422130e-03\n",
      "Epoch 55 train loss: 1.633920e-03 test loss: 1.367668e-03\n",
      "Epoch 60 train loss: 1.455133e-03 test loss: 2.061459e-03\n",
      "Epoch 65 train loss: 1.359705e-03 test loss: 9.891201e-04\n",
      "Epoch 70 train loss: 1.101305e-03 test loss: 1.248149e-03\n",
      "Epoch 75 train loss: 1.034629e-03 test loss: 1.741087e-03\n",
      "Epoch 80 train loss: 9.677236e-04 test loss: 8.029980e-04\n",
      "Epoch 85 train loss: 9.137033e-04 test loss: 7.385706e-04\n",
      "Epoch 90 train loss: 8.515422e-04 test loss: 8.916398e-04\n",
      "Epoch 95 train loss: 7.121030e-04 test loss: 6.450592e-04\n",
      "Epoch 100 train loss: 6.727659e-04 test loss: 9.201005e-04\n"
     ]
    }
   ],
   "source": [
    "std::vector<TI> indices(DATASET_SIZE);\n",
    "std::iota(indices.begin(), indices.end(), 0); // fill with range 0..DATASET_SIZE\n",
    "constexpr TI N_EPOCH = 100;\n",
    "for(TI epoch_i = 0; epoch_i < N_EPOCH; epoch_i++){\n",
    "    T epoch_loss = 0;\n",
    "    std::shuffle(indices.begin(), indices.end(), rng.engine);\n",
    "    for(TI batch_i = 0; batch_i < DATASET_SIZE / BATCH_SIZE; batch_i++){\n",
    "        for(TI sequence_i = 0; sequence_i < BATCH_SIZE; sequence_i++){\n",
    "            TI index = BATCH_SIZE * batch_i + sequence_i;\n",
    "            auto input_sample = rlt::view(device, input, sequence_i, rlt::tensor::ViewSpec<1>{});\n",
    "            auto output_sample = rlt::view(device, output_target, sequence_i, rlt::tensor::ViewSpec<1>{});\n",
    "            auto dataset_input_sample = rlt::view(device, dataset_X, indices[index], rlt::tensor::ViewSpec<0>{});\n",
    "            auto dataset_output_sample = rlt::view(device, dataset_y, indices[index], rlt::tensor::ViewSpec<0>{});\n",
    "            rlt::copy(device, device, dataset_input_sample, input_sample);\n",
    "            rlt::copy(device, device, dataset_output_sample, output_sample);\n",
    "        }\n",
    "        rlt::forward(device, model, input, buffer, rng);\n",
    "        auto output = rlt::output(device, model);\n",
    "        auto output_matrix_view = rlt::matrix_view(device, output);\n",
    "        auto output_target_matrix_view = rlt::matrix_view(device, output_target);\n",
    "        auto d_output_matrix_view = rlt::matrix_view(device, d_output);\n",
    "        rlt::nn::loss_functions::mse::gradient(device, output_matrix_view, output_target_matrix_view, d_output_matrix_view);\n",
    "        T batch_loss = rlt::nn::loss_functions::mse::evaluate(device, output_matrix_view, output_target_matrix_view); \n",
    "        epoch_loss += batch_loss;\n",
    "        rlt::zero_gradient(device, model);\n",
    "        rlt::backward(device, model, input, d_output, buffer);\n",
    "        rlt::step(device, optimizer, model);\n",
    "    }\n",
    "    epoch_loss /= DATASET_SIZE / BATCH_SIZE;\n",
    "    if((epoch_i+1) % 5 == 0){\n",
    "        rlt::copy(device, device, model, test_model);\n",
    "        rlt::evaluate(device, test_model, testset_X_permuted, testset_output_permuted, test_buffer, rng);\n",
    "        T test_loss = rlt::nn::loss_functions::mse::evaluate(device, testset_output_permuted, testset_y_permuted);\n",
    "        std::cout << \"Epoch \" << (epoch_i+1) << \" train loss: \" << epoch_loss << \" test loss: \" << test_loss << std::endl;\n",
    "    }\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "42bfcf13-4c53-4f67-8715-b8de786a5374",
   "metadata": {},
   "source": [
    "Now we check if the predictions of the model are plausible:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "daac2ece-8e9d-4e1f-b19c-2732d22b2741",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Test sequence 0\n",
      "Input => Target ~ Predicted\n",
      "   0.65 =>  0.65 ~  0.66\n",
      "   0.89 =>  0.89 ~  0.88\n",
      "   1.38 =>  1.38 ~  1.37\n",
      "   0.74 =>  1.38 ~  1.37\n",
      "  -1.00 =>  1.38 ~  1.37\n",
      "   0.06 =>  1.38 ~  1.38\n",
      "   0.07 =>  1.38 ~  1.39\n",
      "  -0.98 =>  1.38 ~  1.39\n",
      "   1.20 =>  1.38 ~  1.39\n",
      "   0.12 =>  1.38 ~  1.40\n",
      "Test sequence 1\n",
      "Input => Target ~ Predicted\n",
      "  -0.08 => -0.08 ~ -0.09\n",
      "   0.21 =>  0.21 ~  0.21\n",
      "  -0.19 =>  0.21 ~  0.21\n",
      "  -1.47 =>  0.21 ~  0.21\n",
      "  -0.16 =>  0.21 ~  0.19\n",
      "  -1.06 =>  0.21 ~  0.21\n",
      "  -0.36 =>  0.21 ~  0.20\n",
      "   0.62 =>  0.62 ~  0.59\n",
      "   0.31 =>  0.62 ~  0.60\n",
      "  -0.30 =>  0.62 ~  0.63\n",
      "Test sequence 2\n",
      "Input => Target ~ Predicted\n",
      "   1.29 =>  1.29 ~  1.32\n",
      "   0.08 =>  1.29 ~  1.30\n",
      "   0.76 =>  1.29 ~  1.30\n",
      "   0.14 =>  1.29 ~  1.31\n",
      "   0.43 =>  1.29 ~  1.32\n",
      "  -1.59 =>  1.29 ~  1.31\n",
      "  -0.13 =>  1.29 ~  1.32\n",
      "  -0.30 =>  1.29 ~  1.33\n",
      "  -0.64 =>  1.29 ~  1.32\n",
      "   1.25 =>  1.29 ~  1.35\n",
      "Test sequence 3\n",
      "Input => Target ~ Predicted\n",
      "  -0.44 => -0.44 ~ -0.45\n",
      "   0.62 =>  0.62 ~  0.63\n",
      "  -0.06 =>  0.62 ~  0.64\n",
      "   1.33 =>  1.33 ~  1.34\n",
      "   1.23 =>  1.33 ~  1.37\n",
      "   0.40 =>  1.33 ~  1.37\n",
      "   0.32 =>  1.33 ~  1.38\n",
      "  -0.11 =>  1.33 ~  1.39\n",
      "  -0.75 =>  1.33 ~  1.38\n",
      "   0.25 =>  1.33 ~  1.39\n",
      "Test sequence 4\n",
      "Input => Target ~ Predicted\n",
      "  -0.10 => -0.10 ~ -0.11\n",
      "   0.17 =>  0.17 ~  0.17\n",
      "   0.01 =>  0.17 ~  0.19\n",
      "   0.66 =>  0.66 ~  0.64\n",
      "   0.79 =>  0.79 ~  0.80\n",
      "  -0.81 =>  0.79 ~  0.81\n",
      "  -0.85 =>  0.79 ~  0.82\n",
      "  -0.19 =>  0.79 ~  0.83\n",
      "   0.05 =>  0.79 ~  0.83\n",
      "   0.55 =>  0.79 ~  0.82\n"
     ]
    }
   ],
   "source": [
    "std::cout << std::fixed << std::setprecision(2); // fixed point printing\n",
    "for(TI sequence_i = 0; sequence_i < 5; sequence_i++){\n",
    "    std::cout << \"Test sequence \" << sequence_i << std::endl;\n",
    "    std::cout << \"Input => Target ~ Predicted\" << std::endl;\n",
    "    for(TI step_i = 0; step_i < SEQUENCE_LENGTH; step_i++){\n",
    "        std::cout << \"  \" << std::setw(5) << rlt::get(device, testset_X_permuted, step_i, sequence_i, 0);\n",
    "        std::cout << \" => \" << std::setw(5) << rlt::get(device, testset_y_permuted, step_i, sequence_i, 0);\n",
    "        std::cout << \" ~ \" << std::setw(5) << rlt::get(device, testset_output_permuted, step_i, sequence_i, 0);\n",
    "        std::cout << std::endl;\n",
    "    }\n",
    "}"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "C++17",
   "language": "C++17",
   "name": "xcpp17"
  },
  "language_info": {
   "codemirror_mode": "text/x-c++src",
   "file_extension": ".cpp",
   "mimetype": "text/x-c++src",
   "name": "c++",
   "version": "17"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}