{ "cells": [ { "cell_type": "markdown", "id": "403cfae0-a11f-4844-b6f6-57d0d89cfeaf", "metadata": {}, "source": [ "# Recurrent Neural Networks (RNNs)\n", "\n", "RLtools initially only supports the [GRU (Gated Recurrent Unit)](https://en.wikipedia.org/wiki/Gated_recurrent_unit), a widely used and time-tested RNN architecture. \n", "\n", "In this example, we show the supervised training of a simple sequence model that learns to do the set operation `output = max(inputs)`. As in the previous examples, we import the required datastructures and models first:" ] }, { "cell_type": "code", "execution_count": 1, "id": "bd3b2a63-014b-4466-9f10-f268a3cc233b", "metadata": {}, "outputs": [], "source": [ "#include \n", "#include \n", "#include \n", "#include \n", "\n", "#define RL_TOOLS_BACKEND_ENABLE_OPENBLAS\n", "#include \n", "#include \n", "#include \n", "#include \n", "#include \n", "#include \n", "#include \n", "namespace rlt = rl_tools;\n", "#pragma cling load(\"openblas\")" ] }, { "cell_type": "markdown", "id": "f35217c1-ddb8-4569-bdb3-f6eaded15aae", "metadata": {}, "source": [ "Then setup the environment:" ] }, { "cell_type": "code", "execution_count": 2, "id": "3c335c55-ab11-42d2-9f7b-d565e809f226", "metadata": {}, "outputs": [], "source": [ "using T = float;\n", "using TYPE_POLICY = rlt::numeric_types::Policy;\n", "using DEVICE = rlt::devices::DEVICE_FACTORY;\n", "using RNG = DEVICE::SPEC::RANDOM::ENGINE<>;\n", "using TI = typename DEVICE::index_t;\n", "constexpr bool DYNAMIC_ALLOCATION = true;" ] }, { "cell_type": "markdown", "id": "bdb3d706-0baf-42bb-9464-a5ac98e5e9cc", "metadata": {}, "source": [ "Now we can configure the sequence model. Here we use a GRU that directly takes the input and transforms it into its latent space. This latent space is then decoded by the `OUTPUT_LAYER` to predict the outputs:" ] }, { "cell_type": "code", "execution_count": 3, "id": "5bd081ce-3e1e-4df2-ade5-41ff60717978", "metadata": {}, "outputs": [], "source": [ "constexpr TI SEQUENCE_LENGTH = 10;\n", "constexpr TI BATCH_SIZE = 10;\n", "constexpr TI INPUT_DIM = 1;\n", "constexpr TI HIDDEN_DIM = 8;\n", "constexpr TI OUTPUT_DIM = 1;\n", "using INPUT_SHAPE = rlt::tensor::Shape;\n", "using GRU_CONFIG = rlt::nn::layers::gru::Configuration;\n", "using GRU = rlt::nn::layers::gru::BindConfiguration;\n", "using OUTPUT_LAYER_CONFIG = rlt::nn::layers::dense::Configuration;\n", "using OUTPUT_LAYER = rlt::nn::layers::dense::BindConfiguration;" ] }, { "cell_type": "markdown", "id": "b981cd6c-da5e-4a15-b5de-ca8f2e9f8acb", "metadata": {}, "source": [ "As usual, we assemble these layers into a `nn_models::sequential` which is a sequence of layers and implements compile-time autodiff:" ] }, { "cell_type": "code", "execution_count": 4, "id": "25d8268d-5ce2-4ffd-a574-5dc3a1b60988", "metadata": {}, "outputs": [], "source": [ "template \n", "using Module = typename rlt::nn_models::sequential::Module;\n", "\n", "using MODULE_CHAIN = Module>;\n", "using CAPABILITY = rlt::nn::capability::Gradient;\n", "using MODEL = rlt::nn_models::sequential::Build;" ] }, { "cell_type": "markdown", "id": "797fac04-28d1-4843-abc9-db4f4e165ac4", "metadata": {}, "source": [ "We need an optimizer as well, of course:" ] }, { "cell_type": "code", "execution_count": 5, "id": "6ea581ab-039f-4cfd-8c24-24bfa9add4be", "metadata": {}, "outputs": [], "source": [ "struct ADAM_PARAMS: rlt::nn::optimizers::adam::DEFAULT_PARAMETERS_TENSORFLOW{\n", " static constexpr T ALPHA = 0.003;\n", "};\n", "using ADAM_SPEC = rlt::nn::optimizers::adam::Specification;\n", "using OPTIMIZER = rlt::nn::optimizers::Adam;" ] }, { "cell_type": "markdown", "id": "5149c6ab-81a6-4fa3-8487-0bf6fb9fd0a8", "metadata": {}, "source": [ "Now we can instantiate, allocate and initialize the data structures:" ] }, { "cell_type": "code", "execution_count": 6, "id": "83d3788e-98b6-498e-9315-1a20a2fde77a", "metadata": {}, "outputs": [], "source": [ "constexpr TI DATASET_SIZE = 1000;\n", "constexpr TI TESTSET_SIZE = 100;\n", "\n", "DEVICE device;\n", "RNG rng;\n", "MODEL model;\n", "MODEL::Buffer<> buffer;\n", "using TEST_MODEL_TMP = MODEL::template CHANGE_BATCH_SIZE; // inference only model with test set as batch size\n", "using TEST_MODEL = TEST_MODEL_TMP::template CHANGE_CAPABILITY>;\n", "TEST_MODEL test_model;\n", "TEST_MODEL::Buffer<> test_buffer;\n", "MODEL::State<> state;\n", "OPTIMIZER optimizer;\n", "rlt::Tensor> input;\n", "rlt::Tensor> output_target, d_output;\n", "\n", "using DATASET_SHAPE = rlt::tensor::Shape;\n", "using DATASET_TARGET_SHAPE = rlt::tensor::Shape;\n", "rlt::Tensor> dataset_X;\n", "rlt::Tensor> dataset_y;\n", "\n", "using TESTSET_SHAPE = rlt::tensor::Shape;\n", "using TESTSET_TARGET_SHAPE = rlt::tensor::Shape;\n", "rlt::Tensor> testset_X;\n", "rlt::Tensor> testset_y;\n", "using TESTSET_SHAPE_PERMUTED = rlt::tensor::Shape;\n", "using TESTSET_TARGET_SHAPE_PERMUTED = rlt::tensor::Shape;\n", "rlt::Tensor> testset_X_permuted;\n", "rlt::Tensor> testset_y_permuted;\n", "rlt::Tensor> testset_output_permuted;\n", "\n", "rlt::init(device);\n", "rlt::malloc(device, rng);\n", "constexpr TI SEED = 0;\n", "rlt::init(device, rng, SEED);\n", "rlt::malloc(device, model);\n", "rlt::malloc(device, test_model);\n", "rlt::malloc(device, buffer);\n", "rlt::malloc(device, test_buffer);\n", "rlt::malloc(device, state);\n", "rlt::malloc(device, optimizer);\n", "rlt::malloc(device, input);\n", "rlt::malloc(device, output_target);\n", "rlt::malloc(device, d_output);\n", "rlt::malloc(device, dataset_X);\n", "rlt::malloc(device, dataset_y);\n", "rlt::malloc(device, testset_X);\n", "rlt::malloc(device, testset_y);\n", "rlt::malloc(device, testset_X_permuted);\n", "rlt::malloc(device, testset_y_permuted);\n", "rlt::malloc(device, testset_output_permuted);\n", "rlt::init(device, optimizer);\n", "rlt::init_weights(device, model, rng);\n", "rlt::reset_optimizer_state(device, optimizer, model);" ] }, { "cell_type": "markdown", "id": "41b87ec4-bddb-4059-9232-a0850231ddc2", "metadata": {}, "source": [ "The toy task we are facing here is `output = max(inputs)` hence we sample random numbers from a Gaussian and then calculate the max for the target values:" ] }, { "cell_type": "code", "execution_count": 7, "id": "f2ba5600-693d-4d2d-9a21-ff9c66c681c5", "metadata": {}, "outputs": [], "source": [ "template \n", "void max_dataset(DATASET_X& dataset_X, DATASET_Y& dataset_y){\n", " static_assert(DATASET_X::SHAPE::FIRST == DATASET_Y::SHAPE::FIRST);\n", " static_assert(DATASET_X::SHAPE::template GET<1> == DATASET_Y::SHAPE::template GET<1>);\n", " rlt::randn(device, dataset_X, rng);\n", " for(TI sample_i = 0; sample_i < DATASET_X::SHAPE::FIRST; sample_i++){\n", " T max;\n", " bool max_set = false;\n", " for(TI step_i = 0; step_i < DATASET_X::SHAPE::template GET<1>; step_i++){\n", " T el = rlt::get(device, dataset_X, sample_i, step_i, 0);\n", " if(!max_set || el > max){\n", " max_set = true;\n", " max = el;\n", " }\n", " rlt::set(device, dataset_y, max, sample_i, step_i, 0);\n", " }\n", " }\n", "}" ] }, { "cell_type": "markdown", "id": "8dce0e0a-514c-4634-80f9-a4ec8824a39a", "metadata": {}, "source": [ "We want to generate a training and test set. We created a `test_model` that natively operates on `BATCH_SIZE = TESTSET_SIZE` so we can directly feed the testset into it without creating an additional, batched loop. The standard input format in RLtools is `(SEQUENCE_STEPS x BATCH_SAMPLES x FEATURES)` hence we permute the dataset generated with the previous function:" ] }, { "cell_type": "code", "execution_count": 8, "id": "6652e672-fe94-4ac2-8a55-98fd19b25b73", "metadata": {}, "outputs": [], "source": [ "max_dataset(dataset_X, dataset_y);\n", "max_dataset(testset_X, testset_y);\n", "// models operate on (SEQUENCE_STEP x BATCH_SIZE x FEATURE_DIM) for performance reasons:\n", "auto permuted_X = rlt::permute(device, testset_X, rlt::tensor::PermutationSpec<0, 1>{});\n", "auto permuted_y = rlt::permute(device, testset_y, rlt::tensor::PermutationSpec<0, 1>{});\n", "rlt::copy(device, device, permuted_X, testset_X_permuted);\n", "rlt::copy(device, device, permuted_y, testset_y_permuted);" ] }, { "cell_type": "markdown", "id": "8640cf43-9f56-4612-bf5e-fda5efc39737", "metadata": {}, "source": [ "Here is an example of an input sequence and the expected output:" ] }, { "cell_type": "code", "execution_count": 9, "id": "189c5dd9-ef4b-4bdf-85e7-e2446be1afb2", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Input: \n", " -1.507621e+00\n", " 1.071986e+00\n", " 8.269271e-01\n", " 1.601774e+00\n", " -1.074195e+00\n", " -5.420533e-01\n", " -6.830205e-01\n", " 1.492320e+00\n", " 6.583855e-02\n", " 9.513746e-01\n", "\n", "Expected output: \n", " -1.507621e+00\n", " 1.071986e+00\n", " 1.071986e+00\n", " 1.601774e+00\n", " 1.601774e+00\n", " 1.601774e+00\n", " 1.601774e+00\n", " 1.601774e+00\n", " 1.601774e+00\n", " 1.601774e+00\n", "\n" ] } ], "source": [ "std::cout << \"Input: \\n\";\n", "rlt::print(device, rlt::view(device, dataset_X, 0));\n", "std::cout << \"Expected output: \" << std::endl;\n", "rlt::print(device, rlt::view(device, dataset_y, 0));" ] }, { "cell_type": "markdown", "id": "32a881a7-455c-4d23-a4ec-a4a1e7e5025c", "metadata": {}, "source": [ "Now we have everything in place to train the model:" ] }, { "cell_type": "code", "execution_count": 10, "id": "2de66bf0-102b-46db-a897-6b02c6d36c69", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Epoch 5 train loss: 4.290424e-02 test loss: 3.762348e-02\n", "Epoch 10 train loss: 1.859014e-02 test loss: 1.703962e-02\n", "Epoch 15 train loss: 1.132567e-02 test loss: 9.239300e-03\n", "Epoch 20 train loss: 8.027696e-03 test loss: 5.971038e-03\n", "Epoch 25 train loss: 6.056546e-03 test loss: 4.463931e-03\n", "Epoch 30 train loss: 4.469625e-03 test loss: 3.213925e-03\n", "Epoch 35 train loss: 3.648721e-03 test loss: 3.196432e-03\n", "Epoch 40 train loss: 3.001935e-03 test loss: 1.977837e-03\n", "Epoch 45 train loss: 2.315897e-03 test loss: 1.590888e-03\n", "Epoch 50 train loss: 1.958636e-03 test loss: 1.422130e-03\n", "Epoch 55 train loss: 1.633920e-03 test loss: 1.367668e-03\n", "Epoch 60 train loss: 1.455133e-03 test loss: 2.061459e-03\n", "Epoch 65 train loss: 1.359705e-03 test loss: 9.891201e-04\n", "Epoch 70 train loss: 1.101305e-03 test loss: 1.248149e-03\n", "Epoch 75 train loss: 1.034629e-03 test loss: 1.741087e-03\n", "Epoch 80 train loss: 9.677236e-04 test loss: 8.029980e-04\n", "Epoch 85 train loss: 9.137033e-04 test loss: 7.385706e-04\n", "Epoch 90 train loss: 8.515422e-04 test loss: 8.916398e-04\n", "Epoch 95 train loss: 7.121030e-04 test loss: 6.450592e-04\n", "Epoch 100 train loss: 6.727659e-04 test loss: 9.201005e-04\n" ] } ], "source": [ "std::vector indices(DATASET_SIZE);\n", "std::iota(indices.begin(), indices.end(), 0); // fill with range 0..DATASET_SIZE\n", "constexpr TI N_EPOCH = 100;\n", "for(TI epoch_i = 0; epoch_i < N_EPOCH; epoch_i++){\n", " T epoch_loss = 0;\n", " std::shuffle(indices.begin(), indices.end(), rng.engine);\n", " for(TI batch_i = 0; batch_i < DATASET_SIZE / BATCH_SIZE; batch_i++){\n", " for(TI sequence_i = 0; sequence_i < BATCH_SIZE; sequence_i++){\n", " TI index = BATCH_SIZE * batch_i + sequence_i;\n", " auto input_sample = rlt::view(device, input, sequence_i, rlt::tensor::ViewSpec<1>{});\n", " auto output_sample = rlt::view(device, output_target, sequence_i, rlt::tensor::ViewSpec<1>{});\n", " auto dataset_input_sample = rlt::view(device, dataset_X, indices[index], rlt::tensor::ViewSpec<0>{});\n", " auto dataset_output_sample = rlt::view(device, dataset_y, indices[index], rlt::tensor::ViewSpec<0>{});\n", " rlt::copy(device, device, dataset_input_sample, input_sample);\n", " rlt::copy(device, device, dataset_output_sample, output_sample);\n", " }\n", " rlt::forward(device, model, input, buffer, rng);\n", " auto output = rlt::output(device, model);\n", " auto output_matrix_view = rlt::matrix_view(device, output);\n", " auto output_target_matrix_view = rlt::matrix_view(device, output_target);\n", " auto d_output_matrix_view = rlt::matrix_view(device, d_output);\n", " rlt::nn::loss_functions::mse::gradient(device, output_matrix_view, output_target_matrix_view, d_output_matrix_view);\n", " T batch_loss = rlt::nn::loss_functions::mse::evaluate(device, output_matrix_view, output_target_matrix_view); \n", " epoch_loss += batch_loss;\n", " rlt::zero_gradient(device, model);\n", " rlt::backward(device, model, input, d_output, buffer);\n", " rlt::step(device, optimizer, model);\n", " }\n", " epoch_loss /= DATASET_SIZE / BATCH_SIZE;\n", " if((epoch_i+1) % 5 == 0){\n", " rlt::copy(device, device, model, test_model);\n", " rlt::evaluate(device, test_model, testset_X_permuted, testset_output_permuted, test_buffer, rng);\n", " T test_loss = rlt::nn::loss_functions::mse::evaluate(device, testset_output_permuted, testset_y_permuted);\n", " std::cout << \"Epoch \" << (epoch_i+1) << \" train loss: \" << epoch_loss << \" test loss: \" << test_loss << std::endl;\n", " }\n", "}" ] }, { "cell_type": "markdown", "id": "42bfcf13-4c53-4f67-8715-b8de786a5374", "metadata": {}, "source": [ "Now we check if the predictions of the model are plausible:" ] }, { "cell_type": "code", "execution_count": 11, "id": "daac2ece-8e9d-4e1f-b19c-2732d22b2741", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Test sequence 0\n", "Input => Target ~ Predicted\n", " 0.65 => 0.65 ~ 0.66\n", " 0.89 => 0.89 ~ 0.88\n", " 1.38 => 1.38 ~ 1.37\n", " 0.74 => 1.38 ~ 1.37\n", " -1.00 => 1.38 ~ 1.37\n", " 0.06 => 1.38 ~ 1.38\n", " 0.07 => 1.38 ~ 1.39\n", " -0.98 => 1.38 ~ 1.39\n", " 1.20 => 1.38 ~ 1.39\n", " 0.12 => 1.38 ~ 1.40\n", "Test sequence 1\n", "Input => Target ~ Predicted\n", " -0.08 => -0.08 ~ -0.09\n", " 0.21 => 0.21 ~ 0.21\n", " -0.19 => 0.21 ~ 0.21\n", " -1.47 => 0.21 ~ 0.21\n", " -0.16 => 0.21 ~ 0.19\n", " -1.06 => 0.21 ~ 0.21\n", " -0.36 => 0.21 ~ 0.20\n", " 0.62 => 0.62 ~ 0.59\n", " 0.31 => 0.62 ~ 0.60\n", " -0.30 => 0.62 ~ 0.63\n", "Test sequence 2\n", "Input => Target ~ Predicted\n", " 1.29 => 1.29 ~ 1.32\n", " 0.08 => 1.29 ~ 1.30\n", " 0.76 => 1.29 ~ 1.30\n", " 0.14 => 1.29 ~ 1.31\n", " 0.43 => 1.29 ~ 1.32\n", " -1.59 => 1.29 ~ 1.31\n", " -0.13 => 1.29 ~ 1.32\n", " -0.30 => 1.29 ~ 1.33\n", " -0.64 => 1.29 ~ 1.32\n", " 1.25 => 1.29 ~ 1.35\n", "Test sequence 3\n", "Input => Target ~ Predicted\n", " -0.44 => -0.44 ~ -0.45\n", " 0.62 => 0.62 ~ 0.63\n", " -0.06 => 0.62 ~ 0.64\n", " 1.33 => 1.33 ~ 1.34\n", " 1.23 => 1.33 ~ 1.37\n", " 0.40 => 1.33 ~ 1.37\n", " 0.32 => 1.33 ~ 1.38\n", " -0.11 => 1.33 ~ 1.39\n", " -0.75 => 1.33 ~ 1.38\n", " 0.25 => 1.33 ~ 1.39\n", "Test sequence 4\n", "Input => Target ~ Predicted\n", " -0.10 => -0.10 ~ -0.11\n", " 0.17 => 0.17 ~ 0.17\n", " 0.01 => 0.17 ~ 0.19\n", " 0.66 => 0.66 ~ 0.64\n", " 0.79 => 0.79 ~ 0.80\n", " -0.81 => 0.79 ~ 0.81\n", " -0.85 => 0.79 ~ 0.82\n", " -0.19 => 0.79 ~ 0.83\n", " 0.05 => 0.79 ~ 0.83\n", " 0.55 => 0.79 ~ 0.82\n" ] } ], "source": [ "std::cout << std::fixed << std::setprecision(2); // fixed point printing\n", "for(TI sequence_i = 0; sequence_i < 5; sequence_i++){\n", " std::cout << \"Test sequence \" << sequence_i << std::endl;\n", " std::cout << \"Input => Target ~ Predicted\" << std::endl;\n", " for(TI step_i = 0; step_i < SEQUENCE_LENGTH; step_i++){\n", " std::cout << \" \" << std::setw(5) << rlt::get(device, testset_X_permuted, step_i, sequence_i, 0);\n", " std::cout << \" => \" << std::setw(5) << rlt::get(device, testset_y_permuted, step_i, sequence_i, 0);\n", " std::cout << \" ~ \" << std::setw(5) << rlt::get(device, testset_output_permuted, step_i, sequence_i, 0);\n", " std::cout << std::endl;\n", " }\n", "}" ] } ], "metadata": { "kernelspec": { "display_name": "C++17", "language": "C++17", "name": "xcpp17" }, "language_info": { "codemirror_mode": "text/x-c++src", "file_extension": ".cpp", "mimetype": "text/x-c++src", "name": "c++", "version": "17" } }, "nbformat": 4, "nbformat_minor": 5 }