Estimating Parameters of the Fractional Ornstein-Uhlenbeck (fOU) Process using LSTM in Rust

In the world of quantitative finance and stochastic processes, accurately estimating parameters is crucial for developing robust models. One such process, the fractional Ornstein-Uhlenbeck (fOU) process has garnered significant attention due to its applications in various fields, including finance and physics. This article delves into the parameter estimation of the fOU process using Long Short-Term Memory (LSTM) models implemented in Rust, leveraging the stochastic-rs and candle packages.

Rust, known for its performance and safety, is an excellent choice for implementing computational models. In this tutorial, we will guide you through setting up an LSTM model in Rust to estimate the parameters of an fOU process. We will utilize the stochastic-rs package to generate the stochastic processes and the candle package to build and train the neural network model.

The stochastic-rs library is a high-performance library implemented in pure Rust, designed to generate large amounts of data for tasks such as simulating stochastic processes. It is optimized for speed and efficiency, making it very handy for handling computationally intensive simulations. Additionally, stochastic-rs offers a variety of other implementations, catering to a wide range of stochastic modeling needs.

By the end of this article, you will have a solid understanding of how to approach parameter estimation for complex stochastic processes using state-of-the-art machine-learning techniques in Rust. Follow along as we explore the intersection of stochastic calculus, machine learning, and high-performance computing.

The full implementation can be found on GitHub: https://github.com/dancixx/rust-ai/tree/main/fou-lstm.

Let's dive in!

The Fractional Ornstein-Uhlenbeck Process

The fractional Ornstein-Uhlenbeck (fOU) process is a special type of stationary stochastic process described by the following differential equation:

Data Generation for the Neural Network

Here’s the first code snippet for generating paths for a fractional Ornstein-Uhlenbeck process and preparing them for training a neural network:

use anyhow::Result;
use candle_core::{Device, Tensor};
use candle_datasets::{batcher::IterResult2, Batcher};
use indicatif::{ProgressBar, ProgressStyle};
use ndarray::{Array1};
use ndarray_rand::RandomExt;
use rand_distr::Uniform;
use stochastic_rs::diffusions::ou::fou;
use std::vec::IntoIter;

/// Generates synthetic data using the fractional Ornstein-Uhlenbeck (fOU) process.
/// This data will be used to train the LSTM model for parameter estimation.
/// 
/// # Arguments
/// 
/// * `epoch_size` - Number of epochs to run.
/// * `batch_size` - Size of each batch.
/// * `n` - Number of data points per path.
/// * `device` - Device to run the computation on (CPU or GPU).
///
/// # Returns
/// A tuple containing:
/// * A batcher that yields the data in batches for training.
/// * A vector of Hurst parameters corresponding to the generated paths.
pub fn test_vasicek_1_d(
    epoch_size: usize,
    batch_size: usize,
    n: usize,
    device: &Device,
) -> Result<(
    Batcher<IterResult2<IntoIter<Result<(Tensor, Tensor), candle_core::Error>>>>,
    Vec<f64>,
)> {
    // Allocate memory for storing paths
    let mut paths = Vec::with_capacity(epoch_size);
    let mu = 2.8;  // Mean reversion level
    let sigma = 1.0;  // Volatility

    // Generate random thetas and Hurst parameters for each path
    let thetas = Array1::random(epoch_size, Uniform::new(0.0, 10.0)).to_vec();
    let hursts = Array1::random(epoch_size, Uniform::new(0.01, 0.99)).to_vec();
    
    // Create a progress bar to track data generation
    let progress_bar = ProgressBar::new(epoch_size as u64);
    progress_bar.set_style(
        ProgressStyle::with_template(
            "{spinner:.green} [{elapsed_precise}] [{wide_bar:.cyan/blue}] ({eta})",
        )?
        .progress_chars("#>-"),
    );
    
    // Generate paths for the fractional Ornstein-Uhlenbeck process
    for idx in 0..epoch_size {
        let hurst = hursts[idx];  // Get Hurst exponent for this path
        let theta = thetas[idx];  // Get theta for this path

        // Generate a path for the fOU process
        let mut path = Array1::from_vec(fou(hurst, mu, sigma, theta, n, Some(0.0), Some(16.0)));
        
        // Standardize the path by subtracting the mean and dividing by the standard deviation
        let mean = path.mean().unwrap();
        let std = path.std(0.0);
        path = (path - mean) / std;

        // Store the generated path and corresponding theta value
        paths.push(Ok((
            Tensor::from_iter(path, device)?,
            Tensor::new(&[thetas[idx]], device)?,
        )));

        progress_bar.inc(1);  // Update the progress bar
    }
    
    progress_bar.finish();  // Finish the progress bar once done

    // Create a batcher that will provide data in batches during training
    let batcher = Batcher::new_r2(paths.into_iter())
        .batch_size(batch_size)
        .return_last_incomplete_batch(false);

    Ok((batcher, hursts))
}

Building the LSTM Model

To build our model, we use the candle package. This package provides powerful tools for constructing neural networks in Rust. Below, we describe the model building process using candle:

use std::{fs::File, time::Instant};
use candle_core::{DType, Device, Module, Result, Tensor};
use candle_nn::{
    layer_norm, linear, loss::mse, lstm, prelu, seq, AdamW, Dropout, LSTMConfig, LayerNorm,
    LayerNormConfig, Linear, Optimizer, PReLU, ParamsAdamW, Sequential, VarBuilder, VarMap, LSTM,
    RNN,
};
use polars::prelude::*;

/// Struct representing the LSTM model for parameter estimation of the fOU process.
pub struct Model {
    is_train: bool,        // Whether the model is in training mode
    use_dropout: bool,     // Whether to use dropout layers
    linear1: Linear,       // First linear layer
    linear2: Linear,       // Second linear layer
    dropout: Dropout,      // Dropout layer
    prelu: PReLU,          // Parametric ReLU activation function
    lstm: Vec<LSTM>,       // LSTM layers
    layer_norm: LayerNorm, // Layer normalization
    mlp: Sequential,       // Multi-layer perceptron (MLP) for output
}

impl Model {
    /// Initializes a new instance of the model with the given parameters.
    /// 
    /// # Arguments
    /// 
    /// * `vs` - Variable builder to manage the model's parameters.
    /// * `lstm_features` - Number of input features for the LSTM.
    /// * `hidden_dim` - Number of hidden units in the LSTM.
    /// * `out_dim` - Output dimension (size of the prediction).
    /// * `num_lstm_layers` - Number of LSTM layers.
    /// * `use_dropout` - Whether to apply dropout.
    /// * `droput_rate` - Dropout rate (if dropout is enabled).
    /// 
    /// # Returns
    /// A new instance of `Model`.
    pub fn new(
        vs: VarBuilder,
        lstm_features: usize,
        hidden_dim: usize,
        out_dim: usize,
        num_lstm_layers: Option<usize>,
        use_dropout: Option<bool>,
        droput_rate: Option<f32>,
    ) -> Result<Self> {
        // Initialize linear layers and activation function
        let linear1 = linear(lstm_features, hidden_dim, vs.pp("linear-1"))?;
        let linear2 = linear(hidden_dim, hidden_dim, vs.pp("linear-2"))?;
        let dropout = Dropout::new(droput_rate.unwrap_or(0.25));  // Default dropout rate: 25%
        let prelu = prelu(None, vs.pp("prelu"))?;

        // Initialize LSTM layers
        let mut lstm_layers = Vec::with_capacity(num_lstm_layers.unwrap_or(2));
        for i in 0..num_lstm_layers.unwrap_or(2) {
            lstm_layers.push(lstm(
                hidden_dim,
                hidden_dim,
                LSTMConfig {
                    layer_idx: i,
                    ..Default::default()
                },
                vs.pp(&format!("lstm-{}", i)),
            )?);
        }

        // Initialize layer normalization and MLP layers
        let layer_n = layer_norm(hidden_dim, LayerNormConfig::default(), vs.pp("layer-norm"))?;
        let mlp = seq()
            .add(linear(hidden_dim, hidden_dim, vs.pp("mpl-linear-1"))?)
            .add_fn(|x| x.relu())
            .add(linear(hidden_dim, hidden_dim / 2, vs.pp("mpl-linear-2"))?)
            .add_fn(|x| x.relu())
            .add(linear(hidden_dim / 2, out_dim, vs.pp("mpl-linear-3"))

?);

        // Return the model instance
        Ok(Self {
            is_train: true,
            use_dropout: use_dropout.unwrap_or(true),
            linear1,
            linear2,
            dropout,
            prelu,
            lstm: lstm_layers,
            layer_norm: layer_n,
            mlp,
        })
    }

    /// Forward pass through the model.
    /// 
    /// # Arguments
    /// 
    /// * `x` - Input tensor.
    /// 
    /// # Returns
    /// The model's prediction.
    pub fn forward(&self, x: &Tensor) -> Result<Tensor> {
        let mut x = x.clone().unsqueeze(1)?;  // Unsqueeze to match LSTM input shape
        x = self.prelu.forward(&x)?;          // Apply PReLU activation
        x = self.linear1.forward(&x)?;        // Apply first linear layer
        x = self.prelu.forward(&x)?;          // Apply PReLU activation again
        x = self.linear2.forward(&x)?;        // Apply second linear layer
        x = self.prelu.forward(&x)?;          // Apply PReLU activation again

        // Apply dropout if in training mode
        if self.use_dropout {
            x = self.dropout.forward(&x, self.is_train)?;
        }

        // Forward pass through each LSTM layer
        for (idx, lstm) in self.lstm.iter().enumerate() {
            if idx > 0 {
                x = x.unsqueeze(1)?;  // Adjust dimensions for subsequent layers
            }
            let states = lstm.seq(&x)?;  // Get the LSTM hidden states
            x = lstm.states_to_tensor(&states)?;  // Convert hidden states back to a tensor
        }

        // Apply layer normalization
        x = self.layer_norm.forward(&x)?;

        // Apply dropout if in training mode
        if self.use_dropout {
            x = self.dropout.forward(&x, self.is_train)?;
        }

        // Forward pass through the MLP layers
        let out = self.mlp.forward(&x)?;

        Ok(out)  // Return the output tensor
    }

    /// Switch the model to evaluation mode (disable dropout).
    pub fn eval(&mut self) {
        self.is_train = false;
    }
}

Testing the Model

Now that we have built our model, it’s time to evaluate its performance. Testing the model involves running it on a separate dataset to see how well it generalizes to new data. We’ll use the candle-rs package to facilitate this process. Here's how we approach the testing phase:

pub fn test() -> anyhow::Result<()> {
    // Set up the device (GPU if available, otherwise fall back to CPU)
    let device = Device::cuda_if_available(0).unwrap_or(Device::Cpu);

    // Initialize the variable map and builder to manage the model's parameters
    let varmap = VarMap::new();
    let vs = VarBuilder::from_varmap(&varmap, DType::F64, &device);

    // Set the hyperparameters for the model training and testing
    let epochs = 50_usize;
    let epoch_size = 12_800_usize;
    let lstm_features = 1_600_usize;
    let hidden_dim = 64_usize;
    let out_dim = 1_usize;
    let batch_size = 64;

    // Initialize the LSTM model with the specified architecture
    let mut net = Model::new(
        vs,
        lstm_features,
        hidden_dim,
        out_dim,
        Some(3),        // Use 3 LSTM layers
        Some(false),    // Disable dropout during evaluation
        Some(0.25),     // Dropout rate of 25% (ignored in evaluation mode)
    )
    .unwrap();

    // Set the optimizer parameters (AdamW)
    let adamw_params = ParamsAdamW {
        lr: 1e-3,       // Learning rate
        beta1: 0.9,     // Beta1 for AdamW optimizer
        beta2: 0.999,   // Beta2 for AdamW optimizer
        eps: 1e-8,      // Epsilon for numerical stability
        weight_decay: 0.01,  // Weight decay for regularization
    };
    let mut opt = AdamW::new(varmap.all_vars(), adamw_params)?;

    let n: usize = 1600_usize;  // Number of data points per path
    let start = Instant::now();  // Start timing the training process

    // Main training loop for the LSTM model
    for epoch in 0..epochs {
        let (batcher, _) = test_vasicek_1_d(epoch_size, batch_size, n, &device)?;  // Generate data for the current epoch

        // Inner loop to process each batch of data
        'inner: for (batch_idx, batch) in batcher.enumerate() {
            match batch {
                // If the batch was successfully fetched
                Ok((x, target)) => {
                    let inp = net.forward(&x)?;  // Perform a forward pass through the model
                    let loss = mse(&inp, &target)?;  // Calculate the mean squared error (MSE) loss
                    opt.backward_step(&loss)?;  // Perform a backward pass and update model parameters
                    
                    // Print the current loss for this batch
                    println!(
                        "Epoch: {}, Batch: {}, Loss: {:?}",
                        epoch + 1,
                        batch_idx + 1,
                        loss.to_scalar::<f64>()?
                    );
                }
                // If an error occurred, exit the inner loop
                Err(_) => break 'inner,
            }
        }

        // Print the time taken to complete this epoch
        println!("Epoch {} took {:?}", epoch + 1, start.elapsed());
    }

    net.eval();  // Switch the model to evaluation mode (disable dropout)

    // Generate new data and test the trained model
    let (batcher, hursts) = test_vasicek_1_d(epoch_size, batch_size, n, &device)?;
    let mut theta = Vec::with_capacity(epoch_size);
    let mut est_theta = Vec::with_capacity(epoch_size);

    // Collect predictions and corresponding true values
    for batch in batcher {
        match batch {
            Ok((x, target)) => {
                let inp = net.forward(&x)?;  // Make predictions using the trained model
                let inp_vec = inp
                    .to_vec2::<f64>()?
                    .into_iter()
                    .flatten()
                    .collect::<Vec<_>>();  // Convert the predictions to a vector
                
                let target_vec = target
                    .to_vec2::<f64>()?
                    .into_iter()
                    .flatten()
                    .collect::<Vec<_>>();  // Convert the true values to a vector
                
                theta.push(target_vec);  // Store the true theta values
                est_theta.push(inp_vec);  // Store the predicted theta values
            }
            Err(_) => break,
        }
    }

    // Flatten the nested vectors into simple lists
    let theta = theta.into_iter().flatten().collect::<Vec<_>>();
    let est_theta = est_theta.into_iter().flatten().collect::<Vec<_>>();

    // Create a DataFrame to store the results (true and predicted values)
    let mut dataframe = df!(
        "alpha" => theta,
        "est_alpha" => est_theta,
        "hurst" => hursts
    )?;

    // Write the DataFrame to a CSV file for analysis
    let writer = File::create("vasicek_hurst=0.01..0.99_alpha=-0.5..10.0_init=0.0_slice=300.csv")?;
    let mut csv_writer = CsvWriter::new(writer);
    csv_writer.finish(&mut dataframe)?;

    Ok(())
}

This section covered the implementation details and problem setup. The actual measurement results and analysis will be presented in the next part. Stay tuned!