Estimating Parameters of the Fractional Ornstein-Uhlenbeck (fOU) Process using LSTM in Rust
by Daniel Boros
Aug 15
9 min read
93 views
In the world of quantitative finance and stochastic processes, accurately estimating parameters is crucial for developing robust models. One such process, the fractional Ornstein-Uhlenbeck (fOU) process has garnered significant attention due to its applications in various fields, including finance and physics. This article delves into the parameter estimation of the fOU process using Long Short-Term Memory (LSTM) models implemented in Rust, leveraging the stochastic-rs
and candle
packages.
Rust, known for its performance and safety, is an excellent choice for implementing computational models. In this tutorial, we will guide you through setting up an LSTM model in Rust to estimate the parameters of an fOU process. We will utilize the stochastic-rs
package to generate the stochastic processes and the candle
package to build and train the neural network model.
The stochastic-rs
library is a high-performance library implemented in pure Rust, designed to generate large amounts of data for tasks such as simulating stochastic processes. It is optimized for speed and efficiency, making it very handy for handling computationally intensive simulations. Additionally, stochastic-rs
offers a variety of other implementations, catering to a wide range of stochastic modeling needs.
By the end of this article, you will have a solid understanding of how to approach parameter estimation for complex stochastic processes using state-of-the-art machine-learning techniques in Rust. Follow along as we explore the intersection of stochastic calculus, machine learning, and high-performance computing.
The full implementation can be found on GitHub: https://github.com/dancixx/rust-ai/tree/main/fou-lstm.
Let's dive in!
The Fractional Ornstein-Uhlenbeck Process
The fractional Ornstein-Uhlenbeck (fOU) process is a special type of stationary stochastic process described by the following differential equation:
Data Generation for the Neural Network
Here’s the first code snippet for generating paths for a fractional Ornstein-Uhlenbeck process and preparing them for training a neural network:
use anyhow::Result;
use candle_core::{Device, Tensor};
use candle_datasets::{batcher::IterResult2, Batcher};
use indicatif::{ProgressBar, ProgressStyle};
use ndarray::{Array1};
use ndarray_rand::RandomExt;
use rand_distr::Uniform;
use stochastic_rs::diffusions::ou::fou;
use std::vec::IntoIter;
/// Generates synthetic data using the fractional Ornstein-Uhlenbeck (fOU) process.
/// This data will be used to train the LSTM model for parameter estimation.
///
/// # Arguments
///
/// * `epoch_size` - Number of epochs to run.
/// * `batch_size` - Size of each batch.
/// * `n` - Number of data points per path.
/// * `device` - Device to run the computation on (CPU or GPU).
///
/// # Returns
/// A tuple containing:
/// * A batcher that yields the data in batches for training.
/// * A vector of Hurst parameters corresponding to the generated paths.
pub fn test_vasicek_1_d(
epoch_size: usize,
batch_size: usize,
n: usize,
device: &Device,
) -> Result<(
Batcher<IterResult2<IntoIter<Result<(Tensor, Tensor), candle_core::Error>>>>,
Vec<f64>,
)> {
// Allocate memory for storing paths
let mut paths = Vec::with_capacity(epoch_size);
let mu = 2.8; // Mean reversion level
let sigma = 1.0; // Volatility
// Generate random thetas and Hurst parameters for each path
let thetas = Array1::random(epoch_size, Uniform::new(0.0, 10.0)).to_vec();
let hursts = Array1::random(epoch_size, Uniform::new(0.01, 0.99)).to_vec();
// Create a progress bar to track data generation
let progress_bar = ProgressBar::new(epoch_size as u64);
progress_bar.set_style(
ProgressStyle::with_template(
"{spinner:.green} [{elapsed_precise}] [{wide_bar:.cyan/blue}] ({eta})",
)?
.progress_chars("#>-"),
);
// Generate paths for the fractional Ornstein-Uhlenbeck process
for idx in 0..epoch_size {
let hurst = hursts[idx]; // Get Hurst exponent for this path
let theta = thetas[idx]; // Get theta for this path
// Generate a path for the fOU process
let mut path = Array1::from_vec(fou(hurst, mu, sigma, theta, n, Some(0.0), Some(16.0)));
// Standardize the path by subtracting the mean and dividing by the standard deviation
let mean = path.mean().unwrap();
let std = path.std(0.0);
path = (path - mean) / std;
// Store the generated path and corresponding theta value
paths.push(Ok((
Tensor::from_iter(path, device)?,
Tensor::new(&[thetas[idx]], device)?,
)));
progress_bar.inc(1); // Update the progress bar
}
progress_bar.finish(); // Finish the progress bar once done
// Create a batcher that will provide data in batches during training
let batcher = Batcher::new_r2(paths.into_iter())
.batch_size(batch_size)
.return_last_incomplete_batch(false);
Ok((batcher, hursts))
}
Building the LSTM Model
To build our model, we use the candle
package. This package provides powerful tools for constructing neural networks in Rust. Below, we describe the model building process using candle
:
use std::{fs::File, time::Instant};
use candle_core::{DType, Device, Module, Result, Tensor};
use candle_nn::{
layer_norm, linear, loss::mse, lstm, prelu, seq, AdamW, Dropout, LSTMConfig, LayerNorm,
LayerNormConfig, Linear, Optimizer, PReLU, ParamsAdamW, Sequential, VarBuilder, VarMap, LSTM,
RNN,
};
use polars::prelude::*;
/// Struct representing the LSTM model for parameter estimation of the fOU process.
pub struct Model {
is_train: bool, // Whether the model is in training mode
use_dropout: bool, // Whether to use dropout layers
linear1: Linear, // First linear layer
linear2: Linear, // Second linear layer
dropout: Dropout, // Dropout layer
prelu: PReLU, // Parametric ReLU activation function
lstm: Vec<LSTM>, // LSTM layers
layer_norm: LayerNorm, // Layer normalization
mlp: Sequential, // Multi-layer perceptron (MLP) for output
}
impl Model {
/// Initializes a new instance of the model with the given parameters.
///
/// # Arguments
///
/// * `vs` - Variable builder to manage the model's parameters.
/// * `lstm_features` - Number of input features for the LSTM.
/// * `hidden_dim` - Number of hidden units in the LSTM.
/// * `out_dim` - Output dimension (size of the prediction).
/// * `num_lstm_layers` - Number of LSTM layers.
/// * `use_dropout` - Whether to apply dropout.
/// * `droput_rate` - Dropout rate (if dropout is enabled).
///
/// # Returns
/// A new instance of `Model`.
pub fn new(
vs: VarBuilder,
lstm_features: usize,
hidden_dim: usize,
out_dim: usize,
num_lstm_layers: Option<usize>,
use_dropout: Option<bool>,
droput_rate: Option<f32>,
) -> Result<Self> {
// Initialize linear layers and activation function
let linear1 = linear(lstm_features, hidden_dim, vs.pp("linear-1"))?;
let linear2 = linear(hidden_dim, hidden_dim, vs.pp("linear-2"))?;
let dropout = Dropout::new(droput_rate.unwrap_or(0.25)); // Default dropout rate: 25%
let prelu = prelu(None, vs.pp("prelu"))?;
// Initialize LSTM layers
let mut lstm_layers = Vec::with_capacity(num_lstm_layers.unwrap_or(2));
for i in 0..num_lstm_layers.unwrap_or(2) {
lstm_layers.push(lstm(
hidden_dim,
hidden_dim,
LSTMConfig {
layer_idx: i,
..Default::default()
},
vs.pp(&format!("lstm-{}", i)),
)?);
}
// Initialize layer normalization and MLP layers
let layer_n = layer_norm(hidden_dim, LayerNormConfig::default(), vs.pp("layer-norm"))?;
let mlp = seq()
.add(linear(hidden_dim, hidden_dim, vs.pp("mpl-linear-1"))?)
.add_fn(|x| x.relu())
.add(linear(hidden_dim, hidden_dim / 2, vs.pp("mpl-linear-2"))?)
.add_fn(|x| x.relu())
.add(linear(hidden_dim / 2, out_dim, vs.pp("mpl-linear-3"))
?);
// Return the model instance
Ok(Self {
is_train: true,
use_dropout: use_dropout.unwrap_or(true),
linear1,
linear2,
dropout,
prelu,
lstm: lstm_layers,
layer_norm: layer_n,
mlp,
})
}
/// Forward pass through the model.
///
/// # Arguments
///
/// * `x` - Input tensor.
///
/// # Returns
/// The model's prediction.
pub fn forward(&self, x: &Tensor) -> Result<Tensor> {
let mut x = x.clone().unsqueeze(1)?; // Unsqueeze to match LSTM input shape
x = self.prelu.forward(&x)?; // Apply PReLU activation
x = self.linear1.forward(&x)?; // Apply first linear layer
x = self.prelu.forward(&x)?; // Apply PReLU activation again
x = self.linear2.forward(&x)?; // Apply second linear layer
x = self.prelu.forward(&x)?; // Apply PReLU activation again
// Apply dropout if in training mode
if self.use_dropout {
x = self.dropout.forward(&x, self.is_train)?;
}
// Forward pass through each LSTM layer
for (idx, lstm) in self.lstm.iter().enumerate() {
if idx > 0 {
x = x.unsqueeze(1)?; // Adjust dimensions for subsequent layers
}
let states = lstm.seq(&x)?; // Get the LSTM hidden states
x = lstm.states_to_tensor(&states)?; // Convert hidden states back to a tensor
}
// Apply layer normalization
x = self.layer_norm.forward(&x)?;
// Apply dropout if in training mode
if self.use_dropout {
x = self.dropout.forward(&x, self.is_train)?;
}
// Forward pass through the MLP layers
let out = self.mlp.forward(&x)?;
Ok(out) // Return the output tensor
}
/// Switch the model to evaluation mode (disable dropout).
pub fn eval(&mut self) {
self.is_train = false;
}
}
Testing the Model
Now that we have built our model, it’s time to evaluate its performance. Testing the model involves running it on a separate dataset to see how well it generalizes to new data. We’ll use the candle-rs
package to facilitate this process. Here's how we approach the testing phase:
pub fn test() -> anyhow::Result<()> {
// Set up the device (GPU if available, otherwise fall back to CPU)
let device = Device::cuda_if_available(0).unwrap_or(Device::Cpu);
// Initialize the variable map and builder to manage the model's parameters
let varmap = VarMap::new();
let vs = VarBuilder::from_varmap(&varmap, DType::F64, &device);
// Set the hyperparameters for the model training and testing
let epochs = 50_usize;
let epoch_size = 12_800_usize;
let lstm_features = 1_600_usize;
let hidden_dim = 64_usize;
let out_dim = 1_usize;
let batch_size = 64;
// Initialize the LSTM model with the specified architecture
let mut net = Model::new(
vs,
lstm_features,
hidden_dim,
out_dim,
Some(3), // Use 3 LSTM layers
Some(false), // Disable dropout during evaluation
Some(0.25), // Dropout rate of 25% (ignored in evaluation mode)
)
.unwrap();
// Set the optimizer parameters (AdamW)
let adamw_params = ParamsAdamW {
lr: 1e-3, // Learning rate
beta1: 0.9, // Beta1 for AdamW optimizer
beta2: 0.999, // Beta2 for AdamW optimizer
eps: 1e-8, // Epsilon for numerical stability
weight_decay: 0.01, // Weight decay for regularization
};
let mut opt = AdamW::new(varmap.all_vars(), adamw_params)?;
let n: usize = 1600_usize; // Number of data points per path
let start = Instant::now(); // Start timing the training process
// Main training loop for the LSTM model
for epoch in 0..epochs {
let (batcher, _) = test_vasicek_1_d(epoch_size, batch_size, n, &device)?; // Generate data for the current epoch
// Inner loop to process each batch of data
'inner: for (batch_idx, batch) in batcher.enumerate() {
match batch {
// If the batch was successfully fetched
Ok((x, target)) => {
let inp = net.forward(&x)?; // Perform a forward pass through the model
let loss = mse(&inp, &target)?; // Calculate the mean squared error (MSE) loss
opt.backward_step(&loss)?; // Perform a backward pass and update model parameters
// Print the current loss for this batch
println!(
"Epoch: {}, Batch: {}, Loss: {:?}",
epoch + 1,
batch_idx + 1,
loss.to_scalar::<f64>()?
);
}
// If an error occurred, exit the inner loop
Err(_) => break 'inner,
}
}
// Print the time taken to complete this epoch
println!("Epoch {} took {:?}", epoch + 1, start.elapsed());
}
net.eval(); // Switch the model to evaluation mode (disable dropout)
// Generate new data and test the trained model
let (batcher, hursts) = test_vasicek_1_d(epoch_size, batch_size, n, &device)?;
let mut theta = Vec::with_capacity(epoch_size);
let mut est_theta = Vec::with_capacity(epoch_size);
// Collect predictions and corresponding true values
for batch in batcher {
match batch {
Ok((x, target)) => {
let inp = net.forward(&x)?; // Make predictions using the trained model
let inp_vec = inp
.to_vec2::<f64>()?
.into_iter()
.flatten()
.collect::<Vec<_>>(); // Convert the predictions to a vector
let target_vec = target
.to_vec2::<f64>()?
.into_iter()
.flatten()
.collect::<Vec<_>>(); // Convert the true values to a vector
theta.push(target_vec); // Store the true theta values
est_theta.push(inp_vec); // Store the predicted theta values
}
Err(_) => break,
}
}
// Flatten the nested vectors into simple lists
let theta = theta.into_iter().flatten().collect::<Vec<_>>();
let est_theta = est_theta.into_iter().flatten().collect::<Vec<_>>();
// Create a DataFrame to store the results (true and predicted values)
let mut dataframe = df!(
"alpha" => theta,
"est_alpha" => est_theta,
"hurst" => hursts
)?;
// Write the DataFrame to a CSV file for analysis
let writer = File::create("vasicek_hurst=0.01..0.99_alpha=-0.5..10.0_init=0.0_slice=300.csv")?;
let mut csv_writer = CsvWriter::new(writer);
csv_writer.finish(&mut dataframe)?;
Ok(())
}
This section covered the implementation details and problem setup. The actual measurement results and analysis will be presented in the next part. Stay tuned!