Switch to a Rust Backend Without Giving Up WebSocket and Subscription Benefits for Heavy SQL Queries
by Daniel Boros
Mar 27
8 min read
10 views
In my spare time, I’m building a fitness startup where the core product is a mobile application. This app provides science-based workouts and programs to help users progress and track their results. Of course, this post isn’t about why it’s clearly the best fitness app out there 😂, but rather about how and why we ended up building a custom Rust backend for certain services.
The Backstory
Everything started with Firebase/Firestore as our NoSQL database. It quickly became apparent that for a project like this, it might be a death sentence on the tech side relatively early on. Around that time, I was already eyeing Hasura (https://hasura.io), which basically gives you a GraphQL API with zero backend development required. And seriously: you write practically zero backend code to get an excellent cloud service that’s production-ready, scalable, and open source. We’ve run quite a few projects where Hasura was the backend layer, and it handled almost all tasks without issues.
Naturally, as your project grows and your database becomes more complex, performance deterministically drops. You can only solve this by reorganizing your database or adding some custom code and possibly writing your own SQL queries.
For over three years, we served our mobile app via Hasura, but the time came when the single WebSocket-driven feature in the app became too complex and started to affect the app startup. What exactly caused the problem? As more features migrated into the app, more computed fields were born. These are basically SQL functions tied to certain tables. On every query, these computed fields get called. There’s a fairly detailed explanation on Hasura’s GitHub (https://github.com/hasura/graphql-engine/blob/master/architecture/live-queries.md) about how this works. Long story short, they use a polling technique because SQL notifiers can eventually cause performance degradation at the database level.
There are many ways to optimize this, but perhaps the simplest is to switch from WebSockets to plain queries. In our case, though, that would have required such a huge refactor and so many code modifications that we chose a different approach: creating a Rust backend using async-graphql
that still serves the connection but implements field-level caching under the hood.
Before diving into the details, let’s take a look at the stack we’re using.
Our Rust Stack
Naturally, the language is Rust. We’ve used async-graphql
several times and find it to be a great open source project, so our choice here wasn’t difficult. For SQL handling, we’re using the BB8 async pool and tokio-postgres
as our native SQL driver. We could have chosen SQLX or Diesel.rs, but we wanted maximum performance with minimal unsafe Rust code.
Something else that might pique interest is that we’re using Dragonfly DB instead of Redis for caching. Its API is 100% identical to Redis, so if you’re familiar with Redis, you just have to host a different Docker image.
Performance Tweaks
- Everything is, of course, async and multithreaded.
- Instead of the general-purpose
serde
/serde_json
crates, we usesimd_json
for JSON operations, which utilizes SIMD instructions. Since the project runs on Google Cloud, the architectural requirements are met, and we can enjoy these performance benefits. - We also use Tikv-Jemalloc as the global memory allocator, partially because it’s recommended for use with SIMD operations and also because several benchmarks showed immediate performance gains.
- The last noteworthy performance-related setting is the compiler settings in the release profile.
The Open-Source Starter
We strongly believe in building open source communities, and Rust is where it is today thanks to many people who share similar philosophies. We’ve open-sourced the template used by this project (somewhat stripped down from our actual backend): https://github.com/rust-dd/rust-axum-async-graphql-postgres-redis-starter. We won’t dwell too long on this part; instead, we’ll focus on the subscription piece.
The Subscription Part
First, we have a User
struct that contains many fields. Importantly, it has 26 computed fields, which must be run on every poll:
#[dynamic_graphql]
#[skip_serializing_none]
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq)]
#[serde(rename_all = "snake_case")]
pub struct User {
pub id: String,
pub email: Option<String>,
pub gender: Option<String>,
pub name: Option<String>,
pub created_at: Option<i64>,
// ...
}
Here, the dynamic_graphql
macro is necessary to dynamically handle the selection set in the subscription, because we don’t want to query all data from the database. Instead, we build the SQL query based on the incoming GraphQL query’s selection set.
let computed_fields_all = User::computed_fields();
let selected_computed_fields = extract_computed_selection_fields(context.look_ahead(), &computed_fields_all);
let fields = join_selection_fields(
context.look_ahead(),
"select_user_by_id",
&[User::non_table_fields(), selected_computed_fields.clone()].concat(),
)
.unwrap();
let computed_field_fns_all = User::computed_field_fns();
let mut selected_fields = Vec::new();
let mut selected_fns = Vec::new();
for field in &selected_computed_fields {
if let Some((i, _)) = computed_fields_all.iter().enumerate().find(|(_, f)| *f == field) {
selected_fields.push(computed_fields_all[i].clone());
selected_fns.push(computed_field_fns_all[i].clone());
}
}
let user_query = format!("SELECT {} FROM noexapp.users WHERE id = $1", fields);
let computed_fields_query = format!(
"SELECT {} FROM noexapp.users u WHERE u.id = $1",
selected_fns
.iter()
.map(|f| format!("{}(u)", f))
.collect::<Vec<_>>()
.join(", ")
);
In this snippet, we create a query for the user
table based on the selection set. Because these computed fields aren’t physically part of the table (they’re effectively “virtual columns”), they must be handled separately at the database level with a separate query (computed_fields_query
).
Subscription Resolver Logic
The basic idea behind the subscription resolver is to do field-level caching and run queries on a specific “tick.” The resolver returns a
Pin<Box<dyn Stream<Item = FieldResult<User>> + Send + '_>>
which is required by async-graphql
for a streaming subscription. This makes sense logically.
To avoid sending unnecessary data each time, we use a tokio::mpsc
channel to handle the data flow, which implements the Stream
trait and therefore meets the requirement.
If the previously cached data is the same as the new data, we skip sending it, so the client doesn’t receive pointless notifications from the server that it would then need to process.
if let Ok(Some(mut cache_val)) = cache.get::<_, Option<String>>(&cache_key).await {
if let Ok(user) = unsafe { simd_json::serde::from_str::<User>(&mut cache_val) } {
if Some(&user) != last_user.as_ref() {
if tx.send(Ok(user.clone())).await.is_err() {
break;
}
last_user = Some(user);
}
}
}
Since we’re not returning at this point but using a channel, the code continues and lets us handle additional logic:
User::update_computed_fields::<i32>(&db, &cache_pool, id.clone(), cf_stmt.clone(), &selected_fields)
.await
This ensures that the computed fields are always updated in the cache so that on the next iteration, the data is already there for comparison. Let’s look at what update_computed_fields
does:
pub async fn update_computed_fields<T>(
db: &PgPoolConnection<'_>,
cache: &DfPool,
user_id: String,
stmt: Arc<Statement>,
selected_fields: &[String],
) -> Result<HashMap<String, T>>
where
T: Debug
+ Serialize
+ DeserializeOwned
+ FromSqlOwned
+ FromRedisValue
+ ToRedisArgs
+ Clone
+ Send
+ Sync
+ 'static,
{
let mut map = HashMap::new();
if let Ok(row) = db.query_one(&*stmt, &[&user_id]).await {
let mut cache = cache.get().await?;
for (i, field_name) in selected_fields.iter().enumerate() {
let cache_key = format!("user:{}:{}", user_id, field_name);
let value = row.get::<_, T>(i);
map.insert(field_name.clone(), value.clone());
let _ = cache
.set_options::<_, T, ()>(
&cache_key,
value,
SetOptions::default().with_expiration(SetExpiry::EX(DEFAULT_CACHE_EXPIRATION)),
)
.await;
}
}
Ok(map)
}
Here, we cache the new values pulled from the database and return them in a HashMap
so the caller can build the final output for the next state of the stream. We made the method fully generic so we could handle practically any field type. Typically, the result of a computed field is a primitive data structure—int, float, string. For more complex fields, a VIEW or MATERIALIZED VIEW often becomes more flexible and a better choice.
When the result comes back, we assemble the final next state. If it matches the previous state, we again skip pushing it into the channel, and the client doesn’t get a redundant notification.
if let Ok(row) = db.query_one(&*u_stmt, &[&id]).await {
let pg_user = User::from(&row);
let mut user = simd_json::serde::to_string(&pg_user).unwrap();
let mut user = unsafe { simd_json::serde::from_str::<Value>(&mut user).unwrap() };
if let Some(user) = user.as_object_mut() {
for (key, value) in computed_fields.unwrap().iter() {
user.insert(key.clone().into(), json!(value).into());
}
}
let mut merged = simd_json::serde::to_string(&user).unwrap();
let user = unsafe { simd_json::serde::from_str::<User>(&mut merged).unwrap() };
let _ = cache
.set_options::<&str, String, ()>(
&cache_key,
simd_json::serde::to_string(&user).unwrap(),
SetOptions::default().with_expiration(SetExpiry::EX(DEFAULT_CACHE_EXPIRATION)),
)
.await;
if Some(&user) != last_user.as_ref() {
if tx.send(Ok(user.clone())).await.is_err() {
break;
}
last_user = Some(user);
}
}
A Note on simd_json
It’s not always worth using simd_json
, especially if you deal with very small data structures. The overhead of launching SIMD operations can cause performance loss in such cases. Where justified, we further improved performance by moving the cache write to a separate tokio::spawn
thread so it doesn’t affect the server-client communication.
Was It Worth It?
One might wonder if all this work was worth it for a single subscription. For us, the answer is clearly yes. The app’s startup time was heavily influenced by establishing the WebSocket connection and retrieving the first data. With Hasura, in some cases, this took between 2-4 seconds, but with our new approach, we’re again pushing theoretical limits. We run entirely within the same cloud infrastructure over internal networks, data mostly comes from the cache, and updates happen in the background.
Initial tests show that our first response arrives between 200-400 ms. Partly this is because our servers are deployed in a distant region, which we might optimize further by choosing a different region.
Before and After
To wrap things up beyond raw text, numbers, and code, here are two videos showing the before and after states:
Final Thoughts – Worth Every Bit
So yeah… Rust can sometimes hurt.
The compiler nags you, the lifetimes confuse you, and “why is this not Send
?” becomes a daily mantra.
But at the end of the day – it's so, so worth it.
Our users immediately felt the performance improvements.
The app feels snappier, the subscriptions are faster, and our servers?
Yeah – we're spending less.
This short Rust journey brought real, measurable benefits.
So to all fellow Rustaceans out there:
Chin up. Keep shipping. Happy coding. 🦀❤️