Realtime applications are everywhere nowaday, it plays such a crucial job where we want to interact with people directly: chat app, online meeting, and gaming.

Building a reliable real-time application is hard, but scaling it is even harder. A real-time application has to maintain persistent connections with the client, so scaling is different from scaling a normal web app. In this post, we will explore those challenges, solve them, and make a demo using Docker.

How WebSocket application normally work

Typically, there will be a lot of client connect to the server through websocket connection. They will interact with server in both ways, client send event to the server, and the server can send something to the client.

This server is a stateful server, because it has to keep all connection information in its memory, if it’s down, all connections will be lost.

The problem when scaling.

A WebSocket server can become overloaded faster than a traditional HTTP server because it must maintain all connection information. Both vertical and horizontal scaling can be applied to increase performance. While vertical scaling involves upgrading CPU and adding more RAM, I’ll focus on horizontal scaling.

So, horizontal scaling, just add more servers with a load balancer, and it’s done right?

This is true for HTTP server, but for WebSocket-based applications, we have some problems:

Connection consistency: If client A establishes a WebSocket connection to Server 1, all subsequent messages from that client must be handled by the same server. If the load balancer routes its request to Server 2, the connection state is lost, causing errors.
Cross-client communication: if client A and client B connect to different servers, how do these 2 clients exchange messages in real-time. There is no way because each server keeps its own connections and state, isolated from the rest of the clusters.

Design to scale

So, let’s solve the above problems:

Connection consistency

we can solve this problem quite easily by using proper balancing algorithms

What we need in here is each client must stick to their server on one session, or even better, all subsequent sessions. IP hash is a great way to solve it, since it will choose the server base on the hasing result of client IP address. So if the client IP remain consistency on the whole session (which it should), it will always connect to the same server.

Cross-client communication

So we need a way to broadcast messages from one client to the rest.

But the real problem is, that we need a way to broadcast messages from one server to the rest of the clusters.

The simplest way to solve this is to use a message queue.

So now, if client A wants to send a message to client B, it will send it to its connected server — Server 1.

Server 1 will broadcast to the message queue and deliver to Server 2.

If Server 2 is connected to B, it will send that message to client B.

The finalized design will be

The message queue service could be anything, rabbitMQ, kafka, or redis. They all support clustering for scalability.

Demo with a chat app

My demo infrastructures will contain 3 websocket apps, 1 Redis instance and 1 nginx server.

docker-compose.yml file

name: socket-io-app
services:
  app-1:
    build:
      context: .
      dockerfile: Dockerfile
    container_name: socket-io-app-1
    environment:
      - REDIS_URL=redis://redis-1:6379
    ports:
      - "3000:80"
  app-2:
    build:
      context: .
      dockerfile: Dockerfile
    container_name: socket-io-app-2
    environment:
      - REDIS_URL=redis://redis-1:6379
    ports:
      - "3001:80"
  app-3:
    build:
      context: .
      dockerfile: Dockerfile
    container_name: socket-io-app-3
    environment:
      - REDIS_URL=redis://redis-1:6379
    ports:
      - "3002:80"
  redis-1:
    image: redis:7.4
    container_name: redis-1
  nginx:
    image: nginx:latest
    container_name: nginx
    ports:
      - "80:80"
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf
    depends_on:
      - app-1
      - app-2
      - app-3

The exposed port is for testing purposes, since when sending the request locally, they will be routed to the same server.

Setup socket.io with Redis is so easy with the built-in adapter. This adapter will help broadcasting events to all or a subset of clients.

import { Redis } from "ioredis";
import { Server } from "socket.io";
import { createAdapter } from "@socket.io/redis-adapter";

const pubClient = new Redis(process.env.REDIS_URL!);
const subClient = pubClient.duplicate();
const io = new Server({
  adapter: createAdapter(pubClient, subClient),
});

io.listen(80);

For example, this piece of code, let say client 1 connect to server 1 and join room 1. Then client 2 connect to server 2 and join room 1. When client 1 send message to the room 1, both client will receive it.

If there is client 3 connected to server 3, and then it tries to send a message to all, all 3 clients will receive it.

socket.on("joinRoom", (data) => {
  socket.join(data.room);
  console.log(`Socket ${socket.id} joined room ${data.room}`);
});
socket.on("message", (data) => {
  console.log(
    `User ${data.userId} sent message to room ${data.room}: ${data.message}`
  );
  io.to(data.room).emit("message", {
    userId: data.userId,
    message: data.message,
  });
});
// send to all client across clusters
socket.on("all", (data) => {
  console.log(`Socket ${socket.id} sent message to all: ${data.message}`);
  io.emit("all", {
    message: data.message,
  });
});

This is my test on postman to 3 server from 3 client, with according timeline. Seems work as expected huh?

Hope you find valuable things in my post. Cheers!