Dec 13

From the low-level TCP upgrade handshake to Kafka-backed fan-out and distributed session directories

4 Comments

Lets say we want to send message for user b. The backend reads from redis that user b is connected to gateway node 1. But it uses hash of user b as partition key to dispatch the message . How is the information from redis store used here ? Doesn’t

It seem redundant if Kafka is taking care of all routing

Expand full comment

Reply (1)

Sahil Sarwar

I think this is a really good observation, and I had a similar doubt while going through this design. But let's see what the distinction is between Kafka routing and the redis routing.

Let's assume we remove Redis and do all the routing from just hash(UserB) = 42

Kafka sends the message to partition 42, and GW1 consumes the message.

Now UserB disconnects from GW1, and reconnects to GW3.

Kafka still sends the message to partition 42 (since hash(UserB) doesn't change), and it will still be consumed by GW1, but GW1 doesn't have the websocket connection to UserB, it lies in GW3.

So, we are using Redis as the source of truth for where the actual websocket connections are for a particular user.

Now, since the userB is connected to GW3, both GW1 and GW3 will consume from partition 42 (until the TTL comes and removes the entry from GW1), and since GW1 won't have the websocket connection to UserB, it will just drop the message.

For GW3, it will find the websocket connection for UserB, and will be able to relay is successfully.

I have tried to explain the same in this section - https://sahilserver.substack.com/i/180933369/what-happens-after-reconnect

Expand full comment

Reply (1)

Dheemanth Bykere Mallikarjun

Since this requires a flexible routing , my 2 cents is to use rabbitmq instead of kafka. We can read from redis store the gateway responsible for user, once we get the gateway id we can use it as routing key to enqueue message to rabbitmq. The respective message will be pushed to respective gateways using a coalesced connection and batching

Expand full comment

Reply (1)

Sahil Sarwar

I agree, we can use rabbitmq.

Also, I think we can do a similar thing with kafka. Instead of using user_id for partition key, we use gateway_id, that way we can avoid the dropping of message from disconnects.

Expand full comment

Scaling WebSockets: The Complete Guide for…