title | description | published | date | tags |
---|---|---|---|---|
Technology |
Stuff about Discord's internal systems, with some undocumented situations as well |
true |
2020-01-22 09:18:13 UTC |
Technologies Discord uses to provide the service as we know it, gathered from Discord's engineering articles and some other stuff:
- Cassandra for storage(they used MongoDB in the first 2 months of Discord, source).
- Elixir for the
sessions
,presence
andguild
clusters(Gateway API, source). - Python for HTTP/REST API.
- Go for the embed servers and one element of their logging(see Punt)
- Loqui for node communication.
- Punt In favour of Logstash for logging.
- Elasticsearch that powers the search feature for users and powers logging.
- Sources about logging: Punt and this issue
- Source about message search
- There is an article in the works that describes Discord's logging in detail, Soon:tm:
- Google's Cloud Platform for their infrastructure, source
- Cloudflare as a proxy to their nodes
All the clusters were made to be fault tolerant(read: "not crash") and handle cases where a node of them goes out or when a cluster goes down and wait a bit before requesting from the cluster again.
- Description of a full outage where they had to reboot everything
sessions
andpresence
clusters get rebooted due to a host error in aguild
node- Repeating message sends due to errors in the
push
cluster - "furiously" spinning an nginx cluster due to an error in GCP's load balancer
sessions
cluster- Manages multiple sessions per node in the cluster(as Elixir allows it).
- Relationships:
- Requests from the
guild
cluster to gather guild data for yourREADY
event. - Requests from the
presence
cluster for the same reason.
- Requests from the
presence
cluster- Manages your "playing status", smaller than
sessions
. - Gets a high throughput since users join in and request presence data, and users go out and write presence data.
- Manages your "playing status", smaller than
guild
cluster- Manages real time state for guilds and guild data.
push
cluster- Manages push notifications to users.
There is some stuff regarding the resume logic for clients:
-
Session nodes have a message deque, that gets filled with each event your client receives
- If you want to resume from a sequence number that isn't in the deque, your session gets invalidated
-
When Discord goes down, you might get a
PRESENCES_REPLACE
event- That is sent when your session node notices your client is lagging behind new presence updates.