Improve scaling documentation for Streaming (#1335)

* Improve scaling documentation for Streaming This reorganizes the documentation a little bit, adds information about the templates systemd files, and some reference numbers for concurrent streaming connections. * Update content/en/admin/scaling.md * Add section on Streaming Server metrics to Scaling.md
2023-12-06 00:03:47 +01:00 · 2023-12-06 00:03:47 +01:00 · 472344a483
parent 66612d0084
commit 472344a483
1 changed files with 25 additions and 8 deletions
--- a/content/en/admin/scaling.md
+++ b/content/en/admin/scaling.md
@ -36,17 +36,16 @@ The streaming API handles long-lived HTTP and WebSockets connections, through wh
 - `PORT` controls the port the streaming server will listen on, by default 4000. The `BIND` and `SOCKET` environment variables are also able to be used.
 - Additionally the shared [database](/admin/config#postgresql) and [redis](/admin/config#redis) environment variables are used.

-The streaming API can be use a different subdomain if you want to by setting `STREAMING_API_BASE_URL`, this allows you to have one load balancer for streaming and one for web/API requests.
+The streaming API can be use a different subdomain if you want to by setting `STREAMING_API_BASE_URL`, this allows you to have one load balancer for streaming and one for web/API requests. However, this also requires applications to correctly request the streaming URL from the [instance endpoint](/methods/instance/#v2), instead of assuming that it's hosted on the same host as the Web API.

-{{< hint style="warning" >}}
-Previous versions of Mastodon had a `STREAMING_CLUSTER_NUM` environment variable that made the streaming server use clustering, which started mulitple processes (workers) and used node.js to load balance them.
+One process of the streaming server can handle a reasonably high number of connections and throughput, but if you find that a single process isn't handling your instance's load, you can run multiple processes by varying the `PORT` number of each, and then using nginx to load balance traffic to each of those instances. For example, a community of about 50,000 accounts with 10,000-20,000 monthly active accounts, you'll typically have an average concurrent load of about 800-1200 streaming connections.

-This interacted with the other settings in ways which made capacity planning difficult, especially when it comes to database connections and CPU resources. By default the streaming server would consume resources on all available CPUs which could cause contention with other software running on that server. Another common issue was that misconfiguring the `STREAMING_CLUSTER_NUM` would exhaust your database connections by opening up a connection pool per cluster worker process, so a `STREAMING_CLUSTER_NUM` of `5` and `DB_POOL` of `10` would potentially consume 50 database connections.
+The streaming server also exposes a [Prometheus](https://prometheus.io/) endpoint on `/metrics` with a lot of metrics to help you understand the current load on your mastodon streaming server, some key metrics are:

-Now a single streaming server process will only use at maximum `DB_POOL` PostgreSQL connections, and scaling is handled by running more instances of the streaming server.
-{{< /hint >}}
-
-One process can handle a reasonably high number of connections and throughput, but if you find that a single streaming server process isn't handling your instance's load, you can run multiple processes by varying the `PORT` number of each, and then using nginx to load balance traffic to each of those instances.
+* `mastodon_streaming_connected_clients`: This is the number of connected clients, tagged by client type (websocket or eventsource)
+* `mastodon_streaming_connected_channels`: This is the number of "channels" that are currently subscribed (note that this is much higher than connected clients due to how our internal "system" channels currently work)
+* `mastodon_streaming_messages_sent_total`: This is the total number of messages sent to clients since last restart.
+* `mastodon_streaming_redis_messages_received_total`: This is the number of messages received from Redis pubsub, and intended to complement [monitoring Redis directly](https://sysdig.com/blog/redis-prometheus/).

 {{< hint style="info" >}}
 The more streaming server processes that you run, the more database connections will be consumed on PostgreSQL, so you'll likely want to use PgBouncer, as documented below.
@ -63,6 +62,24 @@ upstream streaming {
 }
 ```

+If you're using the distributed systemd files, then you can start up multiple streaming servers with the following commands:
+
+```
+$ sudo systemctl start mastodon-streaming@4000.service
+$ sudo systemctl start mastodon-streaming@4001.service
+$ sudo systemctl start mastodon-streaming@4002.service
+```
+
+By default, `sudo systemctl start mastodon-streaming` starts just one process on port 4000, equivalent to running `sudo systemctl start mastodon-streaming@4000.service`.
+
+{{< hint style="warning" >}}
+Previous versions of Mastodon had a `STREAMING_CLUSTER_NUM` environment variable that made the streaming server use clustering, which started mulitple workers processes and used node.js to load balance them.
+
+This interacted with the other settings in ways which made capacity planning difficult, especially when it comes to database connections and CPU resources. By default the streaming server would consume resources on all available CPUs which could cause contention with other software running on that server. Another common issue was that misconfiguring the `STREAMING_CLUSTER_NUM` would exhaust your database connections by opening up a connection pool per cluster worker process, so a `STREAMING_CLUSTER_NUM` of `5` and `DB_POOL` of `10` would potentially consume 50 database connections.
+
+Now a single streaming server process will only use at maximum `DB_POOL` PostgreSQL connections, and scaling is handled by running more instances of the streaming server.
+{{< /hint >}}
+
 ### Background processing (Sidekiq) {#sidekiq}

 Many tasks in Mastodon are delegated to background processing to ensure the HTTP requests are fast, and to prevent HTTP request aborts from affecting the execution of those tasks. Sidekiq is a single process, with a configurable number of threads.