Apply suggestions from code review

Co-authored-by: Michael Stanclift <mx@vmstan.com>
This commit is contained in:
Frigyes 2023-12-05 10:46:54 -08:00 committed by GitHub
parent 62c8d6b32c
commit c7cd653ddd
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
9 changed files with 25 additions and 20 deletions

View File

@ -22,7 +22,7 @@ Things that need to be backed up in order of importance:
There are two failure types that people in general may guard for: The failure of the hardware, such as data corruption on the disk; and human and software error, such as wrongful deletion of a particular piece of data. In this documentation, only the former type is considered.
A lost PostgreSQL database is completely game over. Mastodon stores all the most important data in the PostgreSQL database. If the database disappears, all the accounts, posts and followers on your server will disappear with it.
Mastodon stores all the most important data in the PostgreSQL database. The loss of the PostgreSQL database will result in the complete failure of the server, including all the accounts, their posts and followers.
If you lose application secrets, some functions of Mastodon will stop working for your users, they will be logged out, two-factor authentication will become unavailable, and Web Push API subscriptions will stop working.

View File

@ -25,7 +25,7 @@ This is the unique identifier of your server in the network. It cannot be safely
As with `LOCAL_DOMAIN`, `WEB_DOMAIN` cannot be safely changed once set, as this will confuse remote servers that know of your previous settings and may break communication with them or make it unreliable. As the issues lie with remote servers' understanding of your accounts, re-installing Mastodon from scratch will not fix the issue. Therefore, please be extremely cautious when setting up `LOCAL_DOMAIN` and `WEB_DOMAIN`.
To install Mastodon on `mastodon.example.com` in such a way it can serve `@alice@example.com`, set `LOCAL_DOMAIN` to `example.com` and `WEB_DOMAIN` to `mastodon.example.com`. This also requires additional configuration on the server hosting `example.com` to redirect or proxy requests from `https://example.com/.well-known/webfinger` to `https://mastodon.example.com/.well-known/webfinger`. For instance, with nginx, the configuration could look like the following:
To install Mastodon on `mastodon.example.com` in such a way it can serve `@alice@example.com`, set `LOCAL_DOMAIN` to `example.com` and `WEB_DOMAIN` to `mastodon.example.com`. This also requires additional configuration on the server hosting `example.com` to redirect requests from `https://example.com/.well-known/webfinger` to `https://mastodon.example.com/.well-known/webfinger`. For instance, with nginx, the configuration could look like the following:
```
location /.well-known/webfinger {
@ -84,7 +84,7 @@ As of Mastodon v4.0.0, the web app is now used to render all requests, even for
#### `SINGLE_USER_MODE`
If set to `true, the front page of your Mastodon server will always redirect to the first profile in the database and registrations will be disabled.
If set to `true`, the front page of your Mastodon server will always redirect to the first profile in the database and registrations will be disabled.
#### `DEFAULT_LOCALE`
@ -203,6 +203,7 @@ Determines the amount of logs generated by Mastodon. Defaults to `info`, which g
#### `TRUSTED_PROXY_IP`
Tells the Mastodon web and streaming processes which IPs act as your trusted reverse proxy (e.g. nginx, Cloudflare). It affects how Mastodon determines the source IP of each request, which is used for important rate limits and security functions. If the value is set incorrectly then Mastodon could use the IP of the reverse proxy instead of the actual source.
By default, the loopback and private network address ranges are trusted. Specifically:
- `127.0.0.1/8`
@ -310,7 +311,7 @@ Defaults to `5432`.
#### `DB_POOL`
Defies how many database connections to pool in the process. This value should cover every thread in the process, for this reason, it defaults to the value of `MAX_THREADS`.
Defines how many database connections to pool in the process. This value should cover every thread in the process, for this reason, it defaults to the value of `MAX_THREADS`.
#### `DB_SSLMODE`
@ -726,14 +727,14 @@ If set, registrations will not be possible with any e-mails **except** those fro
If set, registrations will not be possible with any e-mails from the specified domains. Pipe-separated values, e.g.: `foo.com|bar.com`
{{< hint style="warning" >}}
This option is deprecated. You can dynamically block e-mail domains from the admin interface or the `tootctl`` command-line interface.
This option is deprecated. You can dynamically block e-mail domains from the admin interface or the `tootctl` command-line interface.
{{</ hint >}}
### Sessions
#### `MAX_SESSION_ACTIVATIONS`
Defines how many browser sessions are allowed per user. Defaults to `10`. If a new browser session is created, then the oldest session is deleted, e.g. user in that browser is logged out.
Defines the maximum number of browser sessions allowed per user, which defaults to 10. If a new browser session is created and the limit is exceeded, the oldest session is deleted, resulting in the user being logged out of that session.
### Home feeds

View File

@ -152,7 +152,7 @@ RAILS_ENV=production bin/tootctl search deploy
## Search optimization for other languages
### Chinese search optimization {#chinese-search-optimization}
The default analyzer of Elasticsearch is the standard analyzer, which may not be the best, especially for Chinese. To improve the search experience, you can install a language-specific analyzer. Before creating the indices in Elasticsearch, install the following Elasticsearch extensions:
The standard analyzer is the default for Elasticsearch, but for some languages like Chinese it may not be the optimal choice. To enhance the search experience, consider installing a language-specific analyzer. Before creating indices in Elasticsearch, be sure to install the following extensions:
- [elasticsearch-analysis-ik](https://github.com/medcl/elasticsearch-analysis-ik)
- [elasticsearch-analysis-stconvert](https://github.com/medcl/elasticsearch-analysis-stconvert)

View File

@ -58,7 +58,7 @@ yarn set version classic
### Installing Ruby {#installing-ruby}
We will be using rbenv to manage Ruby versions because its easier to get the right versions and to update once a newer release comes out. rbenv must be installed for a single Linux user, therefore, first, we must create the user Mastodon will be running as:
We will use rbenv to manage Ruby versions as it simplifies obtaining the correct versions and updating them when new releases are available. Since rbenv needs to be installed for an individual Linux user, we must first create the user account under which Mastodon will run:
```bash
adduser --disabled-login mastodon

View File

@ -8,7 +8,7 @@ When you are using Mastodon with an object storage provider like Amazon S3, Wasa
- Bandwidth is usually metered and very expensive
- URLs will be broken if you decide to switch providers later
You can instead serve the files from your own domain, caching them in the process. Access patterns on Mastodon are such that **new files are usually accessed simultaneously by a lot of clients** as new posts stream in through the streaming API or as they get distributed through federation; older content is accessed comparatively rarely. For that reason, caching alone would not reduce the bandwidth consumed by your proxy from the actual object storage. To mitigate this, we can use a **cache lock** mechanism that ensures that only one proxy request is made at the same time.
You can choose to serve the files from your own domain, incorporating caching in the process. In Mastodon, access patterns show that new files are often simultaneously accessed by many clients as they appear in new posts via the streaming API or are shared through federation; in contrast, older content is accessed less frequently. Therefore, relying solely on caching won't significantly reduce the bandwidth usage of your proxy from the actual object storage. To address this, we can implement a cache lock mechanism, which ensures that only one proxy request is made at a time.
Here is an example nginx configuration that accomplishes this:

View File

@ -46,7 +46,7 @@ You must serve the files with CORS headers, otherwise some functions of Mastodon
{{</ hint >}}
{{< hint style="danger" >}}
In any case, your S3 bucket must be configured so that -- ACL configuration notwithstanding -- all objects are publicly readable but neither writable nor listable, while Mastodon itself can write to it. The configuration should be similar for all S3 providers, but common ones have been highlighted below.
Regardless of the ACL configuration, your S3 bucket must be set up to ensure that all objects are publicly readable but not writable or listable. At the same time, Mastodon itself should have write access to the bucket. This configuration is generally consistent across all S3 providers, and common ones are highlighted below.
{{</ hint >}}
### MinIO

View File

@ -71,13 +71,13 @@ Many tasks in Mastodon are delegated to background processing to ensure the HTTP
While the amount of threads in the web process affects the responsiveness of the Mastodon instance to the end-user, the amount of threads allocated to background processing affects how quickly posts can be delivered from the author to anyone else, how soon e-mails are sent out, etc.
The amount of threads is not controlled by an environment variable in this case, but by a command line argument in the invocation of Sidekiq, e.g.:
The number of threads is not regulated by an environment variable, but rather through a command line argument when invoking Sidekiq, as shown in the following example:
```bash
bundle exec sidekiq -c 15
```
Would start the sidekiq process with 15 threads. Please note that each thread needs to be able to connect to the database, which means that the database pool needs to be large enough to support all the threads. The database pool size is controlled with the `DB_POOL` environment variable and must be at least the same as the number of threads.
This would initiate the Sidekiq process with 15 threads. It's important to note that each thread requires a database connection, necessitating a sufficiently large database pool. The size of this pool is managed by the DB_POOL environment variable, which should be set to a value at least equal to the number of threads.
#### Queues {#sidekiq-queues}
@ -100,7 +100,7 @@ bundle exec sidekiq -q default
To run just the `default` queue.
The way Sidekiq works with queues, it first checks for tasks from the first queue, and if there are none, checks the next queue. This means, that if the first queue is overfilled, the other queues will lag behind.
Sidekiq processes queues by first checking for tasks in the first queue, and if it finds none, it then checks the subsequent queue. Consequently, if the first queue is overfilled, tasks in the other queues may experience delays.
As a solution, it is possible to start different Sidekiq processes for the queues to ensure truly parallel execution, by e.g. creating multiple systemd services for Sidekiq with different arguments.
@ -112,7 +112,7 @@ As a solution, it is possible to start different Sidekiq processes for the queue
If you start running out of available Postgres connections (the default is 100) then you may find PgBouncer to be a good solution. This document describes some common gotchas as well as good configuration defaults for Mastodon.
Note that you can check “PgHero” in the administration view to see how many Postgres connections are currently being used. Typically Mastodon uses as many connections as there are threads in Puma, Sidekiq and the streaming API combined.
User roles with `DevOps` permissions in Mastodon can monitor the current usage of Postgres connections through the PgHero link in the Administration view. Generally, the number of connections open is equal to the total threads in Puma, Sidekiq, and the streaming API combined.
### Installing PgBouncer {#pgbouncer-install}
@ -202,7 +202,7 @@ Make sure the `pgbouncer` user is an admin:
admin_users = pgbouncer
```
**This next part is very important!** The default pooling mode is session-based, but for Mastodon, we want transaction-based. In other words, a Postgres connection is created when a transaction is created and dropped when the transaction is done. So youll want to change the `pool_mode` from `session` to `transaction`:
Mastodon requires a different pooling mode than the default session-based one. Specifically, it needs a transaction-based pooling mode. This means that a Postgres connection is established at the start of a transaction and terminated upon its completion. Therefore, it's essential to change the `pool_mode` setting from `session` to `transaction`:
```ini
pool_mode = transaction
@ -287,13 +287,17 @@ Then use `\q` to quit.
## Separate Redis for cache {#redis}
Redis is used widely throughout the application, but some uses are more important than others. Home feeds, list feeds, and Sidekiq queues as well as the streaming API are backed by Redis and thats important data you wouldnt want to lose (even though the loss can be survived, unlike the loss of the PostgreSQL database - never lose that!). However, Redis is also used for volatile cache. If you are at a stage of scaling up where you are worried if your Redis can handle everything, you can use a different Redis database for the cache. In the environment, you can specify `CACHE_REDIS_URL` or individual parts like `CACHE_REDIS_HOST`, `CACHE_REDIS_PORT` etc. Unspecified parts fall back to the same values as without the cache prefix.
Redis plays a vital role in Mastodon, but some uses are more critical than others. Key features like home feeds, list feeds, Sidekiq queues, and the streaming API rely on Redis for important data storage, which you should strive to protect, though its loss is less catastrophic compared to losing the PostgreSQL database.
As far as configuring the Redis database goes, you can basically get rid of background saving to disk, since it doesnt matter if the data gets lost on restart and you can save some disk I/O on that. You can also add a maximum memory limit and a key eviction policy, for that, see this guide: [Using Redis as an LRU cache](https://redis.io/topics/lru-cache)
Additionally, Redis is used for volatile caching. If you're scaling up and concerned about Redis's capacity to handle the load, you can allocate a separate Redis database specifically for caching. To do this, set `CACHE_REDIS_URL` in the environment, or define individual components such as `CACHE_REDIS_HOST`, `CACHE_REDIS_PORT`, etc.
Unspecified components will default to their values without the cache prefix.
When configuring the Redis database for caching, it's possible to disable background saving to disk, as data loss on restart is not critical in this context, and this can save some disk I/O. Additionally, consider setting a maximum memory limit and implementing a key eviction policy. For more details on these configurations, refer to this guide:[Using Redis as an LRU cache](https://redis.io/topics/lru-cache)
## Read-replicas {#read-replicas}
To reduce the load on your Postgresql server, you may wish to set up hot streaming replication (read replica). [See this guide for an example](https://cloud.google.com/community/tutorials/setting-up-postgres-hot-standby). You can make use of the replica in Mastodon in these ways:
To reduce the load on your PostgreSQL server, you may wish to set up hot streaming replication (read replica). [See this guide for an example](https://cloud.google.com/community/tutorials/setting-up-postgres-hot-standby). You can make use of the replica in Mastodon in these ways:
- The streaming API server does not issue writes at all, so you can connect it straight to the replica. But its not querying the database very often anyway so the impact of this is little.
- Use the Makara driver in the web processes, so that writes go to the primary database, while reads go to the replica. Lets talk about that.

View File

@ -25,7 +25,7 @@ If your database is not using `C` or `POSIX` for its collation setting (which yo
your indexes might be inconsistent if you ever ran with a version of glibc prior to 2.28 and did not immediately reindex your databases after updating to glibc 2.28 or newer.
{{< hint style="info" >}}
You may have found this page because of **PgHero** warnings about "Duplicate Indexes". While such warnings can sometimes be indicative of an issue in deploying or updating Mastodon, **they are not related to database index corruption and are not indicative of any functional issue with your database**.
You may have found this page because of PgHero warnings about "Duplicate Indexes". While such warnings can sometimes be indicative of an issue in deploying or updating Mastodon, **they are not related to database index corruption and are not indicative of any functional issue with your database**.
{{< /hint >}}
You can check whether your indexes are consistent using [PostgreSQL's `amcheck` module](https://www.postgresql.org/docs/10/amcheck.html): as the database server's super user, connect to your Mastodon database and issue the following (this may take a while):

View File

@ -11,7 +11,7 @@ Mastodon is a Ruby on Rails application with a React.js front-end. It follows s
The best way of working with Mastodon in a development environment is installing all the dependencies on your system, rather than using Docker or Vagrant. You need Ruby, Node.js, PostgreSQL and Redis, which is a pretty standard set of dependencies for Rails applications.
Tutorials for installing these dependencies can be found on the “Installing from source” page in the Running Mastodon section of the documentation. Please keep in mind that root access to a machine running Ubuntu 18.04 is required. After following the installation guide in the Running Mastodon section, see the “Setting up a dev environment” page for further instructions on how to configure your environment for development.
Tutorials for installing these dependencies can be found on the “Installing from source” page in the Running Mastodon section of the documentation. After following the installation guide in the Running Mastodon section, see the “Setting up a dev environment” page for further instructions on how to configure your environment for development.
### Environments {#environments}