Fully document S3 object storage. (#1393)

This rewrite documents all supported environment variables for the S3
object storage system, and in addition documents the way that Mastodon
constructs URLs that it hands to clients (for them to obtain objects
from the storage provider).

The documentation of the variables lives entirely in the object-storage page
now, instead of being mixed between that page and the main config page. A link
to the object-storage page has been added to the config page.
This commit is contained in:
Kevin P. Fleming 2024-02-20 10:10:11 -05:00 committed by GitHub
parent 1d8f602210
commit bdf33a15f2
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
2 changed files with 171 additions and 39 deletions

View File

@ -535,50 +535,39 @@ Example value: `https://assets.example.com`
You must serve the files with CORS headers, otherwise some functions of Mastodon's web UI will not work. For example, `Access-Control-Allow-Origin: *`
{{</ hint >}}
#### `S3_ALIAS_HOST`
Similar to `CDN_HOST`, you may serve _user-uploaded_ files from a separate host. If you are using external storage like Amazon S3, Minio or Google Cloud, you will by default be serving files from those services' URLs.
It is _extremely recommended_ to use your own host instead, for a few reasons:
1. Bandwidth on external storage providers is metered and expensive
2. You may want to switch to a different provider later without breaking old links
Example value: `files.example.com`
{{< page-ref page="admin/optional/object-storage-proxy" >}}
{{< hint style="info" >}}
You must serve the files with CORS headers, otherwise some functions of Mastodon's web UI will not work. For example, `Access-Control-Allow-Origin: *`
{{</ hint >}}
### Local file storage {#paperclip}
#### `PAPERCLIP_ROOT_PATH`
#### `PAPERCLIP_ROOT_URL`
### Amazon S3 and compatible {#s3}
### AWS S3 and compatible {#s3}
{{< page-ref page="admin/optional/object-storage" >}}
#### `S3_ENABLED`
#### `S3_REGION`
#### `S3_ENDPOINT`
#### `S3_BUCKET`
#### `AWS_ACCESS_KEY_ID`
#### `AWS_SECRET_ACCESS_KEY`
#### `S3_REGION`
#### `S3_SIGNATURE_VERSION`
#### `S3_OVERRIDE_PATH_STYLE`
#### `S3_PROTOCOL`
#### `S3_HOSTNAME`
#### `S3_ENDPOINT`
#### `S3_ALIAS_HOST`
#### `S3_SIGNATURE_VERSION`
#### `S3_OVERRIDE_PATH_STYLE`
#### `S3_OPEN_TIMEOUT`
@ -586,18 +575,18 @@ You must serve the files with CORS headers, otherwise some functions of Mastodon
#### `S3_FORCE_SINGLE_REQUEST`
#### `S3_PERMISSION`
#### `S3_ENABLE_CHECKSUM_MODE`
Defines the S3 object ACL when uploading new files. Default is `public-read`. Use caution when using [S3 Block Public Access](https://docs.aws.amazon.com/AmazonS3/latest/userguide/access-control-block-public-access.html) and turning on the `BlockPublicAcls` option, as uploading objects with ACL `public-read` will fail (403). In that case, set `S3_PERMISSION` to `private`.
#### `S3_STORAGE_CLASS`
#### `S3_MULTIPART_THRESHOLD`
#### `S3_PERMISSION`
#### `S3_BATCH_DELETE_LIMIT`
The official [Amazon S3 API](https://docs.aws.amazon.com/AmazonS3/latest/API/API_DeleteObjects.html) can handle deleting 1,000 objects in one batch job, but some providers may have issues handling this many in one request, or offer lower limits. Defaults to `1000`.
#### `S3_BATCH_DELETE_RETRY`
During batch delete operations, S3 providers may perodically fail or timeout while processing deletion requests. Mastodon will backoff and retry the request up to the maximum number of times. Defaults to `3`.
### Swift {#swift}
#### `SWIFT_ENABLED`

View File

@ -29,15 +29,77 @@ The web server must be configured to serve those files but not allow listing the
Mastodon can use S3-compatible object storage backends. ACL support is recommended as it allows Mastodon to quickly make the content of temporarily suspended users unavailable, or marginally improve the security of private data.
On Mastodon's end, you need to configure the following environment variables:
- `S3_ENABLED=true`
- `S3_BUCKET=mastodata` (replacing `mastodata` with the name of your bucket)
- `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` need to be set to your credentials
- `S3_ALIAS_HOST` is optional but highly recommended in order to set up a caching proxy and not lock you to a specific provider
- `S3_REGION`
- `S3_HOSTNAME` (optional if you use Amazon AWS)
- `S3_PERMISSION` (optional, if you use a provider that does not support ACLs or want to use custom ACLs)
- `S3_FORCE_SINGLE_REQUEST=true` (optional, if you run into trouble processing large files)
Mastodon uses the S3 API (`S3_REGION`, `S3_ENDPOINT`, `S3_BUCKET`,
`AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, `S3_SIGNATURE_VERSION`,
`S3_OVERRIDE_PATH_STYLE`) for all write, delete, and
permissions-modification operations. This includes media uploads (from
the web interface, from Mastodon API clients, and from ActivityPub
servers), media deletion (when a post is edited or deleted), and
blocking access to media (when an account is suspended).
Mastodon sends URLs to the web interface, Mastodon API clients, and
ActivityPub servers for all 'read' operations. As a result those
operations are anonymous (no authentication or authorization needed)
and use plain HTTP GET methods, which means they can be routed through
reverse proxies and CDNs, and can be cached. It also means that those
URLs can contain host/domain names which are entirely different from
those used by the S3 storage provider itself, if desired. See the
detailed documentation below which describes how those URLs are
constructed and which environment variables are involved.
To enable S3 storage, set the `S3_ENABLED` environment variable to `true`.
### Environment variables for S3 API access
- `S3_REGION` (defaults to 'us-east-1', required if using AWS S3, may
not be required with other storage providers)
- `S3_ENDPOINT` (defaults to 's3.<S3_REGION>.amazonaws.com', required
if not using AWS S3)
- `S3_BUCKET=mastodata` (replacing `mastodata` with the name of your
bucket)
- `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` need to be set to
your credentials
- `S3_SIGNATURE_VERSION` (defaults to 'v4', should be compatible with
most storage providers)
- `S3_OVERRIDE_PATH_STYLE` (only used if `S3_ENDPOINT` is configured,
set this to `true` if the storage provider requires API operations
to be sent to '<S3_BUCKET>.<S3_ENDPOINT>` (domain-style))
### Environment variables for client access to media objects
- `S3_PROTOCOL` (defaults to `https`)
- `S3_HOSTNAME` (defaults to 's3-<S3_REGION>.amazonaws.com', required
if not using AWS S3 and `S3_ALIAS_HOST` is not set)
- `S3_ALIAS_HOST` (can be used instead of `S3_HOSTNAME` if you do not
want `S3_BUCKET` to be included in the media URLs, and requires that
you have provisioned a reverse proxy or CDN in front of the storage
provider)
As noted above, Mastodon will send URLs to clients when they need to
access media objects from the storage provider. The URLs are
constructed as follows:
- If `S3_ALIAS_HOST` is not set, then the URL will be
'<S3_PROTOCOL>://<S3_HOSTNAME>/<S3_BUCKET>/\<object path\>'
- If `S3_ALIAS_HOST` is set, then the URL will be
'<S3_PROTOCOL>://<S3_ALIAS_HOST>/\<object path\>'
It is important to note that when `S3_ALIAS_HOST` is set, the bucket
name is **not** included in the generated URL; this means the bucket
name must be included in `S3_ALIAS_HOST` (referred to as
'domain-style' object access), or that `S3_ALIAS_HOST` must point to a
reverse proxy or CDN which can include the bucket name in the URL it
uses to send the request onward to the storage provider. This type of
configuration allows you to 'hide' the usage of the storage provider
from the instance's clients, which means you can change storage
providers without changing the resulting URLs.
In addition to hiding the usage of the storage provider, this can also
allow you to cache the media after retrieval from the storage
provider, reducing egress bandwidth costs from the storage
provider. This can be done in your own reverse proxy, or by using a
CDN.
{{< page-ref page="admin/optional/object-storage-proxy.md" >}}
@ -45,10 +107,91 @@ On Mastodon's end, you need to configure the following environment variables:
You must serve the files with CORS headers, otherwise some functions of Mastodon's web UI will not work. For example, `Access-Control-Allow-Origin: *`
{{</ hint >}}
### Optional environment variables
#### `S3_OPEN_TIMEOUT`
Default: 5 (seconds)
The number of seconds before the HTTP handler should timeout while trying to open a new HTTP session.
#### `S3_READ_TIMEOUT`
Default: 5 (seconds)
The number of seconds before the HTTP handler should timeout while waiting for an HTTP response.
#### `S3_FORCE_SINGLE_REQUEST`
Default: false
Set this to `true` if you run into trouble processing large files.
#### `S3_ENABLE_CHECKSUM_MODE`
Default: false
Enables verification of object checksums when Mastodon is retrieving
an object from the storage provider. This feature is available in AWS
S3 but may not be available in other S3-compatible implementations.
#### `S3_STORAGE_CLASS`
Default: none
When using AWS S3, this variable can be set to one of the [storage
class](https://docs.aws.amazon.com/AmazonS3/latest/userguide/storage-class-intro.html)
options which influence the storage selected for uploaded objects (and
thus their access times and costs). If no storage class is specified
then AWS S3 will use the `STANDARD` class, but options include
`REDUCED_REDUNDANCY`, `GLACIER`, and others.
#### `S3_MULTIPART_THRESHOLD`
Default: 15 (megabytes)
Objects of this size and smaller will be uploaded in a single
operation, but larger objects will be uploaded using the multipart
chunking mechanism, which can improve transfer speeds and reliability.
#### `S3_PERMISSION`
Default: `public-read`
Defines the S3 object ACL when uploading new files. Use caution when
using [S3 Block Public
Access](https://docs.aws.amazon.com/AmazonS3/latest/userguide/access-control-block-public-access.html)
and turning on the `BlockPublicAcls` option, as uploading objects with
ACL `public-read` will fail (403). In that case, set `S3_PERMISSION`
to `private`.
{{< hint style="danger" >}}
Regardless of the ACL configuration, your S3 bucket must be set up to ensure that all objects are publicly readable but not writable or listable. At the same time, Mastodon itself should have write access to the bucket. This configuration is generally consistent across all S3 providers, and common ones are highlighted below.
Regardless of the ACL configuration, your
S3 bucket must be set up to ensure that all objects are publicly
readable but not writable or listable. At the same time, Mastodon
itself should have write access to the bucket. This configuration is
generally consistent across all S3 providers, and common ones are
highlighted below.
{{</ hint >}}
#### `S3_BATCH_DELETE_LIMIT`
Default: `1000`
The official [Amazon S3
API](https://docs.aws.amazon.com/AmazonS3/latest/API/API_DeleteObjects.html)
can handle deleting 1,000 objects in one batch job, but some providers
may have issues handling this many in one request, or offer lower
limits.
#### `S3_BATCH_DELETE_RETRY`
Default: 3
During batch delete operations, S3 providers may perodically fail or
timeout while processing deletion requests. Mastodon will back off and
retry the request up to this maximum number of times.
### MinIO
MinIO is an open-source implementation of an S3 object provider. This section does not cover how to install it, but how to configure a bucket for use in Mastodon.