documentation/content/en/admin/optional/elasticsearch.md

155 lines
5.6 KiB
Markdown
Raw Normal View History

---
title: Full-text search
description: Setting up ElasticSearch to search for statuses authored, favourited, or mentioned in.
menu:
docs:
weight: 10
parent: admin-optional
---
Update content for 4.0 (part 1) (#991) * add rules * join date on profiles * deprecate follow scope * deprecate identity proofs * familiar followers * use definition lists instead of tables for defining activitypub properties * reformat notifications page into markdown * fix broken links to publicKey header * Application website is now nullable * update environment variables added and removed * fix typo * fix heading level * min_id and max_id can be used at the same time (3.3) * fix typo * new tootctl options * reformat tootctl page to use definition lists for params * add rules and configuration to Instance * fix typo * refactor instance api page * 3.3.0 duration on mutes * 3.3.0 mute_expires_at * improve section headings * 3.4.0 resend email confirmation api * 3.4.0 policy on push subscriptions * 3.4.0 add details to account registration error * refactor accounts api page and start adding relrefs to entity pages * 3.4.0 accounts/lookup api * add see also to accounts methods * add more see-also links * 3.5.0 appeal mod decisions * 3.5.0 reformat reports and add category/rule_ids params * document report entity and missing responses * fix typos * fix relrefs and url schema, add aliases to old urls * add archetypes for new methods/entities * update archetypes with see-also stubs * clearer presentation of rate limits * announcements api methods * refactor apps methods * refactor bookmarks methods + some anchors * refactor conversations methods * custom_emojis methods refactor * anchors * refactor directory methods * refactor domain_blocks methods * add see also to emails methods * fix page relref shortcodes to specific methods + refactor endorsements methods * min_id max_id * refactor favourites methods * refactor featured_tags methods * refactor filters methods, make path params consistent, i18n required shortcode * follow_requests methods * lists methods * markers methods * forgot to add entity links * media methods, also fix formatting of some json errors * mutes methods, add more see-also links * oembed methods * preferences methods * proofs methods * push methods * suggestions methods * 3.5.0 add new trend types, fix formatting * refactor streaming methods * refactor oauth methods * note that streaming api casts payload to string * refactor search methods * refactor polls methods * remove unnecessary link * reformat scheduled_statuses methods * reformat timelines methods * reformat statuses methods * 3.5.0 editing statuses * consistent use of array brackets in form data parameters * update dev setup guide, add vagrant and clean up text * add admin/accounts methods * 3.6 role entity * admin/accounts methods v2 * minor fix * stub admin/reports methods * document admin reports * add 403 example to methods archetype * cleanup entities for admin reports and add new attrs * 3.6.0 domain allows methods + normalize admin entity namespace * fix search-and-replace error * add aliases for admin entities * 3.6.0 canonical email blocks entity * 3.5.0 admin/retention api * 3.5.0 add admin::ip doc * 3.5.0 admin/reports * 3.6.0 admin/domain_allows * 3.5.0 admin/dimensions * 3.6.0 permissions and roles * minor formatting fix * add anchor link to headings * checkpoint * add update commands to dev env setup guide * change mentions of v3.6 to v4.0 * tootctl now uses custom roles * fix formatting * v2 instance api * update frontmatter, add better titles to pages * minor wording change * consistency * add more aliases * add placeholders and WIP notices * explain link pagination and stub out todos * switch baseURL to https * 422 on reports with rules but category!=violation * document bug fixes * fix typo * remove duplicate API method definition * s/tootsuite/mastodon for github links * remove unnecessary escaping * s/tootsuite/mastodon in Entity archetype * add missing nullable shortcode * clarify oauth scope when requesting a user token * api/v2/media now synchronous for images * DISALLOW_UNAUTHENTICATED_API_ACCESS * add undocumented env variables * add instance domain blocks and extended description api * add SMTP_ENABLE_STARTTLS * add description to SMTP_ENABLE_STARTTLS * take suggestions from open PRs * normalize links and flavour language * Fully document streaming API based on source code * Add mention of MIME types * bump to ruby 3.0.4 * clarify how to check on async media processing * validation of replies_policy * remove TODOs on admin account action * EmailDomainBlocks * IpBlocks * Admin::DomainBlock * remove TODOs * following hashtags * followed_tags * remove reference to unused parameter * add new oauth scopes for admin blocks and allows * fix command signature for i18n-tasks normalize * reformat code structure page * document fixes for following tags (assume 4.0.3) * Add warning about pre-4.0 hardcoded roles * add note about case sensitivity * remove use of 'simply' from docs * remove reference to silencing * add reference to IDN normalization for verified links * add lang parameter
2022-11-20 07:34:38 +01:00
Mastodon supports full-text search when ElasticSearch is available. Mastodons full-text search allows logged in users to find results from their own statuses, their mentions, their favourites, and their bookmarks. It deliberately does not allow searching for arbitrary strings in the entire database.
## Installing ElasticSearch {#install}
ElasticSearch requires a Java runtime. If you dont have Java already installed, do it now. Assuming you are logged in as `root`:
```bash
apt install openjdk-17-jre-headless
```
Add the official ElasticSearch repository to apt:
```bash
wget -O /usr/share/keyrings/elasticsearch.asc https://artifacts.elastic.co/GPG-KEY-elasticsearch
echo "deb [signed-by=/usr/share/keyrings/elasticsearch.asc] https://artifacts.elastic.co/packages/7.x/apt stable main" > /etc/apt/sources.list.d/elastic-7.x.list
```
Now you can install ElasticSearch:
```bash
apt update
apt install elasticsearch
```
{{< hint style="warning" >}}
**Security warning:** By default, ElasticSearch is supposed to bind to localhost only, i.e. be inaccessible from the outside network. You can check which address ElasticSearch binds to by looking at `network.host` within `/etc/elasticsearch/elasticsearch.yml`. Consider that anyone who can access ElasticSearch can access and modify any data within it, as there is no authentication layer. So its really important that the access is secured. Having a firewall that only exposes the 22, 80 and 443 ports is advisable, as outlined in the [main installation instructions](../../prerequisites/#install-a-firewall-and-only-whitelist-ssh-http-and-https-ports). If you have a multi-host setup, you must know how to secure internal traffic.
{{< /hint >}}
{{< hint style="danger" >}}
**Security warning:** ElasticSearch versions between `2.0` and `2.14.1` are affected by an [exploit](https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-44228) in the `log4j` library. If affected, please refer to the [temporary mitigation](https://github.com/elastic/elasticsearch/issues/81618#issuecomment-991000240) from the ElasticSearch issue tracker.
{{< /hint >}}
To start ElasticSearch:
```bash
systemctl daemon-reload
systemctl enable --now elasticsearch
```
## Configuring Mastodon {#config}
Edit `.env.production` to add the following variables:
```bash
ES_ENABLED=true
ES_HOST=localhost
ES_PORT=9200
```
2020-07-26 22:55:38 +02:00
If you have multiple Mastodon servers on the same machine, and you are planning to use the same ElasticSearch installation for all of them, make sure that all of them have unique `REDIS_NAMESPACE` in their configurations, to differentiate the indices. If you need to override the prefix of the ElasticSearch indices, you can set `ES_PREFIX` directly.
2020-07-26 22:55:38 +02:00
After saving the new configuration, restart Mastodon processes for it to take effect:
```bash
systemctl restart mastodon-sidekiq
systemctl reload mastodon-web
```
2020-07-26 22:55:38 +02:00
Now it's time to create the ElasticSearch indices and fill them with data:
```bash
su - mastodon
cd live
2020-07-26 22:55:38 +02:00
RAILS_ENV=production bin/tootctl search deploy
```
## Search optimization for other languages
### Chinese search optimization {#chinese-search-optimization}
2020-07-26 22:55:38 +02:00
The default analyzer of the ElasticSearch is the standard analyzer, which may not be the best especially for Chinese. To improve search experience, you can install a language specific analyzer. Before creating the indices in ElasticSearch, install the following ElasticSearch extensions:
2020-07-26 22:55:38 +02:00
- [elasticsearch-analysis-ik](https://github.com/medcl/elasticsearch-analysis-ik)
- [elasticsearch-analysis-stconvert](https://github.com/medcl/elasticsearch-analysis-stconvert)
2020-07-26 22:55:38 +02:00
And then modify Mastodon's index definition as follows:
```diff
diff --git a/app/chewy/accounts_index.rb b/app/chewy/accounts_index.rb
--- a/app/chewy/accounts_index.rb
+++ b/app/chewy/accounts_index.rb
@@ -4,7 +4,7 @@ class AccountsIndex < Chewy::Index
settings index: { refresh_interval: '5m' }, analysis: {
analyzer: {
content: {
- tokenizer: 'whitespace',
+ tokenizer: 'ik_max_word',
filter: %w(lowercase asciifolding cjk_width),
},
2020-07-26 22:55:38 +02:00
diff --git a/app/chewy/statuses_index.rb b/app/chewy/statuses_index.rb
--- a/app/chewy/statuses_index.rb
+++ b/app/chewy/statuses_index.rb
@@ -16,9 +16,17 @@ class StatusesIndex < Chewy::Index
language: 'possessive_english',
},
},
+ char_filter: {
+ tsconvert: {
+ type: 'stconvert',
+ keep_both: false,
+ delimiter: '#',
+ convert_type: 't2s',
+ },
+ },
analyzer: {
content: {
- tokenizer: 'uax_url_email',
+ tokenizer: 'ik_max_word',
filter: %w(
english_possessive_stemmer
lowercase
@@ -27,6 +35,7 @@ class StatusesIndex < Chewy::Index
english_stop
english_stemmer
),
+ char_filter: %w(tsconvert),
},
},
}
diff --git a/app/chewy/tags_index.rb b/app/chewy/tags_index.rb
--- a/app/chewy/tags_index.rb
+++ b/app/chewy/tags_index.rb
@@ -2,10 +2,19 @@
2020-07-26 22:55:38 +02:00
class TagsIndex < Chewy::Index
settings index: { refresh_interval: '15m' }, analysis: {
+ char_filter: {
+ tsconvert: {
+ type: 'stconvert',
+ keep_both: false,
+ delimiter: '#',
+ convert_type: 't2s',
+ },
+ },
analyzer: {
content: {
- tokenizer: 'keyword',
+ tokenizer: 'ik_max_word',
filter: %w(lowercase asciifolding cjk_width),
+ char_filter: %w(tsconvert),
},
2020-07-26 22:55:38 +02:00
edge_ngram: {
```