Commit Graph

66 Commits

Author SHA1 Message Date
Daniel Sockwell daf7d1ae7f
Add tests for polling for multiple messages (#149)
* Add tests for polling for multiple messages

This commit adds a mock Redis interface and adds tests that poll the
mock interface for multiple messages at a time.  These tests test that
Flodgatt is robust against receiving incomplete messages, including if
the message break results in receiving invalid UTF8.

* Remove temporary files
2020-05-07 10:56:11 -04:00
Daniel Sockwell 4a13412f98
Improve handling of large Redis input (#143)
* Implement faster buffered input

This commit implements a modified ring buffer for input from Redis.
Specifically, Flodgatt now limits the amount of data it fetches from
Redis in one syscall to 8 KiB (two pages on most systems). Flodgatt
will process all complete messages it receives from Redis and then
re-use the same buffer for the next time it retrieves data.  If
Flodgatt received a partial message, it will copy the partial message
to the beginning of the buffer before its next read.

This change has little effect on Flodgatt under light load (because it
was rare for Redis to have more than 8 KiB of messages available at
any one time).  However, my hope is that this will significantly
reduce memory use on the largest instances.

* Improve handling of backpresure

This commit alters how Flodgatt behaves if it receives enough messages
for a single client to fill that clients channel. (Because the clients
regularly send their messages, should only occur if a single client
receives a large number of messages nearly simultaneously; this is
rare, but could occur, especially on large instances).

Previously, Flodgatt would drop messages in the rare case when the
client's channel was full.  Now, Flodgatt will pause the current Redis
poll and yield control back to the client streams, allowing the
clients to empty their channels; Flodgatt will then resume polling
Redis/sending the messages it previously received.  With the approach,
Flodgatt will never drop messages.

However, the risk to this approach is that, by never dropping
messages, Flodgatt does not have any way to reduce the amount of work
it needs to do when under heavy load – it delays the work slightly,
but doesn't reduce it.  What this means is that it would be
*theoretically* possible for Flodgatt to fall increasingly behind, if
it is continuously receiving more messages than it can process.  Due
to how quickly Flodgatt can process messages, though, I suspect this
would only come up if an admin were running Flodgatt in a
*significantly* resource constrained environment, but I wanted to
mention it for the sake of completeness.

This commit also adds a new /status/backpressure endpoint that
displays the current length of the Redis input buffer (which should
typically be low or 0).  Like the other /status endpoints, this
endpoint is only enabled when Flodgatt is compiled with the
`stub_status` feature.
2020-04-27 16:03:05 -04:00
Daniel Sockwell d8b07b4b03
Simplify handling of disconnected clients (#142)
* Simplify handling of disconnected clients

* Improve disconnection error logging
2020-04-24 18:05:36 -04:00
Daniel Sockwell b18500b884
Resolve memory-use regression (#140)
* Use monotonically increasing channel_id

Using a monotonically increasing channel_id (instead of a Uuid)
reduces memory use under load by ~3%

* Use replace unbounded channels with bounded

This also slightly reduces memory use

* Heap allocate Event

Wrapping the Event struct in an Arc avoids excessive copying and significantly reduces memory use.

* Implement more efficient unsubscribe strategy

* Fix various Clippy lints; bump version

* Update config defaults
2020-04-24 13:23:59 -04:00
Daniel Sockwell 2725439110
Update concurrency primitive. (#139)
* Initial [WIP] implementation

This initial implementation works to send messages but does not yet
handle unsubscribing properly.

* Implement UnboundedSender

* Implement UnboundedChannels for concurrency
2020-04-23 19:28:26 -04:00
Daniel Sockwell 016f49a2d8
Improve module privacy (#136)
* Adjust module privacy

* Use trait object

* Finish module privacy refactor
2020-04-22 14:38:22 -04:00
Daniel Sockwell 10fa24c5d3
Benchmark & performance tune (#132)
* Add temporary perf metrics

* Add load testing and tune performance
2020-04-17 17:07:10 -04:00
Daniel Sockwell 37b652ad79
Error handling, pt3 (#131)
* Improve handling of Postgres errors

* Finish error handling improvements

* Remove `format!` calls from hot path
2020-04-14 20:37:49 -04:00
Daniel Sockwell 45f9d4b9fb
Code reorganization (#130)
* Reorganize files

* Refactor main()

* Code reorganization [WIP]

* Reorganize code [WIP]

* Refacto RedisConn [WIP]

* Complete code reorganization
2020-04-13 16:03:06 -04:00
Daniel Sockwell 5d2b0b94e2
Error handling pt2 (#129)
This commit improves error handling in Flodgatt's main request-response loop, including the portions of that loop that were revised in #128.

This nearly completes the addition of more explicit error handling, but there will be a smaller part 3 to bring the handling of configuration/Postgres errors into conformity with the style here.
2020-04-10 22:36:03 -04:00
Daniel Sockwell 1657113c58
Stream events via a watch channel (#128)
This squashed commit makes a fairly significant structural change to significantly reduce Flodgatt's CPU usage.

Flodgatt connects to Redis in a single (green) thread, and then creates a new thread to handle each WebSocket/SSE connection. Previously, each thread was responsible for polling the Redis thread to determine whether it had a message relevant to the connected client. I initially selected this structure both because it was simple and because it minimized memory overhead – no messages are sent to a particular thread unless they are relevant to the client connected to the thread. However, I recently ran some load tests that show this approach to have unacceptable CPU costs when 300+ clients are simultaneously connected.

Accordingly, Flodgatt now uses a different structure: the main Redis thread now announces each incoming message via a watch channel connected to every client thread, and each client thread filters out irrelevant messages. In theory, this could lead to slightly higher memory use, but tests I have run so far have not found a measurable increase. On the other hand, Flodgatt's CPU use is now an order of magnitude lower in tests I've run.

This approach does run a (very slight) risk of dropping messages under extremely heavy load: because a watch channel only stores the most recent message transmitted, if Flodgatt adds a second message before the thread can read the first message, the first message will be overwritten and never transmitted. This seems unlikely to happen in practice, and we can avoid the issue entirely by changing to a broadcast channel when we upgrade to the most recent Tokio version (see #75).
2020-04-09 13:32:36 -04:00
Daniel Sockwell d2e0a01baf
Stub status (#124)
* Add /status API endpoints [WIP]

* Finish /status API endpoints

This PR enables compiling Flodgatt with the `stub_status` feature.
When compiled with `stub_status`, Flodgatt has 3 new API endpoints:
/api/v1/streaming/status, /api/v1/streaming/status/per_timeline, and
/api/v1/streaming/status/queue.  The first endpoint lists the total
number of connections, the second lists the number of connections per
timeline, and the third lists the length of the longest queue of
unsent messages (which should be low or zero when Flodgatt is
functioning normally).

Note that the number of _connections_ is not equal to the number of
connected _clients_.  If a user is viewing the local timeline, they
would have at least two connections: one for the local timeline, and
one for their user timeline.  Other users could have even more
connections.

I decided to make the status endpoints an option you enable at compile
time rather than at run time for three reasons:

  * It keeps the API of the default version of Flodgatt 100%
    compatible with the Node server's API;

  * I don't beleive it's an option Flodgatt adminstrators will want to
    toggle on and off frequently.

  * Using a compile time option ensures that there is zero runtime
    cost when the option is disabled.  (The runtime cost should be
    negligible either way, but there is value in being 100% sure that
    the cost can be eliminated.)

However, I'm happy to make it a runtime option instead if other think
that would be helpful.
2020-04-05 10:54:42 -04:00
Daniel Sockwell d5f079a864
Error handling, pt1 (#115)
* Initial work to support structured errors

* WIP error handling and RedisConn refactor

* WIP for error handling refactor

* Finish substantive work for Redis error handling

* Apply clippy lints
2020-04-01 15:35:24 -04:00
Daniel Sockwell 0acbde3eee
Reorganize code, pt1 (#110)
* Prevent Reciever from querying postgres

Before this commit, the Receiver would query Postgres for the name
associated with a hashtag when it encountered one not in its cache.
This ensured that the Receiver never encountered a (valid) hashtag id
that it couldn't handle, but caused a extra DB query and made
independent sections of the code more entangled than they need to be.

Now, we pass the relevant tag name to the Receiver when it first
starts managing a new subscription and it adds the tag name to its
cache then.

* Improve module boundary/privacy

* Reorganize Receiver to cut RedisStream

* Fix tests for code reorganization

Note that this change includes testing some private functionality by
exposing it publicly in tests via conditional compilation.  This
doesn't expose that functionality for the benchmarks, so the benchmark
tests do not currently pass without adding a few `pub use`
statements.  This might be worth changing later, but benchmark tests
aren't part of our CI and it's not hard to change when we want to test
performance.

This change also cuts the benchmark tests that were benchmarking old
ways Flodgatt functioned.  Those were useful for comparison purposes,
but have served their purpose – we've firmly moved away from the
older/slower approach.

* Fix Receiver for tests
2020-03-27 12:00:48 -04:00
Daniel Sockwell a6b4d968cb
Add support for WHITELIST_MODE (#99)
When the `WHITELIST_MODE` environmental variable is set, Flodgatt
requires users to authenticate with a valid access token before
subscribing to any timelines (even those that are typically public).
2020-03-20 14:42:01 -04:00
Daniel Sockwell eda52c20b1
Add additional info logging (#98) 2020-03-19 20:54:23 -04:00
Daniel Sockwell 8843f18f5f
Fix valid language (#93)
* Fix panic on delete events

Previously, the code attempted to check the toot's language regardless
of event types.  That caused a panic for `delete` events, which lack a
language.

* WIP implementation of Message refactor

* Major refactor

* Refactor scope managment to use enum

* Use Timeline type instead of String

* Clean up Receiver's use of Timeline

* Make debug output more readable

* Block statuses from blocking users

This commit fixes an issue where a status from A would be displayed on
B's public timelines even when A had B blocked (i.e., it would treat B
as though they were muted rather than blocked for the purpose of
public timelines).

* Fix bug with incorrect parsing of incomming timeline

* Disable outdated tests

* Bump version
2020-03-18 20:37:10 -04:00
Daniel Sockwell 440d691b0f
Filter toots based on user and domain blocks (#89)
* Read user and domain blocks from Postgres

This commit reads the blocks from pg and stores them in the User
struct; it does not yet actually filter the responses.  It also does
not update the tests.

* Update tests

* Filter out toots involving blocked/muted users

* Add support for domain blocks

* Update test and bump version
2020-03-12 22:44:31 -04:00
Daniel Sockwell 405b5e88e5
Update logging (#85)
* Change "Incoming" log msgs from Warn to Info

* Stop logging err when unix socket closed

* Bump version to 0.4.7
2020-01-10 17:56:19 -05:00
Daniel Sockwell ac75cb54af
Postgres connection pool (#84)
* Upgrade rust-postgres library

* Initial postgres connection pool

* Update tests

* s/pg_conn/pg_pool to match reality
2020-01-10 15:45:16 -05:00
Daniel Sockwell 0462267125
Unix sockets (#81)
* Fix unix socket permission issue

* Add support for Unix sockets

* Update README and bump version
2020-01-09 17:54:57 -05:00
Daniel Sockwell 67c59401fd
Unix sockets WIP (#77)
* Initial WIP Unix socket implementation

* Bump version to v0.4.5

* Update type data
2020-01-08 09:51:25 -05:00
Daniel Sockwell b216a81e26
Add api/v1/streaming/health API endpoint (#74) 2020-01-07 17:27:46 -05:00
Daniel Sockwell 0de3d3c484
Postgres config (#70)
* Add logging for known env variables

* Update postgres config to match other configs

* Update README and bump version to 0.4.2
2020-01-05 21:58:18 -05:00
Daniel Sockwell c281418f25
Enforce type safety in config (#63)
* Add type-safe wrapper types to deployement_cfg

* Before deleting redundnat macros

* Store error messages as data

* Significant progress on type safety

* Add type safety to RedisConfig
2019-10-08 20:35:26 -04:00
Daniel Sockwell 9d96907406
Functional config (#59) 2019-10-03 18:02:23 -04:00
Daniel Sockwell e8145275b5
Config refactor (#57)
* Refactor configuration

* Fix bug with incorrect Host env variable

* Improve logging of REDIS_NAMESPACE

* Update test for Postgres configuration

* Conform Redis config to Postgres changes
2019-10-03 00:34:41 -04:00
Daniel Sockwell 11661d2fdc
Redis config (#56)
* Add most Redis config variables

* Add REDIS_NAMESPACE env var

* Fix Clippy lints
2019-10-02 00:03:18 -04:00
Daniel Sockwell 0dec8c4124
Solve SendErrors (#47)
This commit solves the SendErrors that were triggered by attempting
to use a WebSocket connection after it had been closed by the client
2019-09-11 00:13:45 -04:00
Daniel Sockwell 989c71059e
Remove debug statements (#43) 2019-09-09 14:23:48 -04:00
Daniel Sockwell ecfdda093c
Add tests for websocket routes (#38)
* Refactor organazation of SSE

This commit refactors how SSE requests are handled to bring them into
line with how WS requests are handled and increase consistency.

* Add websocket tests

* Bump version to 0.2.0

Bump version and update name from ragequit to flodgatt.

* Add test for non-existant endpoints

* Update documentation for recent changes``
2019-09-09 13:06:24 -04:00
Daniel Sockwell 90602d17ed
Replace integration tests with unit tests (#37)
* Upgrade postgres dependency to support ssl

* Clean up configuration code

* Add support for SSL with postgres [WIP]

* Add unit tests with mock Postgres
2019-09-04 21:48:29 -04:00
Daniel Sockwell 9ec245ccdb Add additional logging for postgres connection/server status 2019-07-09 22:20:11 -04:00
Daniel Sockwell 866f3ee34d Update documentation and restructure code 2019-07-08 15:21:02 -04:00
Daniel Sockwell d6ae45b292 Code reorganization 2019-07-08 07:31:42 -04:00
Daniel Sockwell 1732008840 Initial cleanup/refactor 2019-07-05 20:08:50 -04:00
Daniel Sockwell f3b86ddac8 Add CORS support
Cross-Origin requests were already implicitly allowed, but this
commit allows them explicitly and prohibits request methods other
than GET.
2019-07-04 14:00:35 -04:00
Daniel Sockwell 1765dc39ee Check oauth scopes and reject unauthorized requests 2019-07-04 13:27:11 -04:00
Daniel Sockwell f8a82caa2d Support passing access tokens via Sec-WebSocket-Protocol header
Previously, the access token needed to be passed via the query string;
with this commit, the token can be passed *either* through the query
string or the Sec-WebSocket-Protocol header.

This was done to correspond to the changes made to the streaming.js
version in [Improve streaming server security](https://github.com/tootsuite/mastodon/pull/10818).
However, I am not sure that it *does* increase security; as explained
at <https://support.ably.io/support/solutions/articles/3000075120-is-it-secure-to-send-the-access-token-as-part-of-the-websocket-url-query-params->,
there is generally no security advantage to passing sensitive information
via websocket headers instead of the query string—the entire connection
is encrypted and is not stored in the browser history, so the typical
reasons to keep sensitive info out of the query string don't apply.

I would welcome any corrections on this/reasons this change improves
security.
2019-07-04 10:57:15 -04:00
Daniel Sockwell 280cc60be9 Add hard-coded "sec-websocket-protocol" response header 2019-07-04 09:33:50 -04:00
Daniel Sockwell a6a7ebeae1 Add dotenv configuration 2019-05-10 06:22:26 -04:00
Daniel Sockwell 8ae9bbfac5 Revised WebSocket implementation 2019-05-10 01:47:29 -04:00
Daniel Sockwell 54ad55e0c0 Basic WebSocket support 2019-05-09 11:52:05 -04:00
Daniel Sockwell 6d037dd5af Working WS implemetation, but not cleaned up 2019-05-08 23:02:01 -04:00
Daniel Sockwell 4649f89442 Add unit tests, (some) integration tests, and documentation 2019-04-30 18:41:13 -04:00
Daniel Sockwell 62db7ae0ff Share a single Redis connection
This commit revises the code structure to share a single connection
to Redis (with multiple subscriptions on that connection) rather than
mutiple connections (each with one subscription).  It also simplifies the code based on that change.
2019-04-30 09:44:51 -04:00
Daniel Sockwell 9e921c1c97 Add ability for multiple clients to connect to the same pub/sub connection 2019-04-28 17:28:57 -04:00
Daniel Sockwell 425a9d0aae Allow seperate SSE responses to share Redis pubsub
This commit implements a shared stream of data from Redis, which
allows all SSE connections that send the same data to the client
to share a single connection to Redis.  (Previously, each client
got their own connection, which would significantly increase the
number of open Redis connections—especially since nearly all clients
will subscribe to `/public`.)
2019-04-26 20:00:11 -04:00
Daniel Sockwell f676e51ce4 Add limit on number of active streams
This commit tracks the number of active Pub/Sub streams and adds code to
keep the total number of streams below 400.  (When additional users
attempt to connect past that point, the server will wait for an slot
to open up).  This prevents "too many open file" panics and makes the
server better behaved in general.  However, we may need to revisit it
based on what capacity we want the server to have.

This commit also includes some general refactoring.
2019-04-23 14:07:49 -04:00
Daniel Sockwell 4832f59f2f Fixup
This code should have been included with the previous PR
2019-04-21 09:31:16 -04:00