flodgatt/src/main.rs

134 lines
5.3 KiB
Rust
Raw Normal View History

2020-04-12 22:36:45 +02:00
use flodgatt::config;
use flodgatt::err::FatalErr;
use flodgatt::messages::Event;
use flodgatt::request::{PgPool, Subscription, Timeline};
use flodgatt::response::redis;
use flodgatt::response::stream;
2020-04-13 00:42:32 +02:00
use futures::{future::lazy, stream::Stream as _Stream};
2020-04-12 22:36:45 +02:00
use std::fs;
use std::net::SocketAddr;
use std::os::unix::fs::PermissionsExt;
2020-04-13 00:42:32 +02:00
use std::time::Instant;
2020-04-12 22:36:45 +02:00
use tokio::net::UnixListener;
use tokio::sync::{mpsc, watch};
2020-04-13 00:42:32 +02:00
use tokio::timer::Interval;
2020-04-12 22:36:45 +02:00
use warp::http::StatusCode;
use warp::path;
use warp::ws::Ws2;
use warp::{Filter, Rejection};
2019-02-19 20:29:32 +01:00
fn main() -> Result<(), FatalErr> {
2020-04-12 22:36:45 +02:00
config::merge_dotenv()?;
pretty_env_logger::try_init()?;
let (postgres_cfg, redis_cfg, cfg) = config::from_env(dotenv::vars().collect());
2020-04-13 00:42:32 +02:00
// Create channels to communicate between threads
Stream events via a watch channel (#128) This squashed commit makes a fairly significant structural change to significantly reduce Flodgatt's CPU usage. Flodgatt connects to Redis in a single (green) thread, and then creates a new thread to handle each WebSocket/SSE connection. Previously, each thread was responsible for polling the Redis thread to determine whether it had a message relevant to the connected client. I initially selected this structure both because it was simple and because it minimized memory overhead – no messages are sent to a particular thread unless they are relevant to the client connected to the thread. However, I recently ran some load tests that show this approach to have unacceptable CPU costs when 300+ clients are simultaneously connected. Accordingly, Flodgatt now uses a different structure: the main Redis thread now announces each incoming message via a watch channel connected to every client thread, and each client thread filters out irrelevant messages. In theory, this could lead to slightly higher memory use, but tests I have run so far have not found a measurable increase. On the other hand, Flodgatt's CPU use is now an order of magnitude lower in tests I've run. This approach does run a (very slight) risk of dropping messages under extremely heavy load: because a watch channel only stores the most recent message transmitted, if Flodgatt adds a second message before the thread can read the first message, the first message will be overwritten and never transmitted. This seems unlikely to happen in practice, and we can avoid the issue entirely by changing to a broadcast channel when we upgrade to the most recent Tokio version (see #75).
2020-04-09 19:32:36 +02:00
let (event_tx, event_rx) = watch::channel((Timeline::empty(), Event::Ping));
let (cmd_tx, cmd_rx) = mpsc::unbounded_channel();
2020-04-12 22:36:45 +02:00
let shared_pg_conn = PgPool::new(postgres_cfg, *cfg.whitelist_mode);
Stream events via a watch channel (#128) This squashed commit makes a fairly significant structural change to significantly reduce Flodgatt's CPU usage. Flodgatt connects to Redis in a single (green) thread, and then creates a new thread to handle each WebSocket/SSE connection. Previously, each thread was responsible for polling the Redis thread to determine whether it had a message relevant to the connected client. I initially selected this structure both because it was simple and because it minimized memory overhead – no messages are sent to a particular thread unless they are relevant to the client connected to the thread. However, I recently ran some load tests that show this approach to have unacceptable CPU costs when 300+ clients are simultaneously connected. Accordingly, Flodgatt now uses a different structure: the main Redis thread now announces each incoming message via a watch channel connected to every client thread, and each client thread filters out irrelevant messages. In theory, this could lead to slightly higher memory use, but tests I have run so far have not found a measurable increase. On the other hand, Flodgatt's CPU use is now an order of magnitude lower in tests I've run. This approach does run a (very slight) risk of dropping messages under extremely heavy load: because a watch channel only stores the most recent message transmitted, if Flodgatt adds a second message before the thread can read the first message, the first message will be overwritten and never transmitted. This seems unlikely to happen in practice, and we can avoid the issue entirely by changing to a broadcast channel when we upgrade to the most recent Tokio version (see #75).
2020-04-09 19:32:36 +02:00
let poll_freq = *redis_cfg.polling_interval;
2020-04-13 00:42:32 +02:00
let shared_manager = redis::Manager::try_from(redis_cfg, event_tx, cmd_rx)?.into_arc();
2019-07-06 02:08:50 +02:00
// Server Sent Events
2020-04-13 00:42:32 +02:00
let sse_manager = shared_manager.clone();
Stream events via a watch channel (#128) This squashed commit makes a fairly significant structural change to significantly reduce Flodgatt's CPU usage. Flodgatt connects to Redis in a single (green) thread, and then creates a new thread to handle each WebSocket/SSE connection. Previously, each thread was responsible for polling the Redis thread to determine whether it had a message relevant to the connected client. I initially selected this structure both because it was simple and because it minimized memory overhead – no messages are sent to a particular thread unless they are relevant to the client connected to the thread. However, I recently ran some load tests that show this approach to have unacceptable CPU costs when 300+ clients are simultaneously connected. Accordingly, Flodgatt now uses a different structure: the main Redis thread now announces each incoming message via a watch channel connected to every client thread, and each client thread filters out irrelevant messages. In theory, this could lead to slightly higher memory use, but tests I have run so far have not found a measurable increase. On the other hand, Flodgatt's CPU use is now an order of magnitude lower in tests I've run. This approach does run a (very slight) risk of dropping messages under extremely heavy load: because a watch channel only stores the most recent message transmitted, if Flodgatt adds a second message before the thread can read the first message, the first message will be overwritten and never transmitted. This seems unlikely to happen in practice, and we can avoid the issue entirely by changing to a broadcast channel when we upgrade to the most recent Tokio version (see #75).
2020-04-09 19:32:36 +02:00
let (sse_rx, sse_cmd_tx) = (event_rx.clone(), cmd_tx.clone());
2020-04-13 00:42:32 +02:00
let sse = Subscription::from_sse_request(shared_pg_conn.clone())
.and(warp::sse())
.map(
2020-04-12 22:36:45 +02:00
move |subscription: Subscription, client_conn: warp::sse::Sse| {
2020-03-20 01:54:23 +01:00
log::info!("Incoming SSE request for {:?}", subscription.timeline);
Stream events via a watch channel (#128) This squashed commit makes a fairly significant structural change to significantly reduce Flodgatt's CPU usage. Flodgatt connects to Redis in a single (green) thread, and then creates a new thread to handle each WebSocket/SSE connection. Previously, each thread was responsible for polling the Redis thread to determine whether it had a message relevant to the connected client. I initially selected this structure both because it was simple and because it minimized memory overhead – no messages are sent to a particular thread unless they are relevant to the client connected to the thread. However, I recently ran some load tests that show this approach to have unacceptable CPU costs when 300+ clients are simultaneously connected. Accordingly, Flodgatt now uses a different structure: the main Redis thread now announces each incoming message via a watch channel connected to every client thread, and each client thread filters out irrelevant messages. In theory, this could lead to slightly higher memory use, but tests I have run so far have not found a measurable increase. On the other hand, Flodgatt's CPU use is now an order of magnitude lower in tests I've run. This approach does run a (very slight) risk of dropping messages under extremely heavy load: because a watch channel only stores the most recent message transmitted, if Flodgatt adds a second message before the thread can read the first message, the first message will be overwritten and never transmitted. This seems unlikely to happen in practice, and we can avoid the issue entirely by changing to a broadcast channel when we upgrade to the most recent Tokio version (see #75).
2020-04-09 19:32:36 +02:00
{
2020-04-12 22:36:45 +02:00
let mut manager = sse_manager.lock().unwrap_or_else(redis::Manager::recover);
2020-04-13 00:42:32 +02:00
manager.subscribe(&subscription);
Stream events via a watch channel (#128) This squashed commit makes a fairly significant structural change to significantly reduce Flodgatt's CPU usage. Flodgatt connects to Redis in a single (green) thread, and then creates a new thread to handle each WebSocket/SSE connection. Previously, each thread was responsible for polling the Redis thread to determine whether it had a message relevant to the connected client. I initially selected this structure both because it was simple and because it minimized memory overhead – no messages are sent to a particular thread unless they are relevant to the client connected to the thread. However, I recently ran some load tests that show this approach to have unacceptable CPU costs when 300+ clients are simultaneously connected. Accordingly, Flodgatt now uses a different structure: the main Redis thread now announces each incoming message via a watch channel connected to every client thread, and each client thread filters out irrelevant messages. In theory, this could lead to slightly higher memory use, but tests I have run so far have not found a measurable increase. On the other hand, Flodgatt's CPU use is now an order of magnitude lower in tests I've run. This approach does run a (very slight) risk of dropping messages under extremely heavy load: because a watch channel only stores the most recent message transmitted, if Flodgatt adds a second message before the thread can read the first message, the first message will be overwritten and never transmitted. This seems unlikely to happen in practice, and we can avoid the issue entirely by changing to a broadcast channel when we upgrade to the most recent Tokio version (see #75).
2020-04-09 19:32:36 +02:00
}
2020-04-12 22:36:45 +02:00
stream::Sse::send_events(
client_conn,
sse_cmd_tx.clone(),
subscription,
sse_rx.clone(),
)
},
)
.with(warp::reply::with::header("Connection", "keep-alive"));
2019-07-06 02:08:50 +02:00
// WebSocket
2020-04-13 00:42:32 +02:00
let ws_manager = shared_manager.clone();
let ws = Subscription::from_ws_request(shared_pg_conn)
.and(warp::ws::ws2())
.map(move |subscription: Subscription, ws: Ws2| {
log::info!("Incoming websocket request for {:?}", subscription.timeline);
Stream events via a watch channel (#128) This squashed commit makes a fairly significant structural change to significantly reduce Flodgatt's CPU usage. Flodgatt connects to Redis in a single (green) thread, and then creates a new thread to handle each WebSocket/SSE connection. Previously, each thread was responsible for polling the Redis thread to determine whether it had a message relevant to the connected client. I initially selected this structure both because it was simple and because it minimized memory overhead – no messages are sent to a particular thread unless they are relevant to the client connected to the thread. However, I recently ran some load tests that show this approach to have unacceptable CPU costs when 300+ clients are simultaneously connected. Accordingly, Flodgatt now uses a different structure: the main Redis thread now announces each incoming message via a watch channel connected to every client thread, and each client thread filters out irrelevant messages. In theory, this could lead to slightly higher memory use, but tests I have run so far have not found a measurable increase. On the other hand, Flodgatt's CPU use is now an order of magnitude lower in tests I've run. This approach does run a (very slight) risk of dropping messages under extremely heavy load: because a watch channel only stores the most recent message transmitted, if Flodgatt adds a second message before the thread can read the first message, the first message will be overwritten and never transmitted. This seems unlikely to happen in practice, and we can avoid the issue entirely by changing to a broadcast channel when we upgrade to the most recent Tokio version (see #75).
2020-04-09 19:32:36 +02:00
{
2020-04-12 22:36:45 +02:00
let mut manager = ws_manager.lock().unwrap_or_else(redis::Manager::recover);
2020-04-13 00:42:32 +02:00
manager.subscribe(&subscription);
Stream events via a watch channel (#128) This squashed commit makes a fairly significant structural change to significantly reduce Flodgatt's CPU usage. Flodgatt connects to Redis in a single (green) thread, and then creates a new thread to handle each WebSocket/SSE connection. Previously, each thread was responsible for polling the Redis thread to determine whether it had a message relevant to the connected client. I initially selected this structure both because it was simple and because it minimized memory overhead – no messages are sent to a particular thread unless they are relevant to the client connected to the thread. However, I recently ran some load tests that show this approach to have unacceptable CPU costs when 300+ clients are simultaneously connected. Accordingly, Flodgatt now uses a different structure: the main Redis thread now announces each incoming message via a watch channel connected to every client thread, and each client thread filters out irrelevant messages. In theory, this could lead to slightly higher memory use, but tests I have run so far have not found a measurable increase. On the other hand, Flodgatt's CPU use is now an order of magnitude lower in tests I've run. This approach does run a (very slight) risk of dropping messages under extremely heavy load: because a watch channel only stores the most recent message transmitted, if Flodgatt adds a second message before the thread can read the first message, the first message will be overwritten and never transmitted. This seems unlikely to happen in practice, and we can avoid the issue entirely by changing to a broadcast channel when we upgrade to the most recent Tokio version (see #75).
2020-04-09 19:32:36 +02:00
}
2020-04-13 00:42:32 +02:00
let token = subscription.access_token.clone().unwrap_or_default(); // token sent for security
let ws_stream = stream::Ws::new(cmd_tx.clone(), event_rx.clone(), subscription);
2020-04-13 00:42:32 +02:00
(ws.on_upgrade(move |ws| ws_stream.send_to(ws)), token)
})
.map(|(reply, token)| warp::reply::with_header(reply, "sec-websocket-protocol", token));
2019-10-04 00:02:23 +02:00
let cors = warp::cors()
.allow_any_origin()
.allow_methods(cfg.cors.allowed_methods)
.allow_headers(cfg.cors.allowed_headers);
2020-04-13 00:42:32 +02:00
// TODO -- extract to separate file
2020-04-05 16:54:42 +02:00
#[cfg(feature = "stub_status")]
2020-04-13 00:42:32 +02:00
let status = {
let (r1, r3) = (shared_manager.clone(), shared_manager.clone());
2020-04-05 16:54:42 +02:00
warp::path!("api" / "v1" / "streaming" / "health")
.map(|| "OK")
.or(warp::path!("api" / "v1" / "streaming" / "status")
.and(warp::path::end())
2020-04-12 22:36:45 +02:00
.map(move || r1.lock().unwrap_or_else(redis::Manager::recover).count()))
2020-04-05 16:54:42 +02:00
.or(
warp::path!("api" / "v1" / "streaming" / "status" / "per_timeline")
2020-04-12 22:36:45 +02:00
.map(move || r3.lock().unwrap_or_else(redis::Manager::recover).list()),
2020-04-05 16:54:42 +02:00
)
};
#[cfg(not(feature = "stub_status"))]
2020-04-13 00:42:32 +02:00
let status = warp::path!("api" / "v1" / "streaming" / "health").map(|| "OK");
let streaming_server = move || {
let manager = shared_manager.clone();
let stream = Interval::new(Instant::now(), poll_freq)
.map_err(|e| log::error!("{}", e))
.for_each(move |_| {
let mut manager = manager.lock().unwrap_or_else(redis::Manager::recover);
manager.poll_broadcast().unwrap_or_else(FatalErr::exit);
Ok(())
});
warp::spawn(lazy(move || stream));
warp::serve(ws.or(sse).with(cors).or(status).recover(recover))
};
if let Some(socket) = &*cfg.unix_socket {
2020-03-20 01:54:23 +01:00
log::info!("Using Unix socket {}", socket);
fs::remove_file(socket).unwrap_or_default();
2020-04-13 00:42:32 +02:00
let incoming = UnixListener::bind(socket).expect("TODO").incoming();
fs::set_permissions(socket, PermissionsExt::from_mode(0o666)).expect("TODO");
2020-04-13 00:42:32 +02:00
tokio::run(lazy(|| streaming_server().serve_incoming(incoming)));
2019-10-04 00:02:23 +02:00
} else {
let server_addr = SocketAddr::new(*cfg.address, *cfg.port);
2020-04-13 00:42:32 +02:00
tokio::run(lazy(move || streaming_server().bind(server_addr)));
}
Ok(())
}
Stream events via a watch channel (#128) This squashed commit makes a fairly significant structural change to significantly reduce Flodgatt's CPU usage. Flodgatt connects to Redis in a single (green) thread, and then creates a new thread to handle each WebSocket/SSE connection. Previously, each thread was responsible for polling the Redis thread to determine whether it had a message relevant to the connected client. I initially selected this structure both because it was simple and because it minimized memory overhead – no messages are sent to a particular thread unless they are relevant to the client connected to the thread. However, I recently ran some load tests that show this approach to have unacceptable CPU costs when 300+ clients are simultaneously connected. Accordingly, Flodgatt now uses a different structure: the main Redis thread now announces each incoming message via a watch channel connected to every client thread, and each client thread filters out irrelevant messages. In theory, this could lead to slightly higher memory use, but tests I have run so far have not found a measurable increase. On the other hand, Flodgatt's CPU use is now an order of magnitude lower in tests I've run. This approach does run a (very slight) risk of dropping messages under extremely heavy load: because a watch channel only stores the most recent message transmitted, if Flodgatt adds a second message before the thread can read the first message, the first message will be overwritten and never transmitted. This seems unlikely to happen in practice, and we can avoid the issue entirely by changing to a broadcast channel when we upgrade to the most recent Tokio version (see #75).
2020-04-09 19:32:36 +02:00
2020-04-13 00:42:32 +02:00
// TODO -- extract to separate file
fn recover(r: Rejection) -> Result<impl warp::Reply, warp::Rejection> {
let json_err = match r.cause() {
Some(text) if text.to_string() == "Missing request header 'authorization'" => {
warp::reply::json(&"Error: Missing access token".to_string())
}
Some(text) => warp::reply::json(&text.to_string()),
None => warp::reply::json(&"Error: Nonexistant endpoint".to_string()),
};
2020-04-13 00:42:32 +02:00
Ok(warp::reply::with_status(json_err, StatusCode::UNAUTHORIZED))
2019-02-11 09:45:14 +01:00
}