flodgatt/src/messages/event/dynamic_event.rs

use crate::parse_client_request::Blocks;
use hashbrown::HashSet;
use serde::{Deserialize, Serialize};
use serde_json::Value;

#[derive(Deserialize, Serialize, Debug, Clone, PartialEq)]
pub struct DynamicEvent {
    pub event: String,
    pub payload: Value,
    queued_at: Option<i64>,
}

impl DynamicEvent {
    /// Returns `true` if the status is filtered out based on its language
    pub fn language_not(&self, allowed_langs: &HashSet<String>) -> bool {
        const ALLOW: bool = false;
        const REJECT: bool = true;

        if allowed_langs.is_empty() {
            return ALLOW; // listing no allowed_langs results in allowing all languages
        }

        match self.payload["language"].as_str() {
            Some(toot_language) if allowed_langs.contains(toot_language) => ALLOW,
            None => ALLOW, // If toot language is unknown, toot is always allowed
            Some(empty) if empty == String::new() => ALLOW,
            Some(_toot_language) => REJECT,
        }
    }
    /// Returns `true` if the toot contained in this Event originated from a blocked domain,
    /// is from an account that has blocked the current user, or if the User's list of
    /// blocked/muted users includes a user involved in the toot.
    ///
    /// A user is involved in the toot if they:
    ///  * Are mentioned in this toot
    ///  * Wrote this toot
    ///  * Wrote a toot that this toot is replying to (if any)
    ///  * Wrote the toot that this toot is boosting (if any)
    pub fn involves_any(&self, blocks: &Blocks) -> bool {
        const ALLOW: bool = false;
        const REJECT: bool = true;
        let Blocks {
            blocked_users,
            blocking_users,
            blocked_domains,
        } = blocks;

        let id = self.payload["account"]["id"].as_str().expect("TODO");
        let username = self.payload["account"]["acct"].as_str().expect("TODO");

        if self.involves(blocked_users) || blocking_users.contains(&id.parse().expect("TODO")) {
            REJECT
        } else {
            let full_username = &username;
            match full_username.split('@').nth(1) {
                Some(originating_domain) if blocked_domains.contains(originating_domain) => REJECT,
                Some(_) | None => ALLOW, // None means the local instance, which can't be blocked
            }
        }
    }

    // involved_users = mentioned_users + author + replied-to user + boosted user
    fn involves(&self, blocked_users: &HashSet<i64>) -> bool {
        // mentions
        let mentions = self.payload["mentions"].as_array().expect("TODO");
        let mut involved_users: HashSet<i64> = mentions
            .iter()
            .map(|mention| mention["id"].as_str().expect("TODO").parse().expect("TODO"))
            .collect();

        // author
        let author_id = self.payload["account"]["id"].as_str().expect("TODO");
        involved_users.insert(author_id.parse::<i64>().expect("TODO"));
        // replied-to user
        let replied_to_user = self.payload["in_reply_to_account_id"].as_str();
        if let Some(user_id) = replied_to_user {
            involved_users.insert(user_id.parse().expect("TODO"));
        }
        // boosted user
        let id_of_boosted_user = self.payload["reblog"]["account"]["id"]
            .as_str()
            .expect("TODO");
        involved_users.insert(id_of_boosted_user.parse().expect("TODO"));

        !involved_users.is_disjoint(blocked_users)
    }
}
Handle non conforment events (#117) * Initial implementation of DynamicEvent * Restore early Event parsing 2020-04-03 18:41:53 +02:00			`use crate::parse_client_request::Blocks;`
Minor performance tune (#127) * Tweak release profile & micro optimizations * Replace std HashMap with hashbrown::HashMap The hashbrown::HashMap is faster than the std::collections::HashMap, though it does not protect as well against malicious hash collisions (e.g., in a DDoS). Since we don't expose the hashing externally, we should switch to the faster implementation. 2020-04-09 00:39:52 +02:00			`use hashbrown::HashSet;`
Handle non conforment events (#117) * Initial implementation of DynamicEvent * Restore early Event parsing 2020-04-03 18:41:53 +02:00			`use serde::{Deserialize, Serialize};`
			`use serde_json::Value;`

			`#[derive(Deserialize, Serialize, Debug, Clone, PartialEq)]`
			`pub struct DynamicEvent {`
			`pub event: String,`
			`pub payload: Value,`
			`queued_at: Option<i64>,`
			`}`

			`impl DynamicEvent {`
			/// Returns `true` if the status is filtered out based on its language
			`pub fn language_not(&self, allowed_langs: &HashSet<String>) -> bool {`
			`const ALLOW: bool = false;`
			`const REJECT: bool = true;`

			`if allowed_langs.is_empty() {`
			`return ALLOW; // listing no allowed_langs results in allowing all languages`
			`}`

			`match self.payload["language"].as_str() {`
			`Some(toot_language) if allowed_langs.contains(toot_language) => ALLOW,`
			`None => ALLOW, // If toot language is unknown, toot is always allowed`
Stream events via a watch channel (#128) This squashed commit makes a fairly significant structural change to significantly reduce Flodgatt's CPU usage. Flodgatt connects to Redis in a single (green) thread, and then creates a new thread to handle each WebSocket/SSE connection. Previously, each thread was responsible for polling the Redis thread to determine whether it had a message relevant to the connected client. I initially selected this structure both because it was simple and because it minimized memory overhead – no messages are sent to a particular thread unless they are relevant to the client connected to the thread. However, I recently ran some load tests that show this approach to have unacceptable CPU costs when 300+ clients are simultaneously connected. Accordingly, Flodgatt now uses a different structure: the main Redis thread now announces each incoming message via a watch channel connected to every client thread, and each client thread filters out irrelevant messages. In theory, this could lead to slightly higher memory use, but tests I have run so far have not found a measurable increase. On the other hand, Flodgatt's CPU use is now an order of magnitude lower in tests I've run. This approach does run a (very slight) risk of dropping messages under extremely heavy load: because a watch channel only stores the most recent message transmitted, if Flodgatt adds a second message before the thread can read the first message, the first message will be overwritten and never transmitted. This seems unlikely to happen in practice, and we can avoid the issue entirely by changing to a broadcast channel when we upgrade to the most recent Tokio version (see #75). 2020-04-09 19:32:36 +02:00			`Some(empty) if empty == String::new() => ALLOW,`
Handle non conforment events (#117) * Initial implementation of DynamicEvent * Restore early Event parsing 2020-04-03 18:41:53 +02:00			`Some(_toot_language) => REJECT,`
			`}`
			`}`
			/// Returns `true` if the toot contained in this Event originated from a blocked domain,
			`/// is from an account that has blocked the current user, or if the User's list of`
			`/// blocked/muted users includes a user involved in the toot.`
			`///`
			`/// A user is involved in the toot if they:`
			`/// * Are mentioned in this toot`
			`/// * Wrote this toot`
			`/// * Wrote a toot that this toot is replying to (if any)`
			`/// * Wrote the toot that this toot is boosting (if any)`
			`pub fn involves_any(&self, blocks: &Blocks) -> bool {`
			`const ALLOW: bool = false;`
			`const REJECT: bool = true;`
			`let Blocks {`
			`blocked_users,`
			`blocking_users,`
			`blocked_domains,`
			`} = blocks;`

Stream events via a watch channel (#128) This squashed commit makes a fairly significant structural change to significantly reduce Flodgatt's CPU usage. Flodgatt connects to Redis in a single (green) thread, and then creates a new thread to handle each WebSocket/SSE connection. Previously, each thread was responsible for polling the Redis thread to determine whether it had a message relevant to the connected client. I initially selected this structure both because it was simple and because it minimized memory overhead – no messages are sent to a particular thread unless they are relevant to the client connected to the thread. However, I recently ran some load tests that show this approach to have unacceptable CPU costs when 300+ clients are simultaneously connected. Accordingly, Flodgatt now uses a different structure: the main Redis thread now announces each incoming message via a watch channel connected to every client thread, and each client thread filters out irrelevant messages. In theory, this could lead to slightly higher memory use, but tests I have run so far have not found a measurable increase. On the other hand, Flodgatt's CPU use is now an order of magnitude lower in tests I've run. This approach does run a (very slight) risk of dropping messages under extremely heavy load: because a watch channel only stores the most recent message transmitted, if Flodgatt adds a second message before the thread can read the first message, the first message will be overwritten and never transmitted. This seems unlikely to happen in practice, and we can avoid the issue entirely by changing to a broadcast channel when we upgrade to the most recent Tokio version (see #75). 2020-04-09 19:32:36 +02:00			`let id = self.payload["account"]["id"].as_str().expect("TODO");`
Handle non conforment events (#117) * Initial implementation of DynamicEvent * Restore early Event parsing 2020-04-03 18:41:53 +02:00			`let username = self.payload["account"]["acct"].as_str().expect("TODO");`

Stream events via a watch channel (#128) This squashed commit makes a fairly significant structural change to significantly reduce Flodgatt's CPU usage. Flodgatt connects to Redis in a single (green) thread, and then creates a new thread to handle each WebSocket/SSE connection. Previously, each thread was responsible for polling the Redis thread to determine whether it had a message relevant to the connected client. I initially selected this structure both because it was simple and because it minimized memory overhead – no messages are sent to a particular thread unless they are relevant to the client connected to the thread. However, I recently ran some load tests that show this approach to have unacceptable CPU costs when 300+ clients are simultaneously connected. Accordingly, Flodgatt now uses a different structure: the main Redis thread now announces each incoming message via a watch channel connected to every client thread, and each client thread filters out irrelevant messages. In theory, this could lead to slightly higher memory use, but tests I have run so far have not found a measurable increase. On the other hand, Flodgatt's CPU use is now an order of magnitude lower in tests I've run. This approach does run a (very slight) risk of dropping messages under extremely heavy load: because a watch channel only stores the most recent message transmitted, if Flodgatt adds a second message before the thread can read the first message, the first message will be overwritten and never transmitted. This seems unlikely to happen in practice, and we can avoid the issue entirely by changing to a broadcast channel when we upgrade to the most recent Tokio version (see #75). 2020-04-09 19:32:36 +02:00			`if self.involves(blocked_users) \|\| blocking_users.contains(&id.parse().expect("TODO")) {`
Handle non conforment events (#117) * Initial implementation of DynamicEvent * Restore early Event parsing 2020-04-03 18:41:53 +02:00			`REJECT`
			`} else {`
			`let full_username = &username;`
			`match full_username.split('@').nth(1) {`
			`Some(originating_domain) if blocked_domains.contains(originating_domain) => REJECT,`
			`Some(_) \| None => ALLOW, // None means the local instance, which can't be blocked`
			`}`
			`}`
			`}`
Stream events via a watch channel (#128) This squashed commit makes a fairly significant structural change to significantly reduce Flodgatt's CPU usage. Flodgatt connects to Redis in a single (green) thread, and then creates a new thread to handle each WebSocket/SSE connection. Previously, each thread was responsible for polling the Redis thread to determine whether it had a message relevant to the connected client. I initially selected this structure both because it was simple and because it minimized memory overhead – no messages are sent to a particular thread unless they are relevant to the client connected to the thread. However, I recently ran some load tests that show this approach to have unacceptable CPU costs when 300+ clients are simultaneously connected. Accordingly, Flodgatt now uses a different structure: the main Redis thread now announces each incoming message via a watch channel connected to every client thread, and each client thread filters out irrelevant messages. In theory, this could lead to slightly higher memory use, but tests I have run so far have not found a measurable increase. On the other hand, Flodgatt's CPU use is now an order of magnitude lower in tests I've run. This approach does run a (very slight) risk of dropping messages under extremely heavy load: because a watch channel only stores the most recent message transmitted, if Flodgatt adds a second message before the thread can read the first message, the first message will be overwritten and never transmitted. This seems unlikely to happen in practice, and we can avoid the issue entirely by changing to a broadcast channel when we upgrade to the most recent Tokio version (see #75). 2020-04-09 19:32:36 +02:00
			`// involved_users = mentioned_users + author + replied-to user + boosted user`
			`fn involves(&self, blocked_users: &HashSet<i64>) -> bool {`
			`// mentions`
Handle non conforment events (#117) * Initial implementation of DynamicEvent * Restore early Event parsing 2020-04-03 18:41:53 +02:00			`let mentions = self.payload["mentions"].as_array().expect("TODO");`
			`let mut involved_users: HashSet<i64> = mentions`
			`.iter()`
			`.map(\|mention\| mention["id"].as_str().expect("TODO").parse().expect("TODO"))`
			`.collect();`

			`// author`
			`let author_id = self.payload["account"]["id"].as_str().expect("TODO");`
			`involved_users.insert(author_id.parse::<i64>().expect("TODO"));`
			`// replied-to user`
			`let replied_to_user = self.payload["in_reply_to_account_id"].as_str();`
Stream events via a watch channel (#128) This squashed commit makes a fairly significant structural change to significantly reduce Flodgatt's CPU usage. Flodgatt connects to Redis in a single (green) thread, and then creates a new thread to handle each WebSocket/SSE connection. Previously, each thread was responsible for polling the Redis thread to determine whether it had a message relevant to the connected client. I initially selected this structure both because it was simple and because it minimized memory overhead – no messages are sent to a particular thread unless they are relevant to the client connected to the thread. However, I recently ran some load tests that show this approach to have unacceptable CPU costs when 300+ clients are simultaneously connected. Accordingly, Flodgatt now uses a different structure: the main Redis thread now announces each incoming message via a watch channel connected to every client thread, and each client thread filters out irrelevant messages. In theory, this could lead to slightly higher memory use, but tests I have run so far have not found a measurable increase. On the other hand, Flodgatt's CPU use is now an order of magnitude lower in tests I've run. This approach does run a (very slight) risk of dropping messages under extremely heavy load: because a watch channel only stores the most recent message transmitted, if Flodgatt adds a second message before the thread can read the first message, the first message will be overwritten and never transmitted. This seems unlikely to happen in practice, and we can avoid the issue entirely by changing to a broadcast channel when we upgrade to the most recent Tokio version (see #75). 2020-04-09 19:32:36 +02:00			`if let Some(user_id) = replied_to_user {`
Handle non conforment events (#117) * Initial implementation of DynamicEvent * Restore early Event parsing 2020-04-03 18:41:53 +02:00			`involved_users.insert(user_id.parse().expect("TODO"));`
			`}`
			`// boosted user`
			`let id_of_boosted_user = self.payload["reblog"]["account"]["id"]`
			`.as_str()`
			`.expect("TODO");`
			`involved_users.insert(id_of_boosted_user.parse().expect("TODO"));`

Stream events via a watch channel (#128) This squashed commit makes a fairly significant structural change to significantly reduce Flodgatt's CPU usage. Flodgatt connects to Redis in a single (green) thread, and then creates a new thread to handle each WebSocket/SSE connection. Previously, each thread was responsible for polling the Redis thread to determine whether it had a message relevant to the connected client. I initially selected this structure both because it was simple and because it minimized memory overhead – no messages are sent to a particular thread unless they are relevant to the client connected to the thread. However, I recently ran some load tests that show this approach to have unacceptable CPU costs when 300+ clients are simultaneously connected. Accordingly, Flodgatt now uses a different structure: the main Redis thread now announces each incoming message via a watch channel connected to every client thread, and each client thread filters out irrelevant messages. In theory, this could lead to slightly higher memory use, but tests I have run so far have not found a measurable increase. On the other hand, Flodgatt's CPU use is now an order of magnitude lower in tests I've run. This approach does run a (very slight) risk of dropping messages under extremely heavy load: because a watch channel only stores the most recent message transmitted, if Flodgatt adds a second message before the thread can read the first message, the first message will be overwritten and never transmitted. This seems unlikely to happen in practice, and we can avoid the issue entirely by changing to a broadcast channel when we upgrade to the most recent Tokio version (see #75). 2020-04-09 19:32:36 +02:00			`!involved_users.is_disjoint(blocked_users)`
Handle non conforment events (#117) * Initial implementation of DynamicEvent * Restore early Event parsing 2020-04-03 18:41:53 +02:00			`}`
			`}`