Decoding Complex Harms: Signal Sets In Content Moderation

Created time
Jun 21, 2024 04:57 PM
Posted?
Posted?
notion image
In the rapidly evolving landscape of user interactions, online platforms and services are faced with the daunting task of sifting through tens of millions of user-generated posts to identify discrete risk signals indicating malicious or harmful behaviors. This task is further complicated by the need to maintain a balance between implementing comprehensive oversight mechanisms while still fostering an environment of privacy and freedom of expression.
With over 20 years of experience providing fully managed online moderation solutions to a diverse range of social platforms and online services, Resolver’s dedicated team of human intelligence analysts are experts in proactively identifying potential harmful user behaviors. One powerful tool in their arsenal is the use of sophisticated signal sets that represent finely-tuned analytical frameworks designed to detect specific complex harmful behaviors.
Signal sets combine multiple data points and behavioral indicators to aid trust and safety professionals in detecting intricate patterns and correlations within user data that might otherwise go unnoticed by moderation systems employing a solely automated or content-based approach. In particular, through analyzing several factors including user activity patterns, language usage, network connections among others, analysts can pinpoint user communities engaging in violative conduct on the platform and gain a deeper understanding of the nature and extent of this harmful behavior.

Mapping signal sets to complex harmful behaviors

While risk signals derived from content moderation systems continue to play a crucial role in informing a behavioral approach to moderation, this data only makes up a subset of the risk signals necessary to identify bad-actor behavior. In fact, individual indicators of risk rarely equate to a policy violation. Instead, a more comprehensive evaluation of user interactions within platform communities allows platforms to determine when a defined threshold for harmful behavior has been reached, before employing enforcement against a user, multi-user actor, or harmful network, with confidence.
notion image
Beyond content: using signal sets to identify complex harmful behaviors at scale 1
The human-in-the-loop methodology employed by Resolver begins by leveraging a combination of advanced technology, hash lists, public data sets and finely-tuned large language models (LLM) to detect and classify risks at scale. This dataset is further enriched by network and behavioral intelligence drawn from a diverse team of trust and safety professionals with deep expertise in the gray areas of risks and gaps in policies. Our team detects various risks and harmful behaviors and labels them as ‘signals’ that correspond to specific harmful behaviors. Together, this process allows us to create a unique risk profile corresponding with a particular user account.
Each risk signal is then assigned a score based on our proprietary behavioral intelligence rating system. Using content moderation, behavioral intelligence and network intelligence capabilities, the collective signals from a single user account are developed into signal sets, further training our technical detection tools and human review teams. Each platform will have their own distinct features, community guidelines and terminology which must be taken into account when formulating a enforcement strategy, as a consequence the adaptability of Resolver signal sets allow for them to be altered and amended depending on the specific use case or risk-type targeted by the platform for enforcement.
notion image
Beyond content: using signal sets to identify complex harmful behaviors at scale 2
As a result, the signal sets must be fine-tuned to match the exact community safety requirements and may differ based on the type of virtual environment, the age of the user base, the platform functionality being misused and a range of other factors. To meet this need, the signal sets employed by our human intelligence teams are tailored to a diverse range of specific risk areas including disinformation, child grooming and violent extremism, and are continuously refined and updated to stay compliant with online safety regulations and identify the latest threats emerging across the platform landscape.

Using signal sets to identify child grooming

Child grooming involves an adult user communicating with another user believed to be a child with the intent to commit a sexual offense or abduction. According to latest data analyzed by the Internet Watch Foundation, the past year has seen a “shocking” increase in the number of pages on the open internet showing children under 10 being groomed, manipulated or coerced into performing explicit acts by a child predator.
Protecting minors using online platforms and services from exploitation and abuse is also one of the most critical responsibilities for trust and safety teams. The data collected through the CyberTipline operated by the National Center for Missing & Exploited Children found that internet companies were the most frequent reporter of online enticement accounting for 71% of the total submissions received by the organization over the examined period.
Source: (National Center for Missing & Exploited Children)
notion image
For trust and safety professionals working to safeguard minors on online platforms, the complexity of this task is compounded by the continuous evolution of the tactics employed by child predators and the speed at which such harmful behavior can take place. A Resolver investigation found that online predators are able to lock minors into high-risk grooming conversations in as little as 19 seconds after the first message, while 45 minutes represents the average time for grooming in an online environment.
The most common approach to enforcing community guidelines based on behavioral moderation is possible by focusing on user behavior related to flagged violative content. For example: analysing comments made by an account within the context of an explicit video as demonstrative of grooming.
More complex approaches allow moderators to detect risk signals that can be mapped to multiple violative behaviors. For example, if an initial risk signal indicates self-harm, applying a process of dynamic signal identification may reveal that the broader risk involves grooming by an associated user. In this manner, building a broader risk profile for a harmful user can prompt a shift in the enforcement approach and require broader screening for victim impact.
Employing a signal set tailored to detect for grooming can allow trust and safety teams to automatically surface high-risk incidents for further investigation and intervention. For example, if grooming occurs on a platform that supports text and image based chat, analysts working on such a platform could use a bespoke signal set that breaks this harmful behavior down to its component parts:
Only by understanding that a user has engaged in one or more of these activities, can a determination be made about the harmful behavior taking place. Some grooming conversations may occur very quickly – with high risk markers appearing almost instantly. Others will take place over hours, days or weeks – and sometimes across thousands of messages. Similarly, many bad actors will conduct conversations with multiple potential victims at once.

Interpreting account interactions as signals for harmful behavior

An online safety framework that incorporates a behavioral moderation approach can also go beyond “what” is being said, posted, or commentated on by a particular user to also examine “where” something has been said, and “whom” the user is interacting with, when making determinations about applying policy enforcement against a problematic account. Behavior signal sets can also include ‘AND NOT’ or ‘OR’ elements to become more complex.
As an example, let’s take a user who is selling drugs on a social media platform. Their profile picture may depict cocaine, their username may contain drug references that do not breach community guidelines for content, and they may post videos of their lifestyle which is funded by drug dealing.
notion image
Beyond content: using signal sets to identify complex harmful behaviors at scale 5
Many platforms state that the sale or promotion of drugs is banned but they often can’t take action because content policies typically require explicit signals, while behavior policies can enable actions to be taken in more implicit cases. Applying a more holistic appraisal of the account could allow platforms to potentially reveal off-siting of users to another platform mapped to other risk signals obtained from user behavior such as uploading videos about drugs and posting location and contact details suggesting the drugs were for sale.

Conclusion

Detecting complex harmful behaviors such as child grooming and drug trafficking is essential to closing gaps in platform moderation. By taking a range of labels applied to content, trust and safety professionals can develop tailored ‘signal sets’ of indicators for particularly complex harmful behaviors aiding them in detecting malicious activity on the platform at large.
Resolver offers fully-managed and proactive solutions for trust and safety teams that require high-value outcomes to secure and protect their platform against complex harmful behaviors while ensuring compliance with online safety regulations such as the DSA and OSA.