FAQ / Text Moderation

How does text moderation work?

Our Text Moderation solution is entirely automated. There are no humans involved, meaning that no human moderators will be used to view and rate your content. This is important to achieve super fast turn-around times, high scalability and perfect privacy.

With Text Moderation, you simply submit any type of text (message, comment, description, review, etc.) to our API. The API instantly responds with the moderation details. Any objectionable content found will be flagged and described to help you block, modify or review it.

Approaches to text moderation

Sightengine offers two different approaches to Text Moderation:

  • Text Classification models based on deep learning that are great to interpret full sentences and understand linguistic subtleties, and therefore moderate text based on semantic and in-context meaning. The available classes returned by these models are the following: sexual, discriminatory, insulting, violent, toxic. One given text may obtain scores between 0 and 1 returned by the API indicating a match for multiple classes. See the Text Classification documentation to learn more.
  • Rule-based pattern matching algorithms that are great to flag specific words or phrases, even when these are heavily obfuscated. The existing moderation categories for Rule-based Detection are the following: profanity (sexual, insulting, discriminatory or other inappropriate words), personal details (email addresses or phone numbers for instance), links, misleading usernames, extremist references, weapon names, medical or recreational drugs. You can also create custom lists to force our API to detect any words or content you feel should be flagged. See the Rule-based Detection documentation to know more.

Was this page helpful?