FAQ / Text Moderation

How do text classification models work?

Machine learning models can detect problematic content in situations that would otherwise have been missed or incorrectly flagged by Rule-based models because they are able to take context into account.

When submitting a text item to the API, you instantly receive a score for each available class. Scores are between 0 and 1, and they reflect how likely it is that someone would find the text problematic. Higher scores are therefore usually associated with more problematic content. Note that the API may return multiple high scores for one text if the text is matching multiple classes.

Class availability depends on the language of the submitted text. The available classes for Text Classification are the following:

Class	Description
sexual	detects references to sexual acts, sexual organs or any other content typically associated with sexual activity
discriminatory	detects hate speech directed at individuals or groups because of specific characteristics of their identity (origin, religion, sexual orientation, gender, etc.)
insulting	detects insults undermining the dignity or honor of an individual, signs of disrespect towards someone
violent	detects threatening content, i.e. with an intention to harm / hurt, or expressing violence and brutality
toxic	detects whether a text is unacceptable, harmful, offensive, disrespectful or unpleasant

See the Text Classification documentation to learn more.

Was this page helpful?

Products

MODERATION

REDACTION

REFERENCE

How do text classification models work?

Other frequent questions