Literature Review: Cyberbullying Labeling Schemes and Annotation Guidelines

Cyberbullying, defined as willful and repeated harm inflicted through digital means, continues to rise in prevalence and severity, particularly among youth populations, who are exposed to social media at starting at young ages. The mental health consequences of cyberbullying—including anxiety, depression, and suicidal ideation—are well-documented (NIH, 2023). As a result, researchers have increasingly turned to automated systems powered by machine learning to detect, classify, and mitigate cyberbullying. A central challenge in developing such systems is constructing high-quality annotated datasets with appropriate labeling schemes. This review synthesizes recent scholarship on labeling strategies (binary, multi-class, and role-based), annotation practices, and inter-annotator agreement (IAA), drawing on five key academic sources and the NIH.

Labeling Schemes: Binary, Multi-Class, Participant Roles, and Contextual Dimensions

Binary Classification

Binary classification remains the most common and foundational labeling scheme in cyberbullying detection (Balakrishnan & Kaity, 2023). In this approach, comments are labeled as either cyberbullying or not. Simplicity is its strength, enabling consistent labeling with high inter-rater reliability. However, it falls short in representing the diverse forms and intensities of abuse, often failing to distinguish between subtle and overt attacks.

Multi-Class and Severity Classification

Expanding upon binary labeling, multi-class schemes offer granularity by incorporating levels of severity (Yi, Zubiaga, & Long, 2024). These often distinguish between types of harm, such as harassment, defamation, and denigration. For example, the HDCyberbullying dataset classifies content into harassment and defamation, and uses emotion-adaptive training to improve recognition of indirect forms of abuse. While this allows for richer analysis, it introduces ambiguity in classification and often results in lower IAA.

Participant Role Classification

Role-based labeling schemes identify users’ social roles in cyberbullying episodes—typically bully, victim, or bystander (Ratnayaka et al., 2020). This scheme enables detection of social dynamics and facilitates more nuanced interventions. For instance, in the ASKfm dataset, annotators identified participant roles within multi-turn conversations, yielding a multi-class F1 score of 0.76 for role detection. Although role classification improves contextual understanding, it requires annotators to infer intent and social relationships, increasing annotation complexity.

Context Dependence and Sarcasm Indicators

To address the complexities of implicit abuse and interpretive ambiguity, recent schema have added contextual and stylistic dimensions. Context dependence flags whether the message can be understood in isolation or requires surrounding discourse (Hamlett et al., 2022). Sarcasm, often used to veil harmful intent, is likewise annotated to distinguish ironic expressions from genuine hostility (Kim et al., 2021). Including these layers enhances model precision and aligns with human-centered annotation practices.

Annotation Guidelines and Practices

Dataset Design and Guidelines

Hamlett et al. (2022) designed a multi-dimensional annotation scheme for an Instagram dataset. Their labels included content type, purpose, directionality, and co-occurrence with other phenomena. Annotators were trained using clear definitions and examples to ensure consistency. Similarly, Gomez et al. (2021) emphasized the importance of curating datasets using consensus filtering and hybrid human-AI workflows to ensure quality and diversity.

Addressing Subjectivity and Ambiguity

Cyberbullying often includes sarcasm, coded language, and ambiguous intent. Kim et al. (2021) highlighted that human-centered approaches are crucial for developing annotation schemes that reflect lived experiences and diverse interpretations. Their review revealed that many systems lacked grounding in real human contexts, leading to reduced model relevance and possible harms.

Annotator Diversity and Bias

Annotation quality is also influenced by the demographic background and bias of annotators. As observed by Hamlett et al. (2022), collecting annotator demographic information can provide insights into how perceptions of cyberbullying vary. Including diverse annotators helps mitigate systemic bias and aligns with the human-centered design principles proposed by Kim et al. (2021).

Inter-Annotator Agreement (IAA)

Inter-annotator agreement is a key measure of annotation reliability. Most studies use Cohen’s kappa or Krippendorff’s alpha to assess agreement. Binary classification tends to yield higher IAA due to its simplicity, while multi-class and role-based annotations face challenges due to subjectivity (Gomez et al., 2021).

To address ambiguity, Gomez et al. propose consensus filtering, retaining only instances with strong agreement between human and algorithmic labels. This not only improves dataset quality but also enhances model performance when trained on the filtered data.

Recommendations for a Comprehensive Labeling Schema

Drawing from the reviewed literature and existing annotation manuals, we recommend a hierarchical labeling schema composed of the following dimensions:

Dimension 1: Binary Classification

Cyberbullying: The text contains aggressive, threatening, demeaning, or harmful content directed at a person or group.
Not Cyberbullying: The content is neutral, supportive, humorous (non-hurtful), or otherwise non-threatening.

Dimension 2: Severity Level

Low: Mild teasing, sarcasm, or isolated negative comments without threats or prolonged attacks.
Medium: Repeated negative behavior, name-calling, exclusion, or indirect threats.
High: Explicit threats, hate speech, targeted harassment, doxxing, or strong language intended to severely harm or intimidate.

Dimension 3: Participant Role

Bully: The speaker is initiating or perpetuating harmful behavior.
Victim: The speaker is being targeted or expressing harm.
Bystander: The speaker observes or comments on the bullying.
Unrelated: The speaker is not involved in the bullying event.

Dimension 4: Content Type

Direct Attack: Name-calling, insults, or slurs directed at an individual.
Harassment: Repeated or ongoing targeting of someone with harmful content.
Exclusion / Social Rejection: Content encouraging the isolation of an individual.
Threats / Intimidation: Implicit or explicit threats of harm.
Racism / Xenophobia: Content targeting a person or group based on race, nationality, or ethnicity.
Sexism / Misogyny: Gender-based insults, objectification, or discriminatory comments.
Homophobia / Transphobia: Hostility or slurs directed at LGBTQ+ identities.
Pedophilia / Grooming: Inappropriate sexual content involving or targeting minors.
Sexting / Sexual Harassment: Unwanted sexual messages, images, or pressure.
Body Shaming: Negative comments about someone’s appearance or weight.
Other: Use this tag when none of the above categories fit, but the content is still harmful.

Dimension 5: Emotional Markers

Sentiment: Positive / Neutral / Negative
Emotion: (Choose all that apply): Anger, Fear, Sadness, Disgust, Shame, Guilt, Empathy, Neutral, Unclear, Other, None

Dimension 6: Context Dependence

Dependent: The message requires surrounding conversation to interpret its intent or severity.
Independent: The message can be accurately interpreted in isolation.

Dimension 7: Sarcasm Indicator

Sarcastic: The text employs sarcasm, irony, or parody in a way that obscures or enhances harmful meaning.
Genuine: The text does not include sarcasm.

This initial comprehensive schema supports detailed, interpretable annotations that reflect both social dynamics and linguistic complexity, and aligns with best practices in cyberbullying research and AI-human collaboration.

Conclusion

Labeling schemes in cyberbullying detection have evolved from simple binary tags to rich multi-dimensional frameworks that better reflect the complexity of online abuse. While challenges remain—especially in achieving reliable annotations—the use of human-centered design, consensus filtering, and multi-role annotation are promising directions. Future work should continue refining schema usability, addressing annotation bias, and validating frameworks across diverse platforms and demographics.

References

Balakrishnan, V., & Kaity, M. (2023). Cyberbullying detection and machine learning: A systematic literature review. Artificial Intelligence Review, 56, S1375–S1416. https://doi.org/10.1007/s10462-023-10553
Gomez, C. E., Sztainberg, M. O., & Trana, R. E. (2021). Curating cyberbullying datasets: A human–AI collaborative approach. International Journal of Bullying Prevention, 4, 35–46. https://doi.org/10.1007/s42380-021-00114-6

Hamlett, M., Powell, G., Silva, Y. N., & Hall, D. (2022). A labeled dataset for investigating cyberbullying content patterns in Instagram. ICWSM 2022. https://doi.org/10.1609/icwsm.v16i1.19376

Kim, S., Razi, A., Stringhini, G., Wisniewski, P. J., & De Choudhury, M. (2021). A human-centered systematic literature review of cyberbullying detection algorithms. Proceedings of the ACM on Human-Computer Interaction, 5(CSCW2), 1–34. https://doi.org/10.1145/3476066

Ratnayaka, G., Atapattu, T., Herath, M., Zhang, G., & Falkner, K. (2020). Enhancing the identification of cyberbullying through participant roles. arXiv preprint arXiv:2010.06640. https://doi.org/10.18653/v1/2020.alw-1.11

Yi, P., Zubiaga, A., & Long, Y. (2024). Detecting harassment and defamation in cyberbullying with emotion-adaptive training. AAAI 2024. https://doi.org/10.48550/arXiv.2501.16925

National Institutes of Health (2023). Cyberbullying linked to suicidal thoughts, attempts in young adolescents. https://www.nih.gov/news-events/nih-research-matters/cyberbullying-linked-suicidal-thoughts-attempts-young-adolescents

Our Stories

Grace Li

Originally from San Diego, CA, I’m currently a sophomore at Columbia University studying Computer Science and Mathematics. I grew up as a competitive dancer, taking a gap year before college to pursue professional ballet. Now, I join curaJOY as part of the Impact Fellowship’s Tech Cohort.

Touched by what you read? Join the conversation!

Tell Your Story

A Real Human Being, or a Grade Machine?

Caitlyn Wang’s wonderful article, besides managing to alleviate my anxiety about not taking enough AP courses, also brought up another thought in mind. In her article, there was this quote “Nothing is more fragile than a child who only knows how to chase accolades. Let’s help them learn who they are when the trophies are…

Read more >> about A Real Human Being, or a Grade Machine?
Literature Review: Cyberbullying Labeling Schemes and Annotation Guidelines

Cyberbullying, defined as willful and repeated harm inflicted through digital means, continues to rise in prevalence and severity, particularly among youth populations, who are exposed to social media at starting at young ages. The mental health consequences of cyberbullying—including anxiety, depression, and suicidal ideation—are well-documented (NIH, 2023). As a result, researchers have increasingly turned to…

Read more >> about Literature Review: Cyberbullying Labeling Schemes and Annotation Guidelines
The Hidden Dangers of Data Augmentation

In the rapidly evolving world of AI, data is king—but what happens when the data we rely on is synthetic? While data augmentation offers a promising solution to the challenges of collecting real-world examples, it comes with hidden dangers. From the risk of model collapse to the inability to capture the nuances of human language,…

Read more >> about The Hidden Dangers of Data Augmentation

Literature Review: Cyberbullying Labeling Schemes and Annotation Guidelines

Labeling Schemes: Binary, Multi-Class, Participant Roles, and Contextual Dimensions

Binary Classification

Multi-Class and Severity Classification

Participant Role Classification

Context Dependence and Sarcasm Indicators

Annotation Guidelines and Practices

Dataset Design and Guidelines

Addressing Subjectivity and Ambiguity

Annotator Diversity and Bias

Inter-Annotator Agreement (IAA)

Recommendations for a Comprehensive Labeling Schema

Dimension 1: Binary Classification

Dimension 2: Severity Level

Dimension 3: Participant Role

Dimension 4: Content Type

Dimension 5: Emotional Markers

Dimension 6: Context Dependence

Dimension 7: Sarcasm Indicator

Conclusion

References

Leave a Reply Cancel reply

Touched by what you read? Join the conversation!

A Real Human Being, or a Grade Machine?

Literature Review: Cyberbullying Labeling Schemes and Annotation Guidelines

The Hidden Dangers of Data Augmentation