Cyberbullying, defined as willful and repeated harm inflicted through digital means, continues to rise in prevalence and severity, particularly among youth populations, who are exposed to social media at starting at young ages. The mental health consequences of cyberbullying—including anxiety, depression, and suicidal ideation—are well-documented (NIH, 2023). As a result, researchers have increasingly turned to automated systems powered by machine learning to detect, classify, and mitigate cyberbullying. A central challenge in developing such systems is constructing high-quality annotated datasets with appropriate labeling schemes. This review synthesizes recent scholarship on labeling strategies (binary, multi-class, and role-based), annotation practices, and inter-annotator agreement (IAA), drawing on five key academic sources and the NIH.
Labeling Schemes: Binary, Multi-Class, Participant Roles, and Contextual Dimensions
Binary Classification
Binary classification remains the most common and foundational labeling scheme in cyberbullying detection (Balakrishnan & Kaity, 2023). In this approach, comments are labeled as either cyberbullying or not. Simplicity is its strength, enabling consistent labeling with high inter-rater reliability. However, it falls short in representing the diverse forms and intensities of abuse, often failing to distinguish between subtle and overt attacks.
Multi-Class and Severity Classification
Expanding upon binary labeling, multi-class schemes offer granularity by incorporating levels of severity (Yi, Zubiaga, & Long, 2024). These often distinguish between types of harm, such as harassment, defamation, and denigration. For example, the HDCyberbullying dataset classifies content into harassment and defamation, and uses emotion-adaptive training to improve recognition of indirect forms of abuse. While this allows for richer analysis, it introduces ambiguity in classification and often results in lower IAA.
Participant Role Classification
Role-based labeling schemes identify users’ social roles in cyberbullying episodes—typically bully, victim, or bystander (Ratnayaka et al., 2020). This scheme enables detection of social dynamics and facilitates more nuanced interventions. For instance, in the ASKfm dataset, annotators identified participant roles within multi-turn conversations, yielding a multi-class F1 score of 0.76 for role detection. Although role classification improves contextual understanding, it requires annotators to infer intent and social relationships, increasing annotation complexity.
Context Dependence and Sarcasm Indicators
To address the complexities of implicit abuse and interpretive ambiguity, recent schema have added contextual and stylistic dimensions. Context dependence flags whether the message can be understood in isolation or requires surrounding discourse (Hamlett et al., 2022). Sarcasm, often used to veil harmful intent, is likewise annotated to distinguish ironic expressions from genuine hostility (Kim et al., 2021). Including these layers enhances model precision and aligns with human-centered annotation practices.
Annotation Guidelines and Practices
Dataset Design and Guidelines
Hamlett et al. (2022) designed a multi-dimensional annotation scheme for an Instagram dataset. Their labels included content type, purpose, directionality, and co-occurrence with other phenomena. Annotators were trained using clear definitions and examples to ensure consistency. Similarly, Gomez et al. (2021) emphasized the importance of curating datasets using consensus filtering and hybrid human-AI workflows to ensure quality and diversity.
Addressing Subjectivity and Ambiguity
Cyberbullying often includes sarcasm, coded language, and ambiguous intent. Kim et al. (2021) highlighted that human-centered approaches are crucial for developing annotation schemes that reflect lived experiences and diverse interpretations. Their review revealed that many systems lacked grounding in real human contexts, leading to reduced model relevance and possible harms.
Annotator Diversity and Bias
Annotation quality is also influenced by the demographic background and bias of annotators. As observed by Hamlett et al. (2022), collecting annotator demographic information can provide insights into how perceptions of cyberbullying vary. Including diverse annotators helps mitigate systemic bias and aligns with the human-centered design principles proposed by Kim et al. (2021).
Inter-Annotator Agreement (IAA)
Inter-annotator agreement is a key measure of annotation reliability. Most studies use Cohen’s kappa or Krippendorff’s alpha to assess agreement. Binary classification tends to yield higher IAA due to its simplicity, while multi-class and role-based annotations face challenges due to subjectivity (Gomez et al., 2021).
To address ambiguity, Gomez et al. propose consensus filtering, retaining only instances with strong agreement between human and algorithmic labels. This not only improves dataset quality but also enhances model performance when trained on the filtered data.
Recommendations for a Comprehensive Labeling Schema
Drawing from the reviewed literature and existing annotation manuals, we recommend a hierarchical labeling schema composed of the following dimensions:
Dimension 1: Binary Classification
- Cyberbullying: The text contains aggressive, threatening, demeaning, or harmful content directed at a person or group.
- Not Cyberbullying: The content is neutral, supportive, humorous (non-hurtful), or otherwise non-threatening.
Dimension 2: Severity Level
- Low: Mild teasing, sarcasm, or isolated negative comments without threats or prolonged attacks.
- Medium: Repeated negative behavior, name-calling, exclusion, or indirect threats.
- High: Explicit threats, hate speech, targeted harassment, doxxing, or strong language intended to severely harm or intimidate.
Dimension 3: Participant Role
- Bully: The speaker is initiating or perpetuating harmful behavior.
- Victim: The speaker is being targeted or expressing harm.
- Bystander: The speaker observes or comments on the bullying.
- Unrelated: The speaker is not involved in the bullying event.
Dimension 4: Content Type
- Direct Attack: Name-calling, insults, or slurs directed at an individual.
- Harassment: Repeated or ongoing targeting of someone with harmful content.
- Exclusion / Social Rejection: Content encouraging the isolation of an individual.
- Threats / Intimidation: Implicit or explicit threats of harm.
- Racism / Xenophobia: Content targeting a person or group based on race, nationality, or ethnicity.
- Sexism / Misogyny: Gender-based insults, objectification, or discriminatory comments.
- Homophobia / Transphobia: Hostility or slurs directed at LGBTQ+ identities.
- Pedophilia / Grooming: Inappropriate sexual content involving or targeting minors.
- Sexting / Sexual Harassment: Unwanted sexual messages, images, or pressure.
- Body Shaming: Negative comments about someone’s appearance or weight.
- Other: Use this tag when none of the above categories fit, but the content is still harmful.
Dimension 5: Emotional Markers
- Sentiment: Positive / Neutral / Negative
- Emotion: (Choose all that apply): Anger, Fear, Sadness, Disgust, Shame, Guilt, Empathy, Neutral, Unclear, Other, None
Dimension 6: Context Dependence
- Dependent: The message requires surrounding conversation to interpret its intent or severity.
- Independent: The message can be accurately interpreted in isolation.
Dimension 7: Sarcasm Indicator
- Sarcastic: The text employs sarcasm, irony, or parody in a way that obscures or enhances harmful meaning.
- Genuine: The text does not include sarcasm.
This initial comprehensive schema supports detailed, interpretable annotations that reflect both social dynamics and linguistic complexity, and aligns with best practices in cyberbullying research and AI-human collaboration.
Conclusion
Labeling schemes in cyberbullying detection have evolved from simple binary tags to rich multi-dimensional frameworks that better reflect the complexity of online abuse. While challenges remain—especially in achieving reliable annotations—the use of human-centered design, consensus filtering, and multi-role annotation are promising directions. Future work should continue refining schema usability, addressing annotation bias, and validating frameworks across diverse platforms and demographics.
References
Balakrishnan, V., & Kaity, M. (2023). Cyberbullying detection and machine learning: A systematic literature review. Artificial Intelligence Review, 56, S1375–S1416. https://doi.org/10.1007/s10462-023-10553
Gomez, C. E., Sztainberg, M. O., & Trana, R. E. (2021). Curating cyberbullying datasets: A human–AI collaborative approach. International Journal of Bullying Prevention, 4, 35–46. https://doi.org/10.1007/s42380-021-00114-6
Hamlett, M., Powell, G., Silva, Y. N., & Hall, D. (2022). A labeled dataset for investigating cyberbullying content patterns in Instagram. ICWSM 2022. https://doi.org/10.1609/icwsm.v16i1.19376
Kim, S., Razi, A., Stringhini, G., Wisniewski, P. J., & De Choudhury, M. (2021). A human-centered systematic literature review of cyberbullying detection algorithms. Proceedings of the ACM on Human-Computer Interaction, 5(CSCW2), 1–34. https://doi.org/10.1145/3476066
Ratnayaka, G., Atapattu, T., Herath, M., Zhang, G., & Falkner, K. (2020). Enhancing the identification of cyberbullying through participant roles. arXiv preprint arXiv:2010.06640. https://doi.org/10.18653/v1/2020.alw-1.11
Yi, P., Zubiaga, A., & Long, Y. (2024). Detecting harassment and defamation in cyberbullying with emotion-adaptive training. AAAI 2024. https://doi.org/10.48550/arXiv.2501.16925
National Institutes of Health (2023). Cyberbullying linked to suicidal thoughts, attempts in young adolescents. https://www.nih.gov/news-events/nih-research-matters/cyberbullying-linked-suicidal-thoughts-attempts-young-adolescents
Leave a Reply