Towards A Language Identification Approach For The Detection of Abusive Messages In Tweets of Mixed Wolof-French Codes

This paper presents a contribution on language identification for the detection of abusive messages in tweets in Wolof-French code-mixing. We propose a new two-branch architecture. The first branch uses a language identification model by modeling the representation of the two languages combined with a new boost attribute on Wolof tokens in order to calculate an attention score and the second branch is based on a transformers model to capture context, reinforced by an attention layer to better focus on the most relevant parts. These two branches are concatenated and projected into a dense layer for categorizing tweets into “abusive” or “non-abusive” classes. We evaluate our configuration on eleven (11) pre-trained models of three linguistic categories. Experimental results show that our approach improves the detection of abusive messages of mixed Wolof-French codes on all transforming models in all categories and on all measures.

Ibrahima Ndao
University Assane Seck of Ziguinchor
Senegal

Khadim Dramé
University Assane Seck of Ziguinchor
Senegal

Gorgoumack Sambe
cheikh hamidou kane digital university
Senegal

Gayo Diallo
University of Bordeaux
France

Youssou Faye
University Assane Seck of Ziguinchor
Senegal