Towards A Language Identification Approach For The Detection of Abusive Messages In Tweets of Mixed Wolof-French Codes
This paper presents a contribution on language identification for the detection of abusive messages in tweets in Wolof-French code-mixing. We propose a new two-branch architecture. The first branch uses a language identification model by modeling the representation of the two languages combined with a new boost attribute on Wolof tokens in order to calculate an attention score and the second branch is based on a transformers model to capture context, reinforced by an attention layer to better focus on the most relevant parts. These two branches are concatenated and projected into a dense layer for categorizing tweets into “abusive” or “non-abusive” classes. We evaluate our configuration on eleven (11) pre-trained models of three linguistic categories. Experimental results show that our approach improves the detection of abusive messages of mixed Wolof-French codes on all transforming models in all categories and on all measures.
