The Impact of Nlp Techniques On Financial Sentiment Analysis: From Lexicons To Transformers

This paper examines how different Natural Language Processing (NLP) techniques affect model performance in financial sentiment analysis, using comments from StockTwits social network. The purpose of the study is to assess the sensitivity of four sentiment analysis methodologies: lexicon-based methods, traditional machine learning models, neural network-based models and transformer-based architectures, to various preprocessing, textual representation, and data balancing techniques, and to determine which combinations result in the best overall performance. A modified version of the CRISP-DM process model served as the methodological framework, covering data preparation, modeling, and evaluation. The main findings show an expected performance progression, beginning with lower results in lexicon-based methods and reaching higher performance in transformer architectures. However, the different approaches respond differently and exhibit varying levels of sensitivity, underscoring that these choices must be made carefully, as no single configuration consistently outperforms the others across all model types.

Rui Ribeiro
ISCAP, Polytechnic of Porto
Portugal

Henrique Lopes Cardoso
LIACC, FEUP, University of Porto
Portugal

Célia Talma Gonçalves
CEOS.PP, ISCAP, Polytechnic of Porto and LIACC, University of Porto
Portugal