Research Works

Title

An Empirical Study on Bengali News Headline Categorization Leveraging Different Machine Learning Techniques

Abstract

Bengali News Headline Categorization Using Machine Learning aims to categorize Bengali online news headlines into six distinct categories using Natural Language Processing. Researchers in different application fields have recently paid great attention to the fantastic accomplishments of Machine Learning Models in Natural Language Processing. Several machine learning algorithms categorize Bengali news headlines, including Logistic Regression, Random Forest Classifier, Multinomial Naive Bayes, and RBF Support Vector Machine. Also, deep learning models like LSTM, Bi-LSTM, GRU, Bi-GRU, and CNN, and the Bangla-BERT and XLM-RoBERTa transformer learning models are presented in this research. This paper’s primary purpose is to provide a comparative observation of several machine learning models, deep learning models, and transformer learning methods in Bengali news headline classification. We used 1,36,811 text data of Bengali news headlines for evaluation, and our dataset had an accuracy of 86.50% with XLM-RoBERTa.

Authors

Aysha Gazi Mouri* , Purnendu Talukder , Tanvir Rahman Anik*, Ifti Sam Ibn Rahman , Sajib Kumar Saha Joy , Md. Tanvir Rouf Shawon , Farzad Ahmed, and Nibir Chandra Mandal

Novelty and research contributions

Conducted a comprehensive statistical analysis to evaluate the feasibility of using word-based features for training supervised learning models.
Multiple machine learning algorithms were employed, including Logistic Regression, Random Forest, Multinomial Naive Bayes, and RBF SVM. Deep learning and transformer models like LSTM, Bi-LSTM, GRU, Bi-GRU, CNN with Bi-LSTM, Bengali BERT, and XLM-RoBERTa were also utilized.
To the best of our knowledge, this is the first study conducted on this dataset.

This article was published in 2022 25th International Conference on Computer and Information Technology (ICCIT) 17-19 December, Cox’s Bazar, Bangladesh