Research Works
Title
An Empirical Study on Bengali News Headline Categorization Leveraging Different Machine Learning Techniques
Abstract
Bengali News Headline Categorization Using Machine Learning aims to categorize Bengali online news headlines into six distinct categories using Natural Language Processing. Researchers in different application fields have recently paid great attention to the fantastic accomplishments of Machine Learning Models in Natural Language Processing. Several machine learning algorithms categorize Bengali news headlines, including Logistic Regression, Random Forest Classifier, Multinomial Naive Bayes, and RBF Support Vector Machine. Also, deep learning models like LSTM, Bi-LSTM, GRU, Bi-GRU, and CNN, and the Bangla-BERT and XLM-RoBERTa transformer learning models are presented in this research. This paper’s primary purpose is to provide a comparative observation of several machine learning models, deep learning models, and transformer learning methods in Bengali news headline classification. We used 1,36,811 text data of Bengali news headlines for evaluation, and our dataset had an accuracy of 86.50% with XLM-RoBERTa.
Authors
Aysha Gazi Mouri* , Purnendu Talukder , Tanvir Rahman Anik*, Ifti Sam Ibn Rahman , Sajib Kumar Saha Joy , Md. Tanvir Rouf Shawon , Farzad Ahmed, and Nibir Chandra Mandal
Novelty and research contributions
- Conducted a comprehensive statistical analysis to evaluate the feasibility of using word-based features for training supervised learning models.
- Multiple machine learning algorithms were employed, including Logistic Regression, Random Forest, Multinomial Naive Bayes, and RBF SVM. Deep learning and transformer models like LSTM, Bi-LSTM, GRU, Bi-GRU, CNN with Bi-LSTM, Bengali BERT, and XLM-RoBERTa were also utilized.
- To the best of our knowledge, this is the first study conducted on this dataset.