Date of Award

5-1-2024

Thesis Type

phd

Document Type

Thesis (Restricted Access)

Divisions

fsktm

Department

Department of Information System

Institution

Universiti Malaya

Abstract

Social media platforms consist of rich language resources and is a valuable source for analysing people’s sentiment around the globe. Sarcasm detection is one of many challenges faced in sentiment analysis and is a classification problem. Many past studies employed various techniques and approaches to detect sarcasm. Even though hyperbole is one of the common approaches used individually or combined with other approaches such as lexical or pragmatic, not all types of hyperboles were used. From past research, the top five hyperboles identified and explored in this research to detect sarcasm are intensifier, interjection, capital letters, punctuation marks and elongated words. Each of the hyperboles were identified from six thousand and six hundred pre-processed negative sentiment tweets comprising of #Chinesevirus, #Kungflu, #COVID19, #Hantavirus and #Coronavirus hash tagged tweets. The unbiased dataset was analyzed using three renowned machine learning algorithms, that is, Support Vector Machine, Random Forest, and Random Forest with Bagging. A total of 81 models were evaluated with single and double hyperboles consisting of the top two dominant hyperboles as well as with all hyperbole features. With the presence of hyperbolic words in the tweets in an unbiased dataset, the proposed model (two-class setup) with interjection word achieved an accuracy of 76.61%, 78% precision, 85% recall, an AUC of 74% and F-score of 82% respectively. The model with all hyperboles achieved accuracy of 78.89%, 81% precision, 87% recall, an AUC of 76% and F-score of 84%, respectively. Experiments and analysis conducted in this study concluded that hyperboles exist in an unbiased dataset which helps enhance the sarcasm detection as well. A similar approach was undertaken on an open dataset which focused on lexical approach and artificial recurrent neural network (RNN). The proposed model performed well achieving an accuracy of 89.46% and 90% precision, an increase of 10% in accuracy and more than 40% for precision. Another avenue explored in this study is to determine the significant hyperbole and intensifier was found to be the most significant hyperbole (p< .0001). This finding coincides with ablation study which shows that intensifier as the predominant hyperbole for detecting sarcasm. Experiments and analysis conducted in this study concluded that hyperboles exist in an unbiased dataset which helps enhance the sarcasm detection as well.

Note

Thesis (PhD) - Faculty of Computer Science & Information Technology, Universiti Malaya, 2024.

Share

COinS