ADA Library Digital Repository

Large Scale Classification and Clusterization of COVID-19 Related Papers

Show simple item record

dc.contributor.author Talibzade, Rustam
dc.date.accessioned 2024-12-19T23:36:58Z
dc.date.available 2024-12-19T23:36:58Z
dc.date.issued 2023-04
dc.identifier.uri http://hdl.handle.net/20.500.12181/928
dc.description.abstract The year 2020 is mostly associated with the outbreak of COVID-19 caused by SARS-CoV 2 coronavirus due to immeasurable effects on our lives. The humanity faced unexpected challenges that were not faced in recent history. Plenty of research was done to find out the ways to combat COVID-19 disease and save as many lives as possible. This led to the emergence of huge number of articles and research papers in COVID-19 related literature, which were hard to keep up with. Several datasets like LitCovid and CORD-19 were created where collections of COVID-19 related literature is stored. To gain benefits and insights from such datasets, there is a need for data analytics and machine learning techniques to analyze these datasets. This Master Thesis research explores a comprehensive analysis of text classification and clustering methodologies including Support Vector Machines (SVM), Naive Bayes, Latent Dirichlet Allocation (LDA), Latent Semantic Analysis (LSA), Non-negative Matrix Factorization (NMF), BERT, BioBERT, and SciBERT, applied to a large dataset of COVID-19 research articles sourced from the LitCovid database. The primary goal of this research is to devise and assess techniques for organizing, analyzing, and understanding the swiftly expanding collection of scientific literature pertaining to COVID-19. The research is structured into multiple phases. Initially, a thorough literature review is conducted to establish a robust understanding of the cutting-edge developments in NLP, text classification, clustering and topic modelling. This review encompasses traditional machine learning techniques including supervised and unsupervised clustering algorithms. Their applications on different datasets including COVID-19 related datasets like CORD-19 and LitCovid are also discussed. Next, description of LitCovid dataset is provided. Afterwards, machine learning techniques mentioned above are applied using different word vectorization techniques including Bag-Of Words, TF-IDF and Word2Vec to identify how certain algorithms behave with these vectorization methods. In the results and analysis section, the author offers a comprehensive comparison of all classification, topic modelling and clusterization approaches used for COVID-19 research articles. Finally, in the summary and future work section, the author consolidates key findings and considers potential work for future research. en_US
dc.language.iso en en_US
dc.publisher ADA University en_US
dc.rights Attribution-NonCommercial-NoDerivs 3.0 United States *
dc.rights.uri http://creativecommons.org/licenses/by-nc-nd/3.0/us/ *
dc.subject COVID-19 (Disease) -- Research en_US
dc.subject Natural language processing (Computer science) en_US
dc.subject Machine learning -- Medical applications en_US
dc.title Large Scale Classification and Clusterization of COVID-19 Related Papers en_US
dc.type Thesis en_US


Files in this item

The following license files are associated with this item:

This item appears in the following Collection(s)

Show simple item record

Attribution-NonCommercial-NoDerivs 3.0 United States Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivs 3.0 United States

Search ADA LDR


Advanced Search

Browse

My Account