Please use this identifier to cite or link to this item:
http://localhost:8080/xmlui/handle/123456789/4164| Title: | An integrated clustering and BERT framework for improved topic modeling |
| Authors: | Lijimol, George Sumathy, P |
| Keywords: | Latent Dirichlet allocation (LDA) · Topic modeling · k-means clustering · Dimensionality reduction · Bidirectional encoder representations from transformers (BERT) |
| Issue Date: | 31-May-2024 |
| Publisher: | Bharathidasan University |
| Abstract: | elling is a machine learning tech nique that is extensively used in Natural Language Pro cessing (NLP) applications to infer topics within unstruc tured textual data. Latent Dirichlet Allocation (LDA) is one of the most used topic modeling techniques that can automatically detect topics from a huge collection of text documents. However, the LDA-based topic models alone do not always provide promising results. Clustering is one of the efective unsupervised machine learning algorithms that are extensively used in applications including extract ing information from unstructured textual data and topic modeling. A hybrid model of Bidirectional Encoder Repre sentations from Transformers (BERT) and Latent Dirichlet Allocation (LDA) in topic modeling with clustering based on dimensionality reduction have been studied in detail. As the clustering algorithms are computationally complex, the complexity increases with the higher number of features, the PCA, t-SNE and UMAP based dimensionality reduction methods are also performed. Finally, a unifed clustering based framework using BERT and LDA is proposed as part of this study for mining a set of meaningful topics from the massive text corpora. The experiments are conducted to demonstrate the efectiveness of the cluster-informed topic modeling framework using BERT and LDA by simu lating user input on benchmark datasets. The experimental results show that clustering with dimensionality reduction would help infer more coherent topics and hence this unifed clustering and BERT-LDA based approach can be efectively utilized for building topic modeling applications |
| URI: | http://localhost:8080/xmlui/handle/123456789/4164 |
| Appears in Collections: | Department of Mathematics |
Files in This Item:
| File | Description | Size | Format | |
|---|---|---|---|---|
| s41870-023-01268-w.pdf | 992.61 kB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.