How to perform topic modeling using machine learning in Python?
In the first example, we are going to perform topic modeling using Latent Dirichlet Allocation (LDA) in Python.
In the second example, we are going to perform topic modeling using Non-negative Matrix Factorization (NMF) in Python.
These examples demonstrate how to perform topic modeling using LDA and NMF in Python by following a series of steps including data preprocessing, model initialization, and extracting the top words for each topic.
# Step 1: Import necessary libraries from sklearn.feature_extraction.text import CountVectorizer from sklearn.decomposition import LatentDirichletAllocation # Step 2: Create a CountVectorizer object and fit_transform the data vectorizer = CountVectorizer(stop_words='english') X = vectorizer.fit_transform(data) # Step 3: Initialize and fit the LDA model lda = LatentDirichletAllocation(n_components=5, random_state=42) lda.fit(X) # Step 4: Print the top words for each topic feature_names = vectorizer.get_feature_names_out() for topic_idx, topic in enumerate(lda.components_): top_words = [feature_names[i] for i in topic.argsort()[:-5 - 1:-1]] print(f"Topic {topic_idx}:", top_words)
In the second example, we are going to perform topic modeling using Non-negative Matrix Factorization (NMF) in Python.
# Step 1: Import necessary libraries from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.decomposition import NMF # Step 2: Create a TfidfVectorizer object and fit_transform the data tfidf_vectorizer = TfidfVectorizer(stop_words='english') tfidf = tfidf_vectorizer.fit_transform(data) # Step 3: Initialize and fit the NMF model nmf = NMF(n_components=5, random_state=42) nmf.fit(tfidf) # Step 4: Print the top words for each topic feature_names = tfidf_vectorizer.get_feature_names_out() for topic_idx, topic in enumerate(nmf.components_): top_words = [feature_names[i] for i in topic.argsort()[:-5 - 1:-1]] print(f"Topic {topic_idx}:", top_words)
These examples demonstrate how to perform topic modeling using LDA and NMF in Python by following a series of steps including data preprocessing, model initialization, and extracting the top words for each topic.
Comments
Post a Comment