How to perform topic modeling using machine learning in Python?

In the first example, we are going to perform topic modeling using Latent Dirichlet Allocation (LDA) in Python.
# Step 1: Import necessary libraries
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.decomposition import LatentDirichletAllocation

# Step 2: Create a CountVectorizer object and fit_transform the data
vectorizer = CountVectorizer(stop_words='english')
X = vectorizer.fit_transform(data)

# Step 3: Initialize and fit the LDA model
lda = LatentDirichletAllocation(n_components=5, random_state=42)
lda.fit(X)

# Step 4: Print the top words for each topic
feature_names = vectorizer.get_feature_names_out()
for topic_idx, topic in enumerate(lda.components_):
    top_words = [feature_names[i] for i in topic.argsort()[:-5 - 1:-1]]
    print(f"Topic {topic_idx}:", top_words)

In the second example, we are going to perform topic modeling using Non-negative Matrix Factorization (NMF) in Python.
# Step 1: Import necessary libraries
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.decomposition import NMF

# Step 2: Create a TfidfVectorizer object and fit_transform the data
tfidf_vectorizer = TfidfVectorizer(stop_words='english')
tfidf = tfidf_vectorizer.fit_transform(data)

# Step 3: Initialize and fit the NMF model
nmf = NMF(n_components=5, random_state=42)
nmf.fit(tfidf)

# Step 4: Print the top words for each topic
feature_names = tfidf_vectorizer.get_feature_names_out()
for topic_idx, topic in enumerate(nmf.components_):
    top_words = [feature_names[i] for i in topic.argsort()[:-5 - 1:-1]]
    print(f"Topic {topic_idx}:", top_words)

These examples demonstrate how to perform topic modeling using LDA and NMF in Python by following a series of steps including data preprocessing, model initialization, and extracting the top words for each topic.

Comments

Popular posts from this blog

What are the different types of optimization algorithms used in deep learning?

What are the different evaluation metrics used in machine learning?

What is the difference between a module and a package in Python?