How to perform topic modeling using machine learning in Python?

In the first example, we are going to perform topic modeling using Latent Dirichlet Allocation (LDA) in Python.
# Step 1: Import necessary libraries
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.decomposition import LatentDirichletAllocation

# Step 2: Create a CountVectorizer object and fit_transform the data
vectorizer = CountVectorizer(stop_words='english')
X = vectorizer.fit_transform(data)

# Step 3: Initialize and fit the LDA model
lda = LatentDirichletAllocation(n_components=5, random_state=42)
lda.fit(X)

# Step 4: Print the top words for each topic
feature_names = vectorizer.get_feature_names_out()
for topic_idx, topic in enumerate(lda.components_):
    top_words = [feature_names[i] for i in topic.argsort()[:-5 - 1:-1]]
    print(f"Topic {topic_idx}:", top_words)

In the second example, we are going to perform topic modeling using Non-negative Matrix Factorization (NMF) in Python.
# Step 1: Import necessary libraries
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.decomposition import NMF

# Step 2: Create a TfidfVectorizer object and fit_transform the data
tfidf_vectorizer = TfidfVectorizer(stop_words='english')
tfidf = tfidf_vectorizer.fit_transform(data)

# Step 3: Initialize and fit the NMF model
nmf = NMF(n_components=5, random_state=42)
nmf.fit(tfidf)

# Step 4: Print the top words for each topic
feature_names = tfidf_vectorizer.get_feature_names_out()
for topic_idx, topic in enumerate(nmf.components_):
    top_words = [feature_names[i] for i in topic.argsort()[:-5 - 1:-1]]
    print(f"Topic {topic_idx}:", top_words)

These examples demonstrate how to perform topic modeling using LDA and NMF in Python by following a series of steps including data preprocessing, model initialization, and extracting the top words for each topic.

Comments

Popular posts from this blog

What is the difference between a module and a package in Python?

Sorting Algorithms in Python? - with practical example

What are the different evaluation metrics used in machine learning?