- 1 year ago
Add words per topic LDA
In Latent Dirichlet Allocation (LDA), topics are represented as probability distributions over words in the vocabulary. Each topic is associated with a set of words that have high probabilities under that topic. To add words per topic in LDA, you need to extract the top N words with the highest probabilities for each topic.
Assuming you are using the gensim
library in Python, here's how you can add words per topic in LDA:
pythonimport gensim
from gensim import corpora
# Sample documents
documents = [
["apple", "banana", "orange", "grape", "cherry"],
["car", "bus", "bike", "train", "plane"],
["dog", "cat", "elephant", "giraffe", "lion"]
]
# Create dictionary and corpus
dictionary = corpora.Dictionary(documents)
corpus = [dictionary.doc2bow(doc) for doc in documents]
# Build the LDA model
lda_model = gensim.models.LdaModel(corpus=corpus, id2word=dictionary, num_topics=3)
# Get the top N words for each topic
num_words_per_topic = 5 # Change this to the desired number of words per topic
for topic_idx in range(lda_model.num_topics):
topic = lda_model.show_topic(topic_idx, topn=num_words_per_topic)
words = [word for word, _ in topic]
print(f"Topic {topic_idx + 1}: {', '.join(words)}")
In this example, we use a small sample of documents to demonstrate. Replace documents
with your actual list of documents.
The num_words_per_topic
variable specifies how many top words you want to display for each topic. You can change this value to get more or fewer words per topic.
The code will print the top N words for each topic. Adjust the value of num_words_per_topic
to suit your needs.