Add words per topic LDA

1 year ago

Comment

1 / 1

In Latent Dirichlet Allocation (LDA), topics are represented as probability distributions over words in the vocabulary. Each topic is associated with a set of words that have high probabilities under that topic. To add words per topic in LDA, you need to extract the top N words with the highest probabilities for each topic.

Assuming you are using the gensim library in Python, here's how you can add words per topic in LDA:

python
import gensim

from gensim import corpora



# Sample documents

documents = [

    ["apple", "banana", "orange", "grape", "cherry"],

    ["car", "bus", "bike", "train", "plane"],

    ["dog", "cat", "elephant", "giraffe", "lion"]

]



# Create dictionary and corpus

dictionary = corpora.Dictionary(documents)

corpus = [dictionary.doc2bow(doc) for doc in documents]



# Build the LDA model

lda_model = gensim.models.LdaModel(corpus=corpus, id2word=dictionary, num_topics=3)



# Get the top N words for each topic

num_words_per_topic = 5  # Change this to the desired number of words per topic



for topic_idx in range(lda_model.num_topics):

    topic = lda_model.show_topic(topic_idx, topn=num_words_per_topic)

    words = [word for word, _ in topic]

    print(f"Topic {topic_idx + 1}: {', '.join(words)}")

In this example, we use a small sample of documents to demonstrate. Replace documents with your actual list of documents.

The num_words_per_topic variable specifies how many top words you want to display for each topic. You can change this value to get more or fewer words per topic.

The code will print the top N words for each topic. Adjust the value of num_words_per_topic to suit your needs.