Latent Dirichlet Allocation (LDA)
Overview
Topic modeling is a technique used to identify latent topics or themes in a large corpus of text data. It involves analyzing the frequency of words and phrases in the text and grouping them into coherent topics based on their co-occurrence patterns. Topic modeling can be used to gain insights into the underlying themes and patterns in the text data.
It may analyze a large corpus of customer reviews, industry reports, and other relevant sources to identify the most frequently occurring words and phrases related to Nike and other leading shoe brands. Based on this analysis, it may then group these words and phrases into coherent topics, such as brand recognition, customer appeal, product quality, pricing, and so on. This will help to identify the key factors that are driving Nike's success in the global footwear market and how it compares to other leading shoe brands.
Data Preparation
Latent Dirichlet Allocation (LDA), is a topic modeling technique that is used to identify hidden topics or themes in a corpus of text data. LDA requires the text data to be preprocessed using a tool such as CountVectorizer to transform the text data into a numerical format that can be analyzed by the LDA algorithm. The text data must also be unlabeled, meaning that it does not contain any predefined categories or labels.

Code
Results
Visualization

The representation of topics distribution in a 2-dimensional space with bubbles of varying sizes is a useful visualization technique in topic modeling. This method allows for easy identification of the most frequent topics in the analyzed documents. A model with a low number of topics will have large, non-overlapping bubbles, while a model with a high number of topics will have many small, overlapping bubbles clustered together. The distance between the topics approximates the semantic relationship between them, with overlapping bubbles indicating a higher degree of similarity in their contents.
In addition to the bubble visualization, a horizontal bar graph is also used to display the frequency distribution of words in the documents, with blue representing the total frequency and red showing the frequency of each word within a specific topic. When selecting a topic bubble, the top 30 words associated with that topic are displayed with the red-shaded area. Hovering over specific words in the right panel displays only the topic containing the word, with the size of the bubble representing the weight of the word in that topic. A larger bubble size indicates a higher weight for the selected word in that particular topic.
Conclusion
In conclusion, the Latent Dirichlet Allocation (LDA) analysis provides insights into the themes and topics associated with customer reviews of Nike and other leading shoe brands in the global footwear market. While Nike is found to be associated with several positive themes such as comfort, style, and durability, the analysis does not definitively establish Nike's position as the top-ranked brand in the market. Other leading shoe brands also receive positive reviews from customers, and factors such as pricing, availability, and brand reputation are also important considerations for consumers when making purchasing decisions.