Decision Trees

Overview

Decision trees are a popular supervised learning method used in machine learning, data mining, and statistics. It is a graphical representation of all the possible solutions to a decision-based problem, where each node represents a decision or a test on a specific feature or attribute, and each branch represents the outcome of that decision. Decision trees are easy to interpret, and their output can be visualized and analyzed easily.

Decision trees are often used for classification and regression tasks in which the goal is to predict a target variable based on several input variables. In classification tasks, the target variable is a categorical variable, and the decision tree is used to classify data into different classes. In regression tasks, the target variable is a continuous variable, and the decision tree is used to predict the value of the target variable.

Decision trees can be used to classify different shoe brands based on their popularity, customer appeal, and brand recognition. The input variables could include customer reviews, social media mentions, sales data, and other relevant factors. The target variable could be a categorical variable indicating which brand is the top-ranked in the global footwear market.

By using decision trees to classify shoe brands, the researchers can gain insights into the factors that drive brand recognition and customer appeal. They can identify which attributes are most important for a shoe brand to be successful in the global market, and they can compare the performance of different shoe brands. This information can help footwear companies make better decisions about their marketing strategies, product development, and brand positioning.

Data Preparation

Supervised modeling, including decision trees, requires labeled data, which means that each data point has a known target variable or label. In the case of the text mining topic "Is Nike the top-ranked brand in the global footwear market?: An Analysis of brand recognition and customer appeal compared to other leading shoe brands," labeled data could consist of customer reviews or survey responses that are already categorized or labeled based on the shoe brand they are discussing. This labeled data is then used to train the decision tree model to predict which shoe brand is the top-ranked in the global footwear market based on different input variables such as customer reviews, social media mentions, and sales data. Without labeled data, the decision tree model cannot be trained, and the accuracy of its predictions will be limited. Therefore, it is essential to have labeled data for supervised modeling, including decision trees, to be effective in making accurate predictions.

Code

https://github.com/poonamthakur08/Nike-Footwear-Market-Analysis/blob/main/decisionTrees.ipynb

Results

Visualization

Confusion Matrix

Decision Tree

The decision tree shows the important features such as good, great, happy etc along with the entropy that keep on decreasing as we reach from the parent node to the leaf node.

Conclusion

In conclusion, based on customer reviews, Accuracy of Label: 0.5512820512820513 was achieved using Decision trees