Aha moment in class

4/30/2023

Now, the data is cleanly separable after the data was projected onto a new dimension (each data point represented in two dimensions as ( x, x²)). This is your single Decision Tree classifier, which is naturally a very high-variance model. Take a sine wave with heavy normally distributed noise. How can a model trained on the same data that randomly pulls subsets of the data to simulate artificial ‘diversity’ perform better than one trained on the data as a whole?

If I could ask one all-knowing friend for a recommendation, I see no objection to that. The ensemble model is not actually receiving any new information. In the real world, the single-friend option has less experience than all the friends in total, but in machine learning, the decision tree and random forest models are trained on the same data, and hence, same experiences. This analogy, clean and simple as it is, never really stood out to me. The insight this analogy provides is that each tree has some sort of ‘diversity of thought’ because they were trained on different data, and hence have different ‘experiences’. Unless you only have one friend, most people would answer the second. Would you rather only ask one friend or ask several friends, then find the mode or general consensus? To ask someone for their recommendation, you must answer a variety of yes/no questions, which will lead them to make their decision for which restaurant you should go to. You need to decide which restaurant to go to next. There are many analogies for why Random Forest works well.

0 Comments

Aha moment in class

Leave a Reply.

Author

Archives

Categories