Text Classification - Algorithms in Decision Trees

Re: Text Classification - Algorithms in Decision Trees

par HSU09 Mạc Hoàng Yến,
1. How Decision Trees work (classification)

Decision Trees classify data by repeatedly splitting it based on features (e.g., words in text). Each split aims to make the ...

suite...

1. How Decision Trees work (classification)

Decision Trees classify data by repeatedly splitting it based on features (e.g., words in text). Each split aims to make the groups more “pure” (mostly one class). The process continues until the data is well separated or stopping rules are met.

2. ID3, C4.5, and CART differences
ID3
Uses Information Gain (entropy)
No pruning → can overfit
Mainly categorical data
C4.5
Uses Gain Ratio (improves Information Gain)
Handles continuous data and missing values
Includes pruning → better generalization
CART
Uses Gini Impurity
Produces binary splits only
Works for classification and regression
Widely used in practice
3. Gini Impurity (simple meaning)

Gini Impurity measures how mixed a node is:

0 = pure (all one class)
Higher value = more mixed classes

CART chooses splits that reduce Gini impurity the most, creating cleaner groups.