CCPruner - a helper class to prune a decision tree using the Cost Complexity method (see Classification and Regression Trees by Leo Breiman et al) Some definitions: T_max - the initial, usually highly overtrained tree, that is to be pruned back R(T) - quality index (Gini, misclassification rate, or other) of a tree T ~T - set of terminal nodes in T T' - the pruned subtree of T_max that has the best quality index R(T') alpha - the prune strength parameter in Cost Complexity pruning (R_alpha(T) = R(T) + alpha// |~T|) There are two running modes in CCPruner: (i) one may select a prune strength and prune back the tree T_max until the criterion R(T) - R(t) alpha < ---------- |~T_t| - 1 is true for all nodes t in T, or (ii) the algorithm finds the sequence of critical points alpha_k < alpha_k+1 ... < alpha_K such that T_K = root(T_max) and then selects the optimally-pruned subtree, defined to be the subtree with the best quality index for the validation sample.
~CCPruner() | |
TMVA::CCPruner | CCPruner(const TMVA::CCPruner&) |
TMVA::CCPruner | CCPruner(TMVA::DecisionTree* t_max, const TMVA::CCPruner::EventList* validationSample, TMVA::SeparationBase* qualityIndex = __null) |
TMVA::CCPruner | CCPruner(TMVA::DecisionTree* t_max, const TMVA::DataSet* validationSample, TMVA::SeparationBase* qualityIndex = __null) |
vector<TMVA::DecisionTreeNode*> | GetOptimalPruneSequence() const |
Float_t | GetOptimalPruneStrength() const |
Float_t | GetOptimalQualityIndex() const |
TMVA::CCPruner& | operator=(const TMVA::CCPruner&) |
void | Optimize() |
void | SetPruneStrength(Float_t alpha = -1.) |
Float_t | fAlpha | ! regularization parameter in CC pruning |
Bool_t | fDebug | ! debug flag |
Int_t | fOptimalK | ! index of the optimal tree in the pruned tree sequence |
Bool_t | fOwnQIndex | ! flag indicates if fQualityIndex is owned by this |
vector<TMVA::DecisionTreeNode*> | fPruneSequence | ! map of weakest links (i.e., branches to prune) -> pruning index |
vector<Float_t> | fPruneStrengthList | ! map of alpha -> pruning index |
TMVA::SeparationBase* | fQualityIndex | ! the quality index used to calculate R(t), R(T) = sum[t in ~T]{ R(t) } |
vector<Float_t> | fQualityIndexList | ! map of R(T) -> pruning index |
TMVA::DecisionTree* | fTree | ! (pruned) decision tree |
const TMVA::DataSet* | fValidationDataSet | ! the event sample to select the optimally-pruned tree |
const TMVA::CCPruner::EventList* | fValidationSample | ! the event sample to select the optimally-pruned tree |
Inheritance Chart: | |||||
|
return the list of pruning locations to define the optimal subtree T' of T_max
return the quality index from the validation sample for the optimal subtree T'
return the prune strength (=alpha) corresponding to the prune sequence