A++ » TMVA » TMVA::CCPruner

class TMVA::CCPruner

CCPruner - a helper class to prune a decision tree using the Cost Complexity method
(see Classification and Regression Trees by Leo Breiman et al)

Some definitions:

T_max - the initial, usually highly overtrained tree, that is to be pruned back
R(T) - quality index (Gini, misclassification rate, or other) of a tree T
~T - set of terminal nodes in T
T' - the pruned subtree of T_max that has the best quality index R(T')
alpha - the prune strength parameter in Cost Complexity pruning (R_alpha(T) = R(T) + alpha// |~T|)

There are two running modes in CCPruner: (i) one may select a prune strength and prune back
the tree T_max until the criterion
R(T) - R(t)
alpha <    ----------
|~T_t| - 1

is true for all nodes t in T, or (ii) the algorithm finds the sequence of critical points
alpha_k < alpha_k+1 ... < alpha_K such that T_K = root(T_max) and then selects the optimally-pruned
subtree, defined to be the subtree with the best quality index for the validation sample.

Function Members (Methods)

public:
~CCPruner()
TMVA::CCPrunerCCPruner(const TMVA::CCPruner&)
TMVA::CCPrunerCCPruner(TMVA::DecisionTree* t_max, const TMVA::CCPruner::EventList* validationSample, TMVA::SeparationBase* qualityIndex = __null)
TMVA::CCPrunerCCPruner(TMVA::DecisionTree* t_max, const TMVA::DataSet* validationSample, TMVA::SeparationBase* qualityIndex = __null)
vector<TMVA::DecisionTreeNode*>GetOptimalPruneSequence() const
Float_tGetOptimalPruneStrength() const
Float_tGetOptimalQualityIndex() const
TMVA::CCPruner&operator=(const TMVA::CCPruner&)
voidOptimize()
voidSetPruneStrength(Float_t alpha = -1.)

Data Members

private:
Float_tfAlpha! regularization parameter in CC pruning
Bool_tfDebug! debug flag
Int_tfOptimalK! index of the optimal tree in the pruned tree sequence
Bool_tfOwnQIndex! flag indicates if fQualityIndex is owned by this
vector<TMVA::DecisionTreeNode*>fPruneSequence! map of weakest links (i.e., branches to prune) -> pruning index
vector<Float_t>fPruneStrengthList! map of alpha -> pruning index
TMVA::SeparationBase*fQualityIndex! the quality index used to calculate R(t), R(T) = sum[t in ~T]{ R(t) }
vector<Float_t>fQualityIndexList! map of R(T) -> pruning index
TMVA::DecisionTree*fTree! (pruned) decision tree
const TMVA::DataSet*fValidationDataSet! the event sample to select the optimally-pruned tree
const TMVA::CCPruner::EventList*fValidationSample! the event sample to select the optimally-pruned tree

Class Charts

Inheritance Chart:
TMVA::CCPruner

Function documentation

void SetPruneStrength(Float_t alpha = -1.)
CCPruner( DecisionTree* t_max, const EventList* validationSample, SeparationBase* qualityIndex = NULL )
~CCPruner()
void Optimize()
std::vector<TMVA::DecisionTreeNode*> GetOptimalPruneSequence() const
 return the list of pruning locations to define the optimal subtree T' of T_max
Float_t GetOptimalQualityIndex() const
 return the quality index from the validation sample for the optimal subtree T'
Float_t GetOptimalPruneStrength() const
 return the prune strength (=alpha) corresponding to the prune sequence