Image categorisation through Boosting using cost-minimising strategies for data labelling
Research output: Thesis › Doctoral Thesis
Standard
2009.
Research output: Thesis › Doctoral Thesis
Harvard
APA
Vancouver
Author
Bibtex - Download
}
RIS (suitable for import to EndNote) - Download
TY - BOOK
T1 - Image categorisation through Boosting using cost-minimising strategies for data labelling
AU - Savu-Krohn, Christian
N1 - no embargo
PY - 2009
Y1 - 2009
N2 - Previous work has shown that image categorisation using AdaBoost is a powerful method. There AdaBoost was utilised to select discriminative features to learn a classifier against a background class. As proposed in earlier work, we present recent extensions to that framework by (a) incorporating geometric relations between features into the weak learner and (b) providing a weight optimisation method to combine pairwise classifiers for multiclass classification. We evaluate our framework on the Xerox data set where we compare our results to the bag-of-keypoints approach. Moreover we report our results from the PASCAL VOC Challenge 2006. The mass of images available through image databases a.s.o. is huge but obtaining the class information needed to learn a classifier is usually considered to be costly. One way to deal with the general problem of costly labels is active learning, where points to be labelled are selected with the aim of creating a classifier with better performance than that of a classifier trained on an equal number of randomly sampled points. Previous work showed that active learning can improve the performance compared to standard passive learning. However the basic question of whether new examples should be queried at all is seldom addressed. This work deals with the labelling cost directly as recently proposed in our earlier work. The learning goal is defined as the minimisation of a cost which is a function of the expected model performance and the total cost of the labels used. This allows the development of general strategies and specific algorithms for (a) optimal stopping, where the expected cost dictates whether label acquisition should be terminated, and (b) empirical evaluation, where the cost is used as a performance metric for a given combination of learning, stopping and sampling methods. Though the main focus is optimal stopping, we also aim to provide the background for further developments and discussion within the field of active learning. Experimental results illustrate the proposed evaluation methodology and demonstrate the use of the introduced stopping method.
AB - Previous work has shown that image categorisation using AdaBoost is a powerful method. There AdaBoost was utilised to select discriminative features to learn a classifier against a background class. As proposed in earlier work, we present recent extensions to that framework by (a) incorporating geometric relations between features into the weak learner and (b) providing a weight optimisation method to combine pairwise classifiers for multiclass classification. We evaluate our framework on the Xerox data set where we compare our results to the bag-of-keypoints approach. Moreover we report our results from the PASCAL VOC Challenge 2006. The mass of images available through image databases a.s.o. is huge but obtaining the class information needed to learn a classifier is usually considered to be costly. One way to deal with the general problem of costly labels is active learning, where points to be labelled are selected with the aim of creating a classifier with better performance than that of a classifier trained on an equal number of randomly sampled points. Previous work showed that active learning can improve the performance compared to standard passive learning. However the basic question of whether new examples should be queried at all is seldom addressed. This work deals with the labelling cost directly as recently proposed in our earlier work. The learning goal is defined as the minimisation of a cost which is a function of the expected model performance and the total cost of the labels used. This allows the development of general strategies and specific algorithms for (a) optimal stopping, where the expected cost dictates whether label acquisition should be terminated, and (b) empirical evaluation, where the cost is used as a performance metric for a given combination of learning, stopping and sampling methods. Though the main focus is optimal stopping, we also aim to provide the background for further developments and discussion within the field of active learning. Experimental results illustrate the proposed evaluation methodology and demonstrate the use of the introduced stopping method.
KW - aktives Lernen
KW - optimales Stoppen
KW - Bildkategorisierung
KW - Boosting
KW - active learning
KW - optimal stopping
KW - image categorisation
KW - Boosting
M3 - Doctoral Thesis
ER -