Automatic Scene Interpretation with Totally Occluded Objects
Research output: Thesis › Doctoral Thesis
Standard
2017.
Research output: Thesis › Doctoral Thesis
Harvard
Author
Bibtex - Download
}
RIS (suitable for import to EndNote) - Download
TY - BOOK
T1 - Automatic Scene Interpretation with Totally Occluded Objects
AU - Antenreiter, Martin
N1 - no embargo
PY - 2017
Y1 - 2017
N2 - In this thesis we propose and evaluate a customizable computer vision system with cognitive abilities that tracks multiple objects over a long period of time, even if several of the relevant objects are totally occluded by other objects. At any time the system gives plausible object locations of all relevant objects, independently of their visibility, by maintaining possible interpretations of the observed visual input data. We use an approach that combines bottom-up visual processing with top-down reasoning and inference. Furthermore, our computer vision system has learning capabilities. These learning capabilities are used to obtain more robust tracking results if the visual appearance of relevant objects changes gradually over time. Our modular vision system allows to use several tracking algorithms from the literature, as long as they fit our minimum interface requirements. Template trackers, mean-shift trackers, and interest-point based trackers are employed to show the adaptability of our vision system. Consequently, in the second part of this thesis, we study the effect of combining different types of low-level visual features. The key intuition is that a system using a rich set of low-level visual features should be more robust than a system relying on only a single visual feature. The problem is which visual features are well suited for a specific relevant object. A method is proposed which combines the outcome of detectors based on different visual features using a support vector machine (SVM). Our feature combination is tested on the standard Visual Object Classes challenges (VOC) datasets. Results on the VOC datasets show that our method significantly improves over the performance of detectors which use only a single visual feature.
AB - In this thesis we propose and evaluate a customizable computer vision system with cognitive abilities that tracks multiple objects over a long period of time, even if several of the relevant objects are totally occluded by other objects. At any time the system gives plausible object locations of all relevant objects, independently of their visibility, by maintaining possible interpretations of the observed visual input data. We use an approach that combines bottom-up visual processing with top-down reasoning and inference. Furthermore, our computer vision system has learning capabilities. These learning capabilities are used to obtain more robust tracking results if the visual appearance of relevant objects changes gradually over time. Our modular vision system allows to use several tracking algorithms from the literature, as long as they fit our minimum interface requirements. Template trackers, mean-shift trackers, and interest-point based trackers are employed to show the adaptability of our vision system. Consequently, in the second part of this thesis, we study the effect of combining different types of low-level visual features. The key intuition is that a system using a rich set of low-level visual features should be more robust than a system relying on only a single visual feature. The problem is which visual features are well suited for a specific relevant object. A method is proposed which combines the outcome of detectors based on different visual features using a support vector machine (SVM). Our feature combination is tested on the standard Visual Object Classes challenges (VOC) datasets. Results on the VOC datasets show that our method significantly improves over the performance of detectors which use only a single visual feature.
KW - scene interpretation
KW - computer vision
KW - video processing
KW - tracking
KW - videos
KW - image classification
KW - machine learning
KW - Szeneninterpretation
KW - Bildverstehen
KW - Videoverarbeitung
KW - Tracking
KW - Bildklassifikation
KW - Maschinelles Lernen
M3 - Doctoral Thesis
ER -