Automatic Scene Interpretation with Totally Occluded Objects

Research output: ThesisDoctoral Thesis

Standard

Bibtex - Download

@phdthesis{c33d30803021439e9465753050408a27,
title = "Automatic Scene Interpretation with Totally Occluded Objects",
abstract = "In this thesis we propose and evaluate a customizable computer vision system with cognitive abilities that tracks multiple objects over a long period of time, even if several of the relevant objects are totally occluded by other objects. At any time the system gives plausible object locations of all relevant objects, independently of their visibility, by maintaining possible interpretations of the observed visual input data. We use an approach that combines bottom-up visual processing with top-down reasoning and inference. Furthermore, our computer vision system has learning capabilities. These learning capabilities are used to obtain more robust tracking results if the visual appearance of relevant objects changes gradually over time. Our modular vision system allows to use several tracking algorithms from the literature, as long as they fit our minimum interface requirements. Template trackers, mean-shift trackers, and interest-point based trackers are employed to show the adaptability of our vision system. Consequently, in the second part of this thesis, we study the effect of combining different types of low-level visual features. The key intuition is that a system using a rich set of low-level visual features should be more robust than a system relying on only a single visual feature. The problem is which visual features are well suited for a specific relevant object. A method is proposed which combines the outcome of detectors based on different visual features using a support vector machine (SVM). Our feature combination is tested on the standard Visual Object Classes challenges (VOC) datasets. Results on the VOC datasets show that our method significantly improves over the performance of detectors which use only a single visual feature.",
keywords = "scene interpretation, computer vision, video processing, tracking, videos, image classification, machine learning, Szeneninterpretation, Bildverstehen, Videoverarbeitung, Tracking, Bildklassifikation, Maschinelles Lernen",
author = "Martin Antenreiter",
note = "no embargo",
year = "2017",
language = "English",

}

RIS (suitable for import to EndNote) - Download

TY - BOOK

T1 - Automatic Scene Interpretation with Totally Occluded Objects

AU - Antenreiter, Martin

N1 - no embargo

PY - 2017

Y1 - 2017

N2 - In this thesis we propose and evaluate a customizable computer vision system with cognitive abilities that tracks multiple objects over a long period of time, even if several of the relevant objects are totally occluded by other objects. At any time the system gives plausible object locations of all relevant objects, independently of their visibility, by maintaining possible interpretations of the observed visual input data. We use an approach that combines bottom-up visual processing with top-down reasoning and inference. Furthermore, our computer vision system has learning capabilities. These learning capabilities are used to obtain more robust tracking results if the visual appearance of relevant objects changes gradually over time. Our modular vision system allows to use several tracking algorithms from the literature, as long as they fit our minimum interface requirements. Template trackers, mean-shift trackers, and interest-point based trackers are employed to show the adaptability of our vision system. Consequently, in the second part of this thesis, we study the effect of combining different types of low-level visual features. The key intuition is that a system using a rich set of low-level visual features should be more robust than a system relying on only a single visual feature. The problem is which visual features are well suited for a specific relevant object. A method is proposed which combines the outcome of detectors based on different visual features using a support vector machine (SVM). Our feature combination is tested on the standard Visual Object Classes challenges (VOC) datasets. Results on the VOC datasets show that our method significantly improves over the performance of detectors which use only a single visual feature.

AB - In this thesis we propose and evaluate a customizable computer vision system with cognitive abilities that tracks multiple objects over a long period of time, even if several of the relevant objects are totally occluded by other objects. At any time the system gives plausible object locations of all relevant objects, independently of their visibility, by maintaining possible interpretations of the observed visual input data. We use an approach that combines bottom-up visual processing with top-down reasoning and inference. Furthermore, our computer vision system has learning capabilities. These learning capabilities are used to obtain more robust tracking results if the visual appearance of relevant objects changes gradually over time. Our modular vision system allows to use several tracking algorithms from the literature, as long as they fit our minimum interface requirements. Template trackers, mean-shift trackers, and interest-point based trackers are employed to show the adaptability of our vision system. Consequently, in the second part of this thesis, we study the effect of combining different types of low-level visual features. The key intuition is that a system using a rich set of low-level visual features should be more robust than a system relying on only a single visual feature. The problem is which visual features are well suited for a specific relevant object. A method is proposed which combines the outcome of detectors based on different visual features using a support vector machine (SVM). Our feature combination is tested on the standard Visual Object Classes challenges (VOC) datasets. Results on the VOC datasets show that our method significantly improves over the performance of detectors which use only a single visual feature.

KW - scene interpretation

KW - computer vision

KW - video processing

KW - tracking

KW - videos

KW - image classification

KW - machine learning

KW - Szeneninterpretation

KW - Bildverstehen

KW - Videoverarbeitung

KW - Tracking

KW - Bildklassifikation

KW - Maschinelles Lernen

M3 - Doctoral Thesis

ER -