Abstract
This paper describes how a multi-feature merging approach can be applied in semantic-based visual information retrieval and annotation. The goal is to identify the key visual patterns of specific objects from either static images or video frames. It is shown how the performance of such visual-to-semantic matching schemes can be improved by describing these key visual patterns using particular combinations of multiple visual features. A multi-objective learning mechanism is designed to derive a suitable merging metric for different features. The core of this mechanism is a widely used optimisation method – the multi-objective optimisation strategies. Assessment of the proposed technique has been conducted to validate its performance with natural images and videos.