audio Based Event Detection in Videos - A Survey

728 Words2 Pages

indexing and retrieval systems for identifying videos in which few predefined events are shown. 7) Other Applications: Nahijima et.al [46] presented a quick and precise Motion Pictures Experts Group(MPEG) audio classification algorithm based on sub band data domain. Classification task was carried out for 4 segments such as silent, music, speech and applause segments for 1s unit. Later Bayesian discrimination method for multivariate Gaussian distribution was used for classification task. III. SYSTEM DESCRIPTION Digital video is a one that is generated from the camera which is in the form of pixels. A digital video is a sequences of images, called frames displayed at a frame rate, to create an illusion of animation. Frame rate can be defined as number of unique consecutive frames produced per second. Frame rate varies between several standards. A typical video has a frame rate of 25fps. A complete video is partitioned into acts. Each act is further partitioned into scenes. A scene is a sequence of actions where each consecutive frame differs with slight change. Audio is now extracted from the given video either at short-term frame level or at long-term clip level. Data representation of extracted audio signal addresses the issues of representing the examples to be classified in terms of feature vectors. The intention of modeling is to find a mapping from the feature space to the target labels so as to reduce the prediction error. The general system components of the audio based video event detection is presented in Figure 1. The major components of the system are the audio data representation and learning methodologies. An audio signal can be represented by many number of features. Audio feature extraction is an important phase for... ... middle of paper ... ...sing Rate is defined as the number of zero crossings in the temporal domain within a second. Kedem [30] defined ZCR as the measure of dominant frequency in the signal. ZCR is the common feature that is used for music/speech discrimination due to its simplicity. It is also used in other audio domains such as highlight detection [7], speech analysis [8], singer [68] and environmental sound detection [5]. Linear prediction zero crossing ration (LP-ZCR) is defined as the ration between the zero crossing count of the waveform and the zero crossing count of the linear prediction analysis filter [13]. These features help to discriminate between speech and non-speech audio signal. Zero Crossing Peak Amplitudes(ZCPA) has been presented by Kim et.al in [31] ,[32] which is highly suitable for speech recognition in noisy environments. It is an approximation of the spectrum which

Open Document