audio Based Event Detection in Videos - A Survey

is directly computed from the signal. Pitch Synchronous Zero
Crossing Peak Amplitudes(PS-ZCPA) is an extension of ZCPA which also considers pitch information. It is found to be more robust to noise than ZCPA.
2) Amplitude-Based Features: Features that are computed from the amplitude of the signal directly are easy and its computation time is fast. MPEG-7 audio waveform is a descriptor that better describes the shape of a waveform on computing the maximum and minimum samples within non-overlapping frames. They are suitable for comparing waveform. Amplitude
Descriptor (AD) are the ones that have been developed to recognize sounds of the animal. The descriptor is the one that separates the signal with low and high amplitude by an adaptive threshold. AD signifies the waveform in both quiet and loud segments.
3) Power Based Features: The energy of a signal is defined as the square of the wave form’s amplitude. The power of a sound is defined as the energy transmitted per unit time or it is the mean-square of a signal. Short Time Energy (STE) is mainly used in fields of retrieving audio. Volume is another important feature which is used to detect silence and also used in segmenting music/speech. Volume is the root mean square of magnitude of the signal within a frame.
B. Physical features
The most common methods that are used to represent audio features in frequency domain are Fourier transforms and auto correlation. Other methods like Cosine transform, Wavelet transform and Q-transform are also used. Frequency features can be divided in two sets such as physical features and perceptual features.
...semantic meaning in the context of human auditory perception are called Perceptual Frequency
Features. Brightness, Tonality, Loudness, Pitch, Harmonicity are commonly used perceptual frequency features. A signal is composed of both low and high frequencies. A sound becomes brighter due its high frequency content and silence is less dominant due its less frequency content. Tonality is property of sound that discriminates tonal sounds from noisy sounds.
Tonality measures can be categorized into bandwidth measures and flatness measures. Loudness features are the ones that signifies the auditory sensation. Loudness measure can be used for audio retrieval. Pitch is dimension of sound which can be loudness, duration and timbre. Pitch is commonly related to chroma and harmonicity. Chroma is the one that is divided into 12 pitch classes, where each corresponds to one note of
