• DocumentCode
    569162
  • Title

    MASK: Robust Local Features for Audio Fingerprinting

  • Author

    Anguera, Xavier ; Garzon, Antonio ; Adamek, Tomasz

  • Author_Institution
    Telefonica Res., Barcelona, Spain
  • fYear
    2012
  • fDate
    9-13 July 2012
  • Firstpage
    455
  • Lastpage
    460
  • Abstract
    This paper presents a novel local audio fingerprint called MASK (Masked Audio Spectral Keypoints) that can effectively encode the acoustic information existent in audio documents and discriminate between transformed versions of the same acoustic documents and other unrelated documents. The fingerprint has been designed to be resilient to strong transformations of the original signal and to be usable for generic audio, including music and speech. Its main characteristics are its locality, binary encoding, robustness and compactness. The proposed audio fingerprint encodes the local spectral energies around salient points selected among the main spectral peaks in a given signal. Such encoding is done by centering on each point a carefully designed mask defining regions of the spectrogram whose average energies are compared with each other. From each comparison we obtain a single bit depending on which region has more energy, and group all bits into a final binary fingerprint. In addition, the fingerprint also stores the frequency of each peak, quantized using a Mel filterbank. The length of the fingerprint is solely defined by the number of compared regions being used, and can be adapted to the requirements of any particular application. In addition, the number of salient points encoded per second can be also easily modified. In the experimental section we show the suitability of such fingerprint to find matching segments by using the NIST-TRECVID benchmarking evaluation datasets by comparing it with a well known fingerprint, obtaining up to 26% relative improvement in NDCR score.
  • Keywords
    audio coding; binary codes; channel bank filters; Mel filter bank; NDCR score; NIST-TRECVID benchmarking evaluation datasets; acoustic documents; acoustic information; audio documents; binary encoding; local audio fingerprinting; masked audio spectral keypoints; music; salient points; spectrogram; speech; Acoustics; Databases; Encoding; Fingerprint recognition; Robustness; Spectrogram; Time frequency analysis; Audio fingerprinting; audio indexing; copy detection;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Multimedia and Expo (ICME), 2012 IEEE International Conference on
  • Conference_Location
    Melbourne, VIC
  • ISSN
    1945-7871
  • Print_ISBN
    978-1-4673-1659-0
  • Type

    conf

  • DOI
    10.1109/ICME.2012.137
  • Filename
    6298443