Searching for similar videos in the WeVerify’s stored content based on audio

  • home
  • /
  • Blog Posts
  • /
  • Searching for similar videos in the WeVerify’s stored content based on audio
By on November 25th, 2020 in Blog Posts

In this post, we explain the basics behind our method for calculating video similarity based on audio information. This work was carried out in the context of Near-Duplicate Detection, a key component of the WeVerify project. Based on our results, we have published a research paper titled “Audio-based Near-Duplicate Video Retrieval with Audio Similarity Learning,” which has been accepted for publication at this year’s International Conference on Pattern Recognition (ICPR 2020).

To calculate the similarity between two compared videos, we have to extract feature descriptors for the audio signals of the videos and then calculate the similarity between them as the final similarity score of the video pair. In this work, we have designed two processes that implement these functionalities: (i) a feature extraction scheme based on transfer learning from a pre-trained Convolutional Neural Network (CNN), and (ii) a similarity calculation process based on video similarity learning.

Feature extraction

Let’s begin with feature extraction. We employ the pre-trained CNN network designed for transfer learning. The network is trained on a large-scale dataset, namely AudioSet, consisting of approximately 2.1 million weakly-labeled videos from YouTube with 527 audio event classes.

To extract features, we first generate the Mel-filtered spectrogram from the audio of the videos.  The generated spectrograms are divided into overlapping time frames, which are then fed to the feature extraction CNN. To extract compact audio representation for each spectrogram frame, we apply Maximum Activation of Convolutions (MAC) on the activations of the intermediate convolutional layers. To improve the discriminative capabilities of the audio descriptors, we then apply PCA whitening and an attention-based scheme for the decorrelation and weighting of the extracted feature vectors, respectively.

Similarity calculation

To measure the similarity between the two compared videos, we employ the video similarity learning scheme for the robust and accurate similarity calculation. More precisely, having extracted the audio representation of the two videos, we can now calculate the similarity between all the descriptor pairs of the two videos. To do so, we calculate the similarity between the feature vectors of the corresponding video descriptors by applying the dot product. In that way, we generate a pairwise similarity matrix that contains the similarities between all vectors of the two videos.

Then, to calculate the similarity between the two videos, we provide the pairwise similarity matrix to a CNN network, which we call AuSiL. The network captures the temporal similarity structures existing within the content of the similarity matrix, and it is capable of learning robust patterns of within-video similarities. To calculate the final video similarity, we apply the hard tanh activation function on the values of the network output, and then we apply Chamfer Similarity to derive a single value, which is considered as the final similarity between the two videos.

Experimental results

For the evaluation of the proposed approach, we employ two datasets compiled for fine-grained incident and near-duplicate video retrieval, i.e., FIVR-200K and SVD. We have manually annotated the videos in the dataset according to their audio duplicity with the set of query videos. Also, we evaluate the robustness of our approach to audio speed transformations by artificially generating audio duplicates.

In the following table, we compare the retrieval performance of AuSiL against Dejavu, a publicly available Shazam-like system. The performance is measured based on mean Average Precision (mAP) on the two annotated datasets with two different settings, i.e., the original version and the artificially generated videos with speed transformation. AuSiL outperforms Dejavu by a considerable margin on three out of four runs. Dejavu achieves marginally better results on the original version of the FIVR-200K. It is evident that our approach is very robust against speed transformation, unlike the competing method.

mAP comparison of the proposed approach Dejavu, a publicly available  Shazam-like system. Superscript T indicates the runs with audio speed transformations.

For more details regarding the architecture and training of the model, but also for comprehensive experimental results, feel free to have a look at the AuSiL paper. The implementation of AuSiL is publicly available.

Author: Giorgos Kordopatis Zilos (CERTH).

Editor: Olga Papadopoulou (CERTH).

Image credits: respective persons named. Usage rights have been obtained by the authors named above for publication in this article. Copyright / IPR remains with the respective originators.

Note: This post is an adaptation of the Video similarity based on audio blog post, which was originally prepared for the CERTH Media Verification team (MeVer)  website.

  • Share:

Leave a Comment

sing in to post your comment or sign-up if you dont have any account. Privacy Policy

1. Purpose

The purpose of this Privacy Policy is to describe what we collect, use and sometimes share information about you through our online interfaces (e.g., websites and email) owned and controlled by us, including WeVerify and all subdomains (collectively referred to herein as the “Site”).

At WeVerify, we believe that you should have control of your data. Control starts with information. This is why you should know what data we collect from you and how we use it.

This notice and the accompanying policy is to enable you to make the best decisions about the information that you choose to share with us.2

2. Privacy Policy

By accessing and using any of WeVerify site, demonstrators or publicly available services, you expressly and knowingly consent to the information collection and use practices as described in this Privacy Policy.

3. Our Privacy Commitment

Our commitment to your privacy, is based on the following principles which we apply to our use of both your personally identifiable data (“Your Personal Data” or generally “Personal Data”) and to certain anonymous information we collect when you visit our Sites (“Technical Information”, and together with Personal Data, “Your Data”):

  • We will describe Your Data we will collect;
  • We will inform you clearly about our collection and use of Your Data;
  • We will either seek your express informed consent or rely on other legally permissible bases for the use of Your Data – either way, we will inform you of the basis for our use of Your Data;
  • We will give you control over the privacy preferences that apply to Your Data, including the rights to (a) change your mind about our use, (b) have access to change or correct inaccurate aspects of Your Data, and (c) require that we delete all or parts of Your Data (d) request Your Data in a portable format;
  • We will not sell or rent Your Personal Data to others;
  • We endeavor to maximize the protection of Your Data, and provide you with prompt notice in the unlikely event that a data loss incident or breach occurs; and
  • We will endeavor to be completely transparent and open about our data privacy policies and practices.

4. What Information does this Privacy Policy cover?

This Privacy Policy covers information we collect from you through all of our channels, including website, email and others. Some of our website’s functionality can be used without revealing any of Your Data. In order to access certain products, demonstrators or services, you may need to submit, or we may collect information that can be used to identify you.

Your Personal Data can include information such as your name and email address, among other things. You are responsible for ensuring the accuracy of the information you submit to us. Inaccurate information may affect your ability to use the site, download products, any follow-up information you request, and our ability to contact you. For example, your email address should be kept current because that is one of the primary manners in which we communicate with you.

5. How do we collect information?

We collect Your Data in the following ways:

  • You give it to us when you download software or documentation; register for an event such as a webinar; sign up for a newsletter; fill a form on the website or via any other sales or marketing channel;
  • You give it to us by email or phone inquiries or
  • We automatically collect Technical Information when you visit our Sites.

6. What information do we collect?  

When filling any form on the website, we collect Personal Data such as name; phone; email; company name, website and address; job title and category; social media data and nature of the interest.

In addition, we enrich the Personal Data above with Technical Information, related to:

  1. Conversion point (when, where, what campaign, source);
  2. Activity (dates of contact, email opens, link clicks, website visits, etc.);
  3. Opt-in Date to trace your consent;
  4. CRM identifiers;

When processing payments we additionally collect VAT ID, company identification and other information for invoicing and tax purposes.

7. How do we use Your Data collected at our sites?

We will use Your Data to:

  • Provide information, product or a service requested or consented to by you.
  • Comply with relevant contractual obligations with you and other third parties.
  • Improve Site performance and content, including troubleshooting and diagnostics.
  • Improve our engagement and interaction with you.
  • Facilitate your attendance at and participation in our events, communities or blogs.
  • Process a request or payment submitted to us.
  • Comply with legal requests.

8. What are your rights to control Your Data?

You have the right to request that we:

  • provide access to any of Your Personal Data we hold about you;
  • prevent the processing of Your Personal Data for direct marketing purposes;
  • update any of Your Personal Data which is out of date or incorrect;
  • delete Your Personal Data which we are holding about you;
  • restrict the way that we process Your Personal Data;
  • provide Your Personal Data to a third party provider of services; or
  • provide you with a copy of Your Personal Data which we hold about you.

We try to answer every email promptly where possible and provide our response within the time period stated by applicable law. Keep in mind, however, that there will be residual information that will remain within our databases, access logs and other records, which may or may not contain Your Personal Data. Please also note that certain parts of Your Personal Data may be exempt from such requests in certain circumstances, which may include if we need to keep processing Your Personal Data to comply with a legal obligation.

When you email us with a request, we may ask that you provide us with information necessary to confirm your identity.

8. What data do we retain?

We will only retain Your Data stored on our servers in accordance with the legitimate needs of our business and as required or permitted by applicable law. We will not retain any unused Personal Data on our systems longer than necessary for legitimate business purposes.