The Voice Biometry Standardization (VBS) Initiative is formed by a voluntary alliance of researchers, institutes, manufacturers and organizations to adopt a single and common format of voice biometry data.

Motivation and objectives

Human voice is an indispensable part of personal identity. The exchange of voice data in its raw format (waveform) is however complicated due to legislative issues in different countries across the world. Multiple organizations have developed and deployed voice biometry technologies, such as speaker verification systems, authentication systems, caller ID, etc. Despite similarities of their systems and models, several discrepancies exist preventing mutual compatibility. This creates a barrier limiting the potential deployment and adoption of this technology and user experience.

What is voice biometry

In the context of this initiative, voice biometry is understood as the task of speaker recognition in an unknown audio recording. Automated speaker identification systems do this by comparing voice with stored samples in the database of voice models.

An example application of voice biometry could be:

  • personal authentication
  • speaker search
  • link analysis
  • call-center speaker identification
  • etc.

Objectives

The main objective of this initiative is to standardize a common representation of voice biometry data suitable for speaker characterization. This can be understood as a transformation of a digital audio voice signal into a fixed-length, small-enough set of numbers in a predefined format. During this transformation, statistical information relevant for retrieving speaker identity is preserved while the structure and content of the voice signal itself is removed. This means that voice data can be transmitted anonymously, yet allowing for identification of the speaker. The supporting sites (companies, research labs, etc.) believe that the existence of such standard would facilitate the integration of their technologies into existing systems and seamless communication with other systems world-wide.

The objectives of this initiative can be summarized as follows:

  • unify and encourage research of technology based on i-vectors (speaker comparison, speaker clustering, techniques for adaptation to new acoustic channels and conditions)
  • export/import voice biometry data in standardized format in both commercial and non-commercial applications
  • exchange/storage of voice biometry data in anonymous way
  • provide reference software implementation used for verification purposes

The objective is NOT to freeze or inhibit research or development of new speaker identification algorithms or techniques.

I-Vector Paradigm

Ever since their introduction in speaker recognition, the so-called i-vectors have been widely used in multiple fields of speech processing, such as language recognition, and even in speech recognition (to help compensate for speaker identity). An i-vector is an information-rich low-dimensional fixed-length vector extracted from the feature sequence representing a speech segment, discarding the content of the audio. For a detailed technical description, please refer to the VBS manual in the Download section.

Due to these properties, the i-vectors are occasionally referred to as audio voice-prints (although this term should be used with caution and therefore we avoid using it at all). As such, the i-vectors can be used for audio indexing purposes, information exchange (e.g. forensic or security/defense agencies), speaker search, etc. Such usage, however, assumes that the i-vector extraction method (including the parameters of the method) is kept fixed, so that all i-vectors are compatible, and that their direct comparison is feasible.

Who Will Benefit

The proposed standard preserves all statistical information needed for speaker identification/recognition without the need of the voice signal itself. Thus, it could be used in the same way as e.g. finger-prints in the forensic science today. This should help the security/defense bodies to increase the security at national and international level.

The Voice Biometry Standard is open-source (reference implementation), and license- and patent-free. Any supporters are free to join this initiative at any time. Eventually, the initiative could lead to official standardization process at the selected Standard Development Organizations (SDOs).

Here's a list of potential beneficiaries:

  • Police and Investigation Forces
  • Law Enforcement Agencies
  • Border service agencies
  • Intelligence Agencies
  • Financial institutions (incl. banks and insurance)

The Voice Biometry Standard could help in the following areas (application scenarios):

  • Fraud prevention in financial market
  • Securing banking transactions
  • Access control (Authentication/Verification)
  • Proof of evidence
  • Crime prevention
  • Fight against organized crime and drug trafficking
  • Border and immigration control

Description of the Standard

The standard is based on a conventional speaker verification system specifically tuned for operation on telephone data. The exact specification of the system is described in the VBS Manual in the Download section. The system consists of functional blocks not all of which are part of the standardization effort.

The following blocks are being standardized:

  • Acoustic feature extraction
  • i-vector extraction algorithm
  • i-vector extraction parameters (Gaussian Mixture Model parameters, i-vector extractor parameters)
  • The data exchange formats is being standardized. The standard is coming mainly from implementations and features that are spread among the research community

The following parts of the system are not meant to be standardized, although they are provided in the demo package and are essential part of the speaker recognition system:

  • Voice activity detection
  • i-vector post-processing
  • i-vector scoring

TBD - schematic diagram of the voice biometry standard in the entire SID chain.

Data format

Two data exchange formats are being defined:

  • Binary: for data storing and exchange with a minimal overhead
  • ASCII: for data exchange over standard textual channels and emails, HTTP, XML. Base64 encoding of the vector will be used.

The reference implementation can be downloaded in Download section.

Supporting Organizations and Companies

This initiative has been launched by the BUT Speech@FIT research group of Brno University of Technology in July 2015. The following organizations support this Voice Biometry Standardization Initiative.

BUT Speech@FIT

Phonexia
Phonexia

In order to join this initiative, please send us email to info@voicebiometry.org.

News/Events

2015-09 Presentation about Voice Biometry Standardization Initiative at InterSpeech 2015, Dresden The Voice Biometry Standardization Initiative will be presented and discussed at InterSpeech 2015 Dresden (TUESDAY Sep 8th 15.00-16.00). Please lock time in your schedule to come.
2015-07 Launched www.voicebiometry.org, uploaded reference software implementation (see the Download section)
2014-09 Voice Biometry Standardization Initiative at InterSpeech 2014 Singapore Voice Biometry researcher groups discussed the necessary steps of Voice Standardization. The Brno University of Technology agreed to supervise next steps. The research groups agreed to follow the discussion.
2014-06 Voice Biometry Standardization Initiative discussed at ISS Prague 2014 conference Voice Biometry vendors and researchers discussed the initial idea of Voice Biometry Standardization Initiative as the request came from several end users. The vendors agreed to follow the discussion.

If you wish to subscribe for news, please send us email to info@voicebiometry.org.

Download

Contact

Contact email: info@voicebiometry.org

Acknowledgement: This work was supported by Ministry of Interior of the Czech Republic (project No. VG20132015129 "ZAOM") and by European Union’s Horizon 2020 research and innovation programme under grant agreement No 645323 (BISON).