Voice recognition


Strategic research collaboration with Tampere University of Technology Audio Research Group

Robust processing engine designed to handle Big Data

  • Based on 7 years of R&D in voice recognition
  • Capable of processing millions of audio records each day
  • Pre-processing capability to quickly identify issues with records and associated recorders allowing users to respond quickly and prevent the build-up of corrupted data
  • Supported audio formats include WAV, MP3 and WMA. Internal audio format is 8kHz, 16bit, Mono, PCM WAV files
  • Takes advantage of the industry standard open-source technologies such as Matlab Audio Tool Box, Weka, LIUM open-source library, CMU Sphinx and Kaldi

Extraction of behavioral features from voice records

  • Machine-learning based algorithms capable of discovering behavioral features (e.g. anger, laughter, whispering, questions etc.)
  • Behavioral feature extraction is language-agnostic
  • Feature extraction from voice meta-data, extraction of low-level features like MFCC, AllPolleGD, iVectors
  • Paralinguistic models using AI classifiers (DNN, GMM, SVM)
  • Clustering using HMM, Viterbi and different distance definitions


Recognition of words and phrases using Deep Neural Networks

  • Neural networks have been trained on voice data libraries specific to the financial industry
  • Acoustic Keyword Spotting (KWS) systems significantly outperform Speech-To-Text algorithms in terms of accuracy on low-quality noise recordings (typical for financial industry audio records)
  • Ongoing improvement in accuracy due to continuous training and optimizations
  • Continuous improvement of efficiency and effectiveness in collaboration with Tampere University of Technology Audio Research Group and Aalto University Department of Signal Processing and Acoustics.


© 2014-2016 Behavox