The audio and speech processing team at Behavox love the unique challenges that frenetic and noisy trading rooms, multiple languages, colloquialism, and expressions of sentiment offer in their day to day business. But they were unable to resist the temptation to stress-test the state of the art voice recognition technology on historic audio from outer space. The parallels between the dialogue of sometimes stressed astronauts and their operations teams on Earth, and the communication between hectic traders and clients, are remarkably close. Short bursts, lots of static and cutouts, compounded by last century technology being used to capture the audio. “Come in Apollo 11. We are receiving you, just…”

The intrepid engineers at Behavox described their experiments and findings in a paper titled, “This is Houston. Say Again Please.” This was part of this Fearless Steps Challenge (Phase II), an initiative by the University of Texas (Dallas) Center for Robust Speech Systems (CRSS), that attracts some of the world’s finest speech processing teams. 

The organizers provided a limited amount of annotated data (less than 40 hours) but a vast amount of unlabelled naturalistic recordings (19,000 hours) that really presented the biggest challenge to leverage and unscramble. The focus was on single-channel semi-supervised learning strategies. The best submissions will be presented to the wider academic and scientific community at the world’s largest conference on speech technology – Interspeech 2020. The Behavox paper was selected by the organizers using a peer review process among an extremely high-quality group of applicants. 

Challenge Phase II set out four main tasks for the engineers to address with their solutions:

  • Speech Activity Detection (SAD)
  • Speaker Identification
  • Speaker Diarization (SD) in the wild with up to 60 speakers on the line
  • Automatic Speech Recognition (ASR)

The specific lexicon and speaking style resulted in high error rates on the systems which involved this data. The data provided by the organizers was explored by our team for semi-supervised training of ASR acoustic and language models, where they observed more than 17 percent relative word error rate improvement compared to training on the Challenge II data alone. 

They also compared several SAD and SD systems to approach the most difficult tracks of the challenge, where long 30-minute audio recordings were provided for evaluation without segmentation or speaker information. For all systems, the team reported substantial performance improvements compared to the Challenge II baseline systems, and achieved first-place ranking for SD and ASR, and fourth-place for SAD in the challenge. Two golds and one place away from another podium from just three entries!

The competition was a perfect opportunity to validate the strength of the machine learning algorithms in a public setting and in a totally different environment to the one that has been used to test and train such tools to cope with the world of finance and enterprise business.  

We proved our capability in interpreting communications two hundred thousand miles away in outer space, which gives our customers great comfort that the technology in which we have invested so much can do this effectively down here on planet Earth. We go to the greatest lengths to test the boundaries of our voice recognition capabilities. Space helmets off to the Voice Team at Behavox for another safe landing. –  The Behavox Paper in full – Challenge Results