Exclusive: Speech recognition AI learns industry jargon with aiOla's novel approach

Title: Advancements in Speech Recognition Technology: aiOla’s Innovative Approach to Understanding Industry-Specific Jargon

Paragraph 1: The field of speech recognition is a crucial component of multimodal AI systems, with many enterprises eager to implement the technology. Despite significant advancements, existing speech recognition models often struggle to comprehend specific jargon and vocabulary, particularly in complex enterprise settings. To address this issue, aiOla, an Israeli startup specializing in speech recognition, has announced a novel approach that teaches these models to understand industry-specific terminology.

Paragraph 2: This development enhances the accuracy and responsiveness of speech recognition systems, making them more suitable for complex enterprise settings, even in challenging acoustic environments. As a case study, aiOla adapted OpenAI’s renowned Whisper model with its technique, reducing its word error rate and improving overall detection accuracy.

Paragraph 3: The startup claims that its approach can work with any speech recognition model, including Meta’s MMS model and proprietary models, potentially elevating even the highest-performing speech-to-text models.

Paragraph 4: The problem of jargon in speech recognition has been a persistent challenge, especially for models like OpenAI’s Whisper, which, despite matching human-level robustness and accuracy in English speech recognition, can struggle in complex, real-world environmental conditions.

Paragraph 5: To solve this problem, aiOla developed a two-step “contextual biasing” approach. First, their AdaKWS keyword spotting model identifies domain-specific and personalized jargon from a given speech sample. Then, these identified keywords are used to guide the ASR decoder, improving the model’s speech recognition capability.

Paragraph 6: In initial tests, aiOla used Whisper and tried two techniques to improve its performance. The first, KG-Whisper or keyword-guided Whisper, fine-tuned the entire set of decoder parameters, while the second, KG-Whisper-PT or prompt tuning, used only some 15K trainable parameters, making it more efficient. Both adapted models performed better than the original Whisper baselines on various datasets, even in challenging acoustic environments.

Paragraph 7: The approach can be used with different models, allowing enterprises to adapt their ASR models to their specific industries without retraining the entire system. This “zero-shot” model can make predictions without having seen any specific examples during training.

Paragraph 8: The adaptability of this approach can benefit various industries involving technical jargon, such as aviation, transportation, manufacturing, supply chain, and logistics. aiOla has already started deploying its adaptive model with Fortune 500 enterprises, resulting in significant efficiency improvements in handling jargon-heavy processes.

Paragraph 9: aiOla has published the research for its novel approach, hoping that other AI research teams will build on its work. However, the company is not currently providing API access to the adapted model or releasing the weights. Enterprises can only use it through the company’s product suite, which operates on a subscription-based pricing structure.

.st1{display:none}See more