Recent highlights
Azure Custom Speech Service offers fast transcription, making it possible to transcribe audio at a quicker pace than the actual audio duration. Moreover, users can now benefit from the Azure AI Speech Toolkit extension available for Visual Studio Code users. This extension includes a variety of speech quick-starts and scenario samples that can be easily executed with a few clicks. Additionally, Azure Custom Speech Service introduces high definition (HD) voices in public preview for enhanced understanding, emotion detection in input text, and real-time tone adjustment. Another exciting addition is the availability of video translation in the Azure AI Speech service, offering users the capability to translate videos effectively. The service also now supports OpenAI text to speech voices for a diverse range of voice options. Users can leverage the custom voice API to create and manage personalized neural voice models efficiently.
Release notes
The latest release of the Speech SDK (1.42.0) in December 2024 brought some notable enhancements. For instance, in Java, the addition of Diagnostics logging APIs utilizing classes such as FileLogger, MemoryLogger, EventLogger, and SpxTrace has been added. Furthermore, support has been included for sending JSON property 'details' of meeting participants to the service. In the Go language, a new public property id SpeechServiceConnection_ProxyHostBypass was introduced to specify hosts where the proxy should not be used. Additionally, in JavaScript and Go, public properties such as Speech_SegmentationStrategy and Speech_SegmentationMaximumTimeMs have been added to determine the end of a spoken phrase and generate the final recognized result, respectively. Several bug fixes were also implemented, addressing embedded TTS voice reloading, offset calculation issues, deadlock problems, and aligning behavior at the end of speech across SDK languages. Furthermore, a Go language compilation error related to meeting transcription was rectified, along with improvements in logging.
Samples
The Speech SDK for the October 2024 release introduced new features like support for Amazon Linux 2023 and Azure Linux 3.0. Noteworthy bug fixes include resolving incomplete support for keyword recognition Advanced models produced after August 2024, addressing memory leaks in C#, and correcting issues with string usage. Samples demonstrating real-time text to speech avatars for Android and iOS were added, showcasing how to incorporate these avatars into mobile applications effectively. Additionally, Swift projects on iOS need to use the MicrosoftCognitiveServicesSpeech-EmbeddedXCFramework-1.41.1.zip or the MicrosoftCognitiveServicesSpeechEmbedded-iOS pod to include Advanced model support. Various fixes were made across different languages and platforms to enhance the user experience and address known issues.
Stay Ahead in Today’s Competitive Market!
Unlock your company’s full potential with a Virtual Delivery Center (VDC). Gain specialized expertise, drive
seamless operations, and scale effortlessly for long-term success.
Book a Meeting to Avail the Services of Azure Custom Speech Service