What’s the Benefit of Accessing Raw Media?
In the post-pandemic era, organizations are increasingly focused on enhancing customer experiences through innovative technology. Customers will determine if they’ll do repeat business with your company based on their communications experiences. According to the latest Vonage Global Customer Engagement Report, 75% of customers won’t make a repeat purchase after a poor experience, while 58% will spread the word about good experiences to family and friends.
Increasingly, organizations are turning to artificial intelligence capabilities that promote greater customer engagement with features like live captions, transcriptions, translations, sentiment analysis, post-call summaries, and gaze and posture detection.
To embark on the AI journey, companies will first need to secure access to raw media, such as audio, video, or text. While obtaining this access can pose a challenge and require significant development efforts, it is crucial to unlock the full potential of AI analysis and processing.
Simplifying Access to Media
At Vonage, we streamline media capture. We grant customers access to raw media and provide them with the option to send it to their preferred services for analysis or processing. Here are the three most common types of media you may want to access.
Audio Access: Post-call audio access can enable a variety of features such as transcription, translation, sentiment analysis, summary, Electronic Health Records (EHR), and media intelligence. Real-time audio access enables live captions, live translation, live sentiment analysis, noise suppression, and echo cancellation.
Video Access: Post-call video access can enable video analysis for surveillance, video editing, education and training, and possible legal evidence. Real-time video access enables remote monitoring and live object/action detection.
Text Access: Access to text media be used for Q&A or sharing links or documents during a conference call. This information can be stored or processed in real-time or after the call to build search and indexing and media intelligence.
Introducing Vonage Audio Connector
Vonage Audio Connector enables customers to natively extract raw audio streams from live video sessions and send them to speech recognition services, such as AWS Transcribe, Google Speech to Text, and Azure Speech to Text. This enables real-time and offline processing of audio streams, allowing customers to build captions, transcriptions, translations, search and indexing, content moderation, media intelligence, Electronic Health Records (EHR), sentiment analysis, and more.
What is Unique About the Audio Connector Native Audio Capture?
Vonage customers can already get access to raw audio from WebRTC clients. Here is a blogp ost explaining how to build Live Captions using Symbl.ai and capturing the audio from the client side. With Audio Connector, you can now capture audio from the media router itself.
What’s the advantage of accessing raw audio from the server side instead of client side? We’re glad you asked! Client side audio capture needs development effort on all platforms, but server side audio capture means developer efforts are only needed on that end.. The audio stream also needs to be sent to the CPaaS servers and to the audio processing service, which uses twice the bandwidth. Server side audio capture reduces the burden on client side by forwarding the audio from the server side, and it avoids using extra audio bandwidth on the client side.
Client side solutions also can’t capture audio from SIP dial-in participants, but server side audio capture can collect that media.
Client Side Audio Capture
Server Side Audio Capture
Development effort needed for each device platform (SDK)
Native support for all devices
Audio traffic handling
Audio stream is created on the client side and is sent to the CPaaS servers AND the audio processing service; this results in an increased burden on the client
Audio stream is created on the client side and is sent to the CPaaS servers; servers forward to the audio processing service
Working with firewalls
Challenging; third-party service might be blocked on the client side
Third-party service are connected from the server side
Cost of ownership
Cost of the audio processing service
Cost of the audio processing service and server utilization
Using Audio Connector
With Audio Connector, Vonage customers simply set a WebSocket URL for the video session and decide if they want to send a single audio stream per WebSocket or multiple audio streams per WebSocket.
The single stream allows the customer application to identify the speaker. This is important in healthcare conversations when it’s necessary to differentiate between doctor and patient audio. If identifying the speaker is not a concern, customers can send multiple streams per WebSocket connection.
Once you have Audio Connector, your application will be able to:
- Set a preferred WS(S) URL to send the audio streams
- Set single or multiple streams per WS
- Identify the speakers (with single stream only)
Once the stream is captured, your application can then connect to any third-party conversational AI provider. One of the major providers can fulfill most use cases, but smaller providers can also serve specialty use cases.
Audio Connector Use Cases
Audio Connector has a wide variety of use cases. Here’s some representative examples.
Provide automated live captions in a video call
Create a live/offline record of the conversations
Electronic Health Records (EHR)
Build EHRs based on doctor’s speech
Live/offline translations for accessibility and comprehension
Search & index
Save keywords for indexing and searching the content
Control conversations for obscene / unacceptable content
Extract important action points or summarize meetings
Live/offline analysis of the reactions of the speakers
Get started with Vonage Audio Connector
Vonage Audio Connector enables raw audio access from live Vonage video sessions, unlocking numerous possibilities for enhanced customer experiences. With the Audio Connector, you level up your customer interactions while reducing developer workload.