Amazon Polly

Generate human-sounding voices in over 25 languages for speech-enabled applications like RSS feeds, websites, and videos with real-time streaming capabilities and custom voice options.

About Amazon Polly

Introduction

Amazon Polly is a top-end platform that leverages artificial intelligence (AI) to create natural-sounding human voices in over 25 languages. This platform provides users with an easy-to-use tool for accessing and customizing speech output with the help of Speech Synthesis Markup Language (SSML) tags and support for lexicons. One of the features that make Amazon Polly an excellent choice for businesses and developers is consistently fast response times, providing high-quality voices in just seconds. You can use this platform to add speech to common applications used globally, such as RSS feeds, websites, or videos, without losing user experience or speed. Developers involved in building interactive voice response systems will find Amazon Polly useful, as speech output can be easily stored and replayed. The platform offers 5 million characters free per month for the first 12 months with AWS free tier accounts, making it an ideal testing option without worrying about costs.

TLDR

Amazon Polly is a high-end AI platform that allows for the creation of human-sounding voices in over 25 languages with varying levels of naturalness using SSML tags and lexicons. Businesses and developers can benefit from consistently fast response times when deploying speech-enabled applications like RSS feeds, websites, and videos. It also supports real-time streaming, enabling the platform to function ideally for applications that require immediate and dynamic content such as weather updates and breaking news. Custom voice capabilities enable users to personalize their applications and differentiate themselves from other competitors. Amazon Polly supports all programming languages integrated into AWS SDK or Mobile SDK and offers exceptionally easy integration and customization for seamless integration with AWS workflows.

Company Overview

Amazon Polly is a powerful artificial intelligence (AI) tool that deploys high-quality, natural-sounding human voices in dozens of languages, making it a top choice for businesses and developers worldwide. With Amazon Polly, users can access and customize speech output with support for lexicons and Speech Synthesis Markup Language (SSML) tags. This AI-powered tool allows users to easily store and redistribute speech in standard formats such as MP3 and OGG.

One of Amazon Polly's standout features is its consistently fast response times, delivering lifelike voices and conversational user experiences in just seconds. As a result, businesses can add speech to applications with a global audience, such as RSS feeds, websites, or videos, without sacrificing user experience or speed. Furthermore, Amazon Polly's speech output can be easily stored and replayed, making it an excellent choice for interactive or automated voice response systems.

Amazon Polly is committed to delivering advanced improvements in speech quality. Users can employ Speech Synthesis Markup Language (SSML), a W3C standard XML-based markup language for speech synthesis applications, to support common SSML tags for phrasing, emphasis, and intonation. This allows for more nuanced, human-like speech that is perfect for a wide range of use cases. To get the most out of Amazon Polly, users can also learn about custom lexicons, speech synthesis, newscaster speaking style, and other features through the platform's comprehensive documentation and resources.

Finally, Amazon Polly provides users with 5 million characters free per month for 12 months with the AWS Free Tier, allowing users to test and refine their implementations without worrying about cost. No wonder Amazon Polly is such a popular choice for businesses and developers alike!

Features

Speech Synthesis

Lifelike Voices and Multilingual Support

Amazon Polly offers dozens of Standard and Neural Text-to-Speech (NTTS) voices, including NicoleOlivia, Vitória, Camila, Amy, Emma, Brian, Arthur, Léa, Céline, Mathieu, Rémi, Raveena, Aditi, Kajal, Joanna, Salli, Kendra, Kimberly, Ivy, Ruth, Matthew, Justin, Joey, Stephen, Penélope, and Lupe. This range of voices supports numerous languages and dialects, so you can select the ideal voice that suits your speech-enabled applications, regardless of your country. With NTTS voices, you can achieve more natural and human-like voices.

Metadata Stream

Amazon Polly makes it easy to request a metadata stream that provides information about when specific sentences, words, and sounds are being pronounced. This metadata stream is synced with the synthesized speech audio stream, allowing developers to create applications with an enhanced visual experience such as speech-synchronized facial animation or word highlighting like karaoke-style. The additional stream of metadata enables developers to provide a more immersive user experience for their applications.

Real-Time Streaming

Amazon Polly provides a real-time text-to-speech synthesis by immediately returning the audio stream directly to your application in various audio stream formats including MP3, Vorbis, and raw PCM. This feature allows developers to stream information to users in near-real time, making it perfect for dynamic content such as weather updates, breaking news, and other situational information. You can also choose from various sampling rates so your application can optimize bandwidth and audio quality.

Speech Quality and Style Options

SSML Support

Amazon Polly supports Speech Synthesis Markup Language (SSML), an XML-based markup language for speech synthesis. SSML offers support for various tags to control intonation, tone, and phrasing that create lifelike speech. Amazon Polly extends the capabilities of SSML through custom SSML tags that enable developers to access unique options. For example, developers can use custom SSML tags to make certain voices sound like a Newscaster when reading news articles. This flexibility helps developers create more engaging and lifelike speech, keeping audiences entertained and focused on the message they want to convey.

Newscaster Style

Amazon Polly gives developers the ability to synthesize speech that sounds like it’s spoken by a TV or radio Newscaster. This is useful when reading news articles or delivering flash briefing updates. The Newscaster style is currently supported by the US English (Matthew and Joanna) voices, the British English (Amy) voice, and the US Spanish (Lupe) voice using Neural Text-to-Speech technology. Audio samples in US English, British English, and US Spanish are available for listening.

Time-Driven Prosody

With the feature called Time-Driven Prosody, Amazon Polly enables developers to automatically adjust the speech rate based on a maximum allotted amount of time that you define. This feature is especially beneficial for developers using Amazon Polly for localization use cases. For instance, if a video was localized from US English to German and translated using Amazon Translate. Voice along with it, the German voice won’t be longer than the US English voice. Amazon Polly adjusts these aspects to more easily improve the dubbing process.

Easy Integration and Customization

Programming Language and AWS Integration Support

Amazon Polly supports all programming languages that are integrated into the AWS SDK (Java, Node.js, .NET, PHP, Python, Ruby, Go, and C++) and the AWS Mobile SDK (iOS/Android) also offers an HTTP API for users to create their own access layer. With AWS Management Console, AWS command-line interface (CLI), and full control of Amazon Polly capabilities using AWS Account ID(s), developers can integrate Amazon Polly seamlessly into their existing AWS workflows.

Custom Lexicons/Vocabularies

Amazon Polly's Custom Lexicons enable users to modify how certain words are pronounced. With this functionality, developers can modify the pronunciation of acronyms, company names, unknown foreign words, and neologisms. To modify these pronunciations, users upload custom XML files containing lexical entries. The customization can be as granular as taking one’s name, for example, Nguyen, and customizing its phoneme on a per-use case basis. This flexibility provides developers with more control over pronunciation, improving speech accuracy that end-users can enjoy.

Brand Voices

Brand Voices is a custom engagement service that allows users to work with the Amazon Polly team to create a Neural Text-to-Speech (NTTS) voice specifically for their organization. Users are guided through the entire process, from identifying persona to voice actor/actress selection with speech recordings in partner studios. After processing and model training, AWS will provide users access to their unique voice within their account ID(s). This unique feature enables organizations to personalize their products and applications and differentiate themselves from their competitors.

Amazon Polly
Alternatives

Company Results

Exceptional speech synthesis powered by Microsoft's advanced technology, offering a variety of voices and styles.

Resemble generates synthetic videos and voices for movies, TV shows, video games, and other creative content, providing new forms of expression with consent and transparency.

SpeechGen is an online tool that converts text into lifelike speech in multiple languages and voices, with customizable parameters and storage options for various user needs.

Synthesys X is an all-in-one, user-friendly platform for creating high-quality, personalized branded videos using cutting-edge artificial intelligence technology.