Options and functionalities might vary from one service to another. Nevertheless, we try to uniformize the parameters between the services as much as possible.

To help you find what you need in all these parameters, you may use the following table. The required parameters are blue.

Note that some parameters might not work with certain languages. The details can be found on the respective websites of the service you will use.

ParameterTypeDescriptionAmazonGoogleMicrosoftRev.aiSpeechmatics
alternative_language_codesarray(string)The other language that might be spoken in the audio. It gives hint to the services to automatically determine spoken languages in your audio.YesNoNoNoNo
attach_punctuationbooleanAllows you to merge punctuation item with the previous (or next if none) word.YesYesYesYesYes
audio_channel_countintegerThe audio data might include a channel for each speaker present on the recording.YesNoSee channelsYesNo
channelsarray(integer)The audio data might include a channel for each speaker present on the recording. Indicates a collection of the requested channel numbers. In the default case, the channels 0 and 1 are considered.See audio_channel_countNoYesSee audio_channel_countNo
content_redactionstring
enum: redacted, redacted_and_unredacted
Enables automatic content redaction. Accepts redacted and redacted_and_unredacted. You might want to leave this argument empty if you don't want content redaction. The only redaction type available is PII.YesNoNoNoNo
diarization_speaker_countinteger ≥ 2If speaker diarization is enabled, you can provide the number of speaker to improve the diarization.YesNoNoNoNo
enable_automatic_punctuationbooleanEnables automatic punctuation.NoYesSee punctuation_modeYesNo
enable_disfluenciesbooleanAdd disfluencies (such as "ums" and "uhs") in the result.
The support is ensured for english only, but could work with other languages.
NoNoYesYesNo
enable_speaker_diarizationbooleanAllows you to identify and separate the different speakers.YesNoYesYesYes
enable_word_time_offsetsbooleanProvides start and end timestamps for each word.NoYesYesNoNo
language_codestringThe language of the supplied audio as a BCP-47 language tag. (Example: en-US) You can find a list of these tags on this website (https://www.techonthenet.com/js/language_tags.php).YesYesYesYes
enum: "es", "pt", "fr", "de", "en"
Yes
max_alternativesintegerThe number of alternative transcriptions to provide in the response. By default, no alternatives are provided.YesYesNoNoNo
media_formatstring
enum: mp3, mp4, wav, flac, ogg, amr, webm
The format of your audio file. Each service has restriction on their allowed format.YesNoNoNoNo
phrasesarray(string)A list of words and phrases that provides hints for the speech recognition.NoYesNoYesYes
profanity_filterbooleanIndicates whether to filter out profane words or phrases.NoYesSee profanity_filter_modeYesNo
profanity_filter_modestring enum: None, Masked, Removed, TagsIndicates whether to filter out profane words or phrases. Can be None (deactivate filtering), Masked (replace word with asterisks), Removed (remove word) or Tags (put tags around the word).NoSee profanity_filterYesSee profanity_filterNo
punctuation_modestring
enum: None, Dictated, Automatic, DictatedAndAutomatic
Indicates the punctuation mode to use. Can be None (deactivate punctuation), Dictated (indicate an explicit punctuation), Automatic (allow decoder to process punctuation) or DictatedAndAutomatic (use dictated and automatic punctuation).NoSee enable_automatic_punctuationYesSee enable_automatic_punctuationNo
sample_rate_hertzintegerThe format of your audio file. Each service has restriction on their allowed format.
The sample rate (in Hertz) of the supplied audio.
YesYesNoNoNo
use_enhanced_modelbooleanEnhanced models might give better results but have a higher cost.NoYesNoNoYes
vocabularyobject { phrases }Contains additional contextual information for processing this audio. This argument contains phrases, a list of words and phrases that provides hints for the speech recognition.NoYesNoYesYes