Description of the parameters

Options and functionalities might vary from one service to another. Nevertheless, we try to uniformize the parameters between the services as much as possible.

To help you find what you need in all these parameters, you may use the following table. The required parameters are blue.

Note that some parameters might not work with certain languages. The details can be found on the respective websites of the service you will use.

ParameterTypeDescriptionAmazonGoogleMicrosoftRev.aiSpeechmaticsElevenLabs
alternative_language_codesarray(string)The other language that might be spoken in the audio. It gives hint to the services to automatically determine spoken languages in your audio.YesNoNoNoNoNo
attach_punctuationbooleanAllows you to merge punctuation item with the previous (or next if none) word.YesYesYesYesYesYes
audio_channel_countintegerThe audio data might include a channel for each speaker present on the recording.YesNoSee channelsYesNoNo
channelsarray(integer)The audio data might include a channel for each speaker present on the recording. Indicates a collection of the requested channel numbers. In the default case, the channels 0 and 1 are considered.See audio_channel_countNoYesSee audio_channel_countNoNo
content_redactionstring
enum: redacted, redacted_and_unredacted
Enables automatic content redaction. Accepts redacted and redacted_and_unredacted. You might want to leave this argument empty if you don't want content redaction. The only redaction type available is PII.YesNoNoNoNoNo
diarization_speaker_countinteger ≥ 2If speaker diarization is enabled, you can provide the number of speaker to improve the diarization.YesNoNoNoNoNo
enable_automatic_punctuationbooleanEnables automatic punctuation.NoYesSee punctuation_modeYesNoNo
enable_disfluenciesbooleanAdd disfluencies (such as "ums" and "uhs") in the result.
The support is ensured for english only, but could work with other languages.
NoNoYesYesNoNo
enable_speaker_diarizationbooleanAllows you to identify and separate the different speakers.YesNoYesYesYesYes
enable_word_time_offsetsbooleanProvides start and end timestamps for each word.NoYesYesNoNoYes
language_codestringThe language of the supplied audio as a BCP-47 language tag. (Example: en-US) You can find a list of these tags on this website (https://www.techonthenet.com/js/language_tags.php).YesYesYesYes
enum: "es", "pt", "fr", "de", "en"
YesYes
max_alternativesintegerThe number of alternative transcriptions to provide in the response. By default, no alternatives are provided.YesYesNoNoNoNo
media_formatstring
enum: mp3, mp4, wav, flac, ogg, amr, webm
The format of your audio file. Each service has restriction on their allowed format.YesNoNoNoNoNo
phrasesarray(string)A list of words and phrases that provides hints for the speech recognition.NoYesNoYesYesNo
profanity_filterbooleanIndicates whether to filter out profane words or phrases.NoYesSee profanity_filter_modeYesNoNo
profanity_filter_modestring enum: None, Masked, Removed, TagsIndicates whether to filter out profane words or phrases. Can be None (deactivate filtering), Masked (replace word with asterisks), Removed (remove word) or Tags (put tags around the word).NoSee profanity_filterYesSee profanity_filterNoNo
punctuation_modestring
enum: None, Dictated, Automatic, DictatedAndAutomatic
Indicates the punctuation mode to use. Can be None (deactivate punctuation), Dictated (indicate an explicit punctuation), Automatic (allow decoder to process punctuation) or DictatedAndAutomatic (use dictated and automatic punctuation).NoSee enable_automatic_punctuationYesSee enable_automatic_punctuationNoNo
sample_rate_hertzintegerThe format of your audio file. Each service has restriction on their allowed format.
The sample rate (in Hertz) of the supplied audio.
YesYesNoNoNoNo
use_enhanced_modelbooleanEnhanced models might give better results but have a higher cost.NoYesNoNoYesNo
vocabularyobject { phrases }Contains additional contextual information for processing this audio. This argument contains phrases, a list of words and phrases that provides hints for the speech recognition.NoYesNoYesYesNo
modelstringThe model name or ID of the provider that should be used for the request.NoYesNoNoNoYes
tag_audio_eventsbooleanTranscribe certain audio events like music or laughter.NoNoNoNoNoYes