Options and functionalities might vary from one service to another. Nevertheless, we try to uniformize the parameters between the services as much as possible.
To help you find what you need in all these parameters, you may use the following table. The required parameters are blue.
Note that some parameters might not work with certain languages. The details can be found on the respective websites of the service you will use.
Parameter | Type | Description | Amazon | Microsoft | Rev.ai | Speechmatics | |
---|---|---|---|---|---|---|---|
alternative_language_codes | array(string) | The other language that might be spoken in the audio. It gives hint to the services to automatically determine spoken languages in your audio. | Yes | No | No | No | No |
attach_punctuation | boolean | Allows you to merge punctuation item with the previous (or next if none) word. | Yes | Yes | Yes | Yes | Yes |
audio_channel_count | integer | The audio data might include a channel for each speaker present on the recording. | Yes | No | See channels | Yes | No |
channels | array(integer) | The audio data might include a channel for each speaker present on the recording. Indicates a collection of the requested channel numbers. In the default case, the channels 0 and 1 are considered. | See audio_channel_count | No | Yes | See audio_channel_count | No |
content_redaction | string enum: redacted, redacted_and_unredacted | Enables automatic content redaction. Accepts redacted and redacted_and_unredacted. You might want to leave this argument empty if you don't want content redaction. The only redaction type available is PII. | Yes | No | No | No | No |
diarization_speaker_count | integer ≥ 2 | If speaker diarization is enabled, you can provide the number of speaker to improve the diarization. | Yes | No | No | No | No |
enable_automatic_punctuation | boolean | Enables automatic punctuation. | No | Yes | See punctuation_mode | Yes | No |
enable_disfluencies | boolean | Add disfluencies (such as "ums" and "uhs") in the result. The support is ensured for english only, but could work with other languages. | No | No | Yes | Yes | No |
enable_speaker_diarization | boolean | Allows you to identify and separate the different speakers. | Yes | No | Yes | Yes | Yes |
enable_word_time_offsets | boolean | Provides start and end timestamps for each word. | No | Yes | Yes | No | No |
language_code | string | The language of the supplied audio as a BCP-47 language tag. (Example: en-US) You can find a list of these tags on this website (https://www.techonthenet.com/js/language_tags.php). | Yes | Yes | Yes | Yes enum: "es", "pt", "fr", "de", "en" | Yes |
max_alternatives | integer | The number of alternative transcriptions to provide in the response. By default, no alternatives are provided. | Yes | Yes | No | No | No |
media_format | string enum: mp3, mp4, wav, flac, ogg, amr, webm | The format of your audio file. Each service has restriction on their allowed format. | Yes | No | No | No | No |
phrases | array(string) | A list of words and phrases that provides hints for the speech recognition. | No | Yes | No | Yes | Yes |
profanity_filter | boolean | Indicates whether to filter out profane words or phrases. | No | Yes | See profanity_filter_mode | Yes | No |
profanity_filter_mode | string enum: None, Masked, Removed, Tags | Indicates whether to filter out profane words or phrases. Can be None (deactivate filtering), Masked (replace word with asterisks), Removed (remove word) or Tags (put tags around the word). | No | See profanity_filter | Yes | See profanity_filter | No |
punctuation_mode | string enum: None, Dictated, Automatic, DictatedAndAutomatic | Indicates the punctuation mode to use. Can be None (deactivate punctuation), Dictated (indicate an explicit punctuation), Automatic (allow decoder to process punctuation) or DictatedAndAutomatic (use dictated and automatic punctuation). | No | See enable_automatic_punctuation | Yes | See enable_automatic_punctuation | No |
sample_rate_hertz | integer | The format of your audio file. Each service has restriction on their allowed format. The sample rate (in Hertz) of the supplied audio. | Yes | Yes | No | No | No |
use_enhanced_model | boolean | Enhanced models might give better results but have a higher cost. | No | Yes | No | No | Yes |
vocabulary | object { phrases } | Contains additional contextual information for processing this audio. This argument contains phrases, a list of words and phrases that provides hints for the speech recognition. | No | Yes | No | Yes | Yes |