General form
Each service might give different information for the same request. For this reason, the output of Hooho might vary from one service to another.
The general form of an output is
[
{
"transcript": string,
"start_time": number,
"end_time": number,
"confidence": number,
"phrases": [
{
"content": string,
"type": string,
"start_time": number,
"end_time": number,
"confidence": number,
"speaker_tags": array[integer],
"items": [
{
"content": string,
"type": string,
"start_time": number,
"end_time": number,
"confidence": number,
"speaker_tag": integer
}
]
}
],
"wer_data": [
{
"reference": string,
"wer": number
}
],
"wer_mean": number
}
]
[
{
"transcript": "I use who oh. It makes things easier!",
"start_time": 0.0,
"end_time": 5.38,
"phrases": [
{
"content": "I use who oh.",
"type": "phrase",
"start_time": 0.0,
"end_time": 2.51,
"items": [
{
"content": "I",
"type": "word",
"start_time": 0,
"end_time": 0.39,
"confidence": 1
},
{
"word": "use",
"start_time": 0.39,
"end_time": 1.26,
"confidence": 1
},
{
"content": "who",
"type": "word",
"start_time": 1.26,
"end_time": 1.96,
"confidence": 1
},
{
"content": "oh",
"type": "word",
"start_time": 1.96,
"end_time": 2.51,
"confidence": 1
},
{
"content": ".",
"type": "punctuation",
"start_time": 2.51,
"confidence": 1
}
]
},
{
"content": "It makes things easier.",
"type": "phrase",
"start_time": 2.51,
"end_time": 5.38,
"items": [
{
"content": "It",
"type": "word",
"start_time": 2.51,
"end_time": 3.12,
"confidence": 1
},
{
"word": "makes",
"start_time": 3.12,
"end_time": 3.81,
"confidence": 1
},
{
"content": "things",
"type": "word",
"start_time": 3.81,
"end_time": 4.33,
"confidence": 1
},
{
"content": "easier",
"type": "word",
"start_time": 4.33,
"end_time": 5.38,
"confidence": 1
},
{
"content": "!",
"type": "punctuation",
"start_time": 5.38,
"confidence": 1
}
]
}
],
"wer_data": [
{
"reference": "I love Hooho. It makes things easier!",
"wer": 0.2857142857142857
}
],
"wer_mean": 0.2857142857142857
}
]
Parameter description
transcript string
transcript stringThe transcription of your audio file. Based on the service, this transcript might be complete or not. If it is not complete, it should be an alternative result.
For example,
"transcript": "I use who oh. It makes things easier!"
"transcript": "I use Hooho"
start_time number
start_time numberThe beginning of the transcript, in seconds. The timestamps of the transcript are determined using the start_time and end_time each items.
For example,
"start_time": 0.37
end_time number
end_time numberThe end of the transcript, in seconds. The timestamps of the transcript are determined using the start_time and end_time each items.
For example,
"end_time": 49.63
confidence number
confidence numberA confidence score for the transcription, between 0 and 1. The higher it is, the more reliable the transcription is.
For example,
"confidence": 0.96
"confidence": 0.12
phrases array[object]
phrases array[object]A list of the recognized phrases in the transcript. Phrases are determined by punctuation. Without it, the phrases will likely be the whole transcript.
Each object of this array contains
contentstring The content of the phrase.
"content": "I use Hooho."
"content": "I use Hooho it makes things easier"
typestring The type of this object ("phrase" in this case).
"type": "phrase"
start_timenumber The beginning of the phrase, in second.
"start_time": 0.0
end_timenumber The end of the phrase, in second.
"end_time": 2.51
confidencenumber A confidence score for the phrase, between 0 and 1.
"confidence" : 1
speaker_tagsarray[integer] If you enabled speaker diarization, it contains a list of the speaker tags assigned to its items.
"speaker_tags": [0, 1]
itemsarray[object] The type of this object ("phrase" in this case).
The list of words in a phrase. The details are given below.
items array[object]
items array[object]The list of words in a phrase. Words are determined by spaces. A text without spaces will result in a single big word (but that should not happen).
Each object of this array contains
contentstring The content of the item.typestring The type of this object ("word" or "punctuation").
"content": "use"
"type": "word"
"content": "."
"type": "punctuation"
start_timenumber The beginning of the item, in second.
"start_time": 1.12
end_timenumber The end of the item, in second.
"end_time": 2.33
confidencenumber A confidence score for the phrase, between 0 and 1.
"confidence" : 0
speaker_taginteger If you enabled speaker diarization, it contains an integer representing the speaker associated with the word.
"speaker_tag": 1
wer_data array[object]
wer_data array[object]The information about WER (Word Error Rate) calculation. For each reference transcript, we will use our results to calculate a WER.
Each object of this array contains
referencestring The content of the reference transcript.wernumber The WER obtained with the transcript and the given reference, between 0 and 1. The lower it is, the closer the transcripts are.
"wer_data": [
{
"reference": "I use Hooho. It makes things easier!",
"wer": 0.2857142857142857
}
]
wer_mean number
wer_mean numberThe mean value of the WERs obtained with all the references.
For example,
"wer_mean": 0.2857142857142857
Occurence's conditions
Some fields require to be activated using the associated parameter in the "config" part of your request.
This table summarizes the possible output's field for each service, and the parameter responsible for its appearance. (Note that this table have been created for en-US language, but might vary from one language to another.)
| Field | Amazon | Microsoft | Rev.ai | Speechmatics | |
|---|---|---|---|---|---|
transcript | Always | Always | Always | Always | Always |
start_time | Always | enable_word_time_offsets OR enable_speaker_diarization | enable_word_time_offsets OR enable_speaker_diarization | Always | Always |
end_time | Always | enable_word_time_offsets OR enable_speaker_diarization | enable_word_time_offsets OR enable_speaker_diarization | Always | Always |
confidence | Never | Always | Never | Never | Always |
phrases | Always | enable_word_time_offsets OR enable_speaker_diarization | enable_word_time_offsets OR enable_speaker_diarization | Always | Always |
phrases {content} | Always | enable_word_time_offsets OR enable_speaker_diarization | enable_word_time_offsets OR enable_speaker_diarization | Always | Always |
phrases {type} | Always | enable_word_time_offsets OR enable_speaker_diarization | enable_word_time_offsets OR enable_speaker_diarization | Always | Always |
phrases {start_time} | Always | enable_word_time_offsets | enable_word_time_offsets | Always | Always |
phrases {end_time} | Always | enable_word_time_offsets | enable_word_time_offsets | Always | Always |
phrases {confidence} | Never | Never | Never | Never | Never |
phrases {speaker_tag} | enable_speaker_diarization | enable_speaker_diarization | enable_speaker_diarization | enable_speaker_diarization | enable_speaker_diarization |
phrases {items} | Always | enable_word_time_offsets OR enable_speaker_diarization | enable_word_time_offsets OR enable_speaker_diarization | Always | Always |
phrases {items {content}} | Always | enable_word_time_offsets OR enable_speaker_diarization | enable_word_time_offsets OR enable_speaker_diarization | Always | Always |
phrases {items {start_time}} | Always | enable_word_time_offsets | enable_word_time_offsets | Always | Always |
phrases {items {end_time}} | Always | enable_word_time_offsets | enable_word_time_offsets | Always | Always |
phrases {items {confidence}} | Always | Never | enable_word_time_offsets | Always | Always |
phrases {items {speaker_tag}} | enable_speaker_diarization | enable_speaker_diarization | enable_speaker_diarization | enable_speaker_diarization | enable_speaker_diarization |
wer_data | get_wer + transcripts | get_wer + transcripts | get_wer + transcripts | get_wer + transcripts | get_wer + transcripts |
wer_data {reference} | get_wer + transcripts | get_wer + transcripts | get_wer + transcripts | get_wer + transcripts | get_wer + transcripts |
wer_data {wer} | get_wer + transcripts | get_wer + transcripts | get_wer + transcripts | get_wer + transcripts | get_wer + transcripts |
wer_mean | get_wer + transcripts | get_wer + transcripts | get_wer + transcripts | get_wer + transcripts | get_wer + transcripts |