Hooho output

General form
Parameter description
Occurence's conditions

General form

Each service might give different information for the same request. For this reason, the output of Hooho might vary from one service to another.

The general form of an output is

[
	{
		"transcript": string,
		"start_time": number,
		"end_time": number,
		"confidence": number,
		"phrases": [
			{
				"content": string,
				"type": string,
				"start_time": number,
				"end_time": number,
				"confidence": number,
				"speaker_tags": array[integer],
				"items": [
					{
						"content": string,
						"type": string,
						"start_time": number,
						"end_time": number,
						"confidence": number,
						"speaker_tag": integer
					}
				]
			}
		],
		"wer_data": [
			{
				"reference": string,
				"wer": number
			}
		],
		"wer_mean": number
	}
]

[
  {
    "transcript": "I use who oh. It makes things easier!",
    "start_time": 0.0,
    "end_time": 5.38,
    "phrases": [
      {
        "content": "I use who oh.",
        "type": "phrase",
        "start_time": 0.0,
        "end_time": 2.51,
        "items": [
          {
            "content": "I",
            "type": "word",
            "start_time": 0,
            "end_time": 0.39,
            "confidence": 1
          },
          {
            "word": "use",
            "start_time": 0.39,
            "end_time": 1.26,
            "confidence": 1
          },
          {
            "content": "who",
            "type": "word",
            "start_time": 1.26,
            "end_time": 1.96,
            "confidence": 1
          },
          {
            "content": "oh",
            "type": "word",
            "start_time": 1.96,
            "end_time": 2.51,
            "confidence": 1
          },
          {
            "content": ".",
            "type": "punctuation",
            "start_time": 2.51,
            "confidence": 1
          }
        ]
      },
      {
        "content": "It makes things easier.",
        "type": "phrase",
        "start_time": 2.51,
        "end_time": 5.38,
        "items": [
          {
            "content": "It",
            "type": "word",
            "start_time": 2.51,
            "end_time": 3.12,
            "confidence": 1
          },
          {
            "word": "makes",
            "start_time": 3.12,
            "end_time": 3.81,
            "confidence": 1
          },
          {
            "content": "things",
            "type": "word",
            "start_time": 3.81,
            "end_time": 4.33,
            "confidence": 1
          },
          {
            "content": "easier",
            "type": "word",
            "start_time": 4.33,
            "end_time": 5.38,
            "confidence": 1
          },
          {
            "content": "!",
            "type": "punctuation",
            "start_time": 5.38,
            "confidence": 1
          }
        ]
      }
    ],
    "wer_data": [
      {
        "reference": "I love Hooho. It makes things easier!",
        "wer": 0.2857142857142857
      }
    ],
    "wer_mean": 0.2857142857142857
  }
]

Parameter description

`transcript` string

The transcription of your audio file. Based on the service, this transcript might be complete or not. If it is not complete, it should be an alternative result.

For example,

"transcript": "I use who oh. It makes things easier!"

"transcript": "I use Hooho"

`start_time` number

The beginning of the transcript, in seconds. The timestamps of the transcript are determined using the start_time and end_time each items.

For example,

"start_time": 0.37

`end_time` number

The end of the transcript, in seconds. The timestamps of the transcript are determined using the start_time and end_time each items.

For example,

"end_time": 49.63

`confidence` number

A confidence score for the transcription, between 0 and 1. The higher it is, the more reliable the transcription is.

For example,

"confidence": 0.96

"confidence": 0.12

`phrases` array[object]

A list of the recognized phrases in the transcript. Phrases are determined by punctuation. Without it, the phrases will likely be the whole transcript.

Each object of this array contains

content string The content of the phrase.

"content": "I use Hooho."

"content": "I use Hooho it makes things easier"

type string The type of this object ("phrase" in this case).

"type": "phrase"

start_time number The beginning of the phrase, in second.

"start_time": 0.0

end_time number The end of the phrase, in second.

"end_time": 2.51

confidence number A confidence score for the phrase, between 0 and 1.

"confidence" : 1

speaker_tags array[integer] If you enabled speaker diarization, it contains a list of the speaker tags assigned to its items.

"speaker_tags": [0, 1]

items array[object] The type of this object ("phrase" in this case).
The list of words in a phrase. The details are given below.

`items` array[object]

The list of words in a phrase. Words are determined by spaces. A text without spaces will result in a single big word (but that should not happen).

Each object of this array contains

content string The content of the item.
type string The type of this object ("word" or "punctuation").

"content": "use"
"type": "word"

"content": "."
"type": "punctuation"

start_time number The beginning of the item, in second.

"start_time": 1.12

end_time number The end of the item, in second.

"end_time": 2.33

confidence number A confidence score for the phrase, between 0 and 1.

"confidence" : 0

speaker_tag integer If you enabled speaker diarization, it contains an integer representing the speaker associated with the word.

"speaker_tag": 1

`wer_data` array[object]

The information about WER (Word Error Rate) calculation. For each reference transcript, we will use our results to calculate a WER.

Each object of this array contains

reference string The content of the reference transcript.
wer number The WER obtained with the transcript and the given reference, between 0 and 1. The lower it is, the closer the transcripts are.

"wer_data": [
  {
    "reference": "I use Hooho. It makes things easier!",
    "wer": 0.2857142857142857
  }
]

`wer_mean` number

The mean value of the WERs obtained with all the references.

For example,

"wer_mean": 0.2857142857142857

Occurence's conditions

Some fields require to be activated using the associated parameter in the "config" part of your request.

This table summarizes the possible output's field for each service, and the parameter responsible for its appearance. (Note that this table have been created for en-US language, but might vary from one language to another.)

Field	Amazon	Google	Microsoft	Rev.ai	Speechmatics
`transcript`	Always	Always	Always	Always	Always
`start_time`	Always	enable_word_time_offsets OR enable_speaker_diarization	enable_word_time_offsets OR enable_speaker_diarization	Always	Always
`end_time`	Always	enable_word_time_offsets OR enable_speaker_diarization	enable_word_time_offsets OR enable_speaker_diarization	Always	Always
`confidence`	Never	Always	Never	Never	Always
`phrases`	Always	enable_word_time_offsets OR enable_speaker_diarization	enable_word_time_offsets OR enable_speaker_diarization	Always	Always
`phrases {content}`	Always	enable_word_time_offsets OR enable_speaker_diarization	enable_word_time_offsets OR enable_speaker_diarization	Always	Always
`phrases {type}`	Always	enable_word_time_offsets OR enable_speaker_diarization	enable_word_time_offsets OR enable_speaker_diarization	Always	Always
`phrases {start_time}`	Always	enable_word_time_offsets	enable_word_time_offsets	Always	Always
`phrases {end_time}`	Always	enable_word_time_offsets	enable_word_time_offsets	Always	Always
`phrases {confidence}`	Never	Never	Never	Never	Never
`phrases {speaker_tag}`	enable_speaker_diarization	enable_speaker_diarization	enable_speaker_diarization	enable_speaker_diarization	enable_speaker_diarization
`phrases {items}`	Always	enable_word_time_offsets OR enable_speaker_diarization	enable_word_time_offsets OR enable_speaker_diarization	Always	Always
`phrases {items {content}}`	Always	enable_word_time_offsets OR enable_speaker_diarization	enable_word_time_offsets OR enable_speaker_diarization	Always	Always
`phrases {items {start_time}}`	Always	enable_word_time_offsets	enable_word_time_offsets	Always	Always
`phrases {items {end_time}}`	Always	enable_word_time_offsets	enable_word_time_offsets	Always	Always
`phrases {items {confidence}}`	Always	Never	enable_word_time_offsets	Always	Always
`phrases {items {speaker_tag}}`	enable_speaker_diarization	enable_speaker_diarization	enable_speaker_diarization	enable_speaker_diarization	enable_speaker_diarization
`wer_data`	get_wer + transcripts	get_wer + transcripts	get_wer + transcripts	get_wer + transcripts	get_wer + transcripts
`wer_data {reference}`	get_wer + transcripts	get_wer + transcripts	get_wer + transcripts	get_wer + transcripts	get_wer + transcripts
`wer_data {wer}`	get_wer + transcripts	get_wer + transcripts	get_wer + transcripts	get_wer + transcripts	get_wer + transcripts
`wer_mean`	get_wer + transcripts	get_wer + transcripts	get_wer + transcripts	get_wer + transcripts	get_wer + transcripts

General form

Parameter description

transcript string

start_time number

end_time number

confidence number

phrases array[object]

items array[object]

wer_data array[object]

wer_mean number

Occurence's conditions

`transcript` string

`start_time` number

`end_time` number

`confidence` number

`phrases` array[object]

`items` array[object]

`wer_data` array[object]

`wer_mean` number