Hearing

Turns speech into text.

It is best practice to keep each request under 5 minutes long. If you need to transcribe a longer audio file, you can break it up into smaller segments and send them to the API separately. In the future the API may handle this directly.

API Reference

POST https://api.geppetto.app/hear

Request Body

file

Required

The audio to transcribe

response_format

string

Optional

Default: json

The format of the response. Valid options are json, text, verbose_json, srt, and vtt

model

string

Optional

Default: whisper-tiny

The model to use. Currently only whisper-tiny is supported

language

string

Optional

The language of the input audio. This value is in ISO-639-1 format. It will improve accuracy and latency.

prompt

string

Optional

An optional text to guide the model's style

temperature

number

Optional

Default: 0

The temperature of the model. Audio models tend to perform better with lower temperatures. Between 0 and 1.

Returns

`json` Response

If response_format is json, the response will be the following JSON object.

{
  "text": string
}

`verbose_json` Response

When response_format is verbose_json, the response will be the following JSON object.

{
  "text": "string",
  "language": "string",
  "task": "string",
  "duration": "number",
  "segments": [
    {
      "text": "string",
      "temperature": "number",
      "id": "number",
      "start": "number",
      "end": "number",
      "tokens": "number[]",
      "words": [
        {
          "word": "string",
          "start": "number",
          "end": "number",
          "t_dtw": "number",
          "probability": "number"
        }
      ],
      "avg_logprob": "number"
    }
  ]
}

`text`, `srt`, and `vtt` Response

When the response_format is text, srt or vtt, the response will be a string in the appropriate format.

Speaking Seeing

Hearing

API Reference

json Response

verbose_json Response

text, srt, and vtt Response

`json` Response

`verbose_json` Response

`text`, `srt`, and `vtt` Response