Hearing
Turns speech into text.
It is best practice to keep each request under 5 minutes long. If you need to transcribe a longer audio file, you can break it up into smaller segments and send them to the API separately. In the future the API may handle this directly.
API Reference
POST https://api.geppetto.app/hear
Request Body
file
file
Required
The audio to transcribe
response_format
string
Optional
Default: json
The format of the response. Valid options are json
, text
, verbose_json
, srt
, and vtt
model
string
Optional
Default: whisper-tiny
The model to use. Currently only whisper-tiny
is supported
language
string
Optional
The language of the input audio. This value is in ISO-639-1
format. It will improve accuracy and latency.
prompt
string
Optional
An optional text to guide the model's style
temperature
number
Optional
Default: 0
The temperature of the model. Audio models tend to perform better with lower temperatures. Between 0 and 1.
Returns
json
Response
If response_format
is json
, the response will be the following JSON object.
{
"text": string
}
verbose_json
Response
When response_format
is verbose_json
, the response will be the following JSON object.
{
"text": "string",
"language": "string",
"task": "string",
"duration": "number",
"segments": [
{
"text": "string",
"temperature": "number",
"id": "number",
"start": "number",
"end": "number",
"tokens": "number[]",
"words": [
{
"word": "string",
"start": "number",
"end": "number",
"t_dtw": "number",
"probability": "number"
}
],
"avg_logprob": "number"
}
]
}
text
, srt
, and vtt
Response
When the response_format
is text
, srt
or vtt
, the response will be a string
in the appropriate format.