Seeing

Add vision to your app. Simply send an image to the API and get back a description of what's in the image. You can customize the prompts in order to get responses that are more tailored to your use case.

The API only accepts images which are valid base64 encoded data URI's.

API Reference

POST https://api.geppetto.app/see

Request Body

image

fileURI

Required

The image to see. Must be a valid Base64 Data URI. Max size 25MB.

Accepts: jpg, png, gif, webp, bmp, svg, and heic

prompt

string

Optional

Default: Describe this image in detail

The question about the image

system_prompt

string

Optional

System prompt to give the model context or instructions

stream

boolean

Optional

Default: false

If the response should be streamed or not

temperature

number

Optional

Default: 0.2

The temperature of the model. Vision models tend to perform better with lower temperatures

max_tokens

number

Optional

Default: 200

The maximum number of tokens to generate

presence_penalty

number

Optional

Default: 0

Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

frequency_penalty

number

Optional

Default: 0

Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

top_p

number

Optional

Default: 1

The cumulative probability of tokens to generate

Returns

If stream is false, the response will be the following JSON object.

{
  "content": string
}

When stream is true, the response will be streamed with Transfer-Encoding: chunked. Each chunk will be the following JSON object.

{
  "content": string,
  "stop": boolean
}

Models Speaking