Pricing

kaleidoprompt offers a transparent, usage-based pricing model. You only pay for the tokens you use, with each token standardized to exactly 4 characters.

Unlike subscription-based services, our pricing ensures that you’re deducted solely based on your usage. Below, you'll find detailed pricing information per million tokens or bytes for each available Language Model.

OpenAI

Model	Input Tokens	Output Tokens	Input Bytes (Files only)
GPT-4o	$3.00 / 1M	$12.00 / 1M	$0.63 / 1M
o4-mini-high	$2.20 / 1M	$8.80 / 1M	$0.28 / 1M
o3	$4.00 / 1M	$16.00 / 1M	$0.50 / 1M
GPT-4.5	$85.00 / 1M	$175.00 / 1M	$18.75 / 1M

Google

Model	Input Tokens	Output Tokens	Input Bytes (Files only)
Gemini 2.0 Flash Thinking	$1.25 / 1M	$1.25 / 1M	$0.31 / 1M
Gemini 2.0 Flash	$0.20 / 1M	$0.70 / 1M	$0.03 / 1M
Gemini 2.5 Pro	$5.00 / 1M	$30.00 / 1M	$0.63 / 1M

Anthropic

Model	Input Tokens	Output Tokens	Input Bytes (Files only)
Claude 3.7 Sonnet	$3.75 / 1M	$17.50 / 1M	$0.75 / 1M
Claude Sonnet 4.0	$6.00 / 1M	$45.00 / 1M	$0.75 / 1M

Mistral

Model	Input Tokens	Output Tokens	Input Bytes (Files only)
Mistral Large	$2.50 / 1M	$8.00 / 1M	$0.50 / 1M

Model	Input Tokens	Output Tokens	Input Bytes (Files only)
Meta Llama 3.1 405B	$5.83 / 1M	$18.00 / 1M	$1.33 / 1M
Meta Llama 3.3 70B	$1.21 / 1M	$1.21 / 1M	$0.18 / 1M

xAI

Model	Input Tokens	Output Tokens	Input Bytes (Files only)
Grok 3	$3.50 / 1M	$17.00 / 1M	$0.75 / 1M

DeepSeek

Model	Input Tokens	Output Tokens	Input Bytes (Files only)
DeepSeek R1	$1.38 / 1M	$4.00 / 1M	$0.22 / 1M

How It Works

For example, if you use 10,435 input tokens and 5,493 output tokens with GPT-4o:

Input Tokens: 10435 × $3.00 / 1M = $0.03
Output Tokens: 5493 × $12.00 / 1M = $0.07
Total Cost: $0.10

Understanding Tokens

100 tokens represent exactly 400 characters of text. For example:

Hello, how are you today? is equal to 6 tokens.
The quick brown fox jumps over the lazy dog? is equal to 11 tokens.
So, 100 tokens would be equivalent to 400 characters or around 80 words.

Tips to Reduce Costs

The total cost of using our service is directly related to the number of tokens processed during your interactions with the language models. It's important to understand that the entire conversation history is sent to the language model with each new message. This is not specific to kaleidoprompt but is how language models (LLMs) handle context in chat threads to provide coherent and relevant responses.

Here are some tips to help you reduce costs:

Start new conversations: If the previous context is no longer needed, consider starting a new thread. This prevents unnecessary conversation history from being sent, reducing the number of input tokens.
Keep messages concise: Be specific and to the point in your prompts. This helps reduce the number of tokens used in both your input and the model's output.
Limit large file and image uploads: Uploading large images or files can increase costs since they are processed and included in the token count. If your conversation includes images or files, these are re-sent with each new prompt due to how LLMs maintain context. Consider minimizing the size of files or avoiding unnecessary uploads.
Summarize previous information: Instead of including the entire previous conversation, provide a brief summary of the important points. This reduces the number of tokens needed to convey the necessary context.

Example: Instead of maintaining a lengthy chat history, you might start a new thread and summarize the essential information: "Previously, we discussed strategies for reducing operational costs in manufacturing. Now, I have a question about supply chain optimization." By starting a new conversation and providing a concise summary, you reduce the number of tokens while still giving the model the necessary context.

Remember, by managing the length and content of your messages and conversations, you can effectively control your token usage and minimize costs.