If you’ve gotten into the habit of using AI in your daily life, you’ll know that any savings are worth taking advantage of. Change the model to a less demanding algorithm, avoid long conversations that reduce your consumption, compress your notices… All these tips will allow you to save a few euros on the final bill. But there is something that can really be very profitable for you.
What is fast caching?
Fast caching allows you to get a 90% discount on cached entry tokens. And yet, the functionality offered by the Anthropic API remains largely underutilized. Specifically, each time a message is sent to Claude, the model does not just address the last question in isolation. Fast caching identifies tokens that remain the same across multiple queries: if a query starts with the same sequence of tokens as a previous query, the model can reuse the already computed representation instead of recalculating everything each time.
More specifically, Claude’s great flaw lies in his lack of memory. For each message, the AI rereads everything from the beginning: context, basic message, conversation history… It’s slow, redundant and quickly expensive. Fast caching reminds the AI that it has already read the start of your conversation by keeping your message history in memory for 5 minutes. If a new question is submitted within this period, Claude does not reread everything and only part of the new request..
90% savings on tokens
Tokens read from the cache cost about 10 times less than traditional tokens. In a long conversation with a voluminous message, this can represent a savings of 70 to 90%. Fast caching is accessible to everyone, easy to configure, and above all, it saves money and time on repetitive tasks. The good news is that you hardly have to do anything: direct chats with Claude Chat use message caching by default. For automations through Claude Code or Cowork, which use APIs, simply add a line to the API call.
-
cache_control={"type": "ephemeral"}
Then simply run the request again and Claude will take care of the rest. To check that everything is functional, type:
If cache_read_input_tokens is greater than 0, you have saved tokens and therefore money. It’s that simple.
🟣 To not miss any news from Woozad, follow us on Google and our WhatsApp channel. And if you love us, we have a newsletter every morning.