Cloudflare can basically remember this for you. • The register

Cloudflare can basically remember this for you. • The register

Not only is hardware memory in short supply these days, but contextual memory—that is, conversational data exchanged with AI models—can also be a problem.

Cloudflare’s answer to this particular problem is Agent Memory, a managed service for siphoning AI conversations when space is scarce and then reinjecting the data on demand.

“This gives AI agents persistent memory, allowing them to remember what matters, forget what doesn’t, and become smarter over time,” Tyson Trautmann, senior director of engineering, and Rob Sutter, engineering manager, said in a blog post.

AI models can accept a limited amount of input, called context. Measured in tokens, the amount varies depending on the model.

Anthropic’s Claude Opus 4.7, for example, has a 1 million token popup, which can accommodate about 555,000 words or about 2.5 million Unicode characters. Claude Sonnet 4.6 also has a 1 million popup, but it contains around 750,000 words or around 3.4 million Unicode characters because it relies on a different tokenizer.

Google’s Gemma 4 family of models has popups of 128,000 for smaller models and 256,000 for larger ones.

This may seem like a lot of space for template prompts, but there is a lot of additional text that accompanies each prompt: the system prompt, system tools, custom agents, memory files, skills, messages, and the autocompaction buffer. So the actual contextual space available could be 10-20% less.

Storing prompts and responses as “memories” helps maximize available space by providing a place to offload useful discussion details that may not be needed during each conversation turn (prompt).

At the same time, more context is not always better: AI models may perform better when provided with less context. Memory is therefore potentially useful for extracting data from a conversation as a quality improvement as well as a storage management option.

There are already various software projects and built-in memory tools available to help memorize AI conversations. Cloudflare offers AI memory as a managed service.

“Agents running for weeks or months on real production codebases and systems need memory that remains useful as it grows — not just memory that performs well on a clean reference data set that can fit entirely within the pop-up window of a newer model,” Trautmann and Sutter wrote, arguing that this can be done quickly at a reasonable cost per query, in a way that doesn’t stall the conversation.

Basically they are talking about an asynchronous CRUD operation. For example, after storing a memory on the user’s preferred package manager (e.g. pnpm), this memory can be recalled via the following commands:

const results = await profile.recall("What package manager does the user prefer?");

console.log(results.result); // "The user prefers pnpm over npm."

Agent memory is accessible via binding to a Cloudflare Worker, but also via the REST API for those outside of the Cloudflare Worker ecosystem. It is currently in private beta.

And in case anyone is possessive about their AI chat logs, Trautmann and Sutter reassure that the memory data belongs to the client.

“Agent Memory is a managed service, but your data is yours,” they wrote. “Every brief is exportable, and we are committed to ensuring that the knowledge your agents accumulate on Cloudflare can be passed on to you as your needs evolve.

It’s a touching thought, although some work may be required to recover your text conversations and make your memories functional on another platform. ®