I replaced NotebookLM with a local LLM, and the difference is night and day

NotebookLM is truly one of the best AI tools I use, and I’ve been working with it long enough to have a real opinion on it. The way it stays grounded in your sources, the citation behavior, the interactive mind maps are all some of the most useful features in my workflow – nothing else does what it does at the same level, and it’s free, which always matters.

The fact is that it’s a Google product. Your files go to their servers, are processed by their infrastructure, stay in your account until you delete them. Google is pretty upfront about the fact that your content doesn’t train its models – and as far as I can tell, they think so – but that’s not the same as those files that don’t exist at all somewhere on Google’s servers. For most jobs, absolutely fine. When everything goes wrong is when documents become personal.

4 reasons why Open Notebook is the best self-hosted alternative to NotebookLM

No more sharing your search data with Google

Why even give up on NotebookLM when it’s so good?

When cloud AI is no longer the right solution

Refining the Magic game in NotebookLM

According to Google’s own documentation, NotebookLM will not use your uploaded sources to directly train its fundamental models – unless you submit feedback, in which case that interaction, including your content, becomes reviewable. Your requests are not recorded. But uploaded documents, generated results, and chat history are all retained for as long as the notebook exists. However, usage metadata (how often you access the tool and what features you use) is subject to Google’s product terms and conditions. And the practical reality is that your documents are processed server-side by Google’s infrastructure. For a personal Google account, this is how the product works.

I recently had a health test and received a detailed report with a lot of information, some of which I didn’t know how to interpret, some of which was just too long to read. Naturally, I wanted to go through it and get a better look at what was going on. However, I had a bit of a pause when I was about to upload it to NotebookLM; I had just watched a few Reels about data privacy before this happened and couldn’t shake the nagging feeling of all my health data out of my hands. So I ended up looking for my local LLM instead. Privacy is one of the main reasons I installed one anyway.

Running DeepSeek on the Radxa Orion O6

I don’t pay for ChatGPT, Perplexity, Gemini or Claude – I stick to my self-hosted LLMs instead

There’s no point relying on AI tools when my local LLMs can handle everything

How my local configuration handles documents

And three ways to access information

LM Studio has had built-in document support since version 0.3.0, released in mid-2024, and the way it handles this is pretty sensible. Attach a file to a discussion and it first checks if the document fits in the active template popup. If so, the whole thing is injected directly into the prompt – no fetching at all in fact, just the full content passed to the model at once. If the document is too long for this threshold, it goes to RAG: the document is fragmented, each segment is integrated and when you send a query, it extracts the most semantically relevant elements and drops them into the prompt. The model responds based on what was retrieved.

Of course, where the experience is different from NotebookLM is that the model still has all of its training data. NotebookLM is source-based by design, which is a feature and the bottom line. But sometimes you don’t want that limit. When I was working on my genetic report, I didn’t want the template to just repeat what was written. I wanted him to relate these values ​​to clinical context, explain what a marker typically means outside of the document, or pull from reference ranges he already knew. This scope requires a model with its own knowledge. And with my Brave Search MCP attached, it can pull from the web mid-conversation when something needs to be up to date. So I have RAG on my document, model training knowledge and live web access in the same session – without having to switch tools anywhere.

The model I’m using is Qwen 3.5 9B, released in early March 2026. The reason it runs well on my 8GB GPU is architectural – Qwen 3.5 uses Gated Delta Networks (GDN), which keeps the KV cache footprint significantly smaller than most models of this size, so I can increase the context length in LM Studio beyond the default without it immediately hitting a wall. When it comes to prompts, local models respond much better to explicit instructions than cloud models. They also don’t infer context, so “analyze the following document and report all values ​​outside of typical reference ranges, explain each one in plain language” will beat a vague question every time.

The honest arguments for keeping NotebookLM

Compromises I can’t pretend don’t exist

Invite NotebookLM to preview the document

NotebookLM runs on Gemini with a pop-up window that can hold up to a million tokens per source, or about 750,000 words, the equivalent of several very long books loaded simultaneously and displayed simultaneously. For comparison, LM Studio only allows five document uploads at a time with a combined size of up to 30MB. Even with increased context length and Google Display Network reducing memory overhead, I’m working with a fraction of NotebookLM’s ceiling, and when documents get long, chunking takes over.

RAG clustering works by scoring document segments against your query and surfacing the most relevant ones – which is fine until the answer you need is in a chunk that didn’t score well against the specific words you used. NotebookLM largely gets around this problem because it keeps so many things in context simultaneously that recovery failures are much less common. For very long documents like years of lab results combined into a single file, a lengthy contract, or a complete medical history, NotebookLM is the most reliable tool (in terms of context) and I wouldn’t pretend otherwise.

If you prefer to stay local even for longer documents, the most practical thing is to divide them before starting. Sections of the flow rather than the entire file, ask targeted questions by section. You can use a self-hosted tool like OmniTools for this, so the workflow always stays local.

Self-hosted macOS AI LLM

I turned to NotebookLM to set up a self-hosted LLM — here’s how it went

Can NotebookLM guide me towards my first self-hosted LLM?

Some documents belong to your machine

NotebookLM doesn’t get me anywhere – I still use it constantly for research, work documents, reading stacks, anything where cloud storage isn’t a problem. But some documents just aren’t things I want to hand over to Google, and my local LLM was better than I expected. The RAG isn’t perfect and the context ceiling is real, but for me it’s not a problem – it’s just a different type of tool for a different type of document.