Running local LLMs every day for five months shattered every assumption I had about them

I got into local LLMs because it seemed interesting, if I’m completely honest. Running my own AI on my own hardware seemed like something worth trying, and not because I had a specific gap to fill or strong feelings about data privacy at the time. The novelty was all the appeal at first.

The privacy angle came a little later and more gradually, as I learned more about data privacy. There are some documents that I prefer not to hand over to the Google, OpenAI, or Anthropic servers at the moment, and once the local AI setup was working, it was a no-brainer to reach them in those cases. But that never turned it into a full replacement for cloud AI – I still have Claude open most of the day, for example. This is not an argument that local LLMs beat everything, because that is not my experience. It’s more that after about four to five months of using both, I have a clearer idea of what I was wrong going in and what ended up mattering more than I expected.

I finally found a local LLM that I want to use every day (and it’s not for coding)

Local AI that actually fits into my day

The number of parameters is not the whole story

The spec that really matters

The number I looked at first was the settings – 7B, 12B, 20B, etc. – because that’s the number everyone’s talking about. Bigger means better, basically. And that’s why I first opted for a general 20B model (gpt-oss), which I managed to get working thanks to GPU offloading. More settings being better isn’t completely wrong, but it ended up being the least useful thing I paid attention to. What really bit me was the pop-up, which I had essentially ignored because I didn’t fully understand what it meant in practice at the time.

I’m using an RTX 3070 with 8GB of VRAM and ended up swapping my 20B model for a 9B (qwen 3.5) and got noticeably better results. The reason is its architecture: it is lighter in VRAM thanks to Gated DeltaNet, while standard transformers continue to climb. So I can push up to 60,000 tokens on 8GB without difficulty, which means longer sessions where the model actually remembers what we were doing an hour ago. The number of parameters tells you something about what a model is capable of in theory. Architecture is what determines how much of what you can actually achieve with the hardware you have.

It’s still there

This is the thing I didn’t know I would care about

gpt-oss 20b in lm studio on desktop computer, lamp and lego in view

This one surprised me, but I’m not complaining. Although Cloud AI is even more powerful, it has its own limitations: rate limits, message caps, API issues, server downtime, unwarranted censorship, etc. With a local model, none of this is a variable. It’s there when my internet isn’t available, it’s also there at 2 a.m. when I’ve apparently exhausted my Claude quota, even with the paid plan, and since I don’t have any sensation of burning anything, I can use it more freely. It was definitely a nice surprise regarding the use of local AI (although looking back, I don’t know why I expected anything less).

The interface often does more work than the actual model

This is why your configuration is important

When I first opened LM Studio, I treated it like another chat window and ignored virtually everything else. My system prompt was blank, temperature was default and I was prompting it like a search engine. The responses seemed weak and I assumed it was just the ceiling of the local models. But, dropping the temperature to 0.7, increasing the presence penalty, having a system prompt that tells it who I am and what I want, and an iterative prompt instead of waiting for a single response to land, gave me much better results. That’s before we even get to the length of the context, which is its own thing. The configuration in which you load a model determines what it can do more than most people expect.

A lot of the content I see is aimed directly at “getting a better model”. Usually the actual problem lies in your runner’s settings panel.

The model change trap

The reference loop is its own hobby

There’s a phase at the beginning where every model that appears in a Reddit thread or is mentioned in a YouTube video enters the queue, you run the same prompt on each of them, and somehow it becomes its own hobby that has nothing to do with actually using a local LLM. The comparisons are interesting if you approach this from a developer or benchmark perspective – there are real differences worth considering. But if your reasons for operating locally are closer to mine, privacy and just liking it to exist, the gap between decent models is much smaller than the time you’ll spend searching for it.

Once I landed on Qwen 3.5, I pretty much stopped looking. There are probably still other options worth exploring (I’m giving Gemma 4 a spin just to see what the hype is about). But benchmark looping is often just a different hobby than using a local LLM.

I still use cloud templates

It’s not a competition, just an addition

Claude connected to Obsidian on desktop PC, lamp and LEGO in sight

The framing that local LLM content loves is “replace your cloud AI”, and I can see why that’s appealing as a pitch – I’ve definitely phrased it that way myself. However, in reality, cloud AI won’t get me anywhere. I couldn’t complete my daily tasks without Claude or study material without Gemini at this point – they are extremely competent and their products have dedicated features for things like searching, organizing folders, study materials, automated tasks and much more. There is also a trim level that I can’t get locally.

But local AI remains my go-to for everything I prefer to keep on my machine. I don’t have to worry that my personal health or financial data will be stored on Google or Anthropic servers for the next few years. And it’s nice to have, I’m not going to lie – I as the concept of having a chatbot on my terms and owning every conversation.

Useful, not transformative

It was the novelty that got me through the door, and the intimacy is what allowed me to stay. This is probably the most honest endorsement I can give to local LLMs. It didn’t completely replace anything or blow my expectations, and it required more setup than I expected. But now I have a tool that is always available, without restrictions, and allows me to keep my data to myself, which is pretty cool for me.