Roll Your Own AI: Ollama for Local LLM Use

Techs and DIY enthusiasts, including two particular technologists on a certain podcast in Bellingham, have always shared their pursuits and explorations in experimenting in how get the increasingly more centralized web back to being decentralized. Tinkering to find new affordaible ways to back regular technologies or services back “home”, or localizing them, so there is less reliance on cloud computing. It doesn’t matter if it is setting up a web server to save on hosting fees, or building a NAS to bypass monthly cloud storage costs; the spirit of DIT’ing (doing it yourself) has always been about control, savings, and sharing about it. Now, that same “roll your own” ethos is making its way into the fast-evolving landscape of AI, and Ollama is one of the best jumping off point for someone curious in how to bring powerful LLM access; to a local computer.

Two big factors go into this thought. The first is that Ollama proves a simple, free and opensource app that can provides an avenue to run on-device language models on a local computer. The second is that a user can select and install a LLM (or multiple models) without a lot of a heavy technical lift or setup. With most AI, because of the nature of how it processes, somewhat modern hardware is highly advised and required with a strong CPU as well as GPU (at the time of writing I run models on a 2021 MacBook M1 Pro), along with a health chunk of free harddrive space. Once Ollama and the model(s) are installed it can be used hybrid (using local and web AI) or fully offline requring no internet. The latter being a strong option for academics, or anyone, concerned about their data going back to a model for training.

In settings users can control context length for generative responses, and “airplane mode” or the ability to keep data local and disallow websearch for cloud models even when on an internet connection.

Making AI ‘Think Local’

Harnessing the Bellingham’ster feels of “shop local” or “think local”, what a localized, or on-device, LLM can address are some of the reoccurring critical concerns that are constantly brought up in AI in conversations; privacy, cost, and environmental impact from large data centers. In academia, or for data privacy centric folks, one of the biggest benefits may be that on-device LLMs provide potentially more safe-of-mind in that data processing is handled on the user’s machine and not sent up to a cloud LLM to train and retrain for profit.

Ollama. “It really whips the llamma’s…“LLM

I wouldn’t be a true geek if I didn’t directly mention somthing about a formative llama of another software from the early web’s days with it’s catch phrase about a llama, WinAmp. Just as that single app did for so many opend up the options for people to play and control their (offline) digital music in the early 2000s, Ollama reminds me of similar vibes as it brings an accessible way to control AI easily offline. Are their trade offs, sure. Such as these models may not be as quick in response (although in my tests with Gemma3:4B it is snappy), you have to be a little savvy as to what models have the features you are looking for.

Ollama is an open-source AI platform, think of it as a container to run LLMs, by an independent company based in Canada. This isn’t to be confused with Llama, which is an open-source LLM created by Meta. But Ollama does have the option to run Meta’s Llama as an AI option with in it as well as other models such as Mistral, Gemma, Qwen, and others. I need to emphasize that these LLM payloads are BIG, several gigabytes to even hundreds of gigabyte, so be sure you have piping hot internet as it will take time to download and install. I often say to click download and to go off and make a piping hot cup of coffee or sandwich to pass the time. The all-arounder I often use Gemma 3:4B (made by Google) which is 3GB but with it’s Gemma 3:27B it can jump to close to roughly 20GB. The largest model I have tried to run is the 70GB latest version of Llama 4 (with vison) but on my hardware I was unable to have it compute a response. Bare in mind that the full model of Llama 4 128X17B is nearly 250GB. So there is a balance of not only picking a model to fit the needs you are looking for, but also to fit the hardware (CPU and GPU) you plan on using it on.

Beyond privacy and cost benefits, opensource AI platforms like Ollama opens up avenues, and new conversations on AI usage and has the potential to empower developers and organizations to look to decentralized AI computing, especially for everyday or automative AI tasks most users use AI for. The other option this opens up is the potential to better fine-tune the models they use in their conversations, and not have that go back up to a web model. Ollama reports that its architecture supports also local server hosting which would open up enterprise ability to have rapid prototyping for web development, other retrieval-augmented generation (RAG) pipelines, and other internal hybrid deployments where local Ollama complements enterprise cloud deployment. Compared to commercial platforms like OpenAI or Anthropic, Ollama trades instant scale for control and autonomy, making it a compelling choice for those prioritizing security and independence. There is still one debate that still has to be addressed and that is the original training of models being used. Many models are/were still trained on the content of the internet. Which means there are still debates of ethics of their use as copyright has/may have been breached in their making, as well as the complexities of the output it still renders with regards to biases.

Making AI ‘Think Local’

Ollama. “It really whips the llamma’s…“LLM

About

Social