M1 Pro vs M4 Max

New work laptop. So of course I had to benchmark its speed at running local LLMs.. These results are the using the default 4 bit quantization, with ollama version 0.4.1. Apple Macbook Pro M1 Pro (32GB RAM) (2021 model) gemma2:9b: eval rate: 24.17 tokens/s gemma2:27b: eval rate: 10.06 tokens/s llama3.2:3b: eval rate: 52.10 tokens/s llama3.1:8b: eval rate: 31.69 tokens/s Apple Macbook Pro M4 Max (36GB RAM) (2024 model) gemma2:9b: eval rate: 46.49 tokens/s gemma2:27b: eval rate: 20.06 tokens/s llama3.2:3b: eval rate: 99.66 tokens/s llama3.1:8b: eval rate: 59.98 tokens/s Conclusions The 2024 laptop roughly twice as fast as the 2021 one, and almost exactly the speed of RTX 3080 (3 years old nvidia GPU) with more VRAM to play with, so quite nice. Still, cloud providers are order of magnitude faster. ...

14.11.2024 · 1 min · 130 words · Markus Stenberg

Pulumi (and pyinfra) at home

As noted in the previous Pulumi post, I had bit too much to write about when describing my current home infrastructure. Due to that, here’s stand-alone post about just that - Pulumi (and pyinfra) at home. Current hobby architecture To give a concrete example of how I am using Pulumi in my current hobby infrastructure, this is a simplified version of my hobby IaC architecture. There is a lot of containers both within and without Kubernetes that I am omitting for clarity from the diagram: fw pyinfra/Pulumi provisioning configures local infrastructure, and oraakkeli Pulumi stack (and two pyinfra configurations) handle my VPSes in Oracle Cloud. ...

8.11.2024 · 5 min · 892 words · Markus Stenberg

DSL (in DSL), or Pulumi?

I have used Terraform professionally and in hobby things every now and then for couple of years now (most recently OpenTofu). I have tolerated it due to the ecosystem (as mentioned in an earlier blog post), but I have never particularly liked it. Why? The reasons are pretty much the same as why I am not a fan of Helm charts either. DSLs are not expressive enough, nor powerful enough Making something ‘human friendly’ (read: huge pile of YAML for devops people) is overrated. The cost of doing that is that automatically validating and formatting it becomes tricky, and the expressed things are mostly too inaccurately defined (‘sure, this is a string, but you are supposed to enter an URL here’). The tooling usually does not help much either, as while programming languages have widespread support in editors, DSLs most of the time do not. Custom configuration languages are not usually much better - being limited by design is not great, nor is it great for integrating with ‘other’ things which use real programming languages. ...

6.11.2024 · 5 min · 1044 words · Markus Stenberg

iOS app backend language evaluation - Go or Rust?

I have been looking at how to create an iOS app recently, and more particularly, its backend. SwiftUI as a front-end framework these days is quite lovely, but I am not convinced that Swift ecosystem is really good enough to do backend stuff - either on the device, or especially outside it (although Apple is making baby steps with Embedded Swift). UI on the other hand seems to be the best done with Swift (and notably SwiftUI now). It seems considerably better than Interface Builder based objective C was that I used last time around. ...

15.10.2024 · 3 min · 607 words · Markus Stenberg

Unifi was a sidegrade at best for our home networking

Now that we have used it for couple of weeks (Unifi U6 Mesh + Unifi Express + Unifi Flex Mini switch), in one sentence our experience can be summarized as: ‘Do not buy Unifi for mesh networking’. What is wrong with it? Backhaul, or lack of it To elaborate on it, it seems that none of their access points have dedicated backhaul radios, and that means that you are dealing with same congested 5GHz radio band being used both by the client to AP, as well as AP to AP traffic. ...

13.9.2024 · 3 min · 523 words · Markus Stenberg

In the trenches with small LLMs, or, we need a (prompt) hero

TL:DR; The smaller the model, the stupider it is. And this is by a lot. gemma2 is where it is at, even in its 2b version, but at least for me, prompt engineering produced better results than tool calling with it. I decided to do a write-up about this particular experience as I spent quite a bit of time recently staring at results, and writing things down is usually helpful to advance my own thinking. I did something similar in July last time, but with less scope and less data. The outcome is still the same though. ...

13.9.2024 · 7 min · 1380 words · Markus Stenberg

Journey from Orbi to Unifi

TL:DR; Home network Wi-Fi upgrade, some observations about it. Preface I have enjoyed some home wifi kit (e.g. I think Apple’s Airport series was simply brilliant piece of hardware AND software), and some I have tolerated. Most of the OpenWrt based ones belong to this camp; while they work, usually setting up multi-node things has been clunky or they somehow fail at awkward times and that isn’t great. The old setup (2020-2024) We bought Netgear Orbi mesh system (750 series) almost exactly four years ago. It replaced more vanilla OpenWrt-based Turris Omnia, and brought with it actually working mesh system.. Most of the time. ...

26.8.2024 · 7 min · 1443 words · Markus Stenberg

It is 2024 and I could not find IPv6 abroad

Or, ‘NATs continue to be evil’, or ‘the more expensive the hotel, the stupider the captive portal system’. TL;DR: When not at home, you realise how broken the internet access usually is. Problem 1: Not enough addresses Originally IPv4 addressing was designed with 2^32 addresses (some of which are reserved) which was supposed to be enough (and perhaps in the 70s and early 80s, it was good assumption). The lack of addresses was seen as a problem and IETF designed a solution for it in the 90s - IPv6 (c.f. RFC 2460: Internet Protocol, Version 6 (IPv6) Specification). Unfortunately due to various technical reason its availability is still quite low - according to Google it is currently less than half the hosts even now ( see Google IPv6 access statistics). ...

2.8.2024 · 4 min · 792 words · Markus Stenberg

Playing with local gemma2

I tinkered bit with Google’s new gemma2 model on my 32GB RAM M1 Pro. It seems so far quite useful, although I have dabbled with it only day or two so far. Here’s summary from some of the things I tested with it. Benchmarking Using the script from earlier iterations: for MODEL in gemma2:27b-instruct-q5_K_M gemma2:27b \ gemma2:9b-instruct-fp16 gemma2:9b-instruct-q8_0 gemma2 \ llama3:8b-instruct-q8_0 llama3 do echo ${MODEL}: ollama run $MODEL --verbose 'Why is sky blue?' 2>&1 \ | grep -E '^(load duration|eval rate)' echo done with the following models: ...

2.7.2024 · 3 min · 536 words · Markus Stenberg

Playing with local LLMs (or not so local), part 2

This is a really brief follow-up on the earlier local llm performance benchmarking post. Nvidia RTX 3080 Today I decided to check out also performance of RTX 3080. Now that Windows beta of ollama is available, testing it out was straightforward. As it turns out, it was almost exactly double the speed of Apple Silicon hardware, e.g. llama2:7B model produced around 80 tokens per second, but model load duration was a bit slower (3-4s -> 4,9s). ...

18.6.2024 · 1 min · 190 words · Markus Stenberg