NVidia L40S - reasonably priced LLM runner in the cloud?

As we are currently doing things in AWS, I wanted to evaluate AWS EC2 g6e.xlarge (32 GB RAM EPYC 4 cores, with 48 GB nvidia L40S GPU), as it seems to be only AWS offering that is even moderately competitive at around 1,8$/hour. The other instance types wind up either with lots of (unneeded) compute compared to GPU, or have ‘large’ number of GPUs, and in general the pricing seems quite depressing compared to their smaller competitors (e.g. https://datacrunch.io/ provides 2 L40S at 1,8$/hour, and also 1 A100 is similarly priced). ...

8.1.2025 · 4 min · 849 words · Markus Stenberg

Finally working modern mesh wireless network at home

TL;DR: Unifi mesh is bad, Orbi is pricey, TP-Link is surprisingly good. Recap (2024 home wifi history) I had Netgear Orbi (75x series) for 4 years (2020-2024) Last summer, I experimented with Unifi (see earlier posts); to put it bluntly, it sucked for mesh use, and I went back to the Orbis The Orbi still did not support wifi 6E which modern Macbook Pros need for more than 1200mbps wifi phy (= more than 600mbps data rate) So, I was on the hunt for more hardware.. New challenger is found In early Black Friday deals in mid-November, I spotted a TP-Link Deco BE65 set at a quite reasonable discount. On the paper, it seemed quite promising. Why is that? ...

3.1.2025 · 4 min · 644 words · Markus Stenberg

M1 Pro vs M4 Max

New work laptop. So of course I had to benchmark its speed at running local LLMs.. These results are the using the default 4 bit quantization, with ollama version 0.4.1. Apple Macbook Pro M1 Pro (32GB RAM) (2021 model) gemma2:9b: eval rate: 24.17 tokens/s gemma2:27b: eval rate: 10.06 tokens/s llama3.2:3b: eval rate: 52.10 tokens/s llama3.1:8b: eval rate: 31.69 tokens/s Apple Macbook Pro M4 Max (36GB RAM) (2024 model) gemma2:9b: eval rate: 46.49 tokens/s gemma2:27b: eval rate: 20.06 tokens/s llama3.2:3b: eval rate: 99.66 tokens/s llama3.1:8b: eval rate: 59.98 tokens/s Conclusions The 2024 laptop roughly twice as fast as the 2021 one, and almost exactly the speed of RTX 3080 (3 years old nvidia GPU) with more VRAM to play with, so quite nice. Still, cloud providers are order of magnitude faster. ...

14.11.2024 · 1 min · 130 words · Markus Stenberg

Pulumi (and pyinfra) at home

As noted in the previous Pulumi post, I had bit too much to write about when describing my current home infrastructure. Due to that, here’s stand-alone post about just that - Pulumi (and pyinfra) at home. Current hobby architecture To give a concrete example of how I am using Pulumi in my current hobby infrastructure, this is a simplified version of my hobby IaC architecture. There is a lot of containers both within and without Kubernetes that I am omitting for clarity from the diagram: fw pyinfra/Pulumi provisioning configures local infrastructure, and oraakkeli Pulumi stack (and two pyinfra configurations) handle my VPSes in Oracle Cloud. ...

8.11.2024 · 5 min · 892 words · Markus Stenberg

DSL (in DSL), or Pulumi?

I have used Terraform professionally and in hobby things every now and then for couple of years now (most recently OpenTofu). I have tolerated it due to the ecosystem (as mentioned in an earlier blog post), but I have never particularly liked it. Why? The reasons are pretty much the same as why I am not a fan of Helm charts either. DSLs are not expressive enough, nor powerful enough Making something ‘human friendly’ (read: huge pile of YAML for devops people) is overrated. The cost of doing that is that automatically validating and formatting it becomes tricky, and the expressed things are mostly too inaccurately defined (‘sure, this is a string, but you are supposed to enter an URL here’). The tooling usually does not help much either, as while programming languages have widespread support in editors, DSLs most of the time do not. Custom configuration languages are not usually much better - being limited by design is not great, nor is it great for integrating with ‘other’ things which use real programming languages. ...

6.11.2024 · 5 min · 1044 words · Markus Stenberg

iOS app backend language evaluation - Go or Rust?

I have been looking at how to create an iOS app recently, and more particularly, its backend. SwiftUI as a front-end framework these days is quite lovely, but I am not convinced that Swift ecosystem is really good enough to do backend stuff - either on the device, or especially outside it (although Apple is making baby steps with Embedded Swift). UI on the other hand seems to be the best done with Swift (and notably SwiftUI now). It seems considerably better than Interface Builder based objective C was that I used last time around. ...

15.10.2024 · 3 min · 607 words · Markus Stenberg

Unifi was a sidegrade at best for our home networking

Now that we have used it for couple of weeks (Unifi U6 Mesh + Unifi Express + Unifi Flex Mini switch), in one sentence our experience can be summarized as: ‘Do not buy Unifi for mesh networking’. What is wrong with it? Backhaul, or lack of it To elaborate on it, it seems that none of their access points have dedicated backhaul radios, and that means that you are dealing with same congested 5GHz radio band being used both by the client to AP, as well as AP to AP traffic. ...

13.9.2024 · 3 min · 523 words · Markus Stenberg

In the trenches with small LLMs, or, we need a (prompt) hero

TL:DR; The smaller the model, the stupider it is. And this is by a lot. gemma2 is where it is at, even in its 2b version, but at least for me, prompt engineering produced better results than tool calling with it. I decided to do a write-up about this particular experience as I spent quite a bit of time recently staring at results, and writing things down is usually helpful to advance my own thinking. I did something similar in July last time, but with less scope and less data. The outcome is still the same though. ...

13.9.2024 · 7 min · 1380 words · Markus Stenberg

Journey from Orbi to Unifi

TL:DR; Home network Wi-Fi upgrade, some observations about it. Preface I have enjoyed some home wifi kit (e.g. I think Apple’s Airport series was simply brilliant piece of hardware AND software), and some I have tolerated. Most of the OpenWrt based ones belong to this camp; while they work, usually setting up multi-node things has been clunky or they somehow fail at awkward times and that isn’t great. The old setup (2020-2024) We bought Netgear Orbi mesh system (750 series) almost exactly four years ago. It replaced more vanilla OpenWrt-based Turris Omnia, and brought with it actually working mesh system.. Most of the time. ...

26.8.2024 · 7 min · 1443 words · Markus Stenberg

It is 2024 and I could not find IPv6 abroad

Or, ‘NATs continue to be evil’, or ‘the more expensive the hotel, the stupider the captive portal system’. TL;DR: When not at home, you realise how broken the internet access usually is. Problem 1: Not enough addresses Originally IPv4 addressing was designed with 2^32 addresses (some of which are reserved) which was supposed to be enough (and perhaps in the 70s and early 80s, it was good assumption). The lack of addresses was seen as a problem and IETF designed a solution for it in the 90s - IPv6 (c.f. RFC 2460: Internet Protocol, Version 6 (IPv6) Specification). Unfortunately due to various technical reason its availability is still quite low - according to Google it is currently less than half the hosts even now ( see Google IPv6 access statistics). ...

2.8.2024 · 4 min · 792 words · Markus Stenberg