Wednesday, May 17, 2023

libvirt surprise

I just noticed that some of my libvirt VMs had on_crash set to destroy instead of restart. It looks like there is an easy fix:

for vm in $( virsh list --name ) ; do virt-xml "$vm" --edit --events on_crash=restart ; done

I don't know if something changed in virt-manager/virt-install over the years, or if I ran into this a long time ago and forgot about it.

Now I just need to remember to add that --events option to virt-install in the future... 🙂

Tuesday, May 9, 2023

Harvester HCI

I have been using libvirt on CentOS + ZFS for my home lab for somewhere around a decade now.  For the last several years, I have been trying off and on to switch to some kind of hyperconverged infrastructure, usually oVirt + a clustered storage solution (Ceph, Gluster).  For various reasons, I've never quite managed to get all the pieces to work together correctly.

So, imagine how happy I was to hear about Harvester a while back.

Harvester is a modern Hyperconverged infrastructure (HCI) solution built for bare metal servers using enterprise-grade open source technologies including Kubernetes, Kubevirt and Longhorn.

I love everything about this!  It's a more modern take on hyperconverged infrastructure than what I was trying to assemble.  Plus, all my problems magically disappear when all the pieces work together out of the box, right?

Well...  Not quite.  I installed version 0.3.0.  It made for a cool demo, but, thanks to stability problems and a whole lot of missing features, it wasn't quite ready for anything resembling production use.  (Granted, this is my home lab, but I still run virtualized firewalls and stuff like that on it, so I need it to work, and work reliably.)

I'll note here that I wrote all of the above over a year ago.  I then closed with a list of reasons why Harvester wasn't good enough for me to actually use it at the time.  It (unintentionally!) sounded negative enough that I decided not to publish the post.

So here we are a year or so later, and after a few more failed attempts I recently tried Harvester again, this time with version 1.1.1.  Everything I need to work seems to, and I'm ready to start migrating some real workloads!

That's not to say that everything is perfect.  There are a few useful features on the roadmap that I could benefit from (like anti-affinity rules, zero-downtime upgrades, ...), and I still have some challenges.  Some examples:

  • Automating node installation is ... let's say difficult?
  • Networking is almost as functional as I want it, but I still haven't been able to figure out how to move storage replication to a network with jumbo frames.
  • I want real certs.  I see how to manually manage the certs, but it's not immediately obvious how I could manage them automatically (for Let's Encrypt).
Thankfully none of those things are keeping me from using Harvester.  They're just things to look forward to in future upgrades. 😀