Glusterfs is a network filesystem with many features, but the important ones
here are it's ability to live on top of another filesystem, and offer high
availability. If you have used SSHFS, it's quite similar in concept, giving you
a "fake" filesystem from a remote machine, and as a user, you can use it just
like normal without caring about the details of where the files are actually
stored, except "over there I guess". Glusterfs unlike SSHFS, can be stored
across multiple machines similar to network RAID. If one machine goes down, the
data is still all there and well.
A few years ago I decided that I was tired of managing docker services per
machine and wanted them in a swarm. No more thinking! If a machine goes down,
the service is either still up (already replicated across servers like this
blog), or will come up on another server once it sees the service isn't alive.
This is well and good until you need the SAN to go down. Now all of the data is
missing, and the servers don't know, and you basically have to kick the entire
cluster over to get it back alive. Not exactly ideal to say the least.
While ZFS has kept my data very secure over the ages, it can't always prevent
machine oddity. I have had strange issues such as Ryzen bugs that could lock up
machines at idle, a still not figured out random hang on networking (despite
changing 80% of the machine, including all disks, operating system, and network
cards) before it comes back 10 seconds later, and so on. As much as I always
want to have a reliable machine, updates will require service restarts, reboots
need done, and honestly, I'm tired of having to babysit computers. Docker swarm
and NixOS are in my life because I don't want to babysit, but solve problems
once, and be done with it. Storage stability was the next nail to hit, despite
it being arguably a small problem, it still reminded me that computers exist
when I wasn't in the mood for them to exist.
Glusterfs sits on top of a filesystem. This is the feature that took me to it
over anything else. I have trusted my data to ZFS for many years, and have done
countless things that should have cost me data, including "oops, I deleted 2TB
of data on the wrong machine", and having to force power off machines (usually
SystemD reasons), and all of my data is safe. The very few things it couldn't
save me from, it will happily tell me where there's corruption and I can replace
the limited data from a backup. With all of that said, Glusterfs happily lives
on top of ZFS, even letting me use datasets just as I have been for ages. It
does however let me expand over several machines by using Glusterfs. There's a
ton of modes to Glusterfs much as any "RAID software", but I'm sticking to
effectively a mirror (RAID 1) in essence. Let's look at the hardware setup to
explain this a bit better.
planex
- Ryzen 5700
- 32GB RAM
- 2x16TB Seagate Exos
- 2x1TB Crucial MX500
pool
--------------------------
exos
mirror-0
wwn-0x5000c500db2f91e8
wwn-0x5000c500db2f6413
special
mirror-1
wwn-0x500a0751e5b141ca
wwn-0x500a0751e5aff797
--------------------------
morbo
- Ryzen 2700
- 32GB RAM
- 5x3TB Western Digital Red
- 1x10TB Western Digital (replaced a red when it died)
- 2x500GB Crucial MX500
red
raidz2-0
ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N3EVYXPT
ata-WDC_WD100EMAZ-00WJTA0_1EG9UBBN
ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N6ARC4SV
ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N6ARCZ43
ata-WDC_WD30EFRX-68N32N0_WD-WCC7K2KU0FUR
ata-WDC_WD30EFRX-68N32N0_WD-WCC7K7FD8T6K
special
mirror-2
ata-CT500MX500SSD1_1904E1E57733-part2
ata-CT500MX500SSD1_2005E286AD8B-part2
logs
mirror-1
ata-CT500MX500SSD1_1904E1E57733-part1
ata-CT500MX500SSD1_2005E286AD8B-part1
--------------------------------------------
kif
- Intel i3 4170
- 8GB RAM
- 2x256GB Inland SSD
pool
-------------------------------
inland
mirror-0
ata-SATA_SSD_22082224000061
ata-SATA_SSD_22082224000174
-------------------------------
These machines are a bit different in terms of storage layout. Morbo/Planex both
actually store decent amounts of data, and kif is there just to help validate
things, so it doesn't get a lot of anything. We'll see why later. Would having
Morbo/Planex both have identical disk layouts increase performance? Yes, but so
would SSD's, for all of the data. Tradeoffs.
I decided to make my setup simpler on all of my systems, and just keep the mount
points for glusterfs the same. On each system, I created a dataset named
gluster
and set it's mountpoint to /mnt/gluster
. This makes it a ton easier
to not remember which machine has data where, and keep things streamlined. It
may look something like this.
zfs create pool/gluster
zfs set mountpoint=/mnt/gluster
If you have one disk, or just want everything on gluster, you could just mount
the entire drive/pool to somewhere you'll remember, but I find it most simple to
use datasets, and I have to migrate data from outside of gluster on the same
array to inside of gluster. That's it for ZFS specific things.
gluster volume create media replica 2 arbiter 1 planex:/mnt/gluster/media morbo:/mnt/gluster/media kif:/mnt/gluster/media force
This may look like a blob of text that means nothing, so let's look at what it
does.
# Tells gluster that we want to make a volume named "media"
gluster volume create media
# Replicat 2 arbiter 1 tells gluster to use the first 2 servers to store the
# full data in a mirror (replicate) and set the last as an arbiter. This acts
# as a tie breaker for the case that anything ever disagrees, and you
# need a source of truth. It costs VERY little data to store this.
replica 2 arbiter 1
# The server name, and the path that we are using to store data on them
planex:/mnt/gluster/media
morbo:/mnt/gluster/media
kif:/mnt/gluster/media
# Normally you want gluster to create it's own directory. When we use datasets,
# the folder will already exist. This is something you should understand can
# cause issues if you point it at the wrong place, so check first
force
If all goes well, you can start the volume with
gluster volume start media
You'll want to check the status once it's started, and it should look something
like this.
Status of volume: media
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick planex:/mnt/gluster/media 57715 0 Y 1009102
Brick morbo:/mnt/gluster/media 57485 0 Y 1530585
Brick kif:/mnt/gluster/media 54466 0 Y 1015000
Self-heal Daemon on localhost N/A N/A Y 1009134
Self-heal Daemon on kif N/A N/A Y 1015144
Self-heal Daemon on morbo N/A N/A Y 1854760
Task Status of Volume media
------------------------------------------------------------------------------
With that taken care of, you can now mount your Gluster volume on any machine
that you need! Just follow the normal instructions for your platform to install
Gluster as it will be different for all of them. On NixOS at the time of
writing, I'm using this to manage my Glusterfs for my docker swarm for any
machine hosting storage.
https://git.kdb424.xyz/kdb424/nixFlake/src/commit/5a1c902d0233af2302f28ba30de4fec23ddaaac9/common/networking/gluster.nix
Once a volume is started, you can mount it pointing at any machine that has data
in the volume. In my case I can mount from planex/morbo/kif, and even if one
goes down, the data is still served. You can treat this mount identically to if
you were storing files locally, or over NFS/SSHFS, and any data stored on it
will be replicated, and left high availability if a server needs to go down for
maintenance or if it has issues. This provides a bit of a backup (in the same
way that a RAID mirror does, never rely on online machines for a full backup),
so this could not only let you have higher uptime on data, but if you have data
replication on a schedule for a backup to a machine that's always on, this would
do that in real time, which is a nice side effect.
With my docker swarm being able to be served without interruption from odd
quirks, and it replacing my need to ZFS send/recv backups (on live machines,
please have a cold store backup in a fire box if you care about your data,
along with an off site backup), this lets me continue to forget that computers
exist so I can focus on things I want to work on, like eventually setting up
email alerts for ZFS scrubs, or
S.M.A.R.T.
scans with any drive warnings, I can continue to mostly forget about the
details, and stay focused on the problems that are fun to solve. Yes, I could
host my data elsewhere, but even ignoring the insane cost that I won't pay, I
get to actually own my data, and not have a company creeping on things. Just
because I have nothing to hide doesn't mean I leave my door unlocked.
- Dual network paths. Network switch or cable can knock machines offline.
- Dual routers! Router upgrades always take too long. 5 minutes offline isn't
acceptable these days!
- Discover the true power of TempleOS.