btrfs

Readup

The ugly

No space left

Btrfs can fail to create new files despite seemingly still having space. This happened to me after about a year on two machines simultaneously, so that is something to watch for. The reasons is something about metadata chunking and block allocation, I'm not enough of a nerd to understand this.

The issue: touch /fs/file fails with no free space left, df shows plenty of space available. Signs that the cause is btrfs chunks are this:

Alternatively you could be running out of inodes (df -i to check), but

btrfs specifically doesn't work like that.

Steps to fixing this are:

Tips:

This is how it went for me. I ran recommended steps on two hosts. Keep in mind that I did that once actual no space left errors stopped coming not as a rescue but more as a maintenance operation.

Host 1: State before balancing:

 sudo btrfs filesystem df /
Data, single: total=223.92GiB, used=172.65GiB
System, DUP: total=32.00MiB, used=48.00KiB
Metadata, DUP: total=7.00GiB, used=6.43GiB
GlobalReserve, single: total=406.55MiB, used=0.00B
 sudo btrfs fi show /
Label: 'ROOT'  uuid: db5cab8c-391d-4fa6-aa1a-c5b9e0c4a892
       Total devices 1 FS bytes used 179.08GiB
       devid    1 size 237.99GiB used 237.98GiB path /dev/nvme0n1p2

Data balancing:

 sudo btrfs balance start / -dlimit=10
Done, had to relocate 0 out of 236 chunks
took 49s
 sudo btrfs balance start / -dlimit=20
Done, had to relocate 8 out of 236 chunks
 sudo btrfs balance start / -dlimit=30
Done, had to relocate 16 out of 232 chunks
 sudo btrfs balance start / -dlimit=40
Done, had to relocate 22 out of 216 chunks
took 4m12s
 sudo btrfs fi show /
Label: 'ROOT'  uuid: db5cab8c-391d-4fa6-aa1a-c5b9e0c4a892
        Total devices 1 FS bytes used 179.08GiB
        devid    1 size 237.99GiB used 205.91GiB path /dev/nvme0n1p2
 sudo btrfs fi df /
Data, single: total=188.01GiB, used=172.66GiB
System, DUP: total=32.00MiB, used=48.00KiB
Metadata, DUP: total=8.92GiB, used=6.42GiB
GlobalReserve, single: total=391.67MiB, used=0.00B

Metadata balancing:

 sudo btrfs balance start / -mlimit=10
Done, had to relocate 10 out of 201 chunks
took 6m49s
 sudo btrfs balance start / -mlimit=199
Done, had to relocate 0 out of 199 chunks
# I don't know why he found two more chunks to rebalance after supposedly
# checking all of them with limit=199. Running with higher usage kinda took two
# long.
 sudo btrfs balance start / -musage=40
Done, had to relocate 2 out of 199 chunks
 sudo btrfs fi df /
Data, single: total=188.01GiB, used=172.66GiB
System, DUP: total=32.00MiB, used=48.00KiB
Metadata, DUP: total=8.00GiB, used=6.41GiB
GlobalReserve, single: total=385.98MiB, used=0.00B
 sudo btrfs fi show /
Label: 'ROOT'  uuid: db5cab8c-391d-4fa6-aa1a-c5b9e0c4a892
        Total devices 1 FS bytes used 179.07GiB
        devid    1 size 237.99GiB used 204.07GiB path /dev/nvme0n1p2

As you can see metadata rebalance didn't free up that much space (2GB reported by show, 10MB metadata reported by df).

Full rebalance:

 sudo btrfs balance start /
WARNING:
        Full balance without filters requested. This operation is very
        intense and takes potentially very long. It is recommended to
        use the balance filters to narrow down the scope of balance.
        Use 'btrfs balance start --full-balance' option to skip this
        warning. The operation will start in 10 seconds.
        Use Ctrl-C to stop it.
10 9 8 7 6 5 4 3 2 1
Starting balance without any filters.
Done, had to relocate 148 out of 198 chunks
 sudo btrfs fi show /
Label: 'ROOT'  uuid: db5cab8c-391d-4fa6-aa1a-c5b9e0c4a892
        Total devices 1 FS bytes used 178.98GiB
        devid    1 size 237.99GiB used 192.06GiB path /dev/nvme0n1p2
 sudo btrfs fi df /
Data, single: total=174.00GiB, used=172.61GiB
System, DUP: total=32.00MiB, used=48.00KiB
Metadata, DUP: total=9.00GiB, used=6.37GiB
GlobalReserve, single: total=347.31MiB, used=0.00B

Host 2: State before balancing:

> sudo btrfs fi show /
Label: 'NIXOS'  uuid: 4c012b90-4a7f-4044-b8ed-12e4d6af067a
       Total devices 1 FS bytes used 96.29GiB
       devid    1 size 180.00GiB used 180.00GiB path /dev/sdc2
> sudo btrfs fi df /
Data, single: total=172.72GiB, used=93.86GiB
System, DUP: total=32.00MiB, used=48.00KiB
Metadata, DUP: total=3.61GiB, used=2.43GiB
GlobalReserve, single: total=214.72MiB, used=0.00B

This one is really bad as you can see.

Data balancing:

> sudo btrfs balance start / -dlimit=10
Done, had to relocate 10 out of 184 chunks
> sudo btrfs balance start / -dlimit=20
Done, had to relocate 20 out of 176 chunks
> sudo btrfs balance start / -dlimit=30
Done, had to relocate 30 out of 167 chunks
> sudo btrfs balance start / -dlimit=40
Done, had to relocate 40 out of 157 chunks
> sudo btrfs fi show /
Label: 'NIXOS'  uuid: 4c012b90-4a7f-4044-b8ed-12e4d6af067a
       Total devices 1 FS bytes used 96.54GiB
       devid    1 size 180.00GiB used 152.29GiB path /dev/sdc2
> sudo btrfs fi df /
Data, single: total=145.01GiB, used=94.12GiB
System, DUP: total=32.00MiB, used=48.00KiB
Metadata, DUP: total=3.61GiB, used=2.42GiB
GlobalReserve, single: total=239.33MiB, used=0.00B

As you can see compared to host 1 here we btrfs had to move every chunk it checked.

Full rebalance:

> sudo btrfs balance start /
WARNING:
        Full balance without filters requested. This operation is very
        intense and takes potentially very long. It is recommended to
        use the balance filters to narrow down the scope of balance.
        Use 'btrfs balance start --full-balance' option to skip this
        warning. The operation will start in 10 seconds.
        Use Ctrl-C to stop it.
10 9 8 7 6 5 4 3 2 1
Starting balance without any filters.
Done, had to relocate 158 out of 158 chunks
> sudo btrfs fi show /
Label: 'NIXOS'  uuid: 4c012b90-4a7f-4044-b8ed-12e4d6af067a
        Total devices 1 FS bytes used 95.24GiB
        devid    1 size 180.00GiB used 104.06GiB path /dev/sdc2
> sudo btrfs fi df /
Data, single: total=96.00GiB, used=92.84GiB
System, DUP: total=32.00MiB, used=16.00KiB
Metadata, DUP: total=4.00GiB, used=2.40GiB
GlobalReserve, single: total=219.78MiB, used=0.00B

Here btrfs had to rebalance every single chunk, cutting allocated space in two. Real usage of allocated space went from 54% to 98%, much more efficient. Allocated space on disk went from 100% (which as the reason for no space left errors, I assume) to 57%.

What this taught me? You should really read about filesystem you use before they fail on you at the worst possible time. Another observation is that this operation didn't free up any df space - on host 1 I eneded up with ~5GB less, - it was just a btrfs maintenance.

Bonus section. If you have node_exporter, you can monitor these stats with the following promql queries:

// allocated space usage rate - used vs allocated
node_btrfs_used_bytes{block_group_type="data"} /
  node_btrfs_size_bytes{block_group_type="data"}
// fs percentage allocated - i.e. how much is seen as used by btrfs
1 - (node_btrfs_device_unused_bytes  / on(device) max by(device)
  (label_replace(node_filesystem_size_bytes{fstype="btrfs"}, "device", "$1",
    "device", ".*/(.*)")))

General

good.

COW

moo.

Subvols

Like directories, but can be manipulated like file systems, have snapshots and such. This stuff is kinda like bind mounts,

Commands:

Options

Nested vs. flat

There are several basic schemas to layout subvolumes (and snapshots). $ Flat All subvols are children of root and later mounted into appropriate directories. E.g. you can create all your subvolumes in /subvolumes - a directory, with parent root, and use /snapshots - directory, - for snapshots.

$ Nested Subvols are located anywhere in file hierarchy - typically in their desired locations.

Comparison:

Subvolume ID's

Displayed by doas btrfs subvolume list <path>. Start at 256 and increase by 1 for every new subvol. FS root always has subvol name / and id 5. When a btrfs partition is mounted without an explicit subvolume, subvolid=5 is assumed. list shows top level id, most often it's 5 - root.

Snapshots

Snapshots are a diff vcs on a fs level. $ Snapshot is a subvolume with added content: it holds references to current and/or past versions of files (inodes). See COW for how this is possible. Subvol snapshot won't contain children - so with subvols / and /home snapshoting / won't copy /home. Snapshots are created with btrfs subvolume snapshot <opts> <from> <to>. Benefit of snapshots over simple copying is COW - small changes to files will cause small changes to disk usage. This can be verified with compsize.

Backups

Storing snapshots on a single fs is convenient but not really secure - fs failure will cascade to snapshots. Remedying this requires efficiently exchanging information about snapshots between btrfs (or between btrfs and non-btrfs) file systems.

To back up a subvolume

To utilize shared nodes between sent snapshots

To restore a subvolume from a snapshot

Compression

Can only be applied on a filesystem-wide level. blah blah.

Tools