bcachefs

Thanksgiving update

Added 2022-11-28 21:46:16 +0000 UTC

There's so much that goes into developing a real filesystem. Especially one that's intended to be good enough to replace our existing filesystems, codebases that have had decades of refinement by teams of engineers.

Some days it can feel a bit overwhelming.

A filesystem has to be fast. But performance isn't just a matter of taking a codepath and optimizing it until it's fast - though there is certainly a lot of that work; looking at a profile to identify what needs to be looked at, then looking for ways to reshuffle code, eliminate branches, move data around to improve locality. The kind of work that old C hands love, and assembly programmers before us.

The more difficult performance work though is always about evaluating tradeoffs:

"Is this optimization worth the complexity it adds to the code?"

"Can we add a data structure to make this operation faster without slowing down the codepaths that'll need to update it?"

"How do we design this data structure to be fast for the common operations, and what do the common operations look like?"

Challenging, but rewarding.

A good filesystem has to perform predictably - predictable behavior in the uncommon codepaths and worst-case scenarios is just as important as fast-path, best case behavior.

A filesystem has to be dependable - it has to not crash, it can't lose data, it has to handle damage while recovering as much data as possible. When we have to fail, we should fail gracefully.

A good filesystem should be understandable and debuggable. Bugs will happen, and they will happen at the worst times, on user systems with massive filesystems that are impossible to get to hitting scenarios tests never imagined. Log messages must give as much information as they can and in structured easy to read and parse formats, and it should be possible to introspect the internal data structures of the filesystem - both in memory and on disk - as much as possible, and at runtime and while the filesystem is in use. A good filesystem ought to tell you what it is doing, and why.

A good filesystem should have a well organized, readable codebase. That means devoting time to go back and clean things up when it's noticed that a mess has gotten out of control, or when an idea comes for a new organizational method. Codebases have a way of growing and growing, and continual effort is required to keep the complexity under control.

Thought must always be given to the engineers who will come after, who will be tasked with learning and understanding and maintaining our work.

A good filesystem must have features! Oh, so many features! People can be ever so inventive in coming up with new ideas for things filesystems ought to do - and some of their ideas are even good and worthwhile, and come to be things that people expect :)

Ideas for new features come from within and without; as we who are building the system build the tools to build the tools, we naturally notice new ways these tools could be used. Deciding how to spend our precious time, and which features to implement - so many decisions to make.

Developing a good filesystem often isn't just about writing the filesystem itself, it's all the tools and processes that go into it - learning which ones are worthwhile and make our lives easier, and imagining and creating new tools when the existing ones aren't good enough.

Good tools can be immensely satisfying, when they become smooth and trouble free parts of our workflow - I think wood workers must get some of the same satisfaction from their tools as I do when I push code to my CI, and get results back from hours and hours worth of tests in about 15 minutes.

Still so much work to do still.

But - it's an immensely rewarding feeling when everything comes together and new code for a new feature comes together smoothly, from start to finish.

On that note: bcachefs now has nocow mode. Nocow mode turns bcachefs into a normal filesystem with in place updates (like ext4 or xfs): you don't get the fancy data path features (data checksumming, compression or encryption), but random writes within a file will be faster and won't cause fragmentation. Snapshots still work; taking a snapshot will cause a normal COW write, and then we'll go back to nocow writes afterwards. Fallocate works as expected - we now have unwritten extents.

Nocow mode is both a filesystem level option and an inode option, like other data path options.

And the best part is - it was about a two week process, start to finish - complete with new tests in the CI. There's still some failing tests that check -ENOSPC on a completely full filesystem - silly POSIX compliance stuff that most users won't care about, we'll need to check for disk reservations in the pagecache in the fallocate path to fix that. But for normal users, it should just work.

Nocow mode is a new on disk format version, and this time it's not a required upgrade - mount with -o version_upgrade to enable it.

Happy thanksgiving!