SakeTami
bcachefs
bcachefs

patreon


New allocator has been merged!


It's a mandatory disk format upgrade; when switching to the new version on an

existing filesystem you'll see it initialize the freespace btree when you mount.


What's changed: we've got some new persistent data structures that replace code

that used to periodically walk all the buckets in the filesystem, kept in an in

memory array - and now that we don't need to do that anymore, the in-memory

bucket array is gone, too. Specifically, we've got:


 - A new hash table for buckets awaiting journal commit before they can be

   reused, using cuckoo hashing (this one was rolled out awhile ago)


 - An extents-style freespace btree, to replace the code in the old allocator

   threads that periodically walked the arrays of buckets to build up freelists


 - A btree of buckets that need discarding before being moved to the freespace

   btree


 - A new LRU btree, for buckets containing cached data - replacing code in the

   allocator threads that would scan buckets and build up a heap of buckets to

   be reused.


The old allocator threads are completely gone - and the code that replaces them

all transactional b-tree code, much of it trigger based, that's _way_ easier to

debug and reason about. This fixes weird performance corner cases and

scalabiilty issues - in particular, the allocator threads were prone to using

excessive CPU when the filesystem was nearly full. Also, we've got a new and

much improved discard implementation! Previously, we'd only issue discards

shortly prior to reusing/writing to a bucket again - now, we'll issue discards

right after buckets become empty.


Exciting stuff - this was the biggest and most invasive change in quite awhile,

and I'm pretty happy with how it turned out.


Next big change is going to be the addition of backpointers to fix copygc

scanning, and a rebalance-work btree to fix rebalance thread scanning, and then

we'll be pretty much set for major scalability work.


Other recent changes/improvements: a lot of assorted debugability improvements.


 - list_journal improvements: now, when going emergency read only, we finish

   writing everything we have pending to the journal - we just mark them as

   noflush writes, so they'll never be used by recovery, but list_journal can

   still see them. This means when we detect an inconsistency, we can see all

   the updates leading up to it in the journal (along with what transactions

   were doing them), making it much easier to work backwards to what went wrong.


   We've been doing a lot of debugging lately with just list_journal and grep -

   yay for grep debugging!


 - A bunch of printbuf and to_text() method improvements, which make it easy to

   write good log messages when something goes wrong


 - Started moving some internal state used for debugging from sysfs to debugfs,

   where we can be much more verbose (yay for grep debugging!)


 - Fixed some snapshots bugs - figured out a major cause of the transaction path

   overflow bugs we've been seeing.


And, big thanks to all the people who put up with and test my crappy code and

help with finding all the bugs and beating it into shape :)


More Creators