Benchmarks
Added 2016-07-06 06:11:30 +0000 UTCDid a bunch of benchmarking with fio comparing bcachefs, ext4, xfs and btrfs. The terse results are here:
https://evilpiepirate.org/~kent/benchmark-full-results-2016-04-19/terse
You'll find the full results in that directory, and the fio scripts and benchmark driver scripts are in a git repository: https://evilpiepirate.org/git/bcache-benchmarking.git/
The benchmarks test a variety of different workloads and on three different devices (high end flash, a regular Intel SATA SSD, and rotating disk):
The main takeaway is that bcachefs is already in pretty good shape, performance wise. bcachefs is already quite a bit faster than btrfs on most of the benchmarks, and it's pretty competitive with ext4 and xfs.
On the benchmarks where bcachefs does the worst - dio appends - there's actually been some significant improvement since I ran those benchmarks (mainly from adding optimistic spinning to six locks.
There will undoubtedly be more performance work to do in the future as we think of new things to benchmark. I particularly want to find benchmarks that tend to trigger worst case latencies in existing filesystems.
Comments
Thanks :) I did some testing, I think I know what the issue is on the USB stick - the issue is that writeback (from the pagecache) has no ratelimiting, so it queues up tons of writes which kills the latency of everything else (e.g. metadata operations). Not sure why bcachefs is worse affected than other filesystems. Writing some throttling code for that is on the todo list. I want to make more progress on snapshots, though...
Kent Overstreet
2016-07-19 09:24:48 +0000 UTCBug fixed, thank you! It works fine on my M.2 SSD; in my very simple test performances are similar to ext4/btrfs. They are still bad on the USB stick, but it looks like bcachefs is now ~10 % faster.
Francesco Frassinelli
2016-07-19 08:48:17 +0000 UTCOh, there's multiple bugs. I fixed the really stupid one (if blocksize wasn't 512 it'd explode), but there's another more subtle deadlock with linked iterators. I have a fix for that (that's the one I've been working on for the past several days), but I want to hammer on it a bit more before I push it out, it's pretty invasive. Always bugs...
Kent Overstreet
2016-07-15 12:30:15 +0000 UTCOk, fixed it. Can you retest, and make sure I fixed the bug you were hitting? I'll see if I can repro the performance issue next.
Kent Overstreet
2016-07-15 12:23:37 +0000 UTCReproduced a freeze, hopefully the same one you saw. I'll post when I have it fixed. I would like a good bug tracker, but I prefer to do pull requests over email, Linux kernel style.
Kent Overstreet
2016-07-15 11:31:19 +0000 UTCYes, it happened once on 2595528230 (unable to replicate it on 73a5f581ef). I'm going to test it more. Do you think to publish your repository on gitlab/github in order to track issues and pull requests?
Francesco Frassinelli
2016-07-12 18:21:43 +0000 UTCMore concerned about the lockup than the performance bug - did you see that on 2595528230?
Kent Overstreet
2016-07-11 22:30:35 +0000 UTCThank you for your reply. Yes, I was using 2595528230. Same performance with 73a5f581ef.
Francesco Frassinelli
2016-07-11 19:01:04 +0000 UTCDefinitely some sort of bug. I assume you were testing the latest bcache-dev (commit 2595528230)? Can you try 73a5f581ef "bcachefs: fix a writepage race"? It would be helpful to know if it's a recent regression. I'll see if I can reproduce the performance bug you're seeing.
Kent Overstreet
2016-07-10 23:31:52 +0000 UTCSimple benchmark (slow usb pendrive): cp Linux tarball, tar xfJ, sync. bcachefs is three times slower than btrfs or ext4 (~10 minutes vs 3:50). rm -r * requires 1 minutes (20 times slower than btrfs or ext4). I did the same test on my SSD, but it froze after decompressing 8 MB (filesystem totally stuck). Bug/regression?
Francesco Frassinelli
2016-07-10 11:30:20 +0000 UTCF2FS (Flash-Friendly File System) is very different (non-raw flash only, no checksums/snapshots/RAID/deduplication). You can find some tests on Phoronix.
Francesco Frassinelli
2016-07-10 10:07:19 +0000 UTCWe really need a COW filesystem with checksums, RAID and snapshots. Btrfs looked cool, but it has performance issues with databases and hundreds of bugs nobody is taking care of. Thanks for your work. I really hope you will succeed ;-) 🐧
Francesco Frassinelli
2016-07-09 12:46:45 +0000 UTCThanks Kent. I see you’ve worked hard on this project. I assume new features can be added on top of Bcachefs if needed in future without a major rewrite? I can’t think of any; a look on the Wikipedia page “comparison of file systems” shows just about all the technical features a file system can have. I hope you can secure funding from Linux supporters e.g. Canonical, Intel and others. One day I hope to replace Btrfs on my Ubuntu computers and use Bcachefs instead. That would be great. Obviously it would need Grub2 support and inclusion in the list of file systems from the Ubuntu Ubiquity installer. Good luck with your project.
Dave494
2016-07-09 10:36:46 +0000 UTCYeah it'll have dedup (completely forgot to mention that in the planned features, thanks). Most likely it'll have both online and offline, and you'll be able to use whichever you want.
Kent Overstreet
2016-07-09 00:49:35 +0000 UTCYeah it'll definitely have scrubbing. It's got data checksumming now (enabled by default), just have to write the scrubbing code. See my other post regarding 128 bitness - we can make bcachefs block pointers as big as we want in a backwards compatible way.
Kent Overstreet
2016-07-09 00:48:23 +0000 UTCI'm currently using data deduplication under NTFS (windows 2012) and I'm very happy with it, because it's offline (it does not waste memory and slows mounts as zfs). Will you add data deduplication ?
Giovanni Panozzo
2016-07-08 15:27:37 +0000 UTCThanks for your detailed reply. Will bcachefs include a scrub tool like ZFS and Btrfs offers? I believe the scrub tool runs a silent background scan (e.g. once a week) to scan for data degradation and other problems and automatically repair any corruption found? Storage devices are far from perfect for reliability, especially hard drives, so having checksumming on by default should avoid silent data corruption when reading/writing data, which is what ZFS and Btrfs claim to do. According to Wikipedia, ZFS being 128-bit means it can hold 1.84 × 10 (and 19 zeros) times more data than 64-bit file systems like Btrfs. AFAIK there is no other 128-bit or higher file system in existence.
Dave494
2016-07-07 10:50:57 +0000 UTCNice. Any comparisons against F2FS? Or does that one target handheld devices mostly?
Dmitry Gutov
2016-07-06 23:15:48 +0000 UTCI'm not sure what exactly in ZFS is 128 bits, that could be in reference to logical address space (inode/file size) or physical address space (the size of the block pointers - how large a device you can format). bcachefs has 64 bit inode numbers (ZFS claims unlimited here, not sure how they manage that) and 64 bit file sizes (same as ZFS). This is relatively fixed - the inode number and offset are part of the common key format - but bcachefs does have machinery for multiple key formats and for describing the key formats (it's used for compressing the metadata) so we could actually change/expand this in a backwards compatible fashion if we wanted to. For the block pointers, we currently have 44 bit pointers in units of 512 byte sectors - so the maximum device size we can use is 8 petabytes (and you can also have up to 256 devices in a filesystem). However, pointers (and key values in general, which includes things like inodes) are typed, so at some point we'll just add another wider type of pointer (and the existing pointer type will continue to be used wherever it's big enough, so we're not blowing up the size of everyone's metadata with giant pointers unnecessarily). We check the integrity of metadata every time we read it in - not just the checksum, e.g. if we're reading in a btree node we check that every key/value is a valid key/value of whatever type it. Any in kernel filesystem really should be doing this kind of checking anyways, so that if you're reading from a malicious device (e.g. a hacked usb thumbdrive) it can't crash the kernel by returning faulty data. We can't (yet!) do a full fsck at runtime - when we're reading metadata in it's only practical to check local invariants, not global ones (to check e.g. allocation information, or i_nlinks, you have to walk all the metadata). However, we have a really cool story here: bcache started out purely garbage collected, which meant that from the start we had the ability to do a mark and sweep GC (that is, the ability to check/regenerate allocation information) at runtime. For a filesystem, it's not practical to be purely GC based - you need your allocation information to always be up to date for -ENOSPC, so years ago bcache-dev started tracking allocation information on the fly like other filesystems, which means there's no longer any need to run mark and sweep GC at runtime. But I've deliberately retained the mark and sweep GC code (despite it being quite a pain in the ass to keep around at times), precisely because it's a small tweak to have it verify allocation information instead of just regenerating it - then you've got fsck at runtime! The hardest part of fsck, anyways. The remaining parts of fsck - checking i_nlinks, checking for file data past i_size, etc. - we have code for, but it's currently only able to run at mount time and not while a filesystem is in use. But when we get around to it it'll be relatively straightforward to make all that concurrent too, because it can make use of the same machinery mark and sweep GC uses. TLDR - all the hard parts of fsck at runtime are done, we will have it eventually.
Kent Overstreet
2016-07-06 22:38:03 +0000 UTCI’m impressed with what I’ve read on this page. I’m wondering if bcachefs is 128-bit like ZFS or if bcachefs could be 128-bit (or higher e.g. 512-bit) one day? The reason I ask is because 128-bit would future-proof this file system for a very long time and be on par with ZFS and its enormous storage capacity, something bcachefs could also have to make it stand out from the crowd and gain popularity. ;-) PS checking the bcachefs data integrity in RAM memory, in case of RAM errors, would be great for desktops, of which few have ECC RAM. Any errors in RAM would cause undetected file system corruption!
Dave494
2016-07-06 21:25:38 +0000 UTC