Tags: Zygo/bees
Tags
Bees v0.10 Mostly maintenance. Highlights: * Update kernel bugs list to 6.4.1 * Update docs * Build fixes for GCC 13 and clang 16 Shortlog: Zygo Blaxell (17): docs: update kernel bugs and workarounds list for 6.2.0 docs: update the feature interactions page docs: simplify the exit-with-SIGTERM description docs: various gotcha updates docs: minor changes to how-it-works based on past user questions docs: update front page docs: update GCC versions list and clarify markdown statement docs: add "missing" features that have been in development for some time already roots: make sure transid_max's computed value isn't max docs: fill in missing LTS backports for "1119a72e223f btrfs: tree-checker: do not error out if extent ref hash doesn't match" docs: working around `btrfs send` issues isn't really a feature btrfs-tree: fix build on clang++16 test: GCC 13 fix for limits.cc context: downgrade toxic extent workaround message docs: add IGNORE_OFFSET regression in 6.2..6.3 to kernel bugs list context: log when LOGICAL_INO returns 0 refs docs: add vmalloc bug to kernel bugs list
Bees v0.9.3 Two bug fixes related to memory utilization: * Fix the bees checkpoint progress tracker so that it deallocates completed work tracking items immediately. If the first work item was stalled for a while, bees would continue allocating memory to track new work items, but did not delete the completed items as long as the first item was incomplete. bees can go through a _large_ number of work items in a few minutes. In some cases this was resulting in double-digit numbers of GiB allocated for ProgressTracker objects, triggering the OOM-killer. * Fix repeated allocate/free cycles of 16MiB `LOGICAL_INO` extent reference buffers. This was causing crippling performance losses when bees runs with mimalloc or jemalloc, and moderate losses with tcmalloc too. Apparently glibc malloc is quite happy to not deallocate memory, so it was not affected. After these fixes, bees memory usage consists of: * The hash table (user-configurable size) * About 512 KiB of overhead per 1 GiB of hash table * 16 MiB per thread for `LOGICAL_INO` extent reference list buffers * About 0.8-4 MiB of miscellaneous data per thread (read from the filesystem) These can be reduced further, but they aren't bad enough to warrant a bug fix release now. Shortlog: Zygo Blaxell (3): ProgressTracker: reduce memory usage with long-running work items fs: allow BtrfsIoctlLogicalInoArgs to be reused, remove virtual methods context: create a Pool of BtrfsIoctlLogicalInoArgs objects
Bees v0.9 This release rounds up a number of bug fixes and workarounds. Highlights: * Work around a kernel bug which can be triggered by running the LOGICAL_INO ioctl and dedupe on the same extent at the same time. * Prevent worker threads from being blocked by extent and inode locks. Defer the blocked item and find something else for that worker thread to do. * Fix the labelling of threads so they aren't all "task_consumer". * Speed up SIGTERM process termination to have a better chance of flushing the hash table and crawl state to $BEESHOME before the process is killed by a service timeout. * Reduce the hash table writeback rate to 128K/s. * Reduce the interval between crawl restarts to one transid. * Add 'recent' scan mode, which dedupes new data in fully scanned subvols instead of waiting for every old subvol to be scanned. * Better behavior when there are write errors in $BEESHOME. * Drop the unused and obsolete `fiemap` and `fiewalk` binaries. Adam Faiz (1): docs: fix reference direction Hilton Chain (1): beesd: Honor DESTDIR on installation. Vladimir Panteleev (2): scripts: Update beescrawl.dat file name after UUID removal scripts: Remove beescrawl.dat with -f Zygo Blaxell (80): cache: add a method to get estimated cache size ntoa: fix type of mask fs: export btrfs_compress_type_ntoa namedptr: add some doxygen, fix the #endif comment fd: add some doxygen bees: drop m_parent_ctx roots: correctly track crawl dirty state bees: drop bees_sync, we will not need it multilocker: serialize conflicting parallel operations bees: use MultiLocker to serialize dedupe and logical_ino readahead: use emulation task: get rid of the separate Barrier and BarrierLock task: get rid of separate Exclusion and ExclusionState task: don't hold the mutex while disposing of pending Tasks task: use const for current_consumer task: simplify clear_queue task: add a pause() method as an alternative to cancel() task: delete the queue after deleting all of its children task: add more Doxygen comments for PairLock task: increase saved thread name length to 64 task: rescue post-exec queue on Task destruction task: export load tracking statistics task: use exponential backoff algorithm to set thread count context: dump current load tracking stats context: speed up orderly process termination context: drop long-dead ExtentWalker code bees: drop the balance/logical workaround that has been disabled for two years bytevector: add ostream output with hexdump bytevector: add some fugly mutexes bytevector: don't need _all_ of those mutexes bytevector: do not deadlock in self-assignment bytevector: validate length in get<T>() BeesFileRange: coalesce is not used, subtract was never implemented seeker: backward searching template function btrfs-tree: introduce lightweight classes for btrfs tree search operations roots: rework btrfs send workaround using btrfs-tree roots: use symbolic names for SCAN_MODEs roots: use scan mode 'independent' by default roots: organize scan workers by inode instead of extent context: don't let multiple worker Tasks get stuck on a single extent or inode context: process PREALLOC extents synchronously in extent's Task worker roots: improve thread status tracking messages roots: emit "crawl finished" at the correct time roots: add 'recent' crawl mode for a mix of new and old data roots: remove duplicate default scan mode setting roots: reimplement scan modes using virtual base and methods docs: update documentation for new 'recent' scan mode roots: disable recent sorting by max_transid docs: remove the line discussing 'max_transid' in recent scan mode roots: run insert_new_crawl from within a Task context: keep the resolve cache smaller roots: replace BEES_TRANSID_FACTOR with BEES_TRANSID_POLL_INTERVAL context: don't forget to retry locked extents context: don't count MultiLock waiting time in dedup_ms Merge github PR #148 docs: remove duplicate (and wrong) default scan mode task: use pthread_setname_np correctly trace: use pthread_setname wrapper roots: fix extent lock failure handling docs: add crawl_again, drop crawl_restart readahead: report the original size in BEESTOOLONG btrfs-tree: add chunk items: length and type btrfs-tree: translate item types for error messages btrfs-tree: fix whitespace and const fs: get rid of base class btrfs_ioctl_same_extent_info main: catch exceptions and exit gracefully docs: fix broken link in options.md hash: don't spin when writes fail context: remove the one call to operator vector<> method in BtrfsIoctlLogicalInoArgs fiemap, fiewalk: drop dead example/test code fs: remove duplicate BTRFS_COMPRESS_ definitions fs: get rid of base class btrfs_ioctl_logical_ino_args fd: pwrite returns ssize_t not int fd: FS_IOC_SETFLAGS takes an int* argument not a long* lib: simplify dependency generation lib: drop version.cc entirely src: simplify Makefile src: bees-version.cc cleanups test: simplify Makefile hash: flush the table more slowly
Bees v0.8 This release catches up to a year of compiler and kernel header development. Some CPU performance hotspots have been cooled. When the btrfs send workaround is enabled, dedupe in read-only subvols is now paused, and can be resumed by making the subvol read-write or disabling the workaround. Previously, the workaround would skip to the end of read-only subvols, permanently excluding their contents from dedupe. Highlights: * Improved compatibility with new compilers and headers * Specialize some generic classes for speed * Better handling of read-only subvols with send workaround * Fetch fewer objects at a time from btrfs to avoid stale data * Minor improvements to concurrency and error handling Shortlog: Ayla Ounce (1): Fix beesd script arg parsing to respect PREFIX Javi Vilarroig (1): Minimal changes in beesd script to make it functional in my system Khalil Santana (1): Get rid of errors by using grep -E KhalilSantana (1): Fixes a bad grep pattern caused by dffd6e0 Zygo Blaxell (62): fs: fix FIEMAP_MAX_OFFSET type silliness in fiemap.h beesd: add missing RuntimeDirectory roots: ignore subvol when it is read-only and send workaround is enabled roots: use const more endian: fix uint16_t specialization of le_to_cpu roots: reduce number of objects per TREE_SEARCH_V2, drop BEES_MAX_CRAWL_ITEMS and BEES_MAX_CRAWL_BYTES error: introduce THROW_CHECK4, the long-awaited sequel to THROW_CHECK3 lib: introduce ByteVector as a replacement for vector<uint8_t> and Spanner fs: drop virtual do_ioctl methods for btrfs_ioctl_search_key extentwalker: use default sizing of TREE_SEARCH_V2 buffers fs: convert vector<uint8_t> and Spanner to ByteVector and rewrite TREE_SEARCH_V2 wrapper fd: start deprecating vector<uint8_t> for p{read,write}_or_die bees: deprecate vector<uint8_t> and replace with ByteVector fd: finish deprecating vector<uint8_t> in IO wrapper functions spanner: drop Spanner, replaced by ByteVector string: drop vector_copy_struct, obsoleted by ByteVector roots: use default nr_items fs: add an item type parameter to next_min roots: use the new type argument to next_min fs: dump the TREE_SEARCH_V2 parameters on exception context: add experimental code for avoiding tiny extents context: fix the status message that will never be seen context: add a comment explaining why we are not adding bees_unreadahead task: optimize for common case of single following Task docs: document resolve_overflow fd: better error messages for pread/pwrite context: stop using deprecated memset_zero template lib: deprecate memset_zero template, use C99 compound literals instead readahead: update comments to reflect bakeoff results namedptr: concurrency and const cleanup hash: move the random generator out of bees-hash.cc bees: clean up #include list task: delete the move constructor for TaskState task: concurrency cleanups docs: remove some stray whitespace lib: add Uname, a constructor for utsname hash: add utsname fields to log output hash: drop bees_unreadahead resolve: reword the too-many-duplicates exception message gitignore: clang creates a lot of *.tmp files docs: add missing 'adjust_offset_hit' counter context: get rid of resolve (LOGICAL_INO) serializer bees: style cleanups: const, size_t, symbolic names fs: yet another const progress: lock down some const methods context: use consistent status for dedupe in log and thread note hash: initialize m_dirty in BeesHashTable bytevector: introduce BEES_VALGRIND to help work around valgrind types: member m_fd in BeesFileRange must be protected against data races docs: update kernel bugs list for 2022-07-29 README: update copyright year 2022 docs: update kernel bugs list for 5.18 ptvf fix fs: get rid of silly base class that causes build failures now bytevector: fix length check extentwalker: drop explicit default constructors bees: fix deprecated-copy warnings for clang-14 fs: get rid of base class btrfs_data_container fs: get rid of base class fiemap roots: make sure we can never get a uint_max transid roots: sprinkle on some more const fs: update btrfs compatibility header: add csum types, BTRFS_FS_INFO_FLAG_GENERATION and _METADATA_UUID fs: make dedupe work again after a really unfortunate build fix gin66 (1): Remove duplicated //etc for make install suorcd (1): docs: spell "snapshot" correctly
Bees v0.7 This is a long overdue maintenance release collecting some years of bug fixes. There are no bees metadata format changes in this release. Highlights: * Remove 8-CPU thread limit * Add kernel bugs reference table to docs * Workarounds for btrfs send and balance issues * Reduce the number of temporary inodes created * Use posix_fadvise to optimize page cache usage * Use private namespace for mounts under systemd * Assorted bug fixes and small performance improvements * SIGTERM handler to save crawl state, hash table, and exit * Higher ref limits per extent on kernels with LOGICAL_INO_V2 Build dependency changes: * Convert docs to Github Flavored Markdown * Updates for new compilers including clang * Remove dependencies on libbtrfs-dev and uuid-dev * Remove unversioned `libcrucible.so` shared library Shortlog: Andrey Brusnik (1): fs: Change array syntax to pointer syntax Jiahao XU (7): Add new options MOUNT_OPTIONS Modify systemd unit and beesd.in to use private mnt namespace Further sandbox beesd using systemd.exec options Update comment in beesd@.service.in Fix typo when setting default val of MOUNT_OPTIONS in beesd.in Update default MOUNT_OPTIONS beesd.in Rm MOUNT_OPTIONS for it is of no use and dangerous Kai Krakow (9): Update references to Gentoo Makefile: Specify version when building from tarball Makefile: Use the jobserver properly Makefile: mkdir .depends only when needed Makefile: Bring back -O3 in a downstream-compatible way crucible: Try repairing a build failure around swap macro Makefile: Fix git usage for non-git source archive bees-context: Remove confusing log message bees: Avoid unused result with -Werror=unused-result SeerLite (1): install.md: Update Arch Linux instructions Zygo Blaxell (168): README: split into sections, reformat for github.io Merge remote-tracking branch 'nilninull/master' docs: add "what to do when something goes wrong" page docs: add coredumpctl src: add bees-version.new.c to .gitignore hash: reduce hash table extent size to 128KB scripts: use multiples (not power) of 128K hash: remove pointless copy roots: do not allow transid_min to be numeric_limits<uint64_t>::max() roots: do not accept 18446744073709551615 as max_transid in beescrawl.dat roots: fix subvol scan rollover on subvols with empty transid range context: serialize LOGICAL_INO calls bees: drop unused member m_uuid context: cache result of home_fd() roots: simplify BeesRoots::transid_max_nocache Revert "roots: simplify BeesRoots::transid_max_nocache" roots: reimplement transid_max_nocache using extent tree root context: better detection for toxic extents scripts: put AL16M back to avoid breaking existing scripts hash: remove preloaded toxic hash blacklist context: remove limit on the number of references to an extent fs: support LOGICAL_INO_V2 docs: toxic extents and btrfs send stats: streamline add_count fs: remove thread_local storage fs: if search fails, return empty result set resolver: don't log hash collision incidents workarounds: add workaround for btrfs send docs: reorganize options, add workaround for btrfs send bees: soft-limit computed thread counts to 8 docs: working with `btrfs send` is kind of a feature main: single BeesContext instance per process roots: improve "RO root 6094" message docs: derive docs/index.md from README.md README: reintroduce new btrfs-send-compatibility workaround docs: add instructions for Ubuntu 18.10 docs: use bash "type -p" because dash isn't useful docs: dash more useful than previously believed roots: quick fix for task scheduling bug leading to loss of crawl_master tempfile: drop the fsync() task: add cancel method process: ntoa function for signals time: separate sleep time calculation from sleep_for method bees: handle SIGTERM and SIGINT, force immediate flush and exit hash: clean up comments, audit for bugs build: make libcrucible a static library docs: tested with GCC 6.3.0 docs: bees can stop now process: SIGUNUSED is deprecated bees: make exceptions less prominent in log output docs: describe expected exceptions and impact of exception handling docs: add Gotcha for SIGTERM docs: add some notes about interactions with balance task: queue and run exactly once per instance docs: event counter documentation status: report number of active worker threads in status output docs: update kernel compatibility page, now recommending 5.0.4 README: highlight DATA CORRUPTION WARNING docs: update btrfs feature interaction status for flushoncommit and SSD caching layers docs: tested build with btrfs-progs 4.20.2 bees: don't try to print si_lower and si_upper BtrfsExtentWalker: use a buffer at least as large as a btrfs metadata page to avoid EOVERFLOW fs: do not emulate extent-same by clone lib: fix non-local lambda expression cannot have a capture-default lib: add cityhash function hash: prepare for user-selectable hash functions process: Fix gettid() ambiguity with glibc >= 2.30 context: workaround to prevent LOGICAL_INO and btrfs balance from running concurrently docs: update known kernel bugs list bees: initialize context in the correct order bees: replace uncaught_exception(), deprecated in C++17 docs: use Github Flavored Markdown with table extension docs: update kernel bug tracking for October 2020 docs: fix table formatting for kernel bugs list docs: improve send workaround text, add references to backref commits, make grammar more good now docs: expand the tree mod log issues docs: btrfs-kernel: 4.20 adds 32-bit single convert bug, tree mod log issue #4 stats: remove nonsense dedup_unique_bytes stat task: make it build with clang process: make it build with clang bees: make it build with clang bees context: make it build with clang extentwalker: make it build with clang clang: fix struct/class declaration/definition mismatches chatter: make it build with clang roots: make it build with clang bees: move usage message out of source file and fix a few inaccuracies roots: report the search parameters on tree search ioctl error roots: separate crawl sizes into bytes and items fs: always use container's actual size not requested size fs: make operator<() for search ioctl inline tempfile: remove old comments about fsync and deadlock bugs context: move prealloc dedupe to a separate Task fs: don't zero-fill btrfs data containers string: second argument to stoull is technically a nullptr lib: introduce Pool, a class for storing reusable anonymous objects context: move TempFile from TLS to Pool and fix some FdCache issues tempfile: remove size limit in realign() context: fix shutdown log messages identifying the wrong thread include: #undef crc32c lib: don't rebuild libcrucible unless there is a version change test: rebuild the tests if libcrucible.a changes cache: clean up pointer mangling and duplicate code cache: remove unused #includes fd: move relative path string to library lib: namedptr: thread-safe reference counted named object store src: use correct flags for compiling .c files, fix missing dependencies fd: deprecate Resource in favor of NamedPtr fs: add support and workarounds for btrfs fs_info v2 fs: deprecate vector<char> fs: remove buffer overrun check in get_struct_ptr for non-copying containers lib: introduce Spanner, a pointer and size delimiting a range fs: use Spanner to refer to ioctl arg buffer instead of making vector copies resolve: add bees.h constants for balance and logical_ino serialization bees: remove si_addr_lsb from siginfo debug message to fix FTBFS build: include localconf everywhere lib: fs: stop using libbtrfs-dev helper functions to re-enable buffer length checks docs: remove libbtrfs-dev as a build-time dependency docs: btrfs-kernel: add the 5.10 performance regression, the Ctrl-C on balance kernel crash has been fixed docs: btrfs-kernel: update recommended kernels list, slow backrefs bug has been backported uuid: drop dependency on uuid.h docs: drop incomplete build recipe for ubuntu 14.04 docs: note that FIEMAP is also affected by backref performance issue ntoa: fix bits_ntoa formatting and error handling ntoa: fix comment disparaging gcc for not implementing C99 compound literals in C++ context: get rid of all instances of pthread_cancel context: get rid of shared_ptr<BeesContext> in every single cached Fd object src: bees depends on libcrucible.a process: SIGCLD is not portable options: remove default 8 CPU thread limit pool: use weak_ptr to run destructor earlier fd: make the close method on IOHandle private crucible: use '#include "crucible/...' everywhere test: fd: note when bad cast exception is expected chatter: add option to remove log level prefix docs: finally concede that the consensus spelling is "dedupe" docs: btrfs-kernel: add the extent ref hash bug task: serialize Task execution when Tasks block due to mutex contention task: track number of Task objects in program and provide report task: replace waiting state with run/exec counter task: handle thread lifecycle more strictly context: report Task instance count roots: clean up crawl_master cache: emit log messages when clearing FD cache bees: use helper function for readahead bees: misc comment updates context: track record extent reference counts roots: split constructor into separate start method context: remove unnecessary copies bees: use a reserved symbol name in BEESLOG bees: increase StringFile size limit bees: trace and log improvements during roots and context startup roots: add a TRACE for transid_max search and crawl_transid thread docs: update kernel bugs table as of 5.12.3 tracer: annotate both ends of the stack trace extentwalker: fix the hole position logic extentwalker: fix missing characters extentwalker: fix the binary search and add some debug infrastructure trace: move BeesTrace and BeesNote into their own translation unit trace: current_exception() is not a replacement for uncaught_exception() task: set the name of consumer threads so it is not "load_tracker" context: stop creating new refs when there are too many already fiemap: don't force flush so we can see the delalloc shenanigans context: calculate TOTAL RATES correctly fs: avoid unaligned access when copying btrfs search headers bees: readahead() in the kernel is posix_fadvise(..., POSIX_FADV_WILLNEED) hash: use POSIX_FADV_WILLNEED and POSIX_FADV_DONTNEED docs: add `readahead_` event group nilninull (1): FIX: The systemd service file is always installed rsjaffe (1): systemd service replace deprecated parameters
bees v0.6.5 Make clang builds work. Zygo Blaxell (8): extentwalker: make it build with clang task: make it build with clang bees: make it build with clang bees context: make it build with clang clang: fix struct/class declaration/definition mismatches chatter: make it build with clang roots: make it build with clang build: include localconf everywhere
PreviousNext