[go: up one dir, main page]

Skip to content

Tags: Zygo/bees

Tags

v0.10

Toggle v0.10's commit message
Bees v0.10

Mostly maintenance.

Highlights:

 * Update kernel bugs list to 6.4.1
 * Update docs
 * Build fixes for GCC 13 and clang 16

Shortlog:

Zygo Blaxell (17):
      docs: update kernel bugs and workarounds list for 6.2.0
      docs: update the feature interactions page
      docs: simplify the exit-with-SIGTERM description
      docs: various gotcha updates
      docs: minor changes to how-it-works based on past user questions
      docs: update front page
      docs: update GCC versions list and clarify markdown statement
      docs: add "missing" features that have been in development for some time already
      roots: make sure transid_max's computed value isn't max
      docs: fill in missing LTS backports for "1119a72e223f btrfs: tree-checker: do not error out if extent ref hash doesn't match"
      docs: working around `btrfs send` issues isn't really a feature
      btrfs-tree: fix build on clang++16
      test: GCC 13 fix for limits.cc
      context: downgrade toxic extent workaround message
      docs: add IGNORE_OFFSET regression in 6.2..6.3 to kernel bugs list
      context: log when LOGICAL_INO returns 0 refs
      docs: add vmalloc bug to kernel bugs list

v0.9.3

Toggle v0.9.3's commit message
Bees v0.9.3

Two bug fixes related to memory utilization:

 * Fix the bees checkpoint progress tracker so that it deallocates
   completed work tracking items immediately.  If the first work item was
   stalled for a while, bees would continue allocating memory to track
   new work items, but did not delete the completed items as long as
   the first item was incomplete.  bees can go through a _large_ number
   of work items in a few minutes.  In some cases this was resulting in
   double-digit numbers of GiB allocated for ProgressTracker objects,
   triggering the OOM-killer.

 * Fix repeated allocate/free cycles of 16MiB `LOGICAL_INO` extent
   reference buffers.  This was causing crippling performance losses when
   bees runs with mimalloc or jemalloc, and moderate losses with tcmalloc
   too.  Apparently glibc malloc is quite happy to not deallocate memory,
   so it was not affected.

After these fixes, bees memory usage consists of:

 * The hash table (user-configurable size)
 * About 512 KiB of overhead per 1 GiB of hash table
 * 16 MiB per thread for `LOGICAL_INO` extent reference list buffers
 * About 0.8-4 MiB of miscellaneous data per thread (read from the filesystem)

These can be reduced further, but they aren't bad enough to warrant a
bug fix release now.

Shortlog:

Zygo Blaxell (3):
      ProgressTracker: reduce memory usage with long-running work items
      fs: allow BtrfsIoctlLogicalInoArgs to be reused, remove virtual methods
      context: create a Pool of BtrfsIoctlLogicalInoArgs objects

v0.9.2

Toggle v0.9.2's commit message
Bees v0.9.2

Two bug fixes, one in the unit tests and one at run time.

Zygo Blaxell (2):
      roots: don't share a RootFetcher between threads
      seeker: fix the test for ILP32 platforms

v0.9.1

Toggle v0.9.1's commit message
Bees v0.9.1

Fix the install target.

Shortlog:

Kai Krakow (1):
      Makefile: also drop fiemap and fiewalk from main Makefile

v0.9

Toggle v0.9's commit message
Bees v0.9

This release rounds up a number of bug fixes and workarounds.

Highlights:

 * Work around a kernel bug which can be triggered by running the
   LOGICAL_INO ioctl and dedupe on the same extent at the same time.
 * Prevent worker threads from being blocked by extent and inode
   locks.  Defer the blocked item and find something else for that
   worker thread to do.
 * Fix the labelling of threads so they aren't all "task_consumer".
 * Speed up SIGTERM process termination to have a better chance of
   flushing the hash table and crawl state to $BEESHOME before the
   process is killed by a service timeout.
 * Reduce the hash table writeback rate to 128K/s.
 * Reduce the interval between crawl restarts to one transid.
 * Add 'recent' scan mode, which dedupes new data in fully scanned
   subvols instead of waiting for every old subvol to be scanned.
 * Better behavior when there are write errors in $BEESHOME.
 * Drop the unused and obsolete `fiemap` and `fiewalk` binaries.

Adam Faiz (1):
      docs: fix reference direction

Hilton Chain (1):
      beesd: Honor DESTDIR on installation.

Vladimir Panteleev (2):
      scripts: Update beescrawl.dat file name after UUID removal
      scripts: Remove beescrawl.dat with -f

Zygo Blaxell (80):
      cache: add a method to get estimated cache size
      ntoa: fix type of mask
      fs: export btrfs_compress_type_ntoa
      namedptr: add some doxygen, fix the #endif comment
      fd: add some doxygen
      bees: drop m_parent_ctx
      roots: correctly track crawl dirty state
      bees: drop bees_sync, we will not need it
      multilocker: serialize conflicting parallel operations
      bees: use MultiLocker to serialize dedupe and logical_ino
      readahead: use emulation
      task: get rid of the separate Barrier and BarrierLock
      task: get rid of separate Exclusion and ExclusionState
      task: don't hold the mutex while disposing of pending Tasks
      task: use const for current_consumer
      task: simplify clear_queue
      task: add a pause() method as an alternative to cancel()
      task: delete the queue after deleting all of its children
      task: add more Doxygen comments for PairLock
      task: increase saved thread name length to 64
      task: rescue post-exec queue on Task destruction
      task: export load tracking statistics
      task: use exponential backoff algorithm to set thread count
      context: dump current load tracking stats
      context: speed up orderly process termination
      context: drop long-dead ExtentWalker code
      bees: drop the balance/logical workaround that has been disabled for two years
      bytevector: add ostream output with hexdump
      bytevector: add some fugly mutexes
      bytevector: don't need _all_ of those mutexes
      bytevector: do not deadlock in self-assignment
      bytevector: validate length in get<T>()
      BeesFileRange: coalesce is not used, subtract was never implemented
      seeker: backward searching template function
      btrfs-tree: introduce lightweight classes for btrfs tree search operations
      roots: rework btrfs send workaround using btrfs-tree
      roots: use symbolic names for SCAN_MODEs
      roots: use scan mode 'independent' by default
      roots: organize scan workers by inode instead of extent
      context: don't let multiple worker Tasks get stuck on a single extent or inode
      context: process PREALLOC extents synchronously in extent's Task worker
      roots: improve thread status tracking messages
      roots: emit "crawl finished" at the correct time
      roots: add 'recent' crawl mode for a mix of new and old data
      roots: remove duplicate default scan mode setting
      roots: reimplement scan modes using virtual base and methods
      docs: update documentation for new 'recent' scan mode
      roots: disable recent sorting by max_transid
      docs: remove the line discussing 'max_transid' in recent scan mode
      roots: run insert_new_crawl from within a Task
      context: keep the resolve cache smaller
      roots: replace BEES_TRANSID_FACTOR with BEES_TRANSID_POLL_INTERVAL
      context: don't forget to retry locked extents
      context: don't count MultiLock waiting time in dedup_ms
      Merge github PR #148
      docs: remove duplicate (and wrong) default scan mode
      task: use pthread_setname_np correctly
      trace: use pthread_setname wrapper
      roots: fix extent lock failure handling
      docs: add crawl_again, drop crawl_restart
      readahead: report the original size in BEESTOOLONG
      btrfs-tree: add chunk items: length and type
      btrfs-tree: translate item types for error messages
      btrfs-tree: fix whitespace and const
      fs: get rid of base class btrfs_ioctl_same_extent_info
      main: catch exceptions and exit gracefully
      docs: fix broken link in options.md
      hash: don't spin when writes fail
      context: remove the one call to operator vector<> method in BtrfsIoctlLogicalInoArgs
      fiemap, fiewalk: drop dead example/test code
      fs: remove duplicate BTRFS_COMPRESS_ definitions
      fs: get rid of base class btrfs_ioctl_logical_ino_args
      fd: pwrite returns ssize_t not int
      fd: FS_IOC_SETFLAGS takes an int* argument not a long*
      lib: simplify dependency generation
      lib: drop version.cc entirely
      src: simplify Makefile
      src: bees-version.cc cleanups
      test: simplify Makefile
      hash: flush the table more slowly

v0.8

Toggle v0.8's commit message
Bees v0.8

This release catches up to a year of compiler and kernel header
development.  Some CPU performance hotspots have been cooled.

When the btrfs send workaround is enabled, dedupe in read-only subvols
is now paused, and can be resumed by making the subvol read-write or
disabling the workaround.  Previously, the workaround would skip to
the end of read-only subvols, permanently excluding their contents
from dedupe.

Highlights:

	* Improved compatibility with new compilers and headers
	* Specialize some generic classes for speed
	* Better handling of read-only subvols with send workaround
	* Fetch fewer objects at a time from btrfs to avoid stale data
	* Minor improvements to concurrency and error handling

Shortlog:

Ayla Ounce (1):
      Fix beesd script arg parsing to respect PREFIX

Javi Vilarroig (1):
      Minimal changes in beesd script to make it functional in my system

Khalil Santana (1):
      Get rid of errors by using grep -E

KhalilSantana (1):
      Fixes a bad grep pattern caused by dffd6e0

Zygo Blaxell (62):
      fs: fix FIEMAP_MAX_OFFSET type silliness in fiemap.h
      beesd: add missing RuntimeDirectory
      roots: ignore subvol when it is read-only and send workaround is enabled
      roots: use const more
      endian: fix uint16_t specialization of le_to_cpu
      roots: reduce number of objects per TREE_SEARCH_V2, drop BEES_MAX_CRAWL_ITEMS and BEES_MAX_CRAWL_BYTES
      error: introduce THROW_CHECK4, the long-awaited sequel to THROW_CHECK3
      lib: introduce ByteVector as a replacement for vector<uint8_t> and Spanner
      fs: drop virtual do_ioctl methods for btrfs_ioctl_search_key
      extentwalker: use default sizing of TREE_SEARCH_V2 buffers
      fs: convert vector<uint8_t> and Spanner to ByteVector and rewrite TREE_SEARCH_V2 wrapper
      fd: start deprecating vector<uint8_t> for p{read,write}_or_die
      bees: deprecate vector<uint8_t> and replace with ByteVector
      fd: finish deprecating vector<uint8_t> in IO wrapper functions
      spanner: drop Spanner, replaced by ByteVector
      string: drop vector_copy_struct, obsoleted by ByteVector
      roots: use default nr_items
      fs: add an item type parameter to next_min
      roots: use the new type argument to next_min
      fs: dump the TREE_SEARCH_V2 parameters on exception
      context: add experimental code for avoiding tiny extents
      context: fix the status message that will never be seen
      context: add a comment explaining why we are not adding bees_unreadahead
      task: optimize for common case of single following Task
      docs: document resolve_overflow
      fd: better error messages for pread/pwrite
      context: stop using deprecated memset_zero template
      lib: deprecate memset_zero template, use C99 compound literals instead
      readahead: update comments to reflect bakeoff results
      namedptr: concurrency and const cleanup
      hash: move the random generator out of bees-hash.cc
      bees: clean up #include list
      task: delete the move constructor for TaskState
      task: concurrency cleanups
      docs: remove some stray whitespace
      lib: add Uname, a constructor for utsname
      hash: add utsname fields to log output
      hash: drop bees_unreadahead
      resolve: reword the too-many-duplicates exception message
      gitignore: clang creates a lot of *.tmp files
      docs: add missing 'adjust_offset_hit' counter
      context: get rid of resolve (LOGICAL_INO) serializer
      bees: style cleanups: const, size_t, symbolic names
      fs: yet another const
      progress: lock down some const methods
      context: use consistent status for dedupe in log and thread note
      hash: initialize m_dirty in BeesHashTable
      bytevector: introduce BEES_VALGRIND to help work around valgrind
      types: member m_fd in BeesFileRange must be protected against data races
      docs: update kernel bugs list for 2022-07-29
      README: update copyright year 2022
      docs: update kernel bugs list for 5.18 ptvf fix
      fs: get rid of silly base class that causes build failures now
      bytevector: fix length check
      extentwalker: drop explicit default constructors
      bees: fix deprecated-copy warnings for clang-14
      fs: get rid of base class btrfs_data_container
      fs: get rid of base class fiemap
      roots: make sure we can never get a uint_max transid
      roots: sprinkle on some more const
      fs: update btrfs compatibility header: add csum types, BTRFS_FS_INFO_FLAG_GENERATION and _METADATA_UUID
      fs: make dedupe work again after a really unfortunate build fix

gin66 (1):
      Remove duplicated //etc for make install

suorcd (1):
      docs: spell "snapshot" correctly

v0.7.2

Toggle v0.7.2's commit message
Fixes a bad grep pattern caused by dffd6e0

Fixes #233

v0.7.1

Toggle v0.7.1's commit message

Verified

This commit was signed with the committer’s verified signature. The key has expired.
KhalilSantana Khalil Santana
Get rid of errors by using grep -E

"egrep: warning: egrep is obsolescent; using grep -E"

v0.7

Toggle v0.7's commit message
Bees v0.7

This is a long overdue maintenance release collecting some years of
bug fixes.  There are no bees metadata format changes in this release.

Highlights:

	* Remove 8-CPU thread limit
	* Add kernel bugs reference table to docs
	* Workarounds for btrfs send and balance issues
	* Reduce the number of temporary inodes created
	* Use posix_fadvise to optimize page cache usage
	* Use private namespace for mounts under systemd
	* Assorted bug fixes and small performance improvements
	* SIGTERM handler to save crawl state, hash table, and exit
	* Higher ref limits per extent on kernels with LOGICAL_INO_V2

Build dependency changes:

	* Convert docs to Github Flavored Markdown
	* Updates for new compilers including clang
	* Remove dependencies on libbtrfs-dev and uuid-dev
	* Remove unversioned `libcrucible.so` shared library

Shortlog:

Andrey Brusnik (1):
      fs: Change array syntax to pointer syntax

Jiahao XU (7):
      Add new options MOUNT_OPTIONS
      Modify systemd unit and beesd.in to use private mnt namespace
      Further sandbox beesd using systemd.exec options
      Update comment in beesd@.service.in
      Fix typo when setting default val of MOUNT_OPTIONS in beesd.in
      Update default MOUNT_OPTIONS beesd.in
      Rm MOUNT_OPTIONS for it is of no use and dangerous

Kai Krakow (9):
      Update references to Gentoo
      Makefile: Specify version when building from tarball
      Makefile: Use the jobserver properly
      Makefile: mkdir .depends only when needed
      Makefile: Bring back -O3 in a downstream-compatible way
      crucible: Try repairing a build failure around swap macro
      Makefile: Fix git usage for non-git source archive
      bees-context: Remove confusing log message
      bees: Avoid unused result with -Werror=unused-result

SeerLite (1):
      install.md: Update Arch Linux instructions

Zygo Blaxell (168):
      README: split into sections, reformat for github.io
      Merge remote-tracking branch 'nilninull/master'
      docs: add "what to do when something goes wrong" page
      docs: add coredumpctl
      src: add bees-version.new.c to .gitignore
      hash: reduce hash table extent size to 128KB
      scripts: use multiples (not power) of 128K
      hash: remove pointless copy
      roots: do not allow transid_min to be numeric_limits<uint64_t>::max()
      roots: do not accept 18446744073709551615 as max_transid in beescrawl.dat
      roots: fix subvol scan rollover on subvols with empty transid range
      context: serialize LOGICAL_INO calls
      bees: drop unused member m_uuid
      context: cache result of home_fd()
      roots: simplify BeesRoots::transid_max_nocache
      Revert "roots: simplify BeesRoots::transid_max_nocache"
      roots: reimplement transid_max_nocache using extent tree root
      context: better detection for toxic extents
      scripts: put AL16M back to avoid breaking existing scripts
      hash: remove preloaded toxic hash blacklist
      context: remove limit on the number of references to an extent
      fs: support LOGICAL_INO_V2
      docs: toxic extents and btrfs send
      stats: streamline add_count
      fs: remove thread_local storage
      fs: if search fails, return empty result set
      resolver: don't log hash collision incidents
      workarounds: add workaround for btrfs send
      docs: reorganize options, add workaround for btrfs send
      bees: soft-limit computed thread counts to 8
      docs: working with `btrfs send` is kind of a feature
      main: single BeesContext instance per process
      roots: improve "RO root 6094" message
      docs: derive docs/index.md from README.md
      README: reintroduce new btrfs-send-compatibility workaround
      docs: add instructions for Ubuntu 18.10
      docs: use bash "type -p" because dash isn't useful
      docs: dash more useful than previously believed
      roots: quick fix for task scheduling bug leading to loss of crawl_master
      tempfile: drop the fsync()
      task: add cancel method
      process: ntoa function for signals
      time: separate sleep time calculation from sleep_for method
      bees: handle SIGTERM and SIGINT, force immediate flush and exit
      hash: clean up comments, audit for bugs
      build: make libcrucible a static library
      docs: tested with GCC 6.3.0
      docs: bees can stop now
      process: SIGUNUSED is deprecated
      bees: make exceptions less prominent in log output
      docs: describe expected exceptions and impact of exception handling
      docs: add Gotcha for SIGTERM
      docs: add some notes about interactions with balance
      task: queue and run exactly once per instance
      docs: event counter documentation
      status: report number of active worker threads in status output
      docs: update kernel compatibility page, now recommending 5.0.4
      README: highlight DATA CORRUPTION WARNING
      docs: update btrfs feature interaction status for flushoncommit and SSD caching layers
      docs: tested build with btrfs-progs 4.20.2
      bees: don't try to print si_lower and si_upper
      BtrfsExtentWalker: use a buffer at least as large as a btrfs metadata page to avoid EOVERFLOW
      fs: do not emulate extent-same by clone
      lib: fix non-local lambda expression cannot have a capture-default
      lib: add cityhash function
      hash: prepare for user-selectable hash functions
      process: Fix gettid() ambiguity with glibc >= 2.30
      context: workaround to prevent LOGICAL_INO and btrfs balance from running concurrently
      docs: update known kernel bugs list
      bees: initialize context in the correct order
      bees: replace uncaught_exception(), deprecated in C++17
      docs: use Github Flavored Markdown with table extension
      docs: update kernel bug tracking for October 2020
      docs: fix table formatting for kernel bugs list
      docs: improve send workaround text, add references to backref commits, make grammar more good now
      docs: expand the tree mod log issues
      docs: btrfs-kernel: 4.20 adds 32-bit single convert bug, tree mod log issue #4
      stats: remove nonsense dedup_unique_bytes stat
      task: make it build with clang
      process: make it build with clang
      bees: make it build with clang
      bees context: make it build with clang
      extentwalker: make it build with clang
      clang: fix struct/class declaration/definition mismatches
      chatter: make it build with clang
      roots: make it build with clang
      bees: move usage message out of source file and fix a few inaccuracies
      roots: report the search parameters on tree search ioctl error
      roots: separate crawl sizes into bytes and items
      fs: always use container's actual size not requested size
      fs: make operator<() for search ioctl inline
      tempfile: remove old comments about fsync and deadlock bugs
      context: move prealloc dedupe to a separate Task
      fs: don't zero-fill btrfs data containers
      string: second argument to stoull is technically a nullptr
      lib: introduce Pool, a class for storing reusable anonymous objects
      context: move TempFile from TLS to Pool and fix some FdCache issues
      tempfile: remove size limit in realign()
      context: fix shutdown log messages identifying the wrong thread
      include: #undef crc32c
      lib: don't rebuild libcrucible unless there is a version change
      test: rebuild the tests if libcrucible.a changes
      cache: clean up pointer mangling and duplicate code
      cache: remove unused #includes
      fd: move relative path string to library
      lib: namedptr: thread-safe reference counted named object store
      src: use correct flags for compiling .c files, fix missing dependencies
      fd: deprecate Resource in favor of NamedPtr
      fs: add support and workarounds for btrfs fs_info v2
      fs: deprecate vector<char>
      fs: remove buffer overrun check in get_struct_ptr for non-copying containers
      lib: introduce Spanner, a pointer and size delimiting a range
      fs: use Spanner to refer to ioctl arg buffer instead of making vector copies
      resolve: add bees.h constants for balance and logical_ino serialization
      bees: remove si_addr_lsb from siginfo debug message to fix FTBFS
      build: include localconf everywhere
      lib: fs: stop using libbtrfs-dev helper functions to re-enable buffer length checks
      docs: remove libbtrfs-dev as a build-time dependency
      docs: btrfs-kernel: add the 5.10 performance regression, the Ctrl-C on balance kernel crash has been fixed
      docs: btrfs-kernel: update recommended kernels list, slow backrefs bug has been backported
      uuid: drop dependency on uuid.h
      docs: drop incomplete build recipe for ubuntu 14.04
      docs: note that FIEMAP is also affected by backref performance issue
      ntoa: fix bits_ntoa formatting and error handling
      ntoa: fix comment disparaging gcc for not implementing C99 compound literals in C++
      context: get rid of all instances of pthread_cancel
      context: get rid of shared_ptr<BeesContext> in every single cached Fd object
      src: bees depends on libcrucible.a
      process: SIGCLD is not portable
      options: remove default 8 CPU thread limit
      pool: use weak_ptr to run destructor earlier
      fd: make the close method on IOHandle private
      crucible: use '#include "crucible/...' everywhere
      test: fd: note when bad cast exception is expected
      chatter: add option to remove log level prefix
      docs: finally concede that the consensus spelling is "dedupe"
      docs: btrfs-kernel: add the extent ref hash bug
      task: serialize Task execution when Tasks block due to mutex contention
      task: track number of Task objects in program and provide report
      task: replace waiting state with run/exec counter
      task: handle thread lifecycle more strictly
      context: report Task instance count
      roots: clean up crawl_master
      cache: emit log messages when clearing FD cache
      bees: use helper function for readahead
      bees: misc comment updates
      context: track record extent reference counts
      roots: split constructor into separate start method
      context: remove unnecessary copies
      bees: use a reserved symbol name in BEESLOG
      bees: increase StringFile size limit
      bees: trace and log improvements during roots and context startup
      roots: add a TRACE for transid_max search and crawl_transid thread
      docs: update kernel bugs table as of 5.12.3
      tracer: annotate both ends of the stack trace
      extentwalker: fix the hole position logic
      extentwalker: fix missing characters
      extentwalker: fix the binary search and add some debug infrastructure
      trace: move BeesTrace and BeesNote into their own translation unit
      trace: current_exception() is not a replacement for uncaught_exception()
      task: set the name of consumer threads so it is not "load_tracker"
      context: stop creating new refs when there are too many already
      fiemap: don't force flush so we can see the delalloc shenanigans
      context: calculate TOTAL RATES correctly
      fs: avoid unaligned access when copying btrfs search headers
      bees: readahead() in the kernel is posix_fadvise(..., POSIX_FADV_WILLNEED)
      hash: use POSIX_FADV_WILLNEED and POSIX_FADV_DONTNEED
      docs: add `readahead_` event group

nilninull (1):
      FIX: The systemd service file is always installed

rsjaffe (1):
      systemd service replace deprecated parameters

v0.6.5

Toggle v0.6.5's commit message
bees v0.6.5

Make clang builds work.

Zygo Blaxell (8):
      extentwalker: make it build with clang
      task: make it build with clang
      bees: make it build with clang
      bees context: make it build with clang
      clang: fix struct/class declaration/definition mismatches
      chatter: make it build with clang
      roots: make it build with clang
      build: include localconf everywhere