From 8d95074b7d034188af8542aaea0306d3670d71be Mon Sep 17 00:00:00 2001 From: Rob Landley Date: Mon, 7 Mar 2016 16:02:47 -0600 Subject: Cleanup pass on the dirtree infrastructure, in preparation for making rm -r handle infinite depth. Fix docs, tweak dirtree_handle_callback() semantics, remove dirtree_start() and don't export dirtree_handle_callback(), instead offer dirtree_flagread(). (dirtree_read() is a wrapper around dirtree_flagread passing 0 for flags.) --- www/code.html | 83 ++++++++++++++++++++++++++++++++++++----------------------- 1 file changed, 51 insertions(+), 32 deletions(-) (limited to 'www') diff --git a/www/code.html b/www/code.html index b1c6d3f9..7e15e181 100644 --- a/www/code.html +++ b/www/code.html @@ -1198,16 +1198,32 @@ of functions.

These functions do not call chdir() or rely on PATH_MAX. Instead they use openat() and friends, using one filehandle per directory level to -recurseinto subdirectories. (I.E. they can descend 1000 directories deep +recurse into subdirectories. (I.E. they can descend 1000 directories deep if setrlimit(RLIMIT_NOFILE) allows enough open filehandles, and the default in /proc/self/limits is generally 1024.)

+

There are two main ways to use dirtree: 1) assemble a tree of nodes +representing a snapshot of directory state and traverse them using the +->next and ->child pointers, or 2) traverse the tree calling a callback +function on each entry, and freeing its node afterwards. (You can also +combine the two, using the callback as a filter to determine which nodes +to keep.)

+

The basic dirtree functions are:

-

The dirtree_read() function takes two arguments, a starting path for -the root of the tree, and a callback function. The callback takes a -struct dirtree * (from lib/lib.h) as its argument. If the callback is -NULL, the traversal uses a default callback (dirtree_notdotdot()) which -recursively assembles a tree of struct dirtree nodes for all files under -this directory and subdirectories (filtering out "." and ".." entries), -after which dirtree_read() returns the pointer to the root node of this -snapshot tree.

+

The dirtree_read() function is the standard way to start +directory traversal. It takes two arguments: a starting path for +the root of the tree, and a callback function. The callback() is called +on each directory entry, its argument is a fully populated +struct dirtree * (from lib/lib.h) describing the node, and its +return value tells the dirtree infrastructure what to do next.

-

Otherwise the callback() is called on each entry in the directory, -with struct dirtree * as its argument. This includes the initial -node created by dirtree_read() at the top of the tree.

+

(There's also a three argument version, +dirtree_flagread(char *path, int flags, int (*callback)(struct +dirtree node)), which lets you apply flags like DIRTREE_SYMFOLLOW and +DIRTREE_SHUTUP to reading the top node, but this only affects the top node. +Child nodes use the flags returned by callback().

struct dirtree

@@ -1237,12 +1253,13 @@ node created by dirtree_read() at the top of the tree.

st entries describing a file, plus a char *symlink which is NULL for non-symlinks.

-

During a callback function, the int data field of directory nodes -contains a dirfd (for use with the openat() family of functions). This is -generally used by calling dirtree_parentfd() on the callback's node argument. -For symlinks, data contains the length of the symlink string. On the second -callback from DIRTREE_COMEAGAIN (depth-first traversal) data = -1 for -all nodes (that's how you can tell it's the second callback).

+

During a callback function, the int dirfd field of directory nodes +contains a directory file descriptor (for use with the openat() family of +functions). This isn't usually used directly, intstead call dirtree_parentfd() +on the callback's node argument. The char again field is 0 for the +first callback on a node, and 1 on the second callback (triggered by returning +DIRTREE_COMEAGAIN on a directory, made after all children have been processed). +

Users of this code may put anything they like into the long extra field. For example, "cp" and "mv" use this to store a dirfd for the destination @@ -1266,15 +1283,17 @@ return DIRTREE_ABORT from parent callbacks too.)

  • DIRTREE_RECURSE - Examine directory contents. Ignored for non-directory entries. The remaining flags only take effect when recursing into the children of a directory.

  • -
  • DIRTREE_COMEAGAIN - Call the callback a second time after -examining all directory contents, allowing depth-first traversal. -On the second call, dirtree->data = -1.

  • +
  • DIRTREE_COMEAGAIN - Call the callback on this node a second time +after examining all directory contents, allowing depth-first traversal. +On the second call, dirtree->again is nonzero.

  • DIRTREE_SYMFOLLOW - follow symlinks when populating children's struct stat st (by feeding a nonzero value to the symfollow argument of dirtree_add_node()), which means DIRTREE_RECURSE treats symlinks to directories as directories. (Avoiding infinite recursion is the callback's problem: the non-NULL dirtree->symlink can still distinguish between -them.)

  • +them. The "find" command follows ->parent up the tree to the root node +each time, checking to make sure that stat's dev and inode pair don't +match any ancestors.)

    Each struct dirtree contains three pointers (next, parent, and child) @@ -1299,15 +1318,15 @@ single malloc() (even char *symlink points to memory at the end of the node), so llist_free() works but its callback must descend into child nodes (freeing a tree, not just a linked list), plus whatever the user stored in extra.

    -

    The dirtree_read() function is a simple wrapper, calling dirtree_add_node() +

    The dirtree_flagread() function is a simple wrapper, calling dirtree_add_node() to create a root node relative to the current directory, then calling -handle_callback() on that node (which recurses as instructed by the callback -return flags). Some commands (such as chgrp) bypass this wrapper, for example -to control whether or not to follow symlinks to the root node; symlinks +dirtree_handle_callback() on that node (which recurses as instructed by the callback +return flags). The flags argument primarily lets you +control whether or not to follow symlinks to the root node; symlinks listed on the command line are often treated differently than symlinks -encountered during recursive directory traversal). +encountered during recursive directory traversal. -

    The ls command not only bypasses the wrapper, but never returns +

    The ls command not only bypasses this wrapper, but never returns DIRTREE_RECURSE from the callback, instead calling dirtree_recurse() manually from elsewhere in the program. This gives ls -lR manual control of traversal order, which is neither depth first nor breadth first but -- cgit v1.2.3