From c8d0da5358832ab0e24bafc01b934b68f64ec12d Mon Sep 17 00:00:00 2001 From: Rob Landley Date: Sun, 15 Jul 2012 17:47:08 -0500 Subject: Add documentation for lib/llist.c and lib/dirtree.c. --- www/code.html | 220 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 218 insertions(+), 2 deletions(-) diff --git a/www/code.html b/www/code.html index 0b8ba92b..3a77a3ce 100644 --- a/www/code.html +++ b/www/code.html @@ -66,7 +66,13 @@ to the environment will take precedence.

execution starts), the header file toys.h (included by every command), and other global infrastructure.
  • The lib directory contains common functions shared by -multiple commands.
  • +multiple commands: +
  • The toys directory contains the C files implementating each command.
  • The scripts directory contains the build and @@ -471,7 +477,7 @@ in toys/help.c.

    TODO: document lots more here.

    -

    lib: llist, getmountlist(), error_msg/error_exit, xmalloc(), +

    lib: getmountlist(), error_msg/error_exit, xmalloc(), strlcpy(), xexec(), xopen()/xread(), xgetcwd(), xabspath(), find_in_path(), itoa().

    @@ -493,6 +499,93 @@ In each case, the name of the macro refers to the _external_ representation, and converts to/from whatever your native representation happens to be (which can vary depending on what you're currently compiling for).

    +

    lib/llist.c

    + +

    Some generic single and doubly linked list functions, which take +advantage of a couple properties of C:

    + +
      +
    • Structure elements are laid out in memory in the order listed, and +the first element has no padding. This means you can always treat (typecast) +a pointer to a structure as a pointer to the first element of the structure, +even if you don't know anything about the data following it.

    • + +
    • An array of length zero at the end of a structure adds no space +to the sizeof() the structure, but if you calculate how much extra space +you want when you malloc() the structure it will be available at the end. +Since C has no bounds checking, this means each struct can have one variable +length array.

    • +
    + +

    Toybox's list structures always have their next pointer as +the first entry of each struct, and singly linked lists end with a NULL pointer. +This allows generic code to traverse such lists without knowing anything +else about the specific structs composing them: if your pointer isn't NULL +typecast it to void ** and dereference once to get the next entry.

    + +

    lib/lib.h defines three structure types:

    +
      +
    • struct string_list - stores a single string (char str[0]), +memory for which is allocated as part of the node. (I.E. llist_traverse(list, +free); can clean up after this type of list.)

    • + +
    • struct arg_list - stores a pointer to a single string +(char *arg) which is stored in a separate chunk of memory.

    • + +
    • struct double_list - has a second pointer (struct double_list +*prev along with a char *data for payload.

    • +
    + +List Functions + +
      +
    • void *llist_pop(void **list) - advances through a list ala +node = llist_pop(&list); This doesn't modify the list contents, +but does advance the pointer you feed it (which is why you pass the _address_ +of that pointer, not the pointer itself).

    • + +
    • void llist_traverse(void *list, void (*using)(void *data)) - +iterate through a list calling a function on each node.

    • + +
    • struct double_list *dlist_add(struct double_list **llist, char *data) +- append an entry to a circular linked list. +This function allocates a new struct double_list wrapper and returns the +pointer to the new entry (which you can usually ignore since it's llist->prev, +but if llist was NULL you need it). The argument is the ->data field for the +new node.

    • +
      • void dlist_add_nomalloc(struct double_list **llist, +struct double_list *new) - append existing struct double_list to +list, does not allocate anything.

      +
    + +Trivia questions: + +
      +
    • Why do arg_list and double_list contain a char * payload instead of +a void *? - Because you always have to typecast a void * to use it, and +typecasting a char * does no harm. Thus having it default to the most common +pointer type saves a few typecasts (strings are the most common payload), +and doesn't hurt anything otherwise.

      +
    • + +
    • Why do the names ->str, ->arg, and ->data differ? - To force +you to keep track of which one you're using, calling free(node->str) would +be bad, and _failing_ to free(node->arg) leaks memory.

    • + +
    • Why does llist_pop() take a void * instead of void **? - +because the stupid compiler complains about "type punned pointers" when +you typecast and dereference ont he same line, +due to insane FSF developers hardwiring limitations of their optimizer +into gcc's warning system. Since C automatically typecasts any other +pointer _down_ to a void *, the current code works fine. It's sad that it +won't warn you if you forget the &, but the code crashes pretty quickly in +that case.

    • + +
    • How do I assemble a singly-linked-list in order? - use +a double_list, dlist_add() your entries, and then break the circle with +list->prev->next = NULL; when done.

    • +
    +

    lib/args.c

    Toybox's main.c automatically parses command line options before calling the @@ -724,6 +817,129 @@ and would assign this[1] = 42;

    each "bare longopt" (ala "(one)(two)abc" before any option characters) always sets its own bit (although you can group them with +X).

    +

    lib/dirtree.c

    + +

    The directory tree traversal code should be sufficiently generic +that commands never need to use readdir(), scandir(), or the fts.h family +of functions.

    + +

    These functions do not call chdir() or rely on PATH_MAX. Instead they +use openat() and friends, using one filehandle per directory level to +recurseinto subdirectories. (I.E. they can descend 1000 directories deep +if setrlimit(RLIMIT_NOFILE) allows enough open filehandles, and the default +in /proc/self/limits is generally 1024.)

    + +

    The basic dirtree functions are:

    + +
      +
    • dirtree_read(char *path, int (*callback)(struct dirtree node)) - +recursively read directories, either applying callback() or returning +a tree of struct dirtree if callback is NULL.

    • + +
    • dirtree_path(struct dirtree *node, int *plen) - malloc() a +string containing the path from the root of this tree to this node. If +plen isn't NULL then *plen is how many extra bytes to malloc at the end +of string.

    • + +
    • dirtree_parentfd(struct dirtree *node) - return fd of +containing directory, for use with openat() and such.

    • +
    + +

    The dirtree_read() function takes two arguments, a starting path for +the root of the tree, and a callback function. The callback takes a +struct dirtree * (from lib/lib.h) as its argument. If the callback is +NULL, the traversal uses a default callback (dirtree_notdotdot()) which +recursively assembles a tree of struct dirtree nodes for all files under +this directory and subdirectories (filtering out "." and ".." entries), +after which dirtree_read() returns the pointer to the root node of this +snapshot tree.

    + +

    Otherwise the callback() is called on each entry in the directory, +with struct dirtree * as its argument. This includes the initial +node created by dirtree_read() at the top of the tree.

    + +

    struct dirtree

    + +

    Each struct dirtree node contains char name[] and struct stat +st entries describing a file, plus a char *symlink +which is NULL for non-symlinks.

    + +

    During a callback function, the int data field of directory nodes +contains a dirfd (for use with the openat() family of functions). This is +generally used by calling dirtree_parentfd() on the callback's node argument. +For symlinks, data contains the length of the symlink string. On the second +callback from DIRTREE_COMEAGAIN (depth-first traversal) data = -1 for +all nodes (that's how you can tell it's the second callback).

    + +

    Users of this code may put anything they like into the long extra +field. For example, "cp" and "mv" use this to store a dirfd for the destination +directory (and use DIRTREE_COMEAGAIN to get the second callback so they can +close(node->extra) to avoid running out of filehandles). +This field is not directly used by the dirtree code, and +thanks to LP64 it's large enough to store a typecast pointer to an +arbitrary struct.

    + +

    The return value of the callback combines flags (with boolean or) to tell +the traversal infrastructure how to behave:

    + +
      +
    • DIRTREE_SAVE - Save this node, assembling a tree. (Without +this the struct dirtree is freed after the callback returns. Filtering out +siblings is fine, but discarding a parent while keeping its child leaks +memory.)

    • +
    • DIRTREE_ABORT - Do not examine any more entries in this +directory. (Does not propagate up tree: to abort entire traversal, +return DIRTREE_ABORT from parent callbacks too.)

    • +
    • DIRTREE_RECURSE - Examine directory contents. Ignored for +non-directory entries. The remaining flags only take effect when +recursing into the children of a directory.

    • +
    • DIRTREE_COMEAGAIN - Call the callback a second time after +examining all directory contents, allowing depth-first traversal. +On the second call, dirtree->data = -1.

    • +
    • DIRTREE_SYMFOLLOW - follow symlinks when populating children's +struct stat st (by feeding a nonzero value to the symfollow argument of +dirtree_add_node()), which means DIRTREE_RECURSE treats symlinks to +directories as directories. (Avoiding infinite recursion is the callback's +problem: the non-NULL dirtree->symlink can still distinguish between +them.)

    • +
    + +

    Each struct dirtree contains three pointers (next, parent, and child) +to other struct dirtree.

    + +

    The parent pointer indicates the directory +containing this entry; even when not assembling a persistent tree of +nodes the parent entries remain live up to the root of the tree while +child nodes are active. At the top of the tree the parent pointer is +NULL, meaning the node's name[] is either an absolute path or relative +to cwd. The function dirtree_parentfd() gets the directory file descriptor +for use with openat() and friends, returning AT_FDCWD at the top of tree.

    + +

    The child pointer points to the first node of the list of contents of +this directory. If the directory contains no files, or the entry isn't +a directory, child is NULL.

    + +

    The next pointer indicates sibling nodes in the same directory as this +node, and since it's the first entry in the struct the llist.c traversal +mechanisms work to iterate over sibling nodes. Each dirtree node is a +single malloc() (even char *symlink points to memory at the end of the node), +so llist_free() works but its callback must descend into child nodes (freeing +a tree, not just a linked list), plus whatever the user stored in extra.

    + +

    The dirtree_read() function is a simple wrapper, calling dirtree_add_node() +to create a root node relative to the current directory, then calling +handle_callback() on that node (which recurses as instructed by the callback +return flags). Some commands (such as chgrp) bypass this wrapper, for example +to control whether or not to follow symlinks to the root node; symlinks +listed on the command line are often treated differently than symlinks +encountered during recursive directory traversal). + +

    The ls command not only bypasses the wrapper, but never returns +DIRTREE_RECURSE from the callback, instead calling dirtree_recurse() manually +from elsewhere in the program. This gives ls -lR manual control +of traversal order, which is neither depth first nor breadth first but +instead a sort of FIFO order requried by the ls standard.

    +

    Directory scripts/

    scripts/cfg2files.sh

    -- cgit v1.2.3