diff options
Diffstat (limited to 'docs/busybox.net/programming.html')
-rw-r--r-- | docs/busybox.net/programming.html | 115 |
1 files changed, 115 insertions, 0 deletions
diff --git a/docs/busybox.net/programming.html b/docs/busybox.net/programming.html index e44f291b3..f77f3c3a6 100644 --- a/docs/busybox.net/programming.html +++ b/docs/busybox.net/programming.html @@ -12,6 +12,11 @@ </ul> <li><a href="#adding">Adding an applet to busybox</a></li> <li><a href="#standards">What standards does busybox adhere to?</a></li> + <li><a href="#tips">Tips and tricks.</a></li> + <ul> + <li><a href="#tips_encrypted_passwords">Encrypted Passwords</a></li> + <li><a href="#tips_vfork">Fork and vfork</a></li> + </ul> </ul> <h2><b><a name="goals" />What are the goals of busybox?</b></h2> @@ -172,6 +177,116 @@ applet is otherwise finished. When polishing and testing a busybox applet, we ensure we have at least the option of full standards compliance, or else document where we (intentionally) fall short.</p> +<h2><a name="tips" />Programming tips and tricks.</a></h2> + +<p>Various things busybox uses that aren't particularly well documented +elsewhere.</p> + +<h2><a name="tips_encrypted_passwords">Encrypted Passwords</a></h2> + +<p>Password fields in /etc/passwd and /etc/shadow are in a special format. +If the first character isn't '$', then it's an old DES style password. If +the first character is '$' then the password is actually three fields +separated by '$' characters:</p> +<pre> + <b>$type$salt$encrypted_password</b> +</pre> + +<p>The "type" indicates which encryption algorithm to use: 1 for MD5 and 2 for SHA1.</p> + +<p>The "salt" is a bunch of ramdom characters (generally 8) the encryption +algorithm uses to perturb the password in a known and reproducible way (such +as by appending the random data to the unencrypted password, or combining +them with exclusive or). Salt is randomly generated when setting a password, +and then the same salt value is re-used when checking the password. (Salt is +thus stored unencrypted.)</p> + +<p>The advantage of using salt is that the same cleartext password encrypted +with a different salt value produces a different encrypted value. +If each encrypted password uses a different salt value, an attacker is forced +to do the cryptographic math all over again for each password they want to +check. Without salt, they could simply produce a big dictionary of commonly +used passwords ahead of time, and look up each password in a stolen password +file to see if it's a known value. (Even if there are billions of possible +passwords in the dictionary, checking each one is just a binary search against +a file only a few gigabytes long.) With salt they can't even tell if two +different users share the same password without guessing what that password +is and decrypting it. They also can't precompute the attack dictionary for +a specific password until they know what the salt value is.</p> + +<p>The third field is the encrypted password (plus the salt). For md5 this +is 22 bytes.</p> + +<p>The busybox function to handle all this is pw_encrypt(clear, salt) in +"libbb/pw_encrypt.c". The first argument is the clear text password to be +encrypted, and the second is a string in "$type$salt$password" format, from +which the "type" and "salt" fields will be extracted to produce an encrypted +value. (Only the first two fields are needed, the third $ is equivalent to +the end of the string.) The return value is an encrypted password in +/etc/passwd format, with all three $ separated fields. It's stored in +a static buffer, 128 bytes long.</p> + +<p>So when checking an existing password, if pw_encrypt(text, +old_encrypted_password) returns a string that compares identical to +old_encrypted_password, you've got the right password. When setting a new +password, generate a random 8 character salt string, put it in the right +format with sprintf(buffer, "$%c$%s", type, salt), and feed buffer as the +second argument to pw_encrypt(text,buffer).</p> + +<h2><a name="tips_vfork">Fork and vfork</a></h2> + +<p>On systems that haven't got a Memory Management Unit, fork() is unreasonably +expensive to implement, so a less capable function called vfork() is used +instead.</p> + +<p>The reason vfork() exists is that if you haven't got an MMU then you can't +simply set up a second set of page tables and share the physical memory via +copy-on-write, which is what fork() normally does. This means that actually +forking has to copy all the parent's memory (which could easily be tens of +megabytes). And you have to do this even though that memory gets freed again +as soon as the exec happens, so it's probably all a big waste of time.</p> + +<p>This is not only slow and a waste of space, it also causes totally +unnecessary memory usage spikes based on how big the _parent_ process is (not +the child), and these spikes are quite likely to trigger an out of memory +condition on small systems (which is where nommu is common anyway). So +although you _can_ emulate a real fork on a nommu system, you really don't +want to.</p> + +<p>In theory, vfork() is just a fork() that writeably shares the heap and stack +rather than copying it (so what one process writes the other one sees). In +practice, vfork() has to suspend the parent process until the child does exec, +at which point the parent wakes up and resumes by returning from the call to +vfork(). All modern kernel/libc combinations implement vfork() to put the +parent to sleep until the child does its exec. There's just no other way to +make it work: they're sharing the same stack, so if either one returns from its +function it stomps on the callstack so that when the other process returns, +hilarity ensues. In fact without suspending the parent there's no way to even +store separate copies of the return value (the pid) from the vfork() call +itself: both assignments write into the same memory location.</p> + +<p>One way to understand (and in fact implement) vfork() is this: imagine +the parent does a setjmp and then continues on (pretending to be the child) +until the exec() comes around, then the _exec_ does the actual fork, and the +parent does a longjmp back to the original vfork call and continues on from +there. (It thus becomes obvious why the child can't return, or modify +local variables it doesn't want the parent to see changed when it resumes.) + +<p>Note a common mistake: the need for vfork doesn't mean you can't have two +processes running at the same time. It means you can't have two processes +sharing the same memory without stomping all over each other. As soon as +the child calls exec(), the parent resumes.</p> + +<p>(Now in theory, a nommu system could just copy the _stack_ when it forks +(which presumably is much shorter than the heap), and leave the heap shared. +In practice, you've just wound up in a multi-threaded situation and you can't +do a malloc() or free() on your heap without freeing the other process's memory +(and if you don't have the proper locking for being threaded, corrupting the +heap if both of you try to do it at the same time and wind up stomping on +each other while traversing the free memory lists). The thing about vfork is +that it's a big red flag warning "there be dragons here" rather than +something subtle and thus even more dangerous.)</p> + <br> <br> <br> |