Age | Commit message (Collapse) | Author |
|
and address TODO where -z was still splitting patterns on \n
|
|
configurable.
|
|
One reason to use toybox on the host is to get the same behavior across
Android/Linux/macOS. Unfortunately (as we've seen from a few bugs) one
area where that doesn't quite work is that toybox uses the libc regular
expression implementation. That's fine, and mostly what users want, but
those folks trying to get the exact same behavior everywhere might want
to switch in a known regex implementation (bionic's NetBSD regex
implementation, say) for increased consistency.
That actually works pretty well, but portability.h has an #ifndef test
for REG_STARTEND before including <regex.h> that gets in the way. To
make up for that, this patch removes the unnecessary #include <regex.h>
from grep.c itself.
|
|
No current caller except grep needs this, but consistency seems like a
good idea.
Also change the xregcomp error message to be a bit more human-readable,
rather than mention an implementation detail.
|
|
POSIX says there's no such thing as an empty regular expression. The
grammar excludes the possibility:
https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html
BSD agrees with POSIX, and Android and macOS' BSD-based implementations
reject the empty regular expression.
GNU apparently disagrees.
Luckily, BSD does accept the empty *sub* expression `()`, despite their
error message for REG_EMPTY being "empty (sub)expression". This is
presumably a bug, except there's explicit code to support it that is at
least 26 years old:
https://github.com/freebsd/freebsd/blame/master/lib/libc/regex/regcomp.c#L383
This workaround also works fine with glibc.
If we want GNU behavior, I'm struggling to come up with another way to
fake it. If we want POSIX behavior, we could easily just add a check to
reject "" on glibc.
Also switch to xregcomp().
|
|
|
|
The first bug appeared as a memory overwrite, but was actually visible
without hwasan: basically any `grep -F` that let to multiple matches on
the same line was broken.
The second bug was another memory overwrite, visible when I ran the
existing grep tests.
Bug: http://b/137573082
|
|
On BSD these are actually the same, and there's a -S that you need in
addition. So strictly this is a behavior change for Android (which is
going from BSD grep to toybox grep), but it's a behavior preserving
change for the AOSP build (which is going from GNU grep to toybox grep),
and the latter actually has a checked-in use of -R where the former
doesn't.
|
|
Used quite a lot, especially with `--exclude-dir=.git`.
|
|
Add test to show failure case.
|
|
|
|
Also add a test, and add a test for timeout now it's been fixed.
|
|
|
|
--quiet is used 3x more than --silent in my corpus, but they're both
used surprisingly often. (Surprising to someone who thinks -q is part of
the core set of grep options that "everybody knows".)
|
|
into display function, use unsigned length so output lines can be up to 4g each.
|
|
|
|
|
|
|
|
Handling -e by gluing together multiple regexes with | wasn't portable,
break down and do a linked list with for loops.
|
|
|
|
|
|
When necessary, realloc() the line to add 4 aligned bytes of storage at
the end, stick the unsigned offset in there, and then fish it back out for
display (and add 1 because offset is 0 based and display is 1 based).
|
|
Be consistent about upper versus lower case. (Upper seems to have the
majority, so I went with that, though I'm happy to provide the opposite
patch as long as we're consistent!)
Be consistent about using \t. (Though saving a few bytes seems like it
might be better done in the code that generates help.h rather than
directly in the source, since tabs make careful ASCII art layout hard
enough that we regularly have things misaligned.)
Remove trailing periods (most of which seem to have been added by me).
Always use the US "human readable" rather than my British
"human-readable", and be more consistent about declaring whether we're
showing multiples of 1000 or 1024.
Just say "verbose" rather than adding a useless "mode" or "output".
|
|
bionic, glibc, macOS, and musl all have strcasestr
(see http://man7.org/linux/man-pages/man3/strstr.3.html).
macOS (via BSD) has a strnstr that does what strnstr sounds like it
should do by analogy with strnlen and strncpy.
So we at least need to rename strnstr, but it probably makes more sense
just to switch to strcasestr instead.
|
|
|
|
|
|
|
|
They don't imply -r because you might do find . -type f | xargs -S blah regex
|
|
|
|
|
|
line aren't filtered out. Audited all the callers and removed redundant
calls, adjusted call sequence, etc. (And let rm _not_ do this, because posix.)
|
|
|
|
add xopenro() that takes one argument and understands "-" means stdin,
and switch over lots of users.
|
|
continues to get worse, and now can't handle INT_MAX/2 either. So our
first workaround _also_ broke.
But posix says "A negative precision is taken as if the precision were
omitted." and that _doesn't_ trigger the glibc bug, so use that instead.
|
|
the deeply sad passwd heuristics that don't even check numbers and punctuation.
|
|
|
|
|
|
|
|
|
|
checking, and fix up format checking complaints.
Added out(type, value) function to stat to avoid a zillion printf typecasts.
|
|
This lets '(x)\1' match, as reported by Isabella Parakiss.
|
|
|
|
|
|
|
|
macros for a disabled command (needed when multiple commands share infrastructure with a common set of flags).
This means the flag space is no longer packed, but leaves gaps where the zeroes
go. (Actual flag bit positions are the same for all configs.) Since the
option parsing needs to know where the holes are, the OPTSTR values are
now generated as part of flags.h with ascii 1 values for the disabled values.
(So generated/oldflags.h went away.)
This also means that the option string argument for OLDTOY() went away, it now
uses the same arguments as the NEWTOY() it references.
|
|
Isaac's roadmap update.
Mercurial's "import" command is still broken, committing local tree changes to files that weren't even touched by the patch because the hg developers inisist, when I point out how stupid it is, that they meant to do that. (hg record can do hunks, but import can't even track _files_.)
|
|
Some glibc commands are irrelevant because they're for functionality
that is excluded from musl (mtrace, rpc*, localedef, iconvconfig, nscd).
getconf and catchsegv look like candidates for the development toolchain;
locale and iconv were already triaged.
getent is pretty lame, but it and the timezone stuff (tzselect zic
zdump) are the only new possibly interesting commands.
|
|
|
|
|
|
Grep miscalculates the amount of memory it needs to allocate when "converting
strings to one big regex" when the -e flag is not specified. Since in this case
"\|" is inserted between strings rather than "|", two extra bytes rather than
one need to be provided for each string. I noticed this because it caused grep
to seg-fault on musl when a regex of exactly seven characters is provided.
|