aboutsummaryrefslogtreecommitdiff
path: root/toys/posix/grep.c
AgeCommit message (Collapse)Author
2021-05-15Convert utf8towc from wchar_t to unsigned (to match wctoutf8).Rob Landley
The maximum unicode code point is 0x10ffff which is 21 bits.
2021-03-12Fix grep bug testing errno before check statusRobin Hsu
It's legal for a system call to set non-zero errno and return a good status. The caller (grep) should check return status first. Test: 2000 loops of greping 1000+ lines each loop Signed-off-by: Robin Hsu <robinhsu@google.com> Change-Id: I55f7cd5d8a6289c5e8a21ed3057e995a75b9b4fa
2021-02-21Teach -o to print ranges that produce zero length matches.Rob Landley
And fix one test for NUL that should be a length test for -z support
2021-01-23Fix grep bug where -f /dev/null added "" regex matching everything,Rob Landley
and address TODO where -z was still splitting patterns on \n
2020-12-11The "fall back to C.UTF-8" check was backwards, and make TOYFLAG_LINEBUFRob Landley
configurable.
2020-10-29Make it easier to switch regex implementations.Elliott Hughes
One reason to use toybox on the host is to get the same behavior across Android/Linux/macOS. Unfortunately (as we've seen from a few bugs) one area where that doesn't quite work is that toybox uses the libc regular expression implementation. That's fine, and mostly what users want, but those folks trying to get the exact same behavior everywhere might want to switch in a known regex implementation (bionic's NetBSD regex implementation, say) for increased consistency. That actually works pretty well, but portability.h has an #ifndef test for REG_STARTEND before including <regex.h> that gets in the way. To make up for that, this patch removes the unnecessary #include <regex.h> from grep.c itself.
2019-07-31Move the empty regex workaround into xregcomp.Elliott Hughes
No current caller except grep needs this, but consistency seems like a good idea. Also change the xregcomp error message to be a bit more human-readable, rather than mention an implementation detail.
2019-07-29grep: fake GNU behavior for non-POSIX empty regex.Elliott Hughes
POSIX says there's no such thing as an empty regular expression. The grammar excludes the possibility: https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html BSD agrees with POSIX, and Android and macOS' BSD-based implementations reject the empty regular expression. GNU apparently disagrees. Luckily, BSD does accept the empty *sub* expression `()`, despite their error message for REG_EMPTY being "empty (sub)expression". This is presumably a bug, except there's explicit code to support it that is at least 26 years old: https://github.com/freebsd/freebsd/blame/master/lib/libc/regex/regcomp.c#L383 This workaround also works fine with glibc. If we want GNU behavior, I'm struggling to come up with another way to fake it. If we want POSIX behavior, we could easily just add a check to reject "" on glibc. Also switch to xregcomp().
2019-07-16Fix unaligned access, tweak test suite.Rob Landley
2019-07-16grep: fix two bugs found by hwasan.Elliott Hughes
The first bug appeared as a memory overwrite, but was actually visible without hwasan: basically any `grep -F` that let to multiple matches on the same line was broken. The second bug was another memory overwrite, visible when I ran the existing grep tests. Bug: http://b/137573082
2019-07-12grep: add -R as well as -r.Elliott Hughes
On BSD these are actually the same, and there's a -S that you need in addition. So strictly this is a behavior change for Android (which is going from BSD grep to toybox grep), but it's a behavior preserving change for the AOSP build (which is going from GNU grep to toybox grep), and the latter actually has a checked-in use of -R where the former doesn't.
2019-05-25grep: add --exclude-dir.Elliott Hughes
Used quite a lot, especially with `--exclude-dir=.git`.
2019-05-17Fix a missing else, and an inverted test hidden by the missing else.Rob Landley
Add test to show failure case.
2019-04-18Ignore --line-buffered argument for script compatibility (it's the default).Rob Landley
2019-03-14grep: use TOYFLAG_ARGFAIL for grep too.Elliott Hughes
Also add a test, and add a test for timeout now it's been fixed.
2019-02-24grep: add missing long synonyms used in AOSP.Elliott Hughes
2019-02-19grep: add --quiet and --silent synonyms for -q.Elliott Hughes
--quiet is used 3x more than --silent in my corpus, but they're both used surprisingly often. (Surprising to someone who thinks -q is part of the core set of grep options that "everybody knows".)
2019-01-24grep: "tried" should track arguments (not files) that existed, move -o "" testRob Landley
into display function, use unsigned length so output lines can be up to 4g each.
2018-12-23Add grep --colorRob Landley
2018-12-17Use FLAG() macros in grep.Rob Landley
2018-12-17A couple more grep tests, and slightly use dlist_terminate() for the loops.Rob Landley
2018-12-17Fix remaining grep_tests.Rob Landley
Handling -e by gluing together multiple regexes with | wasn't portable, break down and do a linked list with for loops.
2018-12-09Support embedded NUL bytes in grep output, and free memory leaked per-file.Rob Landley
2018-12-09More grep.tests: make exit code 2 happen when it should.Rob Landley
2018-12-09Fix first grep.test failure (-B + -b not producing middle field).Rob Landley
When necessary, realloc() the line to add 4 aligned bytes of storage at the end, stick the unsigned offset in there, and then fish it back out for display (and add 1 because offset is 0 based and display is 1 based).
2018-12-04Clean up some --help formatting.Elliott Hughes
Be consistent about upper versus lower case. (Upper seems to have the majority, so I went with that, though I'm happy to provide the opposite patch as long as we're consistent!) Be consistent about using \t. (Though saving a few bytes seems like it might be better done in the code that generates help.h rather than directly in the source, since tabs make careful ASCII art layout hard enough that we regularly have things misaligned.) Remove trailing periods (most of which seem to have been added by me). Always use the US "human readable" rather than my British "human-readable", and be more consistent about declaring whether we're showing multiples of 1000 or 1024. Just say "verbose" rather than adding a useless "mode" or "output".
2018-11-28macOS: replace local strnstr with strcasestr.Elliott Hughes
bionic, glibc, macOS, and musl all have strcasestr (see http://man7.org/linux/man-pages/man3/strstr.3.html). macOS (via BSD) has a strnstr that does what strnstr sounds like it should do by analogy with strnlen and strncpy. So we at least need to rename strnstr, but it probably makes more sense just to switch to strcasestr instead.
2018-11-02Convert more option vars to the new (single letter) coding style.Rob Landley
2018-08-26Add binary file detection to grep.Rob Landley
2017-06-14Grep exits with 2 for errors, which can happen at any time ( > /dev/full).Rob Landley
2017-06-12Add grep -M match and -S skip supporting wildcard patterns.Rob Landley
They don't imply -r because you might do find . -type f | xargs -S blah regex
2017-06-11Provide error messages for files we can open but not read (ala directories).Rob Landley
2017-06-10Fix bug where grep stopped at first dangling symlink and error_exited().Rob Landley
2016-11-21Have dirtree_notdotdot() pass through !node->parent so . and .. on the commandRob Landley
line aren't filtered out. Audited all the callers and removed redundant calls, adjusted call sequence, etc. (And let rm _not_ do this, because posix.)
2016-09-05Replace loopfiles' failok with WARN_ONLY open flag.Rob Landley
2016-08-04Make xopen() skip stdin/stdout/stderr, add xopen_stdio() if you want stdout,Rob Landley
add xopenro() that takes one argument and understands "-" means stdin, and switch over lots of users.
2016-06-15The glibc bug at https://sourceware.org/bugzilla/show_bug.cgi?id=17829Rob Landley
continues to get worse, and now can't handle INT_MAX/2 either. So our first workaround _also_ broke. But posix says "A negative precision is taken as if the precision were omitted." and that _doesn't_ trigger the glibc bug, so use that instead.
2016-02-10Factor out strnstr() since posix hasn't got it, and add a config option forRob Landley
the deeply sad passwd heuristics that don't even check numbers and punctuation.
2016-02-10Although printf("%.*s", INT_MAX, s) works fine on ubuntu 12.04, it broke since.Rob Landley
2016-02-04Fix -H and -n with -ABC, and add tests.Rob Landley
2016-01-30Add grep -B -CRob Landley
2016-01-30Add grep -ARob Landley
2016-01-05Add error_msg_raw() and friends, replace error_msg("%s", s) uses, enable formatRob Landley
checking, and fix up format checking complaints. Added out(type, value) function to stat to avoid a zillion printf typecasts.
2015-11-01Change grep -w to checking matches after the fact rather than modifing regex.Rob Landley
This lets '(x)\1' match, as reported by Isabella Parakiss.
2015-06-06Last grep commit broke non -r use of grep. Oops.Rob Landley
2015-05-20Make "grep -r regex" work on implicit "." if no files specified.Rob Landley
2015-02-14Make egrep and fgrep build standalone.Rob Landley
2014-12-31Redo option parsing infrastructure so #define FORCE_FLAGS can unzero flag ↵Rob Landley
macros for a disabled command (needed when multiple commands share infrastructure with a common set of flags). This means the flag space is no longer packed, but leaves gaps where the zeroes go. (Actual flag bit positions are the same for all configs.) Since the option parsing needs to know where the holes are, the OPTSTR values are now generated as part of flags.h with ascii 1 values for the disabled values. (So generated/oldflags.h went away.) This also means that the option string argument for OLDTOY() went away, it now uses the same arguments as the NEWTOY() it references.
2014-04-16Revert lots of half-finished local debris I didn't mean to check in with ↵Rob Landley
Isaac's roadmap update. Mercurial's "import" command is still broken, committing local tree changes to files that weren't even touched by the patch because the hg developers inisist, when I point out how stupid it is, that they meant to do that. (hg record can do hunks, but import can't even track _files_.)
2014-04-12roadmap: describe glibc commands.Isaac Dunham
Some glibc commands are irrelevant because they're for functionality that is excluded from musl (mtrace, rpc*, localedef, iconvconfig, nscd). getconf and catchsegv look like candidates for the development toolchain; locale and iconv were already triaged. getent is pretty lame, but it and the timezone stuff (tzselect zic zdump) are the only new possibly interesting commands.