Grouse grep comes with a regression test rig and a flog test rig. These rigs try to examine the program in a complementary fashion:
The regression rig is closely based on the regression test rig included in Perl, and tries to:
· Check that the search handles simple cases correctly;
· Explore the boundary conditions around each search component, to ensure that the search behaves correctly when presented with choices right at the limits of each selection;
· Checks that searches across a reference set of files (based on the Calgary text compression corpus) produces the correct line counts for a range of patterns and option switch settings; and
· Check that the state tables generated for a set of reference patterns conforms exactly with a reference set, to check that no subtle corruption creeps in.
The flog rig investigates:
· Checking the code's performance with much larger quantities of input;
· Checks the program's buffer handling for very large files;
· Checks that non-text files like object, executable and compressed archives;
· Checks that the program behaves identically to GNU Grep across a wide variety of search option settings; and
· Checks the interaction of RE elements by composing random REs to test.
Both rigs contain regression tests -- as problems are found, tests may be added that check the problem case and fail if the problem is present.
The test directory is in the directory "te " under the main directory, and each test script file has the extension te. This name comes from Benjamin Hoff's (no relation) book, "The Te of Piglet", where Te roughly means virtue. Te is quite close to "test", and the point of the test rig is to help provide some evidence that the program has some virtue.
Tests are organised into a set of subdirectories under te:
To execute the entire suite of tests, simply run the te script in the te directory. This script finds all the .te files in the next directory level, and, for each script, changes to that directory and executes the test. Each test file is an independent Perl script, able to be executed in its own right, generating readable but slightly terse output. The test script assumes that the program to test is ../src/ggrep.
The base tests check that basic operations work correctly. Test files in this directory check each RE component, and also include general tests, mainly from Henry Spencer's superb RE package: 0literal.te 1class.te 2notclas.te 3anchor.te 4option.te 5star.te 6plus.te 7chclass.te 8words.te general.te dotest.pl
A feature of the tests is that not only do we check that the line is matched (or not matched) correctly, but we also check that the search engine has in fact identified the correct text within the line, as the match position reported by the search engine is very important in most applications. We do this by marking the match text with a marker character (usually ': ') in the test script. When the test executes, it removes the marker from the target text, then uses the -Mc switch to force GGrep to report the match text by adding the marker character. The test passes if the output from GGrep matches the original text before the markers were removed.
The files tests are a simple set of tests that check line counts reported by GGrep for various patterns across the option switches -c , -cv, -ci, -civ . The files themselves are based on the Calgary Corpus text compression suite.
The grey tests implement "grey box" tests. The base and file tests are effectively black-box tests: They seek to test the program's operation without using the actual source code as a guide on how and what to test. There are no white-box tests (where we seek to ensure that every line of code in the target program has been thoroughly exercised). The black box tests are very much a hit and miss affair: it is quite easy to introduce exotic bugs or subtle corruption cases that are missed by the tests, as the rig can't possibly afford to test every combination.
Another testing strategy is required to pick up the corruption cases by inspection: Hence, we have the grey-box tests in this directory. These tests execute by presenting a series of "reference" patterns to GGrep, then using the -D switch to obtain a dump of the state tables generated by the search. The test rig compares the generated tables against the description in the test file, and reports a test failure if any discrepancies are found.
Any worthwhile test rig should include flog tests -- tests that stress the program and strive to uncover infrequent or inconspicuous problems. "Flog" is a great term -- it implies a healthy but tongue-in-cheek disrepect for the program at hand. To some extent, these rigs try to broaden the test coverage to include cases outside the designer's mindset, simply by creating test cases at random (but not entirely random -- some structure is imposed by the generator to improve the coverage of the test cases).
Because the tests are random, it is essential to carefully record the test input and the program output, as otherwise faults may be missed, or may be irreproducible. The flog rig creates the files flog1.log and flog1.sum that contain details and summary respectively of the tests and results from the rig.
The GGrep flog rig compares the output of GNU grep with Grouse Grep across a set of option switch combinations. The input files are the files in the source directory (mainly C source and object files); while this is a reasonable mix of files, there is some variability in the file data which may affect test effectiveness.
In addition to testing both the output and the return status code, the rig also compares the execution time of GNU versus Grouse for the specified pattern and option switch setting, and selects a colour from bright green (if Grouse is more than 30% faster) to bright red (if Grouse is more than 20% slower).
The flog script is called flog1 , and is in the flog subdirectory under the project root. You need to use a terminal window with 120 columns to support the wide format. As with the regression rig, the test script assumes that the program is ../src/ggrep
Some patterns are known to fail in this test: Most of these failures come from differences in interpreting "^ " as a literal or as a start-of-line character when the character appears anywhere in the RE other than in the first position. GNU's choices here depend on the way the DFA compiles the RE, which is obscure to outsiders (to say the least). Since Grouse Grep needs to be reworked to use a DFA match engine itself, I haven't bothered to fix this discrepancy.
Thre has been a mild bias during development towards handling large files more efficiently than smaller files, although this has been corrected to some extent by the flog rig. The main file searched for performance testing has been an 8Mb BBS catalogue file from a local supplier, downloaded in 1995. Here is a sample of the lines in the file (grepping for grep):
UWIN10.ZIP 80690 17/11/91 All the joy of UNIX with a Windows user interface! cat, cd, cp, df, fgrep, lpr, ls, od, pr, sort, touch, uniq, and wc are included. Input/output redirection and pipes are supported. A good addition to development environments. Try it with the Quick C for Windows tools menu! BMGREP1.ZIP 10708 21/03/89 c - source code included in bmgrep.ZIP BM.ZIP 32750 03/08/86 much faster file search than grep ci86 FGREP181.ZIP 28774 05/10/94 Fgrep V1.81: Fast Unix-like Util That can be Used to Find Strings of Chars in Ascii Text Files, and Arbitrary Sequences of Bytes in Other Files; Recursive Subdir Searches; Search Fields can be Defined; Build Your Own Frequ FGREP172.ZIP 13024 22/11/89 Ver 1.72 Of Fgrep Searches For Text Within Fls PICNIX3.ZIP 49459 30/07/86 3 5 unix fgrep grep ls more doc LOOKFR.ZIP 4347 14/10/90 Search Ascii Zips Without Unzipping! Need Fgrep Als UTOOLS.ZIP 140674 02/02/88 unix tools for msdos grep pr chmod..etc GREPDDJ.ZIP 28715 18/05/86 grep from dr dobb's toolbox source too GREP203.ZIP 29287 21/06/93 (f)grep utility for OS/2 2.0 GA. v2.03 grep text searching u GREP152A.ZIP 28540 14/12/92 Grep v1.52A: OS/2 2.0 GA level grep utility designed to search from the current path on downwards. RGREP.ZIP 6380 09/10/91 This is a PERL language script. In addition to showing what a working script looks like, it is also a very good implementation of egrep, with a recursive directory search added. Requires PERL4014.ZIP.
And although it's neither required nor recommended that you look at this file, here is the entire 8Mb file, compressed to 2.4Mb or so with bzip2: bbscat.bz2 .
Why is "faults" consistently misspelt as "faultss" in the test rig? The answer is: So that passing all tests doesn't fool you into thinking that the program's perfect. This typo in the test rig is a gentle reminder to be wary.