ScanFile is the main module supervising the search. It accepts the RE specification from GGrep, splits out the easier bit to scan first if available, and then prepares to search each RE (using STBM and/or the Grouse FSA as appropriate). The module also oversees the file search, requesting memory-buffered file input from FastFile, invoking the search engine as required on each buffer, and managing search reporting, including handling output if inverted match sense is selected.
The implementation of this module is a bit of a mess. Most noticeably, while just about everything else in Grouse Grep is reentrant, this module most certainly isn't. (To be fair, perhaps the large search context structure in MatchEng could be reworked as well, sigh.) Currently, buffer offsets are described using 32-bit integers, which limits offset reporting for very large files.
An interesting aspect of the search is the role of fast scans and slower matches: ScanFile, like the [Self-] Tuned Boyer Moore algorithms and the state table search, splits the search effort into a fast scan and a slower match component if feasible, based on the EasiestFirst analysis by RegExp. This scan/match separation seems to be a universal feature of high-speed search algorithms.
ScanFile also handles directory recursion, although it doesn't cover the full set of skip and read-as-binary options supported by GNU Grep. [Most ironically, I wasn't going to include recursion, until my private e-mail to the author ended up being published as the letter "Grepping and Globbing" in the September 1999 Dr. Dobb's Journal Letters to the Editor. In this message, I noted how the DOS version of GGrep supported file globbing and directory recursion.]
ScanFile's performance could improve if the search management was a little more sophisticated: For example, the word edge tests from the -w switch could be handled separately and more directly instead of being appended to the RE being searched.
Init -- Prepare module for operation Start -- Begin managing what has to be managed OutputFunctions -- Specify functions to perform match output MatchFunction -- Define routine to perform match Pattern -- Specify RE to be searched Configure -- Define how the module searches and reports matches Search -- Perform specified search on a file MatchedAny -- Report if any files matched search criteria TraceryLink -- Tell Tracery how to deal with us
NewScanContext -- Prepare blank scan context block MatchedAbandon -- Halt search if matching line found Open -- Prepare file for scanning ExpandNames -- Build a list of all files in a directory RecurseDir -- Enumerate and search files in directory DisplayBlock -- Display block of lines (for inverted match) SearchBuffer -- Search one buffer of file NoMatchFunction -- Place-holder to warn of incorrect config