Next: , Previous: , Up: Top   [Contents][Index]


Appendix D Querying using regular expressions

See also Query expressions.

Unfortunately, we do not have room in this manual for a complete exposition on regular expressions. The following is a basic summary of some regular expressions you might wish to use.

NOTE: When you use query expressions containing regular expressions as part of an ordinary query-pr shell command line, you need to quote them with '', otherwise the shell will try to interpret the special characters used, yielding highly unpredictable results.

See Regular Expression Syntax in Regex, for details on regular expression syntax. Also see Syntax of Regular Expressions in GNU Emacs Manual, but beware that the syntax for regular expressions in Emacs is slightly different.

All search criteria options to query-pr rely on regular expression syntax to construct their search patterns. For example,

query-pr --expr 'State="open"' --format full

matches all PRs whose State values match with the regular expression ‘open’.

We can substitute the expression ‘o’ for ‘open’, according to GNU regular expression syntax. This matches all values of State which begin with the letter ‘o’.

We see that

query-pr --expr 'State="o"' --format full

is equivalent to

query-pr --expr 'State="open"' --format full

in this case, since the only value for State which matches the expression ‘o’ in a standard installation is ‘open’. ‘State="o"’ also matches ‘o’, ‘oswald’, and even ‘oooooo’, but none of those values are valid states for a Problem Report in default GNATS installations.

We can also use the expression operator ‘|’ to signify a logical OR, such that

query-pr --expr 'State="o|a"' --format full

matches all ‘open’ or ‘analyzed’ Problem Reports.

Regular expression syntax considers a regexp token surrounded with parentheses, as in ‘(regexp), to be a group. This means that ‘(ab)* matches any number (including zero) of contiguous instances of ‘ab’. Matches include ‘’, ‘ab’, and ‘ababab’.

Regular expression syntax considers a regexp token surrounded with square brackets, as in ‘[regexp], to be a list. This means that ‘Char[(ley)(lene)(broiled) matches any of the words ‘Charley’, ‘Charlene’, or ‘Charbroiled’ (case is significant; ‘charbroiled’ is not matched).

Using groups and lists, we see that

query-pr --expr 'Category="gcc|gdb|gas"' --format full

is equivalent to

query-pr --expr 'Category="g(cc|db|as)"' --format full

and is also very similar to

query-pr --expr 'Category="g[cda]"' --format full

with the exception that this last search matches any values which begin with ‘gc’, ‘gd’, or ‘ga’.

The ‘.’ character is known as a wildcard. ‘.’ matches on any single character. ‘*’ matches the previous character (except newlines), list, or group any number of times, including zero. Therefore, we can understand ‘.*’ to mean “match zero or more instances of any character.”

query-pr --expr 'State=".*a"' --format full

matches all values for State which contain an ‘a’. (These include ‘analyzed’ and ‘feedback’.)

Another way to understand what wildcards do is to follow them on their search for matching text. By our syntax, ‘.*’ matches any character any number of times, including zero. Therefore, ‘.*a’ searches for any group of characters which end with ‘a’, ignoring the rest of the field. ‘.*a’ matches ‘analyzed’ (stopping at the first ‘a’) as well as ‘feedback’.

Note: When using ‘fieldtype:Text’ or ‘fieldtype:Multitext’ (see Query expressions), you do not have to specify the token ‘.*’ at the beginning of your expression to match the entire field. For the technically minded, this is because these queries use ‘re_search’ rather than ‘re_match’. ‘re_matchanchors the search at the beginning of the field, while ‘re_search’ does not anchor the search.

For example, to search in the >Description: field for the text

The defrobulator component returns a nil value.

we can use

query-pr --expr 'fieldtype:Multitext="defrobulator.*nil"' --format full

To also match newlines, we have to include the expression ‘(.|^M)’ instead of just a dot (‘.’). ‘(.|^M)’ matches “any single character except a newline (‘.’) or (‘|’) any newline (‘^M’).” This means that to search for the text

The defrobulator component enters the bifrabulator routine
and returns a nil value.

we must use

query-pr --expr 'fieldtype:Multitext="defrobulator(.|^M)*nil"'
         --format full

To generate the newline character ‘^M’, type the following depending on your shell:

csh

control-V control-M

tcsh

control-V control-J

sh (or bash)

Use the RETURN key, as in

(.|
)

Again, see Regular Expression Syntax in Regex, for a much more complete discussion on regular expression syntax.


Next: , Previous: , Up: Top   [Contents][Index]