module Agrep: sig
.. end
String searching with errors
type
pattern
The type of compiled search patterns
val pattern : ?transl:string -> string -> pattern
Compile a search pattern. The syntax for patterns is
similar to that of the Unix shell. The following constructs
are recognized:
?
match any single character
*
match any sequence of characters
[..]
character set: ranges are denoted with -
, as in [a-z]
;
an initial ^
, as in [^0-9]
, complements the set
&
conjunction (e.g. sweet&sour
)
|
alternative (e.g. high|low
)
(..)
grouping
\
escape special characters; the special characters
are \?*[]&|()
.
The optional argument
transl
is a character translation table.
This is a string
s
of length 256 that ``translates'' a
character
c
to the character
s.(Char.code c)
. A character
of the text matches a character of the pattern if they both
translate to the same character according to
transl
.
If
transl
is not provided, the identity translation
(two characters match iff they are equal) is assumed.
Useful predefined translation tables are provided in
Agrep.Iso8859_15
.
exception Syntax_error of int
Exception thrown by
Agrep.pattern
when the given pattern
is syntactically incorrect. The integer argument is the
character number where the syntax error occurs.
val pattern_string : ?transl:string -> string -> pattern
Agrep.pattern_string s
returns a pattern that matches exactly
the string
s
and nothing else. The optional parameter
transl
is as in
Agrep.pattern
.
val string_match : pattern -> ?numerrs:int -> ?wholeword:bool -> string -> bool
string_match pat text
tests whether the string text
matches the compiled pattern pat
. The optional parameter
numerrs
is the number of errors permitted. One error
corresponds to a substitution, an insertion or a deletion
of a character. numerrs
default to 0 (exact match).
The optional parameter wholeword
is true
if the pattern must
match a whole word, false
if it can match inside a word.
wholeword
defaults to false
(match inside words).
val substring_match : pattern ->
?numerrs:int -> ?wholeword:bool -> string -> pos:int -> len:int -> bool
Same as
Agrep.string_match
, but restrict the match to the
substring of the given string starting at character number
pos
and extending
len
characters.
val errors_substring_match : pattern ->
?numerrs:int -> ?wholeword:bool -> string -> pos:int -> len:int -> int
Same as
Agrep.substring_match
, but return the smallest number
of errors such that the substring matches the pattern.
That is, it returns
0
if the substring matches exactly,
1
if the substring matches with one error, etc.
Return
max_int
if the substring does not match the pattern
with at most
numerrs
errors.
module Iso8859_15: sig
.. end
Useful translation tables for the ISO 8859-15 (Latin-1 with Euro)
character set.