next up previous contents
Next: POS tagging Up: Token-to-word rules Previous: Tables and lists used   Contents

Expansion of the Festival Regex tools

Festival Regex: ``string-matches''
The Scheme function ``string-matches'' provides an interface to handle regular expressions. It takes a string and a regular expression and returns true if they match.
Expansion: ``pattern-matches''
The function ``pattern-matches'' extends the functionality of ``string-matches''. It binds the substrings to variables (like in perl). As real pattern matching takes a rather long time, this is just a simple implementation which only considers the longest match. The pattern has a slightly different structure compared to ``string-matches'':

. matches any character
$ matches the end of a string
^ matches the beginning of a string
X* matches none or several occurances of X
X+ matches one or more occurances of X
X? matches one or none occurance
[\( \ldots \)] matches a range (e.g. [abc],[a-zäöüß], [^abc] means neither
  a nor b nor c)
\\(...\\) parenthesized expressions are handled like single characters.
  This way '*', '+', '?', ...may be applied to more than
  just single characters.
X\\\( \vert \)Y matches either X or Y
{ ...} for each bracketed expression a new variable is generated.
  It is tied to the part of the string which matches the regular
  expression within curly brackets. The variables are
  #1, #2, ... starting from the left.

Here is a short example:
(pattern-matches "9pfünder" "{[0-9]+}{[a-zäöüß]+}")
returns true (t) and ties ``9'' to variable #1 and ``pfünder'' to variable #2.


next up previous contents
Next: POS tagging Up: Token-to-word rules Previous: Tables and lists used   Contents
Gregor Moehler
2001-07-17