Regular expression fundamentals

Regular expressions (regex) are logical formulas used for string pattern-matching in NAM configuration tasks. The syntax of regular expressions is described.

Overview

Regular expressions are used in a number of NAM configuration tasks and therefore basic understanding of the concept is required before configuring certain features. This section attempts to explain the basic concept of regular expressions. For more exhaustive explanations, please refer to any of the numerous online or hard-copy publications available on the subject.

Note

Note that there are various flavors or standards of regular expressions. NAM uses two of the standards: Basic POSIX and Extended POSIX . The basic difference between these is also explained below.

Definition of a "regular expression"

A regular expression is a logical formula enabling you to specify (match) a set of character strings and optionally extract sub-strings out of the found strings. It is usually used in the context of a larger set of character strings, out of which only certain ones fit (match) the specified regular expression or contain a substring that matches the expressions. Thus, for example, in a text file, a regular expression search enables you to find all the occurrences of a particular text pattern or all of the lines containing that pattern.

Example

A single regular expression can match a wide range of very different text strings. For example, the expression “.” matches any single character and the expression “.*” matches any number of occurrences of any character, that is, in effect, it matches anything.

Regular expressions can be used for finding particular text strings and then extracting certain parts of those text strings: the parts that match a sub-expression. The sub-expression is surrounded by round brackets (). For example, the expression “a(b.)”, in the extended POSIX syntax, will find all strings composed of three characters, out of which the first one is “a”, the second one is “b” and the third one is any character. It will then extract the second and third character. Note, however, that the match has to be based on both the regular expression part outside the round brackets and also that inside the round brackets, that is the character “a” has to be in the found string, even though it is not extracted.

Note that the same expression in the basic POSIX syntax would be written as “a\(b.\)”, since the syntax requires special characters, such as round brackets, to be escaped using the backslash character.

Common symbols

The more common regular expression symbols include:

  • period . Matches any character.
  • asterisk * Matches repetition of the previous character zero or more times.
  • plus sign + Matches repetition of the previous character one or more times. Note: In basic regular expressions, it needs to be preceded by a backslash, to prevent it from being considered a normal character to match.
  • caret symbol ^ This symbol can have a number of meanings, depending on the context:
    • If it appears at the beginning of the expression, it means the beginning of the line or search string.
    • If it appears as the first character in square brackets (see below), it means a negation. For example, “[^@]” means any character that is not “@”.
    • In other cases this character is considered a normal character and matches itself.
  • dollar sign $ Matches the end of the line or search string.
  • square brackets [...] Group together symbols denoting a class of characters that is symbols that are to match a single character, for example, [a-z] stands for any lower case alphabetical character, [^@] means a character that is not the @ symbol.
  • hyphen - Is used to specify ranges of characters; see [ ] above.
  • round brackets (...) escaped with backslashes in the basic syntax: \(...\) Select that part of the parsed string which we want to extract. Note: In basic regular expressions, round brackets need to be preceded by backslashes, to prevent them from being considered normal characters to match.
  • vertical bar or pipe | Is used as a regular expression delimiter that informs the regex engine to match either everything to the left of the vertical bar, or everything to the right of the vertical bar. You can use this character to match a single regular expression out of several possible regular expressions.

Escaping special characters

In basic regular expressions, special characters also referred to as meta-characters, such as ?, +, {, |, (, and ) lose their special meaning. To achieve equivalent functionality, you need to "escape" them, which means to precede any of these characters with a backslash character: \?, \+, \{, \|, \(, and \).

Example Basic POSIX regular expression

This is a walk-through example of a Basic POSIX regular expression.

For example, if an HTTP cookie name is defined as Pag and the cookie header line is:

Cookie: Pag=cf68603b@TXP293@10.17.3.125@D1R1wLLsMrjhw;

the cookie value is:

cf68603b@TXP293@10.17.3.125@D1R1wLLsMrjhw

Assuming that the actual substring that we want to extract is positioned between the first and second @ character in the cookie value string, it is:

TXP293

The Basic POSIX regular expression can then be defined as:

^[^@]*@\([^@]\+\)@

In this case, the above regular expression can be understood as follows:

  • ^ means find the beginning of the line.
  • [^@]* means skip zero or more occurrences of any character that is not @.
  • \( means the string to extract is described by that part of the expression that is contained within round brackets.
  • [^@] means that the first character (after @ we found above) must not be “@”.
  • \+ means that we want to extract this character and any other characters that follow it and that are also not “@”.
  • \) marks the end of the expression describing the string we want to extract.
  • @ means that the string to be extracted has to be terminated by “@”, but we do not include the terminating “@” character in the extracted string because it is outside of the round brackets.