Dynatrace Pattern Language
Dynatrace Pattern Language (DPL) is a pattern language that allows you to describe patterns using matchers, where a matcher is a mini-pattern that matches a certain type of data. For example,
INT) matches integer numbers, and
IPADDR matches IPv4 or IPv6 addresses. There are matchers available to handle all kinds of data types.
A written pattern is interpreted from left to right, ignoring extra whitespaces, line breaks, and comments in between. You can write the pattern describing an integer followed by a single space, IP address, and line break as the following one-liner:
INT ' ' IPADDR EOL
Or you can write the same pattern in a more explanatory way:
/* this pattern expects an integer number and an IP address separated by single space in each line */ INT //an integer ' ' //followed by single space IPADDR //followed by IPv4 or IPv6 address EOL //line is terminated with line feed character
Matching vs Parsing
You don't necessarily need all data elements in the input data for analysis. For instance, field separators or end-of-record markers in a log line are useful only for parsing, but we don't need them when we run the queries. All matchers in a defined pattern must match, but only a subset of them may also extract (parse) data.
A matcher will extract data only when it has been assigned an export name - this is an arbitrary name of your choice, which becomes the name of the field you use in query statements.
In the following example, the pattern has:
- 5 matchers in total
- 3 matchers that are extracting data
Note that we chose to ignore characters between the IP address and timestamp, and also ignore end-of-line characters.
A DPL pattern consists of one or more matcher expressions. They can be separated by whitespace or commas or newlines. For a handy reference guide to all matchers, see the DPL Grammar page.
In general, a matcher expression can be any of the following:
- Built-in matchers for many frequently used data types (numeric, time, network, and so on)
- literal expressions
- character groups, which are arbitrary set of characters to be matched (Regular Expression compatible)
- Reference to another pattern expression, to facilitate building complex patterns in a modular way
Matcher expressions can be grouped:
- sequence group—defines an ordered sequence of matchers
- alternatives group—defines a list of matchers to choose from
- array—to parse repeated data elements
- structure—to capture parsed data as composite type
- enum group—to match strings to numeric values
- JSON—to parse JSON structures
Matcher Expression Syntax
A matcher expression consists of the matcher itself and optional controlling elements (modifiers) arranged in the following order (from right to left, where square brackets indicate optional elements):
[lookaround] MATCHER_EXPR ['(' configuration ')'] [quantifier] [mod_optional] [':'export_name]
Note that whitespace characters and newlines are allowed between the elements. Placing elements in a different order (for instance, by placing
mod_optional after the
export_name) will cause a syntax error.
Matcher expressions have
Some matchers allow configuration specifying their behavior. For instance, a timestamp needs an expected format definition.
Most matchers and groupings can be added with a quantifier to tell the engine how many times it should try to match.
All matchers and groupings can be declared to be optional. If the element in the expected position is missing, the engine outputs NULL to the resultset and continues with the next matcher in the expression.
All matchers and groupings can be assigned an export name, which is the name of the field exposed to the query layer.
The sole purpose of pattern matching is to make data elements available for the query engine. However, not all matched elements are needed for queries (such as field separators in tabulated files), so an export name is a mechanism for the user to declare which data elements are exposed for queries (at the same time providing a name for the query fields). A matcher without an export name still does its job matching the pattern, but it's not visible in queries.
All matchers and groupings can "look around" (backward or forward), mainly to enable decision-making (conditional branching).
The following is an example of step-by-step pattern matching.
Suppose we have a comma-separated record (terminated with the line feed character) with the following fields:
- order number - integer
- username - consisting of upper and lower case letters and numbers (but not a comma)
- ipv4 address of the user
1,alice,192.168.1.1 2,bob,10.6.24.18 3,mallory,192.168.1.3
This structure can be described by the following pattern expression:
INT:seq ',' LD:uname ',' IPADDR:user_ip EOL
- line 1: integer matcher for the order number, visible in queries as 'seq' on line 2,4 constant string matcher for the field separator, not visible in queries
- line 3: line data matcher, visible in queries as 'uname'
- line 5: IP address matcher, visible in queries as 'user_ip'
- line 6: chargroup matcher for the line feed terminating the record
The pattern matching engine tries to apply the pattern by utilizing matchers in the order in which they were defined. The example above starts by trying to match
INT:seq at the first byte of input data. This happens to be
1. As it is suitable for an integer type, it moves on to the next byte.
Next, the engine finds the byte to be a comma. This does not match with an integer, so the
INT:seq matcher is completed by converting
1 to an integer and the next matcher in the pattern is selected:
','. The engine tries it for a current position of data and finds a match.
So the data pointer is moved on to the next byte (pointing to the first letter of
bob). As the constant string matcher contained just one character, the matcher is considered complete and the engine takes the next one in the pattern:
[a-zA-Z0-9]*:uname. The quantifier
[a-zA-Z0-9]*:uname to consume a variable number of bytes (zero or more), so it keeps matching until it finds a byte not matching its defined characters. This happens at the second comma (just after
bob), where the engine considers the
[a-zA-Z0-9]*:uname matcher complete and takes the next one:
','. Again, it tries to match it to the byte at the current position and succeeds.
The data pointer is moved to the next byte, pointing to the beginning of
',' is completed, the engine takes
IPV4ADDR:user_ip. Trying it from the current position, a match is found and the data pointer is moved forward 11 bytes, now pointing to a newline character. The engine finds a match for it using the last matcher in the pattern:
Now the data pointer is advanced to the next byte, the pattern iterator is reset, and the cycle continues with trying the first matcher of the pattern against the currently pointed data. This continues until the end of the input data.
If the engine encounters data for which it is unable to find a match, it resets the pattern iterator, marks this byte as unmatched, and moves on to the next byte. This continues until a match is found or there is no more data. Eventually, the following structured data is the output: