2.5.21. Regular Expression

TODO.

2.5.21.1. Instruction

regexp.compile op1 op2 [regexp::CompileString]
Operator 1:ref<regexp>
Operator 2:string

Compiles the pattern in op2 for subsequent matching. The pattern must only contain ASCII characters and must not contain any back-references. Each regexp instance can be compiled only once. Throws ~~ValueError if a second compilation attempt is performed. .. todo: We should support other than ASCII characters too but need the notion of a local character set first. .. todo: We should add compilation flags, like case-insensitive.

target = regexp.find op1 op2 [regexp::FindString]
Target:int<32>
Operator 1:ref<regexp>
Operator 2:string

Scans either string in op2 for the regular expression op1. Returns a positive integer if a match found found; if a set of patterns has been compiled, the integer then indicates which pattern has matched. If multiple patterns from the set match, the left-most one is taken. If multiple patterns match at the left-most position, it is undefined which of them is returned. The instruction returns -1 if no match was found but adding more input bytes could change that (i.e., a partial match). Returns 0 if no match was found and adding more input would not change that. .. todo: This string variant is not yet implemented.

target = regexp.groups op1 op2 [regexp::GroupsString]
Target:ref<vector<*>>
Operator 1:ref<regexp>
Operator 2:string

Scans the string in op2 for the regular expression op1*. If the regular expression is found, returns a vector with one entry for each group defined in the regular expression. Each entry is either the substring matching the corresponding subexpression or a range of iterators locating the matching bytes, respectively. Index 0 always contains the string/bytes that match the total expression. Returns an empty vector if the expression is not found. This method is not compatible with sets of multiple patterns; throws PatternError if used with a set, or when no pattern has been compiled yet. Todo: The string variant is not yet implemented. .. todo: This string variant is not yet implemented.

target = regexp.match_token op1 op2 [regexp::MatchTokenString]
Target:tuple<*>
Operator 1:ref<regexp>
Operator 2:string

Matches the beginning of the string in op1 for the regular expression op1 (if op3 is not given, searches until the end of the bytes object). The regexp must have been compiled with the ~~NoSub attribute. Returns a 2-tuple with (1) a integer match-indicator corresponding to the one returned by ~~Find; and (2) a bytes iterator that pointing one beyond the last examined byte (i.e., right after the match if we had one, or right after the input data if not). Note: As the name implies, this a specialized version for parsing purposes, enabling optimizing for the case that we don’t need any subexpression capturing and must start the match right at the initial position. Internally, the implementation is only slightly optimized at the moment but it could be improved further at some point. Todo: The string variant is not yet implemented. The bytes implementation should be further optimized. .. todo: This string variant is not yet implemented.

target = regexp.match_token_advance op1 op2 [regexp::MatchTokenAdvanceString]
Target:tuple<*>
Operator 1:ref<match_token_state>
Operator 2:string

Performs matching previously initialized with ~~regexp.match_token_init`` on the string in op2. If op3 is not given, searches until the end of the bytes object. This method can be called multiple times with new data as long as no match has been found, and it will continue matching from the previous state as if all data would have been concatenated. Returns a 2-tuple with (1) a integer match- indicator corresponding to the one returned by ~~Find; and (2) a bytes iterator that pointing one beyond the last examined byte (i.e., right after the match if we had one, or right after the input data if not). The same match state must not be used again once this instructions has returned a match indicator >= zero. Note: As their name implies, the regexp.match_token_* family of instructions are specialized versiond for parsing purposes, enabling optimizing for the case that we don’t need any subexpression capturing and must start the match right at the initial position. Todo: The string variant is not yet implemented. .. todo: This string variant is not yet implemented.

target = regexp.match_token_init op1 [regexp::MatchTokenInit]
Target:ref<match_token_state>
Operator 1:ref<regexp>

Initializes incrementatal matching for the regexp op1. op1 will be considered implicitly anchored to the beginning of the data, and it must have been compiled with the ~~NoSub attribute. This instruction does not perform any matching itself, you use ~~regexp.match_token_advance for that. Note: As their name implies, the regexp.match_token_* family of instructions are specialized versiond for parsing purposes, enabling optimizing for the case that we don’t need any subexpression capturing and must start the match right at the initial position.

target = regexp.span op1 op2 [regexp::SpanString]
Target:tuple<*>
Operator 1:ref<regexp>
Operator 2:string

Scans either the string in op2 for the regular expression op1. Returns a 2-tuple with (1) a integer match-indicator corresponding to the one returned by ~~Find; and (2) the matching substring or a tuple of iterators locating the bytes which match, respectively; if there’s no match, the second element is either an empty string or a tuple with two bytes.end iterators, respectively. Throws PatternError if no pattern has been compiled yet. Todo: The string variant is not yet implemented. .. todo: This string variant is not yet implemented.