Regular Expression Callouts [AHK_L 14+]

RegEx callouts provide a means of temporarily passing control to the script in the middle of regular expression pattern matching. For detailed information about the PCRE-standard callout feature, see pcre.txt.

RegEx callouts are currently supported only by RegExMatch() and RegExReplace().

Table of Contents

Syntax

The syntax for a RegEx callout in AutoHotkey is (?CNumber:Function), where both Number and Function are optional. Colon ':' is allowed only if Function is specified, and is optional if Number is omitted. If Function is specified but is not the name of a user-defined function, a compile error occurs and pattern-matching does not begin.

If Function is omitted, the function name must be specified in a variable named pcre_callout. If both a global variable and local variable exist with this name, the local variable takes precedence. If pcre_callout does not contain the name of a user-defined function, RegEx callouts which omit Function are ignored.

RegEx Callout Functions

Function(Match, CalloutNumber, FoundPos, Haystack, NeedleRegEx)
{
    ...
}

RegEx callout functions may define up to 5 parameters:

These names are suggestive only. Actual names may vary.

Warning: Changing the input parameters of RegExReplace() or RegExMatch() during a call is unsupported and may cause unpredictable behaviour.

Pattern-matching may proceed or fail depending on the return value of the RegEx callout function:

For example:

Haystack := "The quick brown fox jumps over the lazy dog."
RegExMatch(Haystack, "i)(The) (\w+)\b(?CCallout)")
Callout(m) {
    MsgBox m=%m%`nm1=%m1%`nm2=%m2%
    return 1
}

In the above example, Callout is called once for each substring which matches the part of the pattern preceding the RegEx callout. \b is used to exclude incomplete words in matches such as The quic, The qui, The qu, etc.

EventInfo

Additional information is available by accessing the pcre_callout_block structure via A_EventInfo.

version           := NumGet(A_EventInfo,  0, "Int")
callout_number    := NumGet(A_EventInfo,  4, "Int")
offset_vector     := NumGet(A_EventInfo,  8)
subject           := NumGet(A_EventInfo,  8 + A_PtrSize)
subject_length    := NumGet(A_EventInfo,  8 + A_PtrSize*2, "Int")
start_match       := NumGet(A_EventInfo, 12 + A_PtrSize*2, "Int")
current_position  := NumGet(A_EventInfo, 16 + A_PtrSize*2, "Int")
capture_top       := NumGet(A_EventInfo, 20 + A_PtrSize*2, "Int")
capture_last      := NumGet(A_EventInfo, 24 + A_PtrSize*2, "Int")
pad := A_PtrSize=8 ? 4 : 0  ; Compensate for 64-bit data alignment.
callout_data      := NumGet(A_EventInfo, 28 + pad + A_PtrSize*2)
pattern_position  := NumGet(A_EventInfo, 28 + pad + A_PtrSize*3, "Int")
next_item_length  := NumGet(A_EventInfo, 32 + pad + A_PtrSize*3, "Int")
if (version >= 2)
    mark   := StrGet(NumGet(A_EventInfo, 36 + pad + A_PtrSize*3, "Int"), "UTF-8")

For more information, see pcre.txt, NumGet() and A_PtrSize.

Auto-Callout

Including C in the options of the pattern enables the auto-callout mode. In this mode, RegEx callouts equivalent to (?C255) are inserted before each item in the pattern. For example, the following template may be used to debug regular expressions:

; Set the default RegEx callout function.
pcre_callout := "DebugRegEx"

; Call RegExMatch with auto-callout option C.
RegExMatch("xxxabc123xyz", "C)abc.*xyz")

DebugRegEx(Match, CalloutNumber, FoundPos, Haystack, NeedleRegEx)
{
    ; See pcre.txt for descriptions of these fields.
    start_match       := NumGet(A_EventInfo, 12 + A_PtrSize*2, "Int")
    current_position  := NumGet(A_EventInfo, 16 + A_PtrSize*2, "Int")
    pad := A_PtrSize=8 ? 4 : 0
    pattern_position  := NumGet(A_EventInfo, 28 + pad + A_PtrSize*3, "Int")
    next_item_length  := NumGet(A_EventInfo, 32 + pad + A_PtrSize*3, "Int")

    ; Point out >>current match<<.
    _HAYSTACK:=SubStr(Haystack, 1, start_match)
        . ">>" SubStr(Haystack, start_match + 1, current_position - start_match)
        . "<<" SubStr(Haystack, current_position + 1)
    
    ; Point out >>next item to be evaluated<<.
    _NEEDLE:=  SubStr(NeedleRegEx, 1, pattern_position)
        . ">>" SubStr(NeedleRegEx, pattern_position + 1, next_item_length)
        . "<<" SubStr(NeedleRegEx, pattern_position + 1 + next_item_length)
    
    ListVars
    ; Press Pause to continue.
    Pause
}

Remarks

RegEx callouts are executed on the current quasi-thread, but the previous value of A_EventInfo will be restored after the RegEx callout function returns. ErrorLevel is not set until immediately before RegExMatch() or RegExReplace() returns.

PCRE is optimized to abort early in some cases if it can determine that a match is not possible. For all RegEx callouts to be called in such cases, it may be necessary to disable these optimizations by specifying (*NO_START_OPT) at the start of the pattern. This requires [v1.1.05] or later.