Documentation

Regex.Backtrack

BoundedBacktracker #

An NFA backed bounded backtracker for executing regex searches with capturing groups (BoundedBacktracker.slots).

This regex engine only implements leftmost-first match semantics and only supports leftmost searches. By design, this backtracking regex engine is bounded. This bound is implemented by not visiting any combination of NFA state ID and position in a haystack more than once.

All searches execute in linear time with respect to the size of the regular expression and search text.

def BoundedBacktracker.Array.Ref.mkRef {α β : Type} [Inhabited β] (arr : Array α) :
ST.Ref β (Array α)

make reference of array

Equations
Instances For

    get array of reference

    Equations
    Instances For
      def BoundedBacktracker.Array.Ref.modifyRefValue {α β : Type} [Inhabited β] [DecidableEq β] (ref : ST.Ref β (Array α)) (index : Nat) (value : α) :
      ST.Ref β (Array α)

      modify array, try to perform the update destructively

      Equations
      • One or more equations did not get rendered due to their size.
      Instances For
        Equations
        Instances For

          Char position in a substring

          • current position

          • curr? : Option Char

            char at current position

          • prev? : Option Char

            char at previous position

          • isPosInRange : posInRange s self.pos

            pos is in range of substring s

          • isValidPos : String.Pos.Valid s.str self.pos

            pos is valid string position in s.str

          • isValidSubstring : s.Valid

            s is a valid substring

          Instances For
            Equations
            • One or more equations did not get rendered due to their size.
            Equations
            • One or more equations did not get rendered due to their size.
            Equations
            • «at».toSlotEntry = «at».pos,
            Instances For
              Equations
              Instances For
                def BoundedBacktracker.CharPos.create (s : Regex.ValidSubstring) («at» : Regex.ValidPos s.val.str) (h : s.val.startPos «at».val «at».val s.val.stopPos) :
                CharPos s.val

                create a CharPos from s and position «at»

                Equations
                • One or more equations did not get rendered due to their size.
                Instances For
                  theorem BoundedBacktracker.CharPos.valid_prev {s : Substring} (cp : CharPos s) :
                  String.Pos.Valid s.str (s.str.prev cp.pos)

                  to prev position of cp

                  Equations
                  • One or more equations did not get rendered due to their size.
                  Instances For
                    Equations
                    Instances For
                      @[irreducible]
                      Equations
                      • One or more equations did not get rendered due to their size.
                      Instances For
                        theorem BoundedBacktracker.CharPos.valid_next {s : Substring} (cp : CharPos s) (h : cp.pos < s.stopPos) :
                        String.Pos.Valid s.str (s.str.next cp.pos)

                        to next position of cp

                        Equations
                        • One or more equations did not get rendered due to their size.
                        Instances For

                          is CharPos at start position

                          Equations
                          • cp.atStart = decide (cp.pos = s.startPos)
                          Instances For

                            is CharPos at stop position

                            Equations
                            Instances For

                              Represents a stack frame on the heap while doing backtracking.

                              Instances For
                                Equations
                                • One or more equations did not get rendered due to their size.

                                A representation of "half" of a match reported by a DFA.

                                Instances For
                                  Equations
                                  • One or more equations did not get rendered due to their size.
                                  @[reducible, inline]

                                  The stack of frames

                                  Equations
                                  Instances For
                                    @[inline]
                                    def BoundedBacktracker.Stack.push {n : Nat} {s : Substring} (stack : Stack n s) (v : Frame n s) :
                                    Stack n s

                                    Push frame to stack

                                    Equations
                                    • stack.push v = v :: stack
                                    Instances For
                                      @[inline]
                                      def BoundedBacktracker.Stack.pop? {n : Nat} {s : Substring} (stack : Stack n s) :
                                      Option (Frame n s × Stack n s)

                                      pop head frame from stack

                                      Equations
                                      Instances For
                                        theorem BoundedBacktracker.Stack.pop?_some_lt {n : Nat} {s : Substring} {a : Frame n s} {s1 : Stack n s} (s✝ : Stack n s) (h : s✝.pop? = some (a, s1)) :
                                        @[inline]
                                        def BoundedBacktracker.Stack.append {n : Nat} {s : Substring} (stack1 stack2 : Stack n s) :
                                        Stack n s

                                        append stacks

                                        Equations
                                        • stack1.append [v1, v2] = v2 :: v1 :: stack1
                                        • stack1.append stack2 = (List.reverse stack2).append stack1
                                        Instances For
                                          @[inline]
                                          def BoundedBacktracker.Stack.toStack {n : Nat} {s : Substring} (arr : Array (Frame n s)) :
                                          Stack n s
                                          Equations
                                          Instances For
                                            @[reducible, inline]

                                            The encoded set of (Fin n, HaystackOffset) pairs that have been visited by the backtracker within a single search. Optimization: encode as bits

                                            Equations
                                            Instances For
                                              @[reducible, inline]

                                              SlotEntry consists of slot, group and char pos in s

                                              Equations
                                              Instances For
                                                Equations
                                                • One or more equations did not get rendered due to their size.
                                                Instances For

                                                  State of the backtracking search

                                                  • stack : Stack n input

                                                    Stack used on the heap for doing backtracking

                                                  • visited : ST.Ref Nat Visited

                                                    The encoded set of (Fin n, HaystackOffset) pairs that have been visited by the backtracker within a single search.

                                                  • countVisited : Nat

                                                    count of pairs that have been visited

                                                  • sid : Fin n

                                                    current state

                                                  • at : CharPos input

                                                    position in input string

                                                  • slots : Array (SlotEntry input)

                                                    slots, positions of captures in input

                                                  • slotsValid : Slots.Valid self.slots = true

                                                    slots are valid

                                                  • recentCaptures : Array (Option (String.Pos × String.Pos))

                                                    recent captures of capture groups

                                                  • logEnabled : Bool

                                                    is logging enabled

                                                  • msgs : Array String

                                                    log msgs

                                                  • halfMatch : Option HalfMatch

                                                    HalfMatch

                                                  • backtracks : Nat

                                                    count of backtracks

                                                  Instances For
                                                    theorem BoundedBacktracker.SearchState.ext {n : Nat} {input : Substring} {x y : SearchState n input} (stack : x.stack = y.stack) (visited : x.visited = y.visited) (countVisited : x.countVisited = y.countVisited) (sid : x.sid = y.sid) («at» : x.at = y.at) (slots : x.slots = y.slots) (recentCaptures : x.recentCaptures = y.recentCaptures) (logEnabled : x.logEnabled = y.logEnabled) (msgs : x.msgs = y.msgs) (halfMatch : x.halfMatch = y.halfMatch) (backtracks : x.backtracks = y.backtracks) :
                                                    x = y
                                                    def BoundedBacktracker.SearchState.fromNfa (nfa : NFA.Checked.NFA) (input : Regex.ValidSubstring) («at» : Regex.ValidPos input.val.str) (logEnabled : Bool) (h : 0 < nfa.n input.val.startPos «at».val «at».val input.val.stopPos) :
                                                    SearchState nfa.n input.val

                                                    create the SearchState from an NFA

                                                    Equations
                                                    • One or more equations did not get rendered due to their size.
                                                    Instances For
                                                      theorem BoundedBacktracker.SearchState.slots_valid_elem_eq_pair {s : Substring} (slots : Array (SlotEntry s)) (h : Slots.Valid slots = true) (i : Fin slots.toList.length) :
                                                      slots.toList[i].fst = i slots.toList[i].snd.fst = (↑i).div 2
                                                      theorem BoundedBacktracker.SearchState.slots_valid_elem_eq {s : Substring} (slots : Array (SlotEntry s)) (h : Slots.Valid slots = true) (i : Fin slots.toList.length) :
                                                      ∃ (v : Option (CharPos.PosInRange s)), slots.toList[i] = (i, (↑i).div 2, v)
                                                      theorem BoundedBacktracker.SearchState.slots_of_map_valid {s : Substring} (slots : Array (SlotEntry s)) (h : Slots.Valid slots = true) (f : SlotEntry sSlotEntry s) (hf : ∀ (x : SlotEntry s), (f x).fst = x.fst (f x).snd.fst = x.snd.fst) :
                                                      def BoundedBacktracker.Visited.index {n : Nat} {s : Substring} (sid : Fin n) (cp : CharPos s) :

                                                      get encoded index of Fin n and CharPos in visited array

                                                      Equations
                                                      Instances For
                                                        theorem BoundedBacktracker.Visited.eq_of_linear_eq {a1 b1 a2 b2 x : Nat} (hb1 : b1 < x) (hb2 : b2 < x) (h : a1 * x + b1 = a2 * x + b2) :
                                                        a1 = a2 b1 = b2
                                                        theorem BoundedBacktracker.Visited.index_injective {n : Nat} {s : Substring} (sid1 sid2 : Fin n) (cp1 cp2 : CharPos s) (heq : index sid1 cp1 = index sid2 cp2) :
                                                        sid1 = sid2 cp1.pos.byteIdx = cp2.pos.byteIdx

                                                        index is injective.

                                                        Check if current Fin n and CharPos in SearchState is already visited. If not visited mark pair as visited.

                                                        Equations
                                                        • One or more equations did not get rendered due to their size.
                                                        Instances For

                                                          Check if current Fin n and CharPos in SearchState is already visited. If not visited mark pair as visited.

                                                          Equations
                                                          • One or more equations did not get rendered due to their size.
                                                          Instances For
                                                            theorem BoundedBacktracker.Visited.checkVisited'_false_lt {n : Nat} {input : Substring} {s1 : SearchState n input} (s : SearchState n input) (h : checkVisited' s = (false, s1)) :
                                                            s.countVisited < s1.countVisited
                                                            theorem BoundedBacktracker.Visited.checkVisited'_true_eq {n : Nat} {input : Substring} {s1 : SearchState n input} (s : SearchState n input) (h : checkVisited' s = (true, s1)) :
                                                            s.countVisited = s1.countVisited List.length s.stack = List.length s1.stack
                                                            theorem BoundedBacktracker.capture_groups_eq_of_consecutive_slots {s : Substring} {i s0 g0 : Nat} {v0 : CharPos.PosInRange s} {s1 g1 : Nat} {v1 : CharPos.PosInRange s} (slots : Array (SlotEntry s)) (valid : SearchState.Slots.Valid slots = true) (h : ¬i % 2 = 0) (h0 : slots.get? (i - 1) = some (s0, g0, some v0)) (h1 : slots.get? i = some (s1, g1, some v1)) :
                                                            g0 = g1

                                                            capture groups of consecutive slots are equal

                                                            theorem BoundedBacktracker.withMsg_eq {input : Substring} {nfa : NFA.Checked.NFA} {s s1 : SearchState nfa.n input} {msg : UnitString} (h : BoundedBacktracker.withMsg✝ msg s = s1) :
                                                            s.countVisited = s1.countVisited s.stack = s1.stack
                                                            theorem BoundedBacktracker.step_match_countVisited_eq {n : Nat} {s : Substring} {p : NFA.PatternID} (s1 s2 : SearchState n s) (h : BoundedBacktracker.step_match✝ p s1 = s2) :
                                                            s1.countVisited = s2.countVisited
                                                            theorem BoundedBacktracker.toNextStep_countVisited_eq {s : Substring} (nfa : NFA.Checked.NFA) (state : NFA.Checked.State nfa.n) (s1 s2 : SearchState nfa.n s) (h : BoundedBacktracker.toNextStep✝ nfa state s1 = s2) :
                                                            s1.countVisited = s2.countVisited
                                                            theorem BoundedBacktracker.toNextStep_eq {input : Substring} {nfa : NFA.Checked.NFA} {state : NFA.Checked.State nfa.n} {s s1 : SearchState nfa.n input} {msg : UnitString} (h : BoundedBacktracker.toNextStep✝ nfa state (BoundedBacktracker.withMsg✝ msg s) = s1) :
                                                            s.countVisited = s1.countVisited
                                                            theorem BoundedBacktracker.toNextStepChecked_true_lt {input : Substring} (nfa : NFA.Checked.NFA) (s s1 : SearchState nfa.n input) (h : BoundedBacktracker.toNextStepChecked✝ nfa s = (true, s1)) :
                                                            s.countVisited < s1.countVisited
                                                            theorem BoundedBacktracker.toNextStepChecked_false_eq {input : Substring} (nfa : NFA.Checked.NFA) (s s1 : SearchState nfa.n input) (h : BoundedBacktracker.toNextStepChecked✝ nfa s = (false, s1)) :
                                                            s.countVisited = s1.countVisited s.stack = s1.stack
                                                            def BoundedBacktracker.steps {s : Substring} (nfa : NFA.Checked.NFA) (state : SearchState nfa.n s) :
                                                            SearchState nfa.n s

                                                            execute next steps in NFA.

                                                            Equations
                                                            Instances For
                                                              @[irreducible]
                                                              Equations
                                                              • One or more equations did not get rendered due to their size.
                                                              Instances For
                                                                @[irreducible]
                                                                theorem BoundedBacktracker.steps_loop_le {input : Substring} (nfa : NFA.Checked.NFA) (s s1 : SearchState nfa.n input) (h : steps.loop nfa s = s1) :
                                                                s.countVisited s1.countVisited
                                                                theorem BoundedBacktracker.steps_lt_or_eq_lt {input : Substring} (nfa : NFA.Checked.NFA) (s s1 : SearchState nfa.n input) (h : steps nfa s = s1) :
                                                                s.countVisited < s1.countVisited s.countVisited = s1.countVisited List.length s.stack = List.length s1.stack
                                                                theorem BoundedBacktracker.toNextFrameStep_true_lt_or_eq_lt {input : Substring} (nfa : NFA.Checked.NFA) (s s1 : SearchState nfa.n input) (h : BoundedBacktracker.toNextFrameStep✝ nfa s = (true, s1)) :
                                                                s.countVisited < s1.countVisited s.countVisited = s1.countVisited List.length s.stack = List.length s1.stack
                                                                theorem BoundedBacktracker.toNextFrameRestoreCapture_true_lt_or_eq_lt {s : Substring} {n : Nat} {s1 : SearchState n s} (slot : Nat) (offset : Option (CharPos.PosInRange s)) (stack : Stack n s) (s✝ : SearchState n s) (h : BoundedBacktracker.toNextFrameRestoreCapture✝ slot offset stack s✝ = (true, s1)) :
                                                                s✝.countVisited = s1.countVisited stack = s1.stack
                                                                theorem BoundedBacktracker.toNextFrame_true_lt {input : Substring} (nfa : NFA.Checked.NFA) (s s1 : SearchState nfa.n input) (h : BoundedBacktracker.toNextFrame✝ nfa s = (true, s1)) :
                                                                s.countVisited < s1.countVisited s.countVisited = s1.countVisited List.length s1.stack < List.length s.stack

                                                                BoundedBacktrack search

                                                                Equations
                                                                • One or more equations did not get rendered due to their size.
                                                                Instances For
                                                                  @[irreducible]
                                                                  Equations
                                                                  • One or more equations did not get rendered due to their size.
                                                                  Instances For
                                                                    def BoundedBacktracker.matches (s : Regex.ValidSubstring) («at» : Regex.ValidPos s.val.str) (nfa : NFA.Checked.NFA) (logEnabled : Bool) :
                                                                    Array String × Array (Option { m : Regex.ValidSubstring // m.val.str = s.val.str })

                                                                    Search for the first match of this regex in the haystack given and return log msgs and the matches of each capture group.

                                                                    Equations
                                                                    • One or more equations did not get rendered due to their size.
                                                                    Instances For