Skip to content

gagabla/GenRegEx

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GenRegEx

GenRegEx is a generic regular expression matching engine implemented in C#. It can work on sequences of any kind of tokens, you only have to provide a helper object, that can decide whether two tokens are equal or not.

Once a pattern has been created, matching is very fast, in some cases one million times as fast as the native regular expressions in C#.

Features

The GenRegEx engine does only implement a small subset of the regular expression functionallity you might be used to from other string-only-implementations like PCRE. The following features are available:

  • Matching single tokens of any kind
  • Tie up a sequence of tokens to a group
  • Repetition of tokens or groups (zero or more (*), one or more (+))
  • Optional tokens or groups (can occur once (?))
  • Greedy/not greedy repetition
  • Match from start (^), match till end ($)
  • Building patterns by code
  • Parsing patterns from string, rendering patterns as string

Background

This implementation is heavily inspired by the article Regular Expression Matching: the Virtual Machine Approach from Russ Cox. The pattern is compiled into a program (a sequence of instructions) that are then beeing processed by a virtual processor simulating multiple threads for the different possibilies to match the pattern.

Todos

  • Supply a neat interface (using IEnumerator)
  • Collect group-matching information and deliver it in the resulting match
  • Support explicit repetitions ({N}, {,N}, {N,} and {X,Y})
  • Support alternatives (A | B)
  • Support token classes ([A B])

About

Generic regular expressions implemented in C#

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages