Skip to content

Latest commit

 

History

History
41 lines (22 loc) · 1.57 KB

README.md

File metadata and controls

41 lines (22 loc) · 1.57 KB

Regex Compressor

Download

Regex Compressor is a Java library intended to take a list of strings and generate a compact, Java-compatible regex that matches those exact strings.

Usage

import static com.gleenn.regex_compressor.RegexCompressor.*;

This example shortens the regex by two characters, and also improves the regex performance because it won't have to check the "aaaa" prefix 4 times when matching "aaaad", only once. This can have a large impact on performance. compress(asList("aaaab", "aaaac", "aaaad", "aaaae")) => "aaaa[bcde]"

compress(asList("foo", "bar", "baz")) => "foo|ba[rz]"

Use the pattern method instead of compress to generate a Java Pattern directly.

Notes

This should have improved regex performance on large numbers of strings. It uses a trie data structure to compress common prefixes.

Works with Unicode!

Insertion order matters, it can affect the result of matches if you care about more than just does the string match true/false.

Future work

  • Use radix trees to improve performance when compressing long prefixes
  • Compress suffixes (although this is pretty non-trivial)
  • Use vistor pattern to not expose internals

Related work

This project is based on a Clojure implementation called Frak.

Radis tree implemented in Java by thegedge.

Licence

Eclipse Public License - v 1.0