AutomataZoo is an automata processing benchmark suite and is an improved version of ANMLZoo.
If you have any questions or comments, please contact [email protected] or create an Issue Ticket.
High performance automata-processing engines are traditionally evaluated against a limited set of regular expression rulesets. These serve as valid, real-World example use cases, but they only represent a small proportion of all automata-based applications. With the recent availability of architectures and software frameworks for automata processing, many new applications have been discovered that benefit from automata processing. These demonstrate a broad variety of characteristics that differ from prior regular expression-based applications, and warrant their own benchmarks.
AutomataZoo improves upon ANMLZoo in several ways:
-
The suite of benchmarks is not standardized to a particular architecture, and does not inherit the same architectural biases as ANMLZoo.
-
The benchmarks implement full kernels, which allows for comparisons between automata and non-automata approaches.
-
The suite includes open-source tools for generating benchmark automata and inputs of various sizes, allowing for design space explorations.
- Snort: A widely used network intrusion detection system.
- ClamAV: A virus-detection tool that relies on a publicly-available database of malware patterns.
- Protomata: An automata-based application that searches for a set of 1309 protein motif patterns from the PROSITE database.
- Brill Tagging: A rule-based approach to part-of-speech tagging.
- Random Forest: A machine learning model based on ensembles of decision trees.
- Hamming Distance: A string-scoring kernel that accepts inputs that are within a set hamming distance of a configured pattern.
- Levenshtein Distance: A string scoring kernel that accepts inputs that are whtin a set edit distance of a configured pattern.
- Sequence Matching: An automata application that counts sorted sequences of item sets to identify frequently-occurring sets.
- Entity Resolution: An automata application that attempts to find duplicate entries in a streaming database.
- CRISPR/Cas9: An automata application that enabled gene editing by identifying targeted locations.
- YARA: An automata application that discovers malware described in the YARA malware pattern description language.
- File Carving: An automata application that identifies files in a stream of input bytes.
- Pseudo Random Number Generation (PRNG): An automata application that models Markov Chains with finite automata to generate high-throughput PRNG streams.
Jack Wadden
Tommy Tracy II
Elaheh Sadredini
Lingzi Wu
Chunkun Bo
Jesse Du
Yizhou Wei
Matthew Wallace
Jeffrey Udall
Vinh Dang
Deyuan Guo
Ke Wang
Nathan Brunelle
Matt Grimm
If you use this benchmark suite in a publication, please use the following citation:
Wadden, J., Tracy II, T., Sadredini, E., Wu, L., Bo, C., Du, J., Wei, Y., Wallace, M., Udall, J., Stan, M., and Skadron, K. "ANMLZoo: A Benchmark Suite for Exploring Bottlenecks in Automata Processing Engines and Architectures." 2018 IEEE International Symposium on Workload Characterization (IISWC'18). IEEE, 2018.
@inproceedings{ANMLZoo,
title={{ AutomataZoo: A Modern Automata Processing Benchmark Suite}},
author={Wadden, Jack and Tracy II, Tom and Sadredini, Elaheh and Wu, Lingzi and Bo, Chunkun and Du, Jesse and Wei, Yizhou and Wallace, Matthew and Udall, Jeffrey and Stan, Mircea and Skadron, Kevin},
booktitle={Proceedings of the IEEE International Symposium on Workload Characterization (IISWC)},
year={2018},
}
Each benchmark and automata processing engine in AutomataZoo is individually licensed. Please refer to the benchmark directories for individual license files.