Estimated time: 1 day
To operate with regular expressions there is regex crate in Rust ecosystem.
Important to know, that in Rust regular expression needs to be compiled before we can use it. The compilation is not cheap. So, the following code introduces a performance problem:
fn is_email(email: &str) -> bool {
let re = Regex::new(".+@.+").unwrap(); // compiles every time the function is called
re.is_match(email)
}
To omit unnecessary performance penalty we should compile regular expression once and reuse its compilation result. This is easily achieved by using lazy_static crate both in global and/or local scopes:
lazy_static! {
static ref REGEX_EMAIL: Regex = Regex::new(".+@.+").unwrap();
} // compiles once on first use
fn is_email(email: &str) -> bool {
REGEX_EMAIL.is_match(email)
}
This may feel different with how regular expressions are used in other programming languages, because some of them implicitly cache compilation results and/or do not expose compilation API at all (like PHP). But if your background is a language like Go or Java, this concept should be familiar to you.
If regular expressions are not powerful enough for your parsing problem, then you are ended up with writing your own parser. Rust ecosystem has numerous crates to help with that:
- nom is a parser combinator library. Nearly most performant among others. Especially good for parsing binary stuff (byte/bit-oriented).
- pest is a general purpose parser, which uses PEG (parsing expression grammar) as input and derives parser's code for it.
- lalrpop is a parser generator framework, which generates LR(1) parser code from custom grammar files.
- combine is an another parser combinator library, inspired by the Haskell library Parsec.
For better understanding parsing problem and approaches, read through the following articles:
Given the following Rust fmt
syntax grammar:
format_spec := [[fill]align][sign]['#']['0'][width]['.' precision][type]
fill := character
align := '<' | '^' | '>'
sign := '+' | '-'
width := count
precision := count | '*'
type := identifier | '?' | ''
count := parameter | integer
parameter := argument '$'
Implement a parser to parse sign
, width
and precision
from a given input (assumed to be a format_spec
).
Provide implementations in two flavours: regex-based and via building a custom parser.
Prove your implementation correctness with tests.