Skip to content

Latest commit

 

History

History

3_4_regex_parsing

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

Step 3.4: Regular expressions and custom parsers

Estimated time: 1 day

Regular expressions

To operate with regular expressions there is regex crate in Rust ecosystem.

Important to know, that in Rust regular expression needs to be compiled before we can use it. The compilation is not cheap. So, the following code introduces a performance problem:

fn is_email(email: &str) -> bool {
    let re = Regex::new(".+@.+").unwrap();  // compiles every time the function is called
    re.is_match(email)
}

To omit unnecessary performance penalty we should compile regular expression once and reuse its compilation result. This is easily achieved by using lazy_static crate both in global and/or local scopes:

lazy_static! {
    static ref REGEX_EMAIL: Regex = Regex::new(".+@.+").unwrap();
}  // compiles once on first use 

fn is_email(email: &str) -> bool {
    REGEX_EMAIL.is_match(email)
}

This may feel different with how regular expressions are used in other programming languages, because some of them implicitly cache compilation results and/or do not expose compilation API at all (like PHP). But if your background is a language like Go or Java, this concept should be familiar to you.

Custom parsers

If regular expressions are not powerful enough for your parsing problem, then you are ended up with writing your own parser. Rust ecosystem has numerous crates to help with that:

For better understanding parsing problem and approaches, read through the following articles:

Task

Given the following Rust fmt syntax grammar:

format_spec := [[fill]align][sign]['#']['0'][width]['.' precision][type]
fill := character
align := '<' | '^' | '>'
sign := '+' | '-'
width := count
precision := count | '*'
type := identifier | '?' | ''
count := parameter | integer
parameter := argument '$'

Implement a parser to parse sign, width and precision from a given input (assumed to be a format_spec).

Provide implementations in two flavours: regex-based and via building a custom parser.

Prove your implementation correctness with tests.