TylerScript

Verbalex - Regex with the reader and writer in mind

July 20, 2019

TLDR;

I created an Elixir library for verbally expressing and composing regular expressions. This is only a short post demonstrating what it can do, but if you’re not in the mood for reading you can find it here!


Why did I write this library?

For a few years now I’ve maintained a chatbot project which, to my surprise, managed to get a number of regular users. It was my first time writing Ruby, which is partly why the repo is private - it’s best for everybody’s wellbeing. When I check on it every so often because a bug comes up or I want to add some functionality, this happens:

Careful

It’s a small project, it doesn’t take all that long to regain my bearings despite it’s issues, but servicing the regular expressions it relies on has always been a headache.

Some people, when confronted with a problem, think “I know, I’ll use regular expressions.” Now they have two problems.

Regular expressions are notoriously read once, write once. Especially the ones I hacked together and hotfixed a hundred times over early in my programming career.

This problem has been largely addressed with solutions like Simple Regex & Verbal Expressions, so while exploring a re-write of Rosterbot I went searching for the Elixir implementation, but I found it hadn’t been maintained since Elixir v0.10.1. I saw an opportunity, so I took it - thanks to Max Szengel for laying down the groundwork!

How to use Verbalex

Verbalex is essentially a port of Verbal Expressions, but I decided not to implement it function-for-function. It focuses on composing the regular expressions themselves, and leaves Regex do the heavy lifting for utilising them.

Let’s see how it fares on a classic example.

A Regular Expression for Emails

Matching email addresses is a pretty common regex task, and one you’ll find a number of implementations for… let’s go ahead and add to the pile. We’ll interpret this example from regular-expressions.info:

~r/\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}\b/

This regular expression, I claim, matches any email address.

Good enough for me. Let’s do it.

First up, we’ll put in the word boundaries bookending the expression:

alias Verbalex, as: Vlx

def email_regex do
  ""
  |> Vlx.word_boundary()
  # loading...
  |> Vlx.word_boundary()
end

Easy enough! Now, we could do the rest of our expression in this function, but to demonstrate Verbalex using composition let’s break it down into sections. An email consists of two main parts:

  • Local-part
  • Domain

We’ll define them both as private functions for email_expr/0:

defp local_part(before) do
  local =
    ""
    |> Vlx.anything_in(class: :alnum, string: "._%+-")
    |> Vlx.one_or_more()

  "#{before}#{local}"
end

defp domain(before) do
  domain =
    ""
    |> Vlx.anything_in(class: :alnum, string: ".-")
    |> Vlx.one_or_more()
    |> Vlx.then(".")
    |> Vlx.anything_in(class: :alpha)
    |> Vlx.occurs_at_least(2)

  "#{before}#{domain}"
end

In order to include these functions in our email address pipeline, we accomodate the regex strings coming before our functions are called and concatenate them accordingly. This is much the same way Verbalex is implemented under the hood. Also worth noting, you can see in my calls to anything_in/2 that I’ve included support for and documented all the named character classes that Elixir’s Regex module provides.

Hopefully at this point, even without a background using regular expressions, the readability in writing them this way allows you to follow what’s going on with relative ease.

With those in place, we can finish off our main function:

def email_regex do
  ""
  |> Vlx.word_boundary()
  |> local_part()
  |> Vlx.then("@")
  |> domain()
  |> Vlx.word_boundary()
  |> Regex.compile!
end

email_regex()
# ~r/\x08[[:alnum:]._%+-]+(?:@)[[:alnum:].-]+(?:\.)[[:alpha:]]{2,}\x08/

I find the implementation of email_regex/0 far easier to reason about than its output. If it’s not your cup of tea - that’s fine, too. When properly understood standard regex can be read like any other syntax while coming with the benefit of being incredibly terse. For myself at least, it’s a breath of fresh air.

To wrap up, it’s worth pointing out that Verbalex is the first library I’ve ever written. I welcome all constructive feedback, issues, and pull requests from anybody who might like to contribute - it’s an exciting time to be in the Elixir community with so many tools still to be built. Thanks for reading!


Tyler Barker

Personal blog by Tyler Barker. I'm a Software Engineer from Australia, currently writing Elixir at Amplified AI.