Just enough regex to get going

A beginner's guide to using regex

·

4 min read

Regex can be pretty intimidating to learn, but I've discovered that a little knowledge can go a long way. I'm far from an expert, but I frequently get real value from using regex. I've used it both within my code, and to manipulate code (or other text).

5 key pieces of syntax

  • The first regex syntax to know is that you can just type stuff. Unless it's a special character with a particular meaning in regex, it will match what you've typed. If you want to match the word apple, your regex should simply be apple.
  • Use a \ to escape special characters. If you want to match arr[0] your regex should be arr\[0\].
  • Use .* to match any number of non-whitespace characters. If you want to match all text like obj.property but regardless of what property is, your regex should be obj\..*.
    • If you only want to capture English alphanumeric characters, you can use [A-Za-z0-9]* instead.
  • Use parentheses to capture a group. I often use this in combination with the above syntax - i.e. (.*) or ([A-Za-z0-9]*).
  • Use $1, $2, etc. to use capture groups within a replacement. This means you can use the text which matches the .*/[A-Za-z0-9]* part in a replacement (or whatever else you've put in brackets ()).

An example using all the above

Let's say you're working with javascript code that has a mix of styles for accessing properties. Sometimes it's bracket notation (obj['property']) and in other places it's dot notation (obj.property).

const obj = { foo: 4, bar: 7 };
obj['foo'] = 5; // single quotes
obj["foo"]++; // double quotes
console.log(obj.foo); // dot notation
let baz = obj.bar;
baz += obj['bar'];
console.log(baz + obj.bar);

You want to introduce some consistency, but it would be a big job to go through your whole codebase changing the notation in every place. Let's say you decide you want to use dot notation everywhere.

To start with you would use regex to find all uses of bracket notation:

obj\[['"](.*)['"]\]

This can be broken down as follows:

SyntaxExplanation
objmatches the letters "obj"
\[matches a [ character
['"]matches an apostrophe ' or a quote "
(.*)match any non-whitespace characters, and store them in a capture group so we can do something with them later - in our scenario this will be the object's property
['"]matches an apostrophe ' or a quote "
\]matches a ] character

You could then replace them with dot notation by using this regex:

obj.$1

This can be broken down as follows:

SyntaxExplanation
objthe letters "obj"
.the full stop/period character . does not need escaping in the replacement regex
$1the content of the capture group from the first regex - in our scenario this will be the name of the object's property

Top tips for learning regex

  • Use regexr or similar websites to test your regex. (You can view the above example on regexr and play around with it.)
  • There's loads more to learn, but start simple so that you can get value from using regex in simple use cases. Learn more when you need it.
  • If copying a regex from online, try to understand it. This is a good way to learn new (to you) features/syntax.
  • Syntax is different for different engines. Don't worry about learning all the variations to start with, just find the one which is most useful for you.

Conclusion

Starting simple with these 5 pieces of regex syntax has given me confidence to use regex in my everyday work. Despite only scratching the surface of all that's possible, it's given me sufficient knowledge to understand and use regex well enough that I can get business value from it. It also serves as a foundation that I can build on with more advanced regex knowledge.

If this inspires you to try using regex more, leave a comment and let us know how it goes for you!