## A basic guide to regular expressions

Wikipedia defines a regular expression as “a sequence of characters that specifies a match pattern in text”.

Another way is just to think of it as a way of describing text.

For example, a UK number plate currently has the format of 2 letters, followed by 2 numbers, followed by 3 letters. As a regular expression this can be written as:

[A-Z]{2}[0-9]{2}[A-Z]{3}

So this would match the string AB12CDE but would not match AB1CDE

Strangely this would also match ABC12DEFG because it contains the correct pattern (BC12DEF). If you want to exclude such strings then you can put a ^ at the start and a $ at the end:

^[A-Z]{2}[0-9]{2}[A-Z]{3}$

^ is a token for the start of a string and $ is a token for the end of the string.

[] is used to specify a set of characters or a range…

[a-zA-Z] means any lower or upper case character

[0-5] means any digit in the set 0, 1, 2, 3, 4, 5 and is the same as [012345]

[a6gW] means any character in the set a, 6, g or W

{} is used to specify the number of times the previous token should be repeated, {N} just means N times but {N, M} means between N and M times (inclusive). You can also use the simple + for 1 or more or * for 0 or more (subtle difference!).

e.g [A-Z]+[0-9]+[A-Z]+ would match ABC123DEF but would not match ABC123, on the other hand [A-Z]+[0-9]+[A-Z]* would match both.

There are a few short cuts such as \d to represent [0-9] but these can vary depending on whether you’re working in perl or java.

BardecodeFiler uses regular expressions in the Reformat Table, which is a way of changing the information in a barcode into a file name. One extra concept you need to understand to use this is grouping using ().

In our original number plate example you could divide it into 3 groups:

^([A-Z]{2})([0-9]{2})([A-Z]{3})$

So for the string AB12CDE this would mean that AB was the first group, 12 the second and CDE the third.

You now have the power to pull a string apart and put it back together in a different order.

Back to BardecodeFiler and the Reformat table – it has a number of rows and each row has a left side and a right side. The left side is a regular expression and the right side is the new format where {1} is a token for group 1, {2} for group 2 etc.

For example, say we want to rewrite AB12CDE as CDE12AB

Left side: ^([A-Z]{2})([0-9]{2})([A-Z]{3})$

Right side: {3}{2}{1}