July 20, 2011

More on regular expressions–(another reason why PowerShell beats VBscript)

For the last few weeks I’ve been settling into a new job in a small company which writes software; and that has meant getting used to some new tools. One of these is an issue tracking system named JIRA. This post isn’t about JIRA, except to say it generates a lot of e-mails – what this post is about is parsing standardized blocks of text and JIRA’s subject lines provide a good example.  If we create an ‘issue’ to look at how some data gets processed we might end up with pile of mails with subject lines like this.

[JIRA] Created: (Foo-164) Test Run – Data type # 2
[JIRA] Commented: (Foo-164) Test Run – Data type # 2
[JIRA] Updated: (Foo-164) Test Run – Data type # 2
[JIRA] Assigned: (Foo-164) Test Run – Data type # 2
[JIRA] Resolved: (Foo-164) Test Run – Data type # 2
[JIRA] Closed: (Foo-164) Test Run – Data type # 2

JIRA accounts for more than half of my messages, so I set up an Outlook rule to move them to their own folder. The format of the message is standardized.

“[JIRA]” Event type “(“ project-ID “-” counter “)” issue-description

So it is pretty east to have a rule which looks for “[JIRA]” and moves messages to a new folder, but the format doesn’t help me sort and group messages – Outlook and Exchange can’t tell that the 6 messages above should be a single conversation. Sorting by date muddles all the issues and doesn’t group by project (“Foo” in my example). Sorting by subject groups all the created together, all the closed together and so on which is no better.  So I came up with the idea of rewriting the subject line – which naturally I did from PowerShell to begin with: First I had to get the JIRA folder in my inbox. The method for this hasn’t changed since the first version of Outlook, even though the languages come and go.

$ns = (New-Object -ComObject outlook.application).getNameSpace("MAPI")
$ExchStore = $ns.stores | where-object {$_.ExchangeStoreType -eq 0}
$jiraFolder = $ExchStore.folders.item('Inbox').folders.item('JIRA')

The next bit does the work. I wrote it as ONE line of PowerShell but I’ve spaced it out here for easy reading

foreach ($item in $jiraFolder.Items) {
$item.Subject = $item.Subject -replace "^\[JIRA\](.*?)\((\w*-\d*)\)",'$2 $1'
    $item.Save() }

The replace operation needs what a friend of mine calls a “Paddington hard stare” to understand it. 

  • “Look for the start of a line followed by “[JIRA]” ” is coded as   ^\[JIRA\]  
    The  [] characters have special meaning in a regex so need to be escaped with a \ sign.
  • “Any sequence of characters followed by “(” ” is coded .*\( 
    Like their square cousins, the () characters also have special meaning – which I’ll come to – and so they, too, need to be escaped with a \sign .
  • Regular expressions are naturally “greedy” so if my subject line had been
    “[JIRA] Closed: (Foo-164) Test Run ( Data type # 2)
    The term .*\( would match all the way up to the second ( character.
    Putting a ? symbol after the * tells it to to match using the shortest sequence of characters it can, so using  .*?\(  reduces the
    match to “ Closed: (”
  • Wrapping part of the expression in () saves it for later , so (.*?)\( will capture “Closed:” in this example
  • \w*-\d*\) means any number of “word” characters , a “-” character, any number of “digit” characters and a “)”  – which in my example will match  on “ Foo-164 )”
  • I can capture “Foo-164” by inserting () giving (\w*-\d*)\)

Putting the pieces together I get ^\[JIRA\](.*?)\((\w*-\d*)\) which matches on “[JIRA] Closed: (Foo-164)” and makes “ Closed: ” and “Foo-164”; available in the replacement text as $1 and $2 (note that PowerShell will process these as variables if the replacement text is wrapped in double quotes, so to ensure they reach regular expression parser single quotes are needed).  So the –replace operation replaces “[JIRA] Closed: (Foo-164)” with “FOO-164  Closed.” which is much better for sorting.
In practice I developed this a bit further by putting information into the message’s userProperties  but my initial prototype shows the idea.

I ran this against the messages I had already received and it worked very nicely. But it would only work as an on-going solution if I was happy to run my PowerShell script for every new mail. I’m not. So I need something that Outlook can run automatically – Macros (Functions written in Outlook’s VBA environment) can be invoked by telling the rule to Run a Script. I found that the Move-to-folder rule-operation needs to be moved into the script (if the rules engine moves the message, the script doesn’t work).  The line of PowerShell that was in the body of the ForEach loop in my example  turns into this:

strSubject = objItem.Subject  
If Left(strSubject, 6) = "[JIRA]" Then
    openParen = InStr(strSubject, "(")
    closeParen = InStr(openParen, strSubject, ")")
    StrJIRASubject = Mid(strSubject, (closeParen + 2))
    strJIRAAction = Mid(strSubject, 8, (openParen - 9))
    strJIRAID = Mid(strSubject, (openParen + 1), (closeParen - openParen - 1))
    objItem.Subject = strJIRAID + " " + strJIRAAction + " " + StrJIRASubject
End If

You can see that here I have to check that line begins with [JIRA]: using –replace in  PowerShell makes no changes if the match isn’t found, so doesn’t need to check – but VB script might scramble a non-JIRA subject line.
Then the script finds the positions of the Parentheses.
Then it has to isolate the pieces of text after the closing one, before the opening one but after “[JIRA] ” and between the two.
Then it assembles these parts to make the new subject line and saves the message.

I know people find  regular expressions tough to follow but I’d say it was pretty hard to tell what is going on in the script, For example, take the line
strJIRAAction = Mid(strSubject, 8, (openParen - 9))
Why select from character 8 for openParen –9 characters ?  

[Answer: “[JIRA]” is 6 characters, character number 7 is a space so the word “Closed:” begins at  character 8.  From there up to the “(” is  openParen – 8 characters , we want to stop one before that so we read openParen –9 characters and store “closed: ”  in strJIRAAction] .
It takes time to work this out.  If you know a little of the language of regex it takes less time to see that ^\[JIRA\](.*?)\( will put “closed: ” into $1

But there’s more.  I alluded to the need to check for [JIRA] at the start of the subject line in VB script.  The –Replace operator in PowerShell “Fails safe” Suppose a future version of [JIRA] changes from “[JIRA] Closed: (Foo-164)” to “[JIRA] (Foo-164): Closed” , the regular expression no longer matches, so nothing is changed. The VB script continues to run and … well you can try to work out what my Macro will do. The iterative development we tend to do in PowerShell lets you enter
"[JIRA] Closed: (Foo-164) Test (type 2)" -match "^\[JIRA\].*\((.*)\)" ; $matches
as a command line and see and fix the problem.  Something you just can’t do so well with VBscript.


  1. I completely agree that Powershell is superior to VBscript and is normally easier to use. However, you can still use regular expressions in VBscript. I’ve just knocked up this quick example script to demonstrate:

    strSubject = “[JIRA] Closed: (Foo-164) Test Run – Data type # 2”
    WScript.Echo “Original: ” & strSubject

    Set objRE = New RegExp
    With objRE
    .Pattern = “^\[JIRA\](.*?)\((\w*-\d*)\)(.*)”
    .IgnoreCase = False
    End With

    If objRE.Test(strSubject) Then
    Set rxMatches = objRE.Execute(strSubject)
    strSubject = rxMatches(0).SubMatches(1) & rxMatches(0).SubMatches(0) & rxMatches(0).SubMatches(2)
    End If
    WScript.Echo “New : ” & strSubject

    Original: [JIRA] Closed: (Foo-164) Test Run – Data type # 2
    New : Foo-164 Closed: Test Run – Data type # 2

    It’s definitely not as pretty as the PowerShell version and it requires a slightly modified RegExp, but it does work 🙂

    Comment by Russ Pitcher — July 21, 2011 @ 9:35 am

