James O'Neill's Blog

July 27, 2011

A “Tail” command in PowerShell

Filed under: Powershell — jamesone111 @ 3:46 pm

I mentioned that I have been working for a small Software company and in this new role I’m having to work with Linux servers and MySQL. MySQL has proved rather better than I expected and Linux itself worse –at least from the perspective of being place for me to get work done (I’ll leave the reasons why for another time).  Our app generates quite a lot of log files, and so do some of the services it uses , and so most of time I’ll have at least one Window open running the Unix tail –f command (tail outputs the last few lines from a file, and –f tells it to follow the file –that is, to keep watching it and output anything added to it).  There should be one for Windows but a quick search didn’t turn one up – so I put one together with PowerShell which pulls some interesting techniques from more than one place.

Watching a file

The first thing that’s needed to give the follow functionality is the ability to know when the log file has changed. I found that James Brundage’s PowerShell Pack gave me ALMOST what I needed. Here’s my modified version

Function Register-FileSystemWatcher {
param(  [Parameter(ValueFromPipelineByPropertyName=$true,Position=0,Mandatory=$true)]
      
 [Alias('FullName')] [string]$Path,
                            [string]$Filter = "*",
                            [switch]$Recurse, 
        [Alias('Do')][ScriptBlock[]]$Process,
                                    $MessageData, 
                         [string[]]$On = @("Created", "Deleted", "Changed", "Renamed"))
begin   { $ValidEvents = [IO.FileSystemWatcher].GetEvents() |
                            select-object -ExpandProperty name
        }
process {
    $realItem = Get-Item -path $path -ErrorAction SilentlyContinue
   
if (-not $realItem) { return }
    if ($realItem -is [system.io.fileinfo]) {
           $Path = Split-Path $realItem.Fullname
         $Filter = Split-Path $realItem.Fullname -leaf }
   
else { $Path = $realItem.Fullname}
    $watcher = New-Object IO.FileSystemWatcher -Property @{
                        
Path=$path;
                      Filter=$filter; 
       IncludeSubdirectories=$Recurse}
    foreach ($o in $on) {
    
 if ($validEvents -Ccontains $o) { #Note CASE SENSITIVE CONTAINS...
       
foreach ($p in $process) {
           
if ($p){Register-ObjectEvent $watcher $o -Action $d -MessageData $MessageData}
        }
     
} 
      Else {Write-warning ("$o is an invalid event name and was ignored." +
           
     [Environment]::NewLine + "Names are case sensitive and valid ones are"
+
                 [Environment]::NewLine + ($ValidEvents -join ", ") + ".")
      }
    }
}
}

The Function takes a path (usually a folder but possibly a file which can be passed via the pipeline) a filter, and a –recurse switch which determine what to watch. If the path isn’t valid the function drops out, and if the path is a file it is split into name and folder parts – the name becomes the filter and the folder becomes the path.
The path, filter and recurse are used to create a FileWatcher object. A FileWatcher raises events , and Register-ObjectEvent hooks PowerShell up to these events: the cmdlet says “when this event happens on that object, run this code block”. Usefully – as I learnt from a post on Ravikanth Chaganti’s blog you can pass something to the registration for use later on – which I’ll come back to in when we see the function in use. I do a quick check to see the event passed is valid before trying to hook up the code to it, generating a warning.

The tail command I created is named “Get-Tail” for two reasons,
(a) What gets returned is the “tail of the file” so GET- is the right verb to use out of the standard ones
(b) If PowerShell can’t find a command name it tries Get-Name as an alias, in other words:
 Get-Tail can be invoked simply as tail. It looks like this

function Get-tail {
param ( $path,
      [int]$Last = 20,
      [int]$CharsPerLine = 500,
      [Switch]$follow
)
$item = (Get-item $path)
if (-not $item) {return}
$Stream = $item.Open([System.IO.FileMode]::Open,
                   [System.IO.FileAccess]::Read, 
                    [System.IO.FileShare]::ReadWrite)
$reader = New-Object System.IO.StreamReader($Stream)
if ($charsPerLine * $last -lt $item.length) {
       $reader.BaseStream.seek((-1 * $last * $charsPerLine) ,[system.io.seekorigin]::End)
}
$reader.readtoend() -split "`n" -replace "\s+$","" | Select-Object -last $Last | write-host
if ($follow) {
          $Global:watcher = Register-FileSystemWatcher -Path $path -MessageData $reader`
        
 -On "Changed" -Process {
               $event.MessageData.readtoend() -split "`n" -replace "\s+$","" | write-host }
      $oldConsoleSetting = [console]::TreatControlCAsInput 
     
[console]::TreatControlCAsInput = $true
      while ($true) {
         
if ([console]::KeyAvailable) {
                       $key = [system.console]::readkey($true) 
                       if (($key.modifiers -band [consolemodifiers]"control")and
                           ($key.key -eq "C")) {
                           
 write-host -ForegroundColor red "Terminating..." ; break } 
                       else { if ([int]$key.keyCHAR -eq 13) { [console]::WriteLine() }
                     
       else { [console]::Write($key.keyCHAR) }} }
                 else {Start-Sleep -Milliseconds 250} } 
     
[console]::TreatControlCAsInput = $oldConsoleSetting
      Unregister-Event $watcher.name
   } 
$Stream.Close() 
$reader.Close()
}

Opening a sharable file in PowerShell

The function takes 4 parameters, –path , -last a -follow switch and a –CharsPerLine which was a bit of an after thought. : the .Open() method is used to open the file as a Read-Only FileStream allowing writes by others; and a StreamReader object is created to read from this Stream.

By using the Reader’s .ReadToEnd() method I could be ready to read anything which is added to the end of the file, and  output the result splitting it on new lines and removing end-of-line spaces – all of which is not much than I could have done with Get-Content.
I added a refinement after realizing that I occasionally deal with log files which are over a Gigabyte in length. Not only will they take ages to read, but using .ReadToEnd() will try to read the whole file into memory which is just horrible. So I added a –CharsPerLine parameter – I multiply this by the number of lines I want to read and if the file is bigger than that I seek forward to that many bytes from the end the file, before calling .ReadToEnd(). The default is a generous 500, so if I request 2000 lines I’ll read 1MB of data which isn’t too terrible even if the average line length is only a few characters. If I’m reading 20,000 lines I might set the parameter lower, or if I know the lines are very long I might set it higher. Then everything is set up for the optional Follow part.

Who reads for the Watchers ?

I want to call the .ReadToEnd() method when the file changes and output everything up to the end of the file. The question is, how to have access to the StreamReader inside the script which runs when FileSystemWatcher fires its changed event ? This is where Ravikanth Chaganti’s tip comes in; by making the Reader the “Message Object” for the event, it can be referenced in the script block.  By the way because I don’t know what else might end up happening I force the output to the console throughout – though my normal custom is to avoid using write-host
{$event.MessageData.readtoend() -split "\s*`n" | write-host }

Taking Control of Control+C

Finally – to mimic the behaviour of tail on unix I trap keyboard input and pass it through to the console until the user presses [Ctrl][c].  This works much better in the “Shell” form of PowerShell than the ISE. When [Ctrl][c]. is pressed the function cleans up and exits.

Job done. 

Update: I’ve put the script here for download

July 20, 2011

More on regular expressions–(another reason why PowerShell beats VBscript)

Filed under: Powershell — jamesone111 @ 8:26 pm

For the last few weeks I’ve been settling into a new job in a small company which writes software; and that has meant getting used to some new tools. One of these is an issue tracking system named JIRA. This post isn’t about JIRA, except to say it generates a lot of e-mails – what this post is about is parsing standardized blocks of text and JIRA’s subject lines provide a good example.  If we create an ‘issue’ to look at how some data gets processed we might end up with pile of mails with subject lines like this.

[JIRA] Created: (Foo-164) Test Run – Data type # 2
[JIRA] Commented: (Foo-164) Test Run – Data type # 2
[JIRA] Updated: (Foo-164) Test Run – Data type # 2
[JIRA] Assigned: (Foo-164) Test Run – Data type # 2
[JIRA] Resolved: (Foo-164) Test Run – Data type # 2
[JIRA] Closed: (Foo-164) Test Run – Data type # 2

JIRA accounts for more than half of my messages, so I set up an Outlook rule to move them to their own folder. The format of the message is standardized.

“[JIRA]” Event type “(“ project-ID “-” counter “)” issue-description

So it is pretty east to have a rule which looks for “[JIRA]” and moves messages to a new folder, but the format doesn’t help me sort and group messages – Outlook and Exchange can’t tell that the 6 messages above should be a single conversation. Sorting by date muddles all the issues and doesn’t group by project (“Foo” in my example). Sorting by subject groups all the created together, all the closed together and so on which is no better.  So I came up with the idea of rewriting the subject line – which naturally I did from PowerShell to begin with: First I had to get the JIRA folder in my inbox. The method for this hasn’t changed since the first version of Outlook, even though the languages come and go.

$ns = (New-Object -ComObject outlook.application).getNameSpace("MAPI")
$ExchStore = $ns.stores | where-object {$_.ExchangeStoreType -eq 0}
$jiraFolder = $ExchStore.folders.item('Inbox').folders.item('JIRA')

The next bit does the work. I wrote it as ONE line of PowerShell but I’ve spaced it out here for easy reading

foreach ($item in $jiraFolder.Items) {
   
$item.Subject = $item.Subject -replace "^\[JIRA\](.*?)\((\w*-\d*)\)",'$2 $1'
    $item.Save() }

The replace operation needs what a friend of mine calls a “Paddington hard stare” to understand it. 

  • “Look for the start of a line followed by “[JIRA]” ” is coded as   ^\[JIRA\]  
    The  [] characters have special meaning in a regex so need to be escaped with a \ sign.
  • “Any sequence of characters followed by “(” ” is coded .*\( 
    Like their square cousins, the () characters also have special meaning – which I’ll come to – and so they, too, need to be escaped with a \sign .
  • Regular expressions are naturally “greedy” so if my subject line had been
    “[JIRA] Closed: (Foo-164) Test Run ( Data type # 2)
    The term .*\( would match all the way up to the second ( character.
    Putting a ? symbol after the * tells it to to match using the shortest sequence of characters it can, so using  .*?\(  reduces the
    match to “ Closed: (”
  • Wrapping part of the expression in () saves it for later , so (.*?)\( will capture “Closed:” in this example
  • \w*-\d*\) means any number of “word” characters , a “-” character, any number of “digit” characters and a “)”  – which in my example will match  on “ Foo-164 )”
  • I can capture “Foo-164” by inserting () giving (\w*-\d*)\)

Putting the pieces together I get ^\[JIRA\](.*?)\((\w*-\d*)\) which matches on “[JIRA] Closed: (Foo-164)” and makes “ Closed: ” and “Foo-164”; available in the replacement text as $1 and $2 (note that PowerShell will process these as variables if the replacement text is wrapped in double quotes, so to ensure they reach regular expression parser single quotes are needed).  So the –replace operation replaces “[JIRA] Closed: (Foo-164)” with “FOO-164  Closed.” which is much better for sorting.
In practice I developed this a bit further by putting information into the message’s userProperties  but my initial prototype shows the idea.

I ran this against the messages I had already received and it worked very nicely. But it would only work as an on-going solution if I was happy to run my PowerShell script for every new mail. I’m not. So I need something that Outlook can run automatically – Macros (Functions written in Outlook’s VBA environment) can be invoked by telling the rule to Run a Script. I found that the Move-to-folder rule-operation needs to be moved into the script (if the rules engine moves the message, the script doesn’t work).  The line of PowerShell that was in the body of the ForEach loop in my example  turns into this:

strSubject = objItem.Subject  
If Left(strSubject, 6) = "[JIRA]" Then
    openParen = InStr(strSubject, "(")
    closeParen = InStr(openParen, strSubject, ")")
    StrJIRASubject = Mid(strSubject, (closeParen + 2))
    strJIRAAction = Mid(strSubject, 8, (openParen - 9))
    strJIRAID = Mid(strSubject, (openParen + 1), (closeParen - openParen - 1))
    objItem.Subject = strJIRAID + " " + strJIRAAction + " " + StrJIRASubject
    objItem.Save
End If

You can see that here I have to check that line begins with [JIRA]: using –replace in  PowerShell makes no changes if the match isn’t found, so doesn’t need to check – but VB script might scramble a non-JIRA subject line.
Then the script finds the positions of the Parentheses.
Then it has to isolate the pieces of text after the closing one, before the opening one but after “[JIRA] ” and between the two.
Then it assembles these parts to make the new subject line and saves the message.

I know people find  regular expressions tough to follow but I’d say it was pretty hard to tell what is going on in the script, For example, take the line
strJIRAAction = Mid(strSubject, 8, (openParen - 9))
Why select from character 8 for openParen –9 characters ?  

[Answer: “[JIRA]” is 6 characters, character number 7 is a space so the word “Closed:” begins at  character 8.  From there up to the “(” is  openParen – 8 characters , we want to stop one before that so we read openParen –9 characters and store “closed: ”  in strJIRAAction] .
It takes time to work this out.  If you know a little of the language of regex it takes less time to see that ^\[JIRA\](.*?)\( will put “closed: ” into $1

But there’s more.  I alluded to the need to check for [JIRA] at the start of the subject line in VB script.  The –Replace operator in PowerShell “Fails safe” Suppose a future version of [JIRA] changes from “[JIRA] Closed: (Foo-164)” to “[JIRA] (Foo-164): Closed” , the regular expression no longer matches, so nothing is changed. The VB script continues to run and … well you can try to work out what my Macro will do. The iterative development we tend to do in PowerShell lets you enter
"[JIRA] Closed: (Foo-164) Test (type 2)" -match "^\[JIRA\].*\((.*)\)" ; $matches
as a command line and see and fix the problem.  Something you just can’t do so well with VBscript.

Blog at WordPress.com.