James O'Neill's Blog

April 10, 2011

Ten tips for better PowerShell functions

Filed under: Uncategorized — jamesone111 @ 11:02 pm

Explaining PowerShell often involves telling people it is both an interactive shell – a replacement for the venerable CMD.EXE – and a scripting language used for single task scripts and libraries of re-useable functions. There are some good practices which are common to both kinds of writing  – including comments, being explicit with parameters, using full names instead of aliases and so-on but having written hundreds of “script cmdlets” I have developed some views on what makes a good function which I wanted to share…

1. Name your function properly
It’s not actually compulsory to use Verb-SingularNoun names with the standard verbs listed by Get-Verb. “Helpers” which you might pop in a profile can be better with a short name. But if your function ends up in a module Import-module grumbles when it sees non-standard verbs. Getting the right name can clarify your thinking about what a command should or should not do. I cite IPConfig.exe as an example of a command line tool which didn’t know when to stop – what it does changes dramatically with different switches.  PowerShell tends towards multiple smaller functions whose names tell you what they will do – which is a Good Thing

2. Use standard, consistent and user-friendly parameters.
(a) PowerShell Cmdlets give you –whatIf and –Confirm switches; before you do something irreversible –  you can get these in your own functions Put this line of code before any others in the function
[CmdletBinding(SupportsShouldProcess=$True)]
and then where you do something which is hard to undo  
If ($psCmdlet.shouldProcess("Target" , "Action")) {
    dangerous actions
}
(b) Look at the names PowerShell uses: “path”, not “filename” , “ComputerName” not “Host”, “Force” “NoClobber” and so on – copy what has been done before unless you have a good reason to do something different; I don’t use “ComputerName” when working with Virtual Machines because it is not clear if it means a Virtual Machine or the Physical Machine which hosts them.
(c)If you are torn between two names : remember that “Computer” is a valid shortening of “ComputerName” and for names which are shortenings of an alternative you can define aliases, like this:
[Alias("Where","Include")]$Filter
TIP 1.You can discover all the parameter names used in by cmdlets, and how popular they are like this
get-command -c cmdlet | get-help -full| foreach {$_.parameters.parameter} |
   forEach{$_.name} | group -NoElement | sort count

Tip2
If you think “Filter” is the right name to re-use you can see how other cmdlets use it like this:
Get-Command -C cmdlet | where { $_.definition -match "filter"} | get-help  -Par "filter" 

3. Support Piping into your functions.
V2 of PowerShell greatly simplified Piping. The more you use PowerShell the stronger sense you get that the output of one command should become the input for another. If you are writing functions, aim for the ability to pipe into them and pipe their output into other things.  Piped input becomes a parameter, all you need to do is

  • Make sure the parts of the function which run for each piped object are in a
    process {} block
  • Prefix the parameter declaration with [parameter(ValueFromPipeline=$true)].
  • If you want a property of a piped object instead of the whole object, use ValueFromPipelineByPropertyName
  • If different types of objects get piped, and they use different property names for what you want, give your parameter aliases, and it will look for the “true” name if it doesn’t find it try each alias in turn.

If you find code that looks like this
something | foreach {myFunction $_ }
It is a sign that you probably need to look at piping.

4. Be flexible about arrays and types of parameters
Piping is one way to feed many objects into one command. In addition, many built-in cmdlets and operators will accept arrays as parameters just as happily as they would accept a single object; previously I gave the example  of Get-WmiObject whose –computername parameter can specify a list of machines – it makes for simpler code.
It is easier to use the functions which catch being passed arrays and process them sensibly (and see that previous post for why simply putting [String] or [FileInfo] in front of a parameter doesn’t work).  Actually I see it as good manners – “I handle the loop so you don’t have to do it at the command line”
Accepting arrays is one case of not being over-prescriptive about types: but it isn’t the only one. If I write something which deals with, say, a Virtual Machine, I ensure that VM names are just as valid as objects which represent VMs. For functions which work with files, it has to be just as acceptable to pass System.IO.FileInfo and System.Management.Automation.PathInfo, objects or strings containing the path (unique or wild card, relative path or absolute). 
TIP:  resolve-path will accept any of these and convert them into objects with fully-qualified paths.
It seems rude to make the user use Get-whatever to fetch the object if I can do it for them.

5. Support ScriptBlock parameters.
If one parameter can be calculated from another it is good to let the user say how to do the calculation.  Consider this example with Rename-Object. I have photos named IMG_4000.JPG, IMG_4001.JPG , IMG_4002.JPG, up to IMG_4500.JPG. They were taken underwater, so I want them to be named DIVE4000.JPG etc. I can use:
dir IMG_*.JPG | rename-object –newname {$_.name –replace "IMG_","DIVE"}
In English “Get the files named IMG_*.JPG and rename them. The new name for each one is the result of replacing IMG_ with DIVE in that one’s current name.” Again you can write a loop to do it but a script block saves you the trouble.

  • The main candidates for this are functions where one parameter is piped and a second parameter is connected to a property of the Piped one.
  • When you are dealing with multiple items arriving from the pipeline, be careful what variables you set in the process{} block of the function: you can introduce some great bugs by overwriting non-piped parameters. For example if you had to implement rename-object, it would be valid to handle a string that had been piped in as the –path parameter by converting it into a FileInfo object – doing so has no effect on the next object to come down the pipe; but if you convert a script block which is passed as -NewName to a String, when the next object arrives it will get that string – I’ve had great fun with the bugs which result from this
  • All you need to do to provide this functionality is
    If ($newname –is [ScriptBlock]) { $TheNewName = $newname.invoke() }
    else                            { $TheNewName = $newname}

6. Don’t make input mandatory if you can set a sensible default.
Perhaps obvious, but… If I write a function named “Get-VM” which finds virtual machines with a given name, what should I do if the the user doesn’t give me a VM name ? Return nothing ? Throw an error ? Or assume they want all possible VMs ?
What would you mean if you typed Get-VM on its own ?

7. Don’t require the user to know too much underlying syntax.
Many of my functions query WMI; WMI uses SQL syntax; SQL Syntax uses “%” as a wildcard, not “*”.  Logical conclusion: if a user wants to specify a wildcarded filter to my functions they should learn to use % instead of *.  That just seems wrong to me: so my code replaces any instance of * with %.  If the user is specifying filtering or search terms a few lines to change the from things they will instinctively do, or wish they could do, to what is required for SQL,  LDAP or any other syntax can make a huge difference in usability.

8. Provide information with Write-Verbose , Write-debug and Write-warning
When you are trying to debug the natural reaction is to put in Write-Host commands, fix the problem and take them out again.  Instead of doing that change $DebugPreference and/or $VerbosePreference and use write-debug / write-verbose to output information. You can leave them in and stop the output by changing the preference variables back. If your function already has
[CmdletBinding(SupportsShouldProcess=$True)]
at the start then you get –debug and –verbose switches for free.
Write-Error is ugly and if you are able to continue, it’s often better to use Write-warning.
And learn to use write-progress when you expect something to spend a long time between screen updates.

9. Remember: your output is someone else’s input.
(a) Point 8  Didn’t talk about using Write-Host – only use it to display something you want to prevent going into something else.
(b) Avoid formatting output in the function, try to output objects which can be consumed by something else.  If you must format turn it on or off with a –formatted or -raw switch.
(c) Think about the properties of the objects you emit. Many commands will understand that something is a file if it has a .Path property, so add one to the objects coming out of your function and they can be piped into copy, invoke-item, resolve-path and so on. Usually that is good – and if it might be dangerous look at what you can do to change it.  Another example: when I get objects that represent components of a virtual machine their properties don’t include the VM name. So I go to a little extra trouble to add it.
Add-Member can add properties or aliases for properties to an object for example
$obj | Add-member -MemberType AliasProperty –Name "Height"-Value “VerticalSize”

10 Provide help
In-line help is easy – it is just a carefully comment before any of the code in your function. It isn’t just there for some far when you share the function with the wider world. It’s for you when you are trying to figure out what you did months previously – and Murphy’s law says you’ll be trying to do it at 3AM when everything else is against you.
Describe what the Parameters expect and what they can and can’t accept. 
Give examples (plural) to show different ways that the function can be called. And when you change the function in the future, check the examples still work.

Advertisements

April 9, 2011

Pattern recognition–the human and PowerShell kinds

Filed under: Uncategorized — jamesone111 @ 8:40 pm

Recently BBC’s Top Gear has been promoting the idea that a particular type of obnoxious drivers have been replacing the BMWs that they traditionally bought with Audis. Chatting to a friend who is a long term Audi customer, and whose household features ‘his’ and ‘hers’ Audis we came to the conclusion that once you think there is a pattern, you recognise it and the your awareness increases – even if in reality it is no more prevalent. I think the same thing happens in IT in general and scripting in particular – it has happened to me recently… when  my understanding of regular expressions in PowerShell took a big step forward, and now I’m finding all manner of places where it helps.

I use a handful of basic regular expressions  for things like removing a trailing \ character from the end of a string with something like:
$Path = $Path –replace "\\$" , ""
Many people use –replace to swap text without realising it handles regular expressions – in this case  “\” is the escape character in regular expression, so to match “\” itself it has to escaped as “\\” . The $ character means “end-of-line” so this fragment just says ‘Replace “\” at the end of $Path – if you find one – with nothing, and store the result back in $Path.  PowerShell’s –Split operator also uses regular expressions. This can be a trap – if you try to split using  “.” it has means “any character” any you get a result you didn’t expect:
This.that" –split "." returns 10 empty strings – (the –split operator discards the delimiter) ; to match a “.” it must  be escaped as “\.” . But it’s also a benefit if you want to split sentences apart you can make  “.” and any spaces round it the delimiter– which saves the need to trim afterwards. The –Match operator uses regular expressions too  – I  worry when I see it used in examples for new users who may use something which parses unexpectedly as a regular expression .

I thought that I knew regular expressions – until thanks to an article by Tome Tanasovski, I found I had missed a big bit of the picture, which meant my understanding was wrong.  I thought that a match meant the equivalent of running a highlighter pen over part of the text and –replace means “take something out and put something else back” – both are usually true but not always. Tome also did a presentation for the PowerShell user group – there’s a link to the recording on Richard’s blog – I’d recommend watching it and pausing every so often to try things out.
Tome showed look-aheads and look-behinds. These say “It’s a Match if it is followed by something”, or “preceded by something” (or not).  This adds a whole new dimension…

A couple of days later I hit a snag with PowerShell’s Split-Path cmdlet. If the path is on a remote machine it might uses a drive letter which doesn’t exist on the local machine – and in that situation Split-Path throws an error. But I can use the –Split operator with a regular expression. I want to say “Find a \ followed by some characters that aren’t \ and the end of the string”. Lets test this:
PS C:\Users\James\Documents\windowsPowershell> $pwd -split "\\[^\\]+$"
C:\Users\James\Documents

As in my first example  ‘\\’ is an escaped ‘\’ character, and ‘$’ means “end of line” , ‘[^\\]’ says “Anything which not the ‘\’ character”  and ‘+’ means “at least once” So this translates as “Look for a ‘\’ followed my at least 1 non-‘\’ followed by end of line”. It’s mostly right but it doesn’t work (yet).
I copied my command prompt so you can see that ‘WindowsPowerShell’ is part of the my working directory – but that bit got lost; or to be more precise it was matched in the expression, so –split returned the text on either side of it.
I want to say “Find ONLY a ‘\’ . The one you want is followed by some characters that aren’t ‘\’ and the end of the string but they don’t form part of the delimiter.”  The syntax for Is followed by is “(?=   )” so I can wrap that around the [^\\]+$ part  and test that:
PS C:\Users\James\Documents\windowsPowershell> $pwd -split "\\(?=[^\\]+?$)"
C:\Users\James\Documents
windowsPowershell

Regular-Expressions can turn into a write-only language – easy to build up but pretty hard to pull apart.  At risk of making things worse, not everyone knows that PowerShell has a “multiple = operator”; if you write $a , $b  = 1,2  it will assign 1 to $a and 2 to $b. Since the output of the split operation is 2 items we can try this
PS C:\Users\James\Documents\windowsPowershell> $Parent,$leaf = $pwd -split "\\(?=[^\\]+?$)"
PS C:\Users\James\Documents\windowsPowershell> $Parent
C:\Users\James\Documents
PS C:\Users\James\Documents\windowsPowershell> $leaf
windowsPowershell

The “cost” of using regular expressions is that the term used to do the split is something akin to a magical incantation. The benefit is code is a lot more streamlined than using the string object’s  .LastInstanceOf(), .Substring() and .length() methods and some arithmetic to get to the same result. I’d contend that even allowing for the “incantation” the regex way makes it easier to see that $pwd is being split into 2 parts.
Good stuff so far, but Tome had another trick:  the match that selects nothing and the replace that removes nothing.  That made me stop and redefine my understanding.  Here’s the use case:

Ages ago I wrote about using PowerShell to query the Windows [Vista] Destkop Index – it works just as well with Windows 7.  The a zillion or so field names used in these queries have names like  “System.Title”, “System.Photo.Orientation” and “System.Image.Dimensions” – I’d type the bare field name like “title” by mistake or waste time discovering whether “HorizontalSize” belonged to System.Photo or System.Image. 
It would be better to enable my Get-IndexedFile function to put in the the right prefix: but could it be done reasonably efficiently and elegantly?
Here lookarounds come into their own. They let me write “If you can find a spot which is immediately after a space, and immediately before the word ‘Dimensions’ OR the word ‘HorizontalSize’ OR…” and so on for all the Image Fields “AND that word is followed by any spaces and a ‘=’ sign  THEN put ‘System.image.’ at the spot you found”.  With just the first two fieldnames the operation looks like this
-replace "(?<=\s) (?=(Dimensions|HorizontalSize)\s*=)" , "system.image."
                 ^
I have put an extra space in for the spot that will be matched – the ^ is pointing this out, it isn’t part of the code.
“(?<=  )” is the wrapper for the “look behind” operation  (replacing the ‘=’ with ‘!’ negates the expression) so “(?<=\s)”  says “behind this spot you find a space” and the second half is a “look ahead” which says “in front of this spot you find ‘Dimensions’ or ‘HorizontalSize’ then zero or more spaces (‘\s*’) followed by ‘=’ ”. A match with an expression like this is like an I-beam cursor between characters – rather than highlighting some: so the –replace operator has nothing to remove but it still inserts ‘system.image’ at that point. So lets put that to the test.

PS> "horizontalsize = 1024"  -replace "(?<=\s)(?=(Dimensions|HorizontalSize)\s*=)",
                                       "system.image."
system.image.horizontalsize = 1024

It works !  This whole exercise of writing a Get-IndexedFilesfunction – which I will share in due course –  ended up as worked example in using regex to support good function design. I’ve got another post in draft at the moment about my ideas on good function design, so I’ll post that and then come back to looking at all the different ways I made use of regular expressions in this one function.

Blog at WordPress.com.