James O'Neill's Blog

June 16, 2017

More on writing clear scripts: Write-output and return … good or bad ?

Filed under: Powershell — jamesone111 @ 11:26 am

My last post talked about writing understandable scripts and something I read a piece entitled Let’s kill Write-Output by Mark Krauss (actually I found it because Thomas Lee Tweeted it with “And sort out return too”).

So let’s start with one practicality. You can’t remove a command which has been in a language for 10 years unless you are prepared for a lot of pain making people re-write scripts. It’s alias “echo” was put there for people who come from other scripting languages who start by asking “How do I print to the console”. But if removing it altogether is impractical, we can advise people to avoid it, write rules to catch it in the script analyser and so on. Should we ? And when is a good idea to use it ?

Mark points out he’s not talking about Write-host, which should be kept for limited scenarios: if you want the user to see it by default, but it isn’t part of the output then that’s a job for Write-host, for example with my Get-SQL command   $result = Get-SQL $sqlQuery writes “42 rows returned” to the console but the output saved into $result is the 42 rows of data. Mark gives an example:    
Write-Output "PowerShell Processes:"
Get-Process -Name PowerShell

and says it is better written as  
"PowerShell Processes:"
Get-Process -Name PowerShell

And this is actually a case where Write-host should be used … why ? Let’s turn that into a function.
Function Get-psProc {
  "PowerShell Processes:"
  Get-Process -Name"*PowerShell*"
}

Looks fine doesn’t it ? But it outputs two different types of object into the pipeline. All is well if we run Get-psProc on its own, but if we run
 Get-psProc | ConvertTo-Csv 
It returns
#TYPE System.String
"Length"
"21" 

The next command in the pipeline saw that the first object was a string and that determined its behaviour. “PowerShell processes” is decoration you want the user to see but isn’t part of the output. That earlier post on understandable scripts came from a talk about writing good code and the one of the biggest problems I find in other peoples code is fixation with printing to the screen.  That leads to things like the next example – which is meant to read a file and say how many lines there are and their average length.

$measurement = cat $path | measure -average Length
echo ("Lines read    : {0}"    -f $measurement.Count  )
echo ("Average length: {0:n0}" -f $measurement.Average)

This runs and it does the job the author intended but I’d suggest they might be new to PowerShell and haven’t yet learnt that Output is not the same as “stuff for a user to read” (as in the previous example) , and they feel their output must be printed for reading. Someone more experienced with PowerShell might just write:
cat $path| measure -average Length
If they aren’t bothered about the labels,  or if the labels really matter
cat $path | measure -average Length | select @{n="Lines Read";e={$_.count}}, @{n="Average Length";e={[math]::Round($_.Average,2)}}

If this is something we use a lot, we might change the aliases to cmdlet names, specify parameter names and save it for later use. And it is re-usable, for example if we want to do something when there are more than x lines in the file, where the previous version can only return text with the number of lines embedded in it.  Resisting the urge to print everything is beneficial and that gets rid of a lot of uses of Write-output (or echo).

Mark’s post has 3 beefs with Write-Output.

  1. Performance. It is slower but rarely noticeably so, so I’d discount this.
  2. Security / Predictability – Write-Output can be redefined, and that allows for something malign or just buggy. True, but it also allows you to redefine it for logging debugging and so on. So you could use a proxy Write-output for testing and the standard one in production. So this is not exclusively bad
  3. The false sense of security. He says that explicitly returning stuff is held to be better than implicit return, which implies
    Write-Output $result
          is better than just   $result            
    But no-one says you should write    cat $path | Write-Output it’s obviously redundant, but when you don’t isn’t that implying output ?

My take on the last point is piping output into write-output (or Out-Default) is a tautology “Here’s some output, take it and output it”. It’s making things longer but not clearer. If using write-output does make things clearer then it is a sign the script is hard to read and at least needs some comments, and possibly some redesign. Joel Bennett sums up the false sense of security part in a sentencewhile some people like it because it highlights the spots where you intentionally output something — other people argue it’s presence distracts you from the fact that other lines could output.”  [Thanks Joel, that would have taken me a paragraph!]

This is where Thomas’ comment about return comes in. Return tells PowerShell to bail out of a function, and there are many good reasons for doing that, it also has a two in one syntax :  return $result is the same as
$result
return

When I linked to Joel above he also asks the question whether, as the last lines of a function, this
$output = $temp + (Get-Thing $temp)
return $output

is better or worse than
$output = $temp + (Get-Thing $temp)
$output

Not many people would add return to the second example – it’s redundant.  But if you store the final output in a variable there is some logic to using return (or Write-output) to send it back. But is it making things any clearer to store the result in a variable ? or it just as easy to read the following.  
$temp + (Get-Thing $temp)

As with Write-output, sometimes using return $result makes things clearer and sometimes it’s a habit from other programming languages where functions return results in a single  place so multiple parts must be gathered and then returned. Here’s something which combines the results of 3 queries and returns them

$result =  (Get-SQL $sqlQuery1)
$result += (Get-SQL $sqlQuery2)
$result +  (Get-SQL $sqlQuery3)

So the first line assigns an array of database rows to a variable the second appends more rows and the third returns these rows together with the results of a third query.  You need to look at  the operator in each line to figure out which sends to the pipeline. Arguably this it is clearer to replace the last line with this:

$result += (Get-SQL $sqlQuery3)
return $result

When there are 3 or 4 lines between introducing $result and putting it into the pipeline this is OK. But lets say there are 50 lines of script between storing the results of the first query and appending the results of the second.  Has the script been made clearer by storing a partial result … or would you see something being appended to $result and look further up the script for where it was originally set and anywhere it was changed ? This example does nothing with the combined segments (like sorting them) we’re just following an old habit of only outputting in one place. Not outputting anything until we have everything can mean it takes a lot longer to run the script – we could have processed all the results from the first query while waiting for the second to run. I would dispense with the variable entirely and use

Get-SQL $sqlQuery1
Get-SQL $sqlQuery2
Get-SQL $sqlQuery3

If there is a lot of script between each I’d then use a #region around the lines which lead up to each query being run
#region build query and return rows for x
#etc etc
Get-SQL $sqlQuery1
#endregion 

so when I collapse the outlining regions in my editor I see
#region build query and return rows for x
#region build query and return rows for y
#region build query and return rows for z

Which gives me a very good sense of what the script is doing at a high level and then I can drill into the regions if I need to. If I do need to do something to the combined set of rows (like sorting) then my collapsed code might become
#region build query for x and keep rows for sorting later
#region build query for y and keep rows for sorting later
#region build query for z and keep rows for sorting later
#region return sorted and de-duplicated results of x,y and Z

Both outlines give a sense of where there should be output and where any output might be a bug.

In conclusion. 
When you see Lots of echo / write-output commands that’s usually a bad sign – it’s usually an indication of too many formatted strings going into the pipeline, but Write-Output is not automatically bad when used sparingly – and used properly return isn’t bad either. But if you find yourself adding either for clarity it should make you ask “Is there a better way”.  


Advertisements

March 13, 2017

Improving PowerShell performance with hash tables.

Filed under: Powershell — jamesone111 @ 1:07 pm

Often the tasks which we do with PowerShell scripts aren’t very sensitive to performance – unless we are sitting drumming our fingers on the desk waiting for it to complete there isn’t a lot of value in making it faster.   When I wrote about start-Parallel, I showed that some things only become viable if they can be run reasonably quickly; but you might assume that scripts which run as scheduled tasks can take an extra minute if they need to . 

imageThat is not always the case.  I’ve been working with a client who has over 100,000 Active directory users enabled for Lync (and obviously they have a lot more AD objects than that). They want to set Lync’s policies based on membership of groups and users are not just placed directly into the policy groups but nested via other groups. If users have a policy but aren’t in the associated group, the policy needs to be removed. There’s a pretty easy Venn diagram for what we need to do.

If you’ve worked with LDAP queries against AD, you may know how to find the nested members of the group using(memberOf:1.2.840.113556.1.4.1941:=<<Group DN>>).
The Lync/Skype for Business cmdlets won’t combine an LDAP filter for AD group membership and non-AD property filter for policy – the user objects returned actually contain more than a dozen different policy properties, but for simplicity I’m just going to use ‘policy’ here – as pseduo-code the natural way to find users and change policy – which someone else had already written – looks like this
Get-CSuser –ldapfiler "(   memberOf <<nested group>> )" | where-object {$_.policy –ne $PolicyName} | Grant-Policy $PolicyName
Get-CSuser –ldapfiler "( ! memberOf <<nested group>>) " | where-object {$_.policy –eq $PolicyName} | Grant-Policy $null

(Very late in the process I found there was a way to check Lync / Skype policies from AD but it wouldn’t have changed what follows). 
You can see that we are going to get every user, those in the group in the first line and those out of it in the second. These “fan out” queries against AD can be slow – MSDN has a warning “Some such queries on subtrees may be more processor intensive, such as chasing links with a high fan-out; that is, listing all the groups that a user is a member of.”   Getting the two sets of data was taking over an hour.  But so what? This is a script which runs once a week, during quiet hours, provided it can run in the window it is given all will be well.  Unfortunately, because I was changing a production script, I had to show that the correct users are selected, the correct changes made and the right information written to a log. While developing a script which will eventually run as scheduled task, testing requires we run it interactively, step through it, check it, polish it, run it again, and a script with multiple segments which run for over an hour is, effectively, untestable (which is why I came to be changing the script in the first place!).

I found I could unpack the nested groups with a script a lot more quickly than using the “natural” 1.2.840.113556.1.4.1941 method; though it feels wrong to do so. I couldn’t find any ready-made code to do the unpack operation – any search for expanding groups comes back to using the OID method which reinforces the idea. 

I can also get all 100,000 Lync users – it takes a few minutes, but provided it is only done once per session it is workable, (I stored the users in a global variable if it was present I didn’t re-fetch them) 

So: I had a variable $users with users and a variable $members which contains all the members of the group; I just had to work out who is each one but not in both. But I had a new problem. Some of the groups contain tens of thousands of users. Lets assume half the users have the policy and half don’t. If I run
$users.where{$_.policy -ne $PolicyName -and $members -contains $_.DistinguishedName}
and
$users.where{$_.policy -eq $PolicyName -and $members -notcontains $_.DistinguishedName}
the -contains operation is going to have a LOT of work to do: if everybody has been given the right policy none of the 50,000 users without it are in the group but we have to look at all 50,000 group-members to be sure – 2,500,000,000 string comparisons. For the 50,000 users who do have the policy, on average [Not]contains has to look at half the group  members before finding a match so that’s 1,250,000,000 comparisons. 3.75 Billion comparisons for each policy  means it is still too slow for testing. Then I had a flash of inspiration, something which might work.

I learnt about Hash tables as a computer science undergraduate, and – as the on-line help puts it  – they are “very efficient for finding and retrieving data”. This can be read as they lead to neat code (which has been their attraction for me in the past) or as they minimize CPU use, which is what I need here.  Microsoft also call hash tables  “associative arrays” and often the boundary between a set of key-value pairs (a dictionary) and a “true” hash table is blurred – with a “true” hash tables the location of data in memory is based on the key value – so an item can be found without scanning the whole list. Some ways to do fast finds make tables slow to build. Things I’d never considered with PowerShell hash tables might turn out to be important at this scale. So I built a hash table to return a user’s policy given their DN:
$users | ForEach-Object -Begin {$hash=@{}} -Process {$hash[$_.distinguishedName] = "" + $_.policy}

About 100,000 users were processed in under 4 seconds, which was a relief; and the right policy came back for $hash[“cn=bob…”] – it looked instant compared with a couple of seconds with $users.where({$_.distinguishedName –eq “cn=bob…”}).policy

This hash table will return one of 3 things. If Bob isn’t set-up for Lync/Skype for Business I will get NULL; if Bob has no policy I will get an empty string (that’s why I add the policy to an empty string – it also forces the policy object to be a string), and if he has a policy I get the policy name. So it was time to see how many users have the right policy (that’s the magenta bit in the middle of the Venn diagram above)
$members.where({$hash[$_] -like $policyname}).count

I’d found ~10,000 members in one of the policy groups, and reckoned if I could get the “find time” down from 20 minutes to 20 seconds that would be OK and … fanfare … it took 0.34 seconds. If we can check can look up 10,000 items in a 100,000 item table in under a second, these must be proper hash tables. I can have the .where() method evaluate
{$hash[$_] –eq $Null} for the AD users who aren’t Lync users or
{$hash[$_] –notin @($null,$policyName) } for users who need the policy to be set. 
It works just as well the other way around for setting up the hash table to return “True” for all members of the AD group; non-members will return null, so we can use that to quickly find users with the policy set but who are not members of the group. 

$members | ForEach-Object -Begin {$MemberHash=@{}} -Process {$MemberHash[$_] = "" + $true}
$users.where({$_.policy -like $policyname -and -not $memberhash[$_.distinguishedName]}).count

Applying this to all the different policies slashed the time to do everything in the script from several hours down to a a handful of minutes.  So I could test thoroughly before the ahead of putting the script into production.

January 29, 2017

Sharing GetSQL

Filed under: Databases / SQL,Powershell — jamesone111 @ 8:03 pm

I’ve uploaded a tool named “GetSQL” to the PowerShell Gallery
Three of my last four posts (and one upcoming one) are about sharing stuff which I have been using for a long time and GetSQL goes back several years – every now and then I add something to it, so it seems like it might never be finished, but eventually I decided that it was fit to be shared.

When I talk about good PowerShell habits, one of the themes is avoiding massive “do-everything” functions; it’s also good minimize the number of dependencies on the wider system state – for example by avoiding global variables. Sometimes it is necessary to put these guidelines to one side –  the module exposes a single command, Get-SQL which uses argument completers (extra script to fill in parameter values). There was a time when you needed to get Jason Shirks TabExpansion Plus Plus to do anything with argument completers but, if you’re running a current version of PowerShell, Register-ArgumentCompleter is now a built in cmdlet. PowerShell itself helps complete parameter names – and Get-SQL is the only place I’ve made heavy use of parameter sets to strip out parameters which don’t apply (and had to overcome some strange behaviour along the way); having completers to fill in the values for the names of connections, databases, tables and column names dramatically speeds up composing commands – not least by removing some silly typos.

One thing about Get- commands in PowerShell is that if the PowerShell parser sees a name which doesn’t match a command where a command-name should be it tries putting Get- in front of the name so History runs Get-History, and SQL runs Get-SQL. That’s one of the reasons why the command didn’t begin life as “Invoke-SQL” and instead became a single command. I also wanted to be able to run lots of commands against the same database without having to reconnect each time. So the connection objects that are used are global variables, which survive from one call to the next – I can run
SQL “select id, name from customers where name like ‘%smith’ ”
SQL “Select * from orders where customerID = 1234”

without needing to make and break connections each time – note, these are shortened versions of the command which could be written out in full as: 
Get-SQL –SQL “Select * from orders where customerID = 1234”
The first time Get-SQL runs it needs to make a connection, so as well as –SQL it has a –Connection parameter (and extra switches to simplify connections to SQL server, Access and Excel). Once I realized that I needed to talk to multiple databases, I started naming the sessions and creating aliases from which the session  can be inferred (I described how here) so after running
Get-SQL -Session f1 -Excel  -Connection C:\Users\James\OneDrive\Public\F1\f1Results.xlsx
(which creates a session named F1 with an Excel file containing results of formula one races) I can dump the contents of a table with
f1 –Table “[races]”

Initially, I wanted to either run a chunk of SQL –like the first example (and I added a –Paste switch to bring in SQL from the Windows Clipboard), or to get the whole of a table (like the example above with a –Table parameter), or see what tables had be defined and their structure so I added –ShowTables and –Describe.  The first argument completer I wrote used the “show tables” functionality to get a list of tables and fill in the name for the –Table or –Describe parameters. Sticking to the one function per command rule one would mean writing “Connect-SQL”, “Invoke-RawSQL” “Invoke-SQLSelect”, and “Get-SQLTable” as separate commands, but it just felt right to be able to allow
Get-SQL -Session f1 -Excel  -Connection C:\Users\James\OneDrive\Public\F1\f1Results.xlsx –showtables

It also just felt right to allow the end of a SQL statement to be appended to what the command line had built giving a command like the following one – the output of a query can, obviously, be passed through a PowerShell Where-Object command but makes much more sense to filter before sending the data back.
f1 –Table “[races]” “where season = 1977”

“where season = 1977” is the –SQL parameter and –Table “[races]” builds “Select * from [Races]” and the two get concatenated to
“Select * from [Races] where season = 1977”.
With argument completers for the table name, it makes sense to fill column names in as well, so there is a –Where parameter with a completer which sees the table name and fills in the possible columns. So the same query results from this:
f1 –Table “[races]”   -where “season”   “= 1977”  
I took it a stage further and made the comparison operators work like they do in where-object, allowing
f1 –Table “[races]” -where “season” -eq 1977

The script will sort out wrapping things in quotes, changing the normal * into SQL’s % for wildcards and so on. With select working nicely (I added –Select to choose columns, –OrderBy, –Distinct and –GroupBy ) I moved on to insert, update (set) and Delete functionality. You can see what I meant when I said I keep adding bits and it might never be “finished”.

Sometimes I just use the module as a quick way to add a SQL query to something I’m working on: need to query a Lync/Skype backend database ? There’s Get-SQL. Line-of-business database ? There’s Get-SQL. Need to poke round in Adobe Lightroom’s SQL-Lite database? I do that with with Get-SQL -  though in both cases its simple query building is left behind, it just goes back to its roots of running a lump of SQL which was created in some other tool.  The easiest way to get data into Excel is usually the EXPORT part of Doug Finke’s excellent import-excel and sometimes it is easier to do it with insert or update queries using Get-SQL. The list goes on … when your hammer seems like a good one, you find an awful lot of nails.

December 6, 2016

Do the job 100 times faster with Parallel Processing in PowerShell

Filed under: Powershell — jamesone111 @ 11:12 pm
Tags: ,

It’s a slightly click-baity title, but I explain below where the 100 times number comes from below. The module is on the PowerShell gallery and you can install it with  Install-Module -Name Start-parallel

Some of the tasks we need to do in PowerShell involve firing off many similar requests and waiting for their answers – for example getting status from lots computers on a network. It might take several seconds to do each one – maybe longer if the machines don’t respond. Doing them one after the other could take ages. If I want to ping all 255 addresses on on my home subnet, most machines will be missing and it will take 3 seconds to time out for each of the 200+ inactive addresses. Even if I try only one ping for each address it’s going to take 10-12 minutes, and for >99% of that time my computer will be waiting for a response. Running them in parallel could speed things up massively.
Incidentally the data which comes back from ping isn’t ideal, and I prefer this to the Test-Connection cmdlet.
Function QuickPing {
    param ($LastByte)
    $P = New-Object -TypeName "System.Net.NetworkInformation.Ping"
    $P.Send("192.168.0.$LastByte") | where status -eq success | select address, roundTripTime
}

PowerShell allows you to start multiple processes using Jobs and there are places where Jobs work well. But it only takes a moment to see the flaw in jobs: if you run
Get-Process *powershell*
Start-Job -ScriptBlock {1..5 | foreach {start-sleep -Seconds 1 ; $_ } }
Get-Process *powershell*

You see that the job creates a new instance of PowerShell … doing that for a single ping is horribly inefficient – jobs are better suited to tasks where run time is much longer than the set up time AND where we don’t want run lots concurrently. In fact I’ve found creating large numbers of jobs tends to crash the PowerShell ISE; so my first attempts at parallelism involved tracking the number of jobs running and keeping to a maximum – starting new jobs only as others finished. It worked but in the process I read this by Boe Prox and this by Ryan Witschger which led me to a better way: RunSpaces and the RunSpace factory.
MSDN defines a RunSpace as “the operating environment where the command pipeline of the PowerShell object is invoked”; and says that the PowerShell object allows applications that programmatically use Windows PowerShell to create pipelines of commands, invoke them and access the results. The factory can create single RunSpaces, or a pool of RunSpaces. So a program (or script) can get a PowerShell object which says “Run this, with these named parameters and these unnamed arguments. Run it asynchronously (i.e. start it and don’t wait for it complete, give me some signal when it is done), and in an a space from this pool.” If there are more things wanting to run than there are RunSpaces, the pool handles queuing for us. 

Thus the idea for Start-Parallel was born.  I wanted to be able to do this
Get-ListOfComputers | Start-Parallel Get-ComputerSettings.ps1
or this  
1..255 | Start-Parallel -Command QuickPing -MaxThreads 500
or even pipe PS objects or hash tables in to provide multiple parameters a same command

-MaxThreads in the second example says create a pool where 500 pings can be in progress, so every QuickPing can be running at the same time (performance monitor shows a spike of threads). So how long does it take to do 255 pings now? 240 inactive addresses taking 3 seconds each gave me ~720 seconds and the version above runs in a little under 7, so a that’s 100 fold speed increase!  This is pretty consistent with what I’ve found with polling servers over the couple of years I’ve been playing with Start-Parallel – things that would take a morning or an afternoon run in a couple of minutes. 

You can install it from the PowerShell Gallery. Some tips

  • Get-ListOfComputers | Start-Parallel Get-ComputerSettings.ps1 
    works better than
    $x = Get-ListOfComputers ; Start-Parallel -InputObject $x -Command Get-ComputerSettings.ps1
    if Get-ListOfComputers is slow, we will probably have the results for the first computer(s) before we have been told the last one on the list to check.    
  • Don’t hit the same same service with many requests in parallel – unless you want to mount a denial of service attack.   
  • Remember that RunSpaces don’t share anything – the parallel RunSpaces won’t load your profile, or inherit anything from the session which launches them. And there is no guarantee that every module out there always behaves as expected if run in multiple RunSpaces simultaneously. In particular if, “QuickPing” is defined in a the same PS1 file which runs Start-Parallel, then Start-Parallel is defined in the global scope and can’t see QuickPing in the script scope. The work round for this is to use  
    Start-Parallel –scriptblock ${Function:\QuickPing}
  • Some commands by their nature specify a computer. For others it is easier to define a script block inside another script block (or a function) which takes a computer name as a parameter and runs
    Invoke-Command –ComputerName $computer –scriptblock $InnerScriptBlock
  • I don’t recommend running Start-Parallel inside itself, but based on very limited testing it does appear to work.

You can install it by running Install-Module -Name Start-parallel

 

November 19, 2016

Format-XML on the PowerShell Gallery

Filed under: Powershell — jamesone111 @ 8:08 pm
Tags: , ,

In the last post, I spoke about those bits of PowerShell we carry around and never think to share. Ages ago I wrote a function named “Format-XML” which “pretty prints” XML with nice indents. I’ve passed it on to a few people over the years -  it’s been included as a “helper” in various modules – but I hadn’t published it on its own.

I’ve got that nagging feeling  I should be crediting someone for providing the original bits but I’ve long since lost track of who. In Britain people sometimes talk about “Trigger’s broom” which classicists tend to call the Ship of Theseus – if you change a part it’s still the same thing, right? But after every part has been changed? That’s even more true of the “SU” script which will be the subject of a future post but in that case I’ve kept track of its origins.

Whatever… Format-XML is on the PowerShell gallery – you can click Show under FileList on that page to look at the code, or use PowerShell Get (see the Gallery homepage for details) to install it, using Install-Script -Name format-xml the licence is chosen to all you to incorporate it into anything you want with no strings attached.

Then you can run to load it with .  format-xml.ps1 – that leading “.” matters … and run it with
Format-XML $MyXML
or $MyXML | Format-XML
Where $MyXML is either any of

  • An XML object
  • Some text in XML format
  • The name of a file which contains XML, or
  • A file object where the file contains XML

Incidentally, if you have stuff to share on the PowerShell gallery the sign-up process is quick, and the PowerShell Get module has a Update-ScriptFileInfo command to set the necessary metadata and then Publish-Script puts the script into the gallery – it couldn’t be easier.  

November 13, 2016

One of those “everyday” patterns in PowerShell –splitting a list

Filed under: Powershell — jamesone111 @ 9:16 pm

For a PowerShell enthusiast, the gig I’ve been on for the last few weeks is one of those “Glass Half Full/Glass Half Empty” situations: the people I’m working with could do a lot more with PowerShell than they are (half empty) but that’s an opportunity to make things better (half full). A pattern which I take for granted took on practically life-changing powers for a couple of my team this week…. 

We had to move some  … lets just say “things”, my teammates know they run Move-Thing Name  Destination but they had been mailed several lists with maybe 100 things to move in each one. Running 100 command lines is going to be a chore.  So I gave them this
@"
PASTE
YOUR
LIST
HERE
"@
   -Split  "\s*[\r\n]+\s*"  | ForEach-Object { Move-thing $_ "Destination"}

Text which is wrapped in @"<newline> and <newline> "@ is technically called a "here string" but to most people it is just a way to have a multiline string.  So pasting a list of items between the quotes is trivial, but the next bit looks like some magic spell …
PowerShell’s –split operator takes regular expressions and splits the text where it finds them (and throws matching bit away). in Regex \r is carriage Return, and \n is New line and [\r\n] is “either return or newline”, so [\r\n]+ means "at least one of the line break characters, but any number in any order." And I usually use –split this way, but here we found the lists often included spaces and tabs – adding \s* at the beginning and end adds “preceded by / followed by  any number of white space characters – even zero”

So the multiline string is now a bunch of discrete items. The command we want to run doesn’t always need to be in a foreach {} – text piped in to many commands becomes the right parameter like this 
@"
List
"@
   -Split  "\s*[\r\n]+\s*"  | Get-Thing | Format-Table
But for  a foreach {} will always work even if it is cumbersome.

I think lots of us have these ready made patterns – as much as anything this post is a call to think about ones you might share. it was nice to pass this one on and hear the boss’s surprise when one of junior guys told him everything was done.

June 29, 2016

Just enough admin and constrained endpoints. Part 1 Understanding endpoints.

Filed under: DevOps,Powershell — jamesone111 @ 1:42 pm

Before we can dive into Just Enough Admin and constrained end points, I think we need fill in some of the background on endpoints and where they fit in PowerShell remoting

When you use PowerShell remoting, the local computer sees a session, which is connected to an endpoint on a remote computer. Originally, PowerShell installations did not enabling inbound sessions but this has changed with newer. If the Windows Remote Management service (also known as WSMAN) is enabled, it will listen on port 5985; you can check with
NetStat -a | where {$_ -Match 5985}
If WSMAN is not listening you can use the Enable-PSRemoting cmdlet to enable it.

With PS remoting enabled you can try to connect. If you run
$s = New-PSSession -ComputerName localhost
from a Non-elevated PowerShell session, you will get an access denied error but from an elevated session it should run successfully. The reason for this is explained later. When then command is successful, $s will look something like this:
Id Name ComputerName State ConfigurationName Availability
-- ---- ------------ ----- ----------------- ------------
2 Session2 localhost Opened Microsoft.PowerShell Available

We will see the importance of ConfigurationName later as well. The Enter-PSSession cmdlet switches the shell from talking to the local session to talking to a remote one running  
Enter-PSSession $s
will change the prompt to something like this
[localhost]: PS C:\Users\James\Documents>
showing where the remote session is connected: Exit-PSSession returns to the original (local) session; you can enter and exit the session at will, or create a temporary session on demand, by running
Enter-PsSession -ComputerName LocalHost

The Get-PsSession cmdlet shows a list of sessions and will show that there is no session left open after exiting an “on-demand” session. As well as interacting with a session you can use Invoke-command to run commands in the session, for example
Invoke-Command -Session $s -ScriptBlock {Get-Process -id $pid}
Handles NPM(K) PM(K) WS(K) VM(M) CPU(s)   Id SI ProcessName PSComputerName
------- ------ ----- ----- ----- ------   -- -- ----------- -------------- 
    547     26 61116 77632 ...45   0.86 5788 0  wsmprovhost      localhost

At first sight this looks like a normal process object, but it has an additional property, "PSComputerName". In fact, a remote process is represented different type of object. Commands in remote sessions might return objects which are not recognised on the local computer. So the object is serialized – converted to a textual representation – sent between sessions, and de-serialized back into a custom object. There are two important things to note about this.

  1. De-serialized objects don’t have Methods or Script Properties. Script properties often will need access to something on the originating machine – so PowerShell tries to convert them to Note Properties. A method can only be invoked in the session where the object was created – not one which was sent a copy of the object’s data.
  2. The object type changes. The .getType() method will return PsObject, and the PSTypeNames property says the object is a Deserialized.System.Diagnostics.Process; PowerShell uses PSTypenames to decide how to format an object and will use rules defined for type X to format a Deserialized.X object.
    However, testing the object type with -is [x] will return false, and a function which requires a parameter to be of type X will not accept a Deserialized.X. In practice this works as a safety-net, if you want a function to be able to accept remote objects you should detect that they are remote objects and direct commands to the correct machine.

Invoke-Command allows commands which don’t support a -ComputerName parameter (or equivalent) to be targeted at a different machine, and also allows commands which aren’t available on the local computer to be used remotely. PowerShell provides two additional commands to make the process of using remote modules easier. Import-PSSession creates a temporary module which contains proxies for all the cmdlets and functions in the remote session that don’t already exist locally, this means that instead of having to write
Invoke-Command -Session $s -ScriptBlock {Get-ADUser}
the Get-ADUser command works much as it would with a locally installed Active Directory module. Using Invoke-Command will return a Deserialized AD user object and the local copy of PowerShell will fall back to default formatting rules to display it; but when the module is created it includes a format XML file describing how to format additional objects.
Import-PSSession adds commands to a single session using a temporary module: its partner Export-PSSession saves a module that can be imported as required – running commands from such a module sets up the remote session and gives the impression that the commands are running locally.

What about the configuration name and the need to logon as Admin?

WSMAN has multiple end points which sessions can connect to, the command Get-PSSessionConfiguration lists them – by default the commands which work with PS Sessions connect to the end point named "microsoft.powershell", but the session can connect to other end points depending on the tasks to be carried out.
Get-PSSessionConfiguration shows that by default for the "microsoft.powershell" endpoint has StartUpScript and RunAsUser properties which are blank and a permission property of
NT AUTHORITY\INTERACTIVE        AccessAllowed,
BUILTIN\Administrators          AccessAllowed,
BUILTIN\Remote Management Users AccessAllowed

This explains why we need to be an administrator (or in the group “Remote Management Users”) to connect. It is possible to modify the permissions with
Set-PSSessionConfiguration -Name "microsoft.powershell" -ShowSecurityDescriptorUI

When Start-up script and Run-As User are not set, the session looks like any other PowerShell session and runs as the user who connected – you can see the user name by running whoami or checking the $PSSenderInfo automatic variable in the remote session.

Setting the Run-As User allows the session to run with more privileges than are granted to the connecting user: to prevent this user running amok, the end point is Constrained a- in simpler terms we put limits what can be done in that session. Typically, we don’t the user to have access to every command available on the remote computer, and we may want to limit the parameters which can be used with those that are allowed. The start-up script does the following to setup the constrained environment:

  • Loads modules
  • Defines proxy functions to wrap commands and modify their functionality
  • Hides cmdlets, aliases and functions from the user.
  • Defines which scripts and external executables may be run
  • Sets the PowerShell language mode, to further limit the commands which can be run in a session, and prevent new ones being defined.

If the endpoint is to work with Active Directory, for example, it might hide Remove-ADGroupMember. (or import only selected commands from the AD module); it might use a proxy function for Add-ADGroupMember so that only certain groups can be manipulated. The DNS Server module might be present on the remote computer but the Import-Module cmdlet is hidden so there is no way to load it.

Hiding or restricting commands doesn’t stop people doing the things that their access rights allow. An administrator can use the default endpoint (or start a remote desktop session) and use the unconstrained set of tools. The goal is to give out fewer admin rights and give people Just Enough Admin to carry out a clearly defined set of tasks: so the endpoint as a privileged account (even a full administrator account) but other, less privileged accounts are allowed to connect run the constrained commands that it provides.
Register-PSSessionConfiguration sets up a new endpoint can and Set-PSSessionConfiguration modifies an existing one ; the same parameters work with both -for example

$cred = Get-Credential
Register-PSSessionConfiguration -Name "RemoteAdmin" `
                               
-RunAsCredential $cred `
                               
-ShowSecurityDescriptorUI  `
                                -StartupScript
'C:\Program Files\WindowsPowerShell\EndPoint.ps1'
The -ShowSecurityDescriptorUI switch pops up a permissions dialog box – to set permissions non-interactively it is possible to use -SecurityDescriptorSddl and specify the information using SDDL but writing SDDL is a skill in itself.

With the end point defined the next part is to create the endpoint script, and I’ll cover that in part 2

June 1, 2016

A different pitch for Pester

Filed under: DevOps,Powershell,Testing — jamesone111 @ 2:10 pm

If you work with PowerShell but don’t consider yourself to be a developer, then when people get excited by the new (newish) testing framework named Pester you might think “what has that got to with me” …
Pester is included with PowerShell 5 and downloadable for older versions, but most things you find written abut it are by software testers for software testers – though that is starting to change. This post is for anyone thinks programs are like Sausages: you don’t want to know how either are made.

Let’s consider a way of how we’d give someone some rules to check something physical 
“Here is a description of an elephant
It is a mammal
It is at least 1.5 M tall
It has grey wrinkly skin
It has long flexible nose” 

Simple stuff. Tell someone what you are describing, and make simple statements about it (that’s important, we don’t say “It is a large grey-skinned mammal with a long nose” . Check those statements and if they are all true you can say “We have one of those”. So lets do the same, in PowerShell for something we can test programmatically – this example  has been collapsed down in the ISE which shows a couple of “special” commands from Pester

$Connections = Get-NetIPConfiguration | Where-Object {$_.netadapter.status -eq "up" }
Describe "A computer with an working internet connection on my home network" {
    It "Has a connected network interface"  {...}
    foreach ($c in $Connections)            {  
        It "Has the expected Default Gateway on the interface named  '$($C.NetAdapter.InterfaceDescription)' "   {...}
        It "Gets a 'ping' response from the default gateway for      '$($C.NetAdapter.InterfaceDescription)' "   {...} 
        It "Has an IPV4 DNS Server configured on the interface named '$($C.NetAdapter.InterfaceDescription)' "   {...}
    }
    It "Can resolve the DNS Name 'www.msftncsi.com' " {...}
    It "Fetches the expected data from the URI 'http://www.msftncsi.com/ncsi.txt' " {...}
}

So Pester can help testing ANYTHING, it isn’t just for checking that Program X gives output Y with input Z: Describe which says what is being tested
Describe "A computer with an working internet connection on my home network" {}
has the steps needed to perform the test inside the braces. Normally PowerShell is easier to read with parameter names included but writing this out in full as
Describe -Name "A computer with an working internet connection on my home network" -Fixture  {}
would make it harder to read, so the norm with Pester is to omit the switches.  
We describe a working connection by saying we know that it has a connected network, it has the right default gateway and so on. The It statements read just like that with a name and a test inside the the braces (again switches are omitted for readability). When expanded, the first one in the example looks like this.

     It "Has a connected network interface"  {
        $Connections.count | Should not beNullOrEmpty
    }

Should is also defined in Pester. It is actually a PowerShell function which goes to some trouble to circumvent normal PowerShell syntax (the PowerShell purist in me doesn’t like that, but and I have to remember the famous quote about “A foolish consistency is the hobgoblin of little minds”) the idea is to make the test read more like natural language than programming.
This example has a test that says there should be some connections, and then three tests inside a loop use other variations on the Should syntax.

$c.DNSServer.ServerAddresses -join "," | Should match "\d+\.\d+\.\d+\.\d+"
$c.IPv4DefaultGateway.NextHop          | Should  be "192.168.0.1"
{
Test-Connection -ComputerName $c.IPv4DefaultGateway.NextHop  -Count 1} | Should not throw

You can see Should allows us to check for errors being thrown (or not) empty values (or not) regular expression matches (or not) values, and depending on what happens in the Should the it command can decide if that test succeeded. When one Should test fails the script block being run by the It statement stops, so in my example it would be better to combine “has a default gateway”, and “Gets a ping response” into a single It, but as it stands the script generates output like this:

Describing A computer with an working internet connection on my home network
[+] Has a connected network interface 315ms
[+] Has the expected Default Gateway on the interface named  'Qualcomm Atheros AR956x Wireless Network Adapter'  56ms
[+] Gets a 'ping' response from the default gateway for      'Qualcomm Atheros AR956x Wireless Network Adapter'  524ms
[+] Has an IPV4 DNS Server configured on the interface named 'Qualcomm Atheros AR956x Wireless Network Adapter'  25ms
[+] Can resolve the DNS Name 'www.msftncsi.com'  196ms
[+] Fetches the expected data from the URI 'http://www.msftncsi.com/ncsi.txt'  603ms

Pester gives this nicely formatted output without having to do any extra work  – it can also output the results as XML so we can gather up the results for automated processing. It doesn’t allow us to test anything that couldn’t be tested before – the benefit is it simplifies turning a description of the test into a script that will perform it and give results which mirror the description.
The first example showed how a folding editor (the PowerShell ISE or any of the third party ones) can present the script as so it looks like a the basic specification.
Here’s an outline of a test to confirm that a machine had been built correctly – I haven’t filled in the code to test each part.  
Describe "Server 432" {
   It "Is Registered in Active Directory"                 {}
   It "Is has an A record in DNS"                         {}
   It "Responds to Ping at the address in DNS"            {}
   It "Accepts WMI Connections and has the right name"    {}
   It "Has a drive D: with at least 100 GB of free space" {}
   It "Has Dot Net Framework installed"                   {}
}
 
This doesn’t need any PowerShell knowledge: it’s easy to take a plain text file with suitable indents and add the Describes, Its, braces and quote marks – and hand the result to someone who knows how to check DNS from PowerShell and so on, they can fill in the gaps. Even before that is done the test script still executes. 

Describing Server 432
[?] Is Registered in Active Directory 32ms
[?] Is has an A record in DNS 13ms
[?] Responds to Ping at the address in DNS 4ms
[?] Accepts WMI Connections and has the right name 9ms
[?] Has a drive D: with at least 100 GB of free space 7ms
[?] Has Dot Net Framework installed 5ms

The test output uses [+] for a successful test, [-] for a failure, [!] for one it was told to skip, and [?] for one which is “pending”, i.e. we haven’t started writing it. 
I think it is good to start with relatively simple set of tests, and add to them, so for checking the state of a machine, is such-and-such a service present and running, are connections accepted on a particular port, is data returned, and so on.  In fact whenever we find something wrong which can be tested it’s often a good idea to add a test for that to the script.

So if you were in any doubt at the start, hopefully you can see now that Pester is just as valuable as a tool for Operational Validation as it is for software testing.

May 31, 2016

Help = Spec = Test

Filed under: Powershell,Testing — jamesone111 @ 2:55 pm

Going back for some years – at least as far the talks which turned into the PowerShell Deep Dives book – I have told people ”Start Help Early” (especially when you’re writing anything that will be used by anyone else).
In the face of time pressure documentation is the first thing to be cut – but this isn’t a plea to keep your efforts from going out into the world undocumented. 
Help specifies what the command does, and help examples are User Stories – a short plain English description of something someone needs to do.
Recently I wrote something to combine the steps of setting up a Skype for business (don’t worry – you don’t need to know S4B to follow the remainder) – the help for one of the commands looked like this

<#
.SYNOPSIS
Sets up a Skype for business user including telephony, conference PIN and Exchange Voice mail
.EXAMPLE
Initialize-CsUser –ID bob@mydomain –PhoneExtension 1234 –pin 2468 –EnterpriseVoice
Enables a pre-existing user, with enterprise voice, determines and grants the correct voice policy,
sets a conferencing PIN, updates the Phone number in AD, and enables voice mail for them in Exchange.
#>

I’ve worked with people who would insist on writing user stories as “Alice wants to provision Bob… …to do this she …”  but the example serves well enough as both help for end users and a specification for one use case: after running the command  user “bob” will

  • Be enabled for Skype-for-Business with Enterprise Voice – including “Phone number to route” and voice policy
  • Have a PIN to allow him to use voice conferencing
  • Have a human readable “phone number to dial”  in AD
  • Have appropriate voice mail on Exchange

The starting point for a Pester test (the Pester testing module ships with PowerShell V5, and is downloadable for earlier versions) ,  is set of simple statements like this – the thing I love about Pester it is so human readable.

Describe "Adding Skype for business, with enterprise voice, to an existing user"  {
### Code to do it and return the results goes here
    It "Enables enterprise voice with number and voice policy" {    }
    It "Sets a conference PIN"                                 {    }
    It "Sets the correct phone number in the directory"        {    }
    It "Enables voice mail"                                    {    }
}

The “doing” part of the test script is the command line from the example (through probably with different values for the parameters).
Each thing we need to check to confirm proper operation is named in an It statement with the script to test it inside the braces. Once I have my initial working function, user feedback will either add further user stories (help examples), which drive the creation of new tests or it will refine this user story leading either to new It lines in an existing test (for example “It Sets the phone number in AD in the correct format”) or to additional tests (for example “It generates an error if the phone number has been assigned to another user”)

In my example running the test a second time proves nothing, because the second run will find everything has already been configured, so a useful thing to add to the suite of tests would be something to undo what has just been done. Because help and test are both ways of writing the specification, you can start by writing the specification in the test script – a simplistic interpretation of “Test Driven Development”.  So I might write this

Describe "Removing Skype for business from a user"   {
### Code to do it and return the results goes here       
    It "Disables S4B and removes any voice number"   {    } –Skip
    It "Removes voice mail"                          {    } –Skip
}

The –Skip prevents future functionality from being tested. Instead of making each command a top-level Describe section in the Pester script, each can be a second-level Context section.

Describe "My Skype for business add-ons" {
    Context "Adding Skype for business, with enterprise voice, to an existing user"   {...}
    Context "Removing Skype for business from a user"  {...}
}

So… you can start work by declaring the functions with their help and then writing the code to implement what the help specifies, and finally create a test script based on the Help/Spec OR you can start by writing the specification as the outline of a Pester test script, and as functionality is added, the help for it can be populated with little more than a copy and paste from the test script.
Generally, the higher level items will have a help example, and the lower level items combine to give the explanation for the example. As the project progresses, each of the It commands has its –Skip removed and the test block is populate, to-do items show up on the on the test output as skipped.

Describing My Skype for business add-ons
   Context Adding Skype for business, with enterprise voice, to an existing user

    [+] Sets the phone number to call in the directory 151ms
    [+] Enables enterprise voice with the phone number to route and voice policy  33ms
    [+] Sets a conference PIN  18ms
    [+] Enables voice mail  22ms

   Context Removing Skype for business from a user
    [!] Disables S4B and removed any voice number 101ms
    [!] Removes voice mail 9m
Tests completed in 347ms
Passed: 4 Failed: 0 Skipped: 2 Pending: 0
 

With larger pieces of work it is possible to use –skip and an empty script block for an It statement to mean different things (Pester treats the empty script block as “Pending”), so the test output can shows me which parts of the project are done, which are unfinished but being worked on, and which aren’t even being thought about at the moment, so it compliments other tools to keep the focus on doing the things that are in the specification. But when someone says “Shouldn’t it be possible to pipe users into the remove command”, we don’t just go and write the code, we don’t even stop at writing and testing. We bring the example in to show that way of working.

May 23, 2016

Good and bad validation in PowerShell

Filed under: Powershell — jamesone111 @ 10:35 am
Tags:

I divide input validation into good and bad. image

Bad validation on the web makes you hate being a customer of a whichever organization. It’s the kind which says “Names can only contain alphabetic characters” so O’Neill isn’t a valid name.
Credit card companies think it’s easier to write blocks of 4 digits but how many web sites demand an unbroken string of 16 digits?

Good validation tolerates spaces and punctuation and also spots credit card numbers which are too short or don’t checksum properly and knows the apostrophe needs special handling. Although it requires the same care on the way out as on the way in as this message from Microsoft shows.
And bad validation can be good validation paired with an unhelpful message  – for example telling your new password you chose isn’t acceptable without saying what is.

In PowerShell, parameter declarations can include validation, but keep in mind validation is not automatically good.
Here’s good validation at work: I can write parameters like this. 
     [ValidateSet("None", "Info", "Warning", "Error")]
     [string]$Icon = "error"

PowerShell’s intellisense can complete values for the -Icon parameter, but if I’m determined to put an invalid value in here’s the error I get.
Cannot validate argument on parameter 'Icon'.
The argument "wibble" does not belong to the set "None,Info,Warning,Error" specified by the ValidateSet attribute.
Supply an argument that is in the set and then try the command again.

It might be a bit a verbose, but it’s clear what is wrong and what I have to do to put it right. But PowerShell builds its messages from templates and sometimes dropping in the text from the validation attribute gives something incomprehensible, like this 
Cannot validate argument on parameter 'Path'.
The argument "c:" does not match the "^\\\\\S*\\\S*$" pattern.
Supply an argument that matches "^\\\\\S*\\\S*$" and try the command again.

This is trying to use a regular expression to check for a UNC path to a share ( \\Server\Share), but when I used it in a conference talk none of 50 or 60 PowerShell experts could work that out quickly. And people without a grounding in regular expressions have no chance.
Moral: What is being checked is valid but to get a good message, do the test in the body of the function.

Recently I saw this – or something like it via a link from twitter.

function Get-info {
  [CmdletBinding()]
  Param (
          [string]$ComputerName
  )
  Get-WmiObject –ComputerName $ComputerName –Class 'Win32_OperatingSystem'
}

Immediately I can see too things wrong with the parameter.
First is “All parameters must have a type” syndrome. ComputerName is a string, right? Wrong! GetWmiObject allows an array of strings, most of the time you or I or the person who wrote the example will call it with a single string, but when a comma separated list is used the “Make sure this is a string” validation concatenates the items into a single string.
Moral. If a parameter is passed straight to something else, either copy the type from there or don’t specify a type at all.

And Second, because the parameter isn’t mandatory and doesn’t’ have a default, so if we run the function with no parameter, it calls Get-WmiObject with a null computer name, which causes an error. I encourage people to get in the habit of setting defaults for parameters.

The author of that article goes on to show that you can use a regular expression to validate the input. As I’ve shown already regular expression give unhelpful error messages, and writing comprehensive ones can be and art in itself in the example, the author used
  [ValidatePattern('^\w+$')]
But if I try
Get-info MyMachine.mydomain.com
Back comes a message to
Supply an argument that matches "^\w+$" and try the command again
The author specified only “word” characters (letters and digits), no dots, no hyphens and so on. The regular expression can be fixed, but as it becomes more complicated, the error message grows harder to understand.

He moves on to a better form of validation, PowerShell supports a validation script for parameters, like this
[ValidateScript({ Test-Connection -ComputerName $_ -Quiet -Count 1 })]
This is a better test, because it checks whether the target machine is pingable or not. But it is still let down by a bad error message.
The " Test-Connection -ComputerName $_ -Quiet -Count 1 " validation script for the argument with value "wibble" did not return a result of True.
Determine why the validation script failed, and then try the command again.

In various PowerShell talks I’ve said that a user should not have to understand the code inside a function in order to use the function. In this case the validation code is simple enough that someone working knowledge of PowerShell can figure out the problem but, again, to get a good message, do the test in the body seems good advice, in simple form the test would look like this
if (Test-Connection -ComputerName $ComputerName -Quiet -Count 1) {
        Get-WmiObject –ComputerName $ComputerName –Class 'Win32_OperatingSystem'
}
else {Write-Warning "Can't connect to $computername" }

But this doesn’t cope with multiple values in computer name – if any are valid the code runs so it would be better to run.
foreach ($c in $ComputerName) {
    if (Test-Connection -ComputerName $c -Quiet -Count 1 ) {
        Get-WmiObject –ComputerName $c –Class 'Win32_OperatingSystem'
    }
    else {Write-Warning "Can't connect to $c"}
}

This doesn’t support using “.” to mean “LocalHost” in Get-WmiObject – hopefully by now you can see the problem: validation strategies can either end up stopping things working which should work or the validation becomes a significant task. If a bad parameter can result in damage, then a lot validation might be appropriate. But this function changes nothing so there is no direct harm if it fails; and although the validation prevents some failures, it doesn’t guarantee the command will succeed. Firewall rules might allow ping but block an RPC call, or we might fail to logon and so on. In a function which uses the result of Get-WmiObject we need to check that result is valid before using it in something else. In other words, validating the output might be better than validating the input.

Note that I say “Might”: validating the output isn’t always better. Depending on the kind of things you write validating input might be best, most of the time. Think about validation rather than cranking it out while running on autopilot. And remember you have three duties to your users

  • Write help (very simple, comment-based help is fine) for the parameter saying what is acceptable and what is not. Often the act of writing “The computer name must only contain letters” will show you that you have lapsed into Bad validation
  • Make error messages understandable. One which requires the user to read code or decipher a regular expression isn’t, so be wary of using some of the built in validation options.
  • Don’t break things. Work the way the user expects to work. If commands which do similar things take a list of targets, don’t force a single one.
    If “.” works, support it.
    If your code uses SQL syntax where “%” is a wildcard, think about converting “*” to “%”, and doubling up single apostrophes (testing with my own surname is a huge help to me!)
    And if users want to enter redundant spaces and punctuation, it’s your job to remove them.

June 16, 2014

A trick with Powershell aliases–making them do parameters

Filed under: Powershell — jamesone111 @ 10:37 am

 

The first thing that everyone learns about PowerShell Aliases is that they just replace the name of a command, aliases don’t do parameters.
DIR is an alias for Get-ChildItem ; you can’t make an alias RDIR for Get-ChildItem –Recurse. If you want that you need a function.
To quote Redmond’s most famous resident* I canna change the laws of physics, Captain, but I can find ye a loophole.

I wrote a function which I use a lot – 100+ times some days – named Get-SQL. Given an unknown command “xxx”,  PowerShell will see if there is a command “Get-XXX” before reporting an “Not recognized” error, so I usually just run it as “SQL” without the to Get-. The function talks to databases and sessions it keeps connections open between calls: connections tend to named after places in South America, so to open a session I might run
> SQL -Session Peru -Connection DSN=PERU
and then to find out the definition of a table I use
> SQL -Session Peru -Describe Projects
I’ve previously written about Jason Shirk’s tab expansion++ which gets a list of tables available to -Describe (or tables which can be Selected from, or updated or Inserted into) in that session, and provides tab expansion or an intellisense pick list: this is incredibly useful when you can’t remember if the table is named “project”, or “Projects”, and tab expansion++ does the same for field names so I don’t have to remember if the field I want is named “ID_Project”, “Project_ID”, “ProjectID” or simply “ID”

Get-SQL has a default session: I might use-Session Peru 20 times in a row but still want to leave the default connection alone
I found myself thinking ‘I want “Peru” to be an alias for Get-SQL -Session Peru. One line – New-Alias -Name $session -Value something – inside Get-SQL could set it all up for me when I make the connection.’
As I said we all know you can’t do that with an alias, but doing this with functions is – bluntly – a nightmare, creating functions on the fly is possible but awkward, and Tab expansion++ wouldn’t know it was supposed to work with them (it does figure out aliases). Defining the functions in advance for each data source is would give me a maintenance headache…

Then I had a flash of inspiration: if I needed this to work for a built-in cmdlet, I’d need to create a proxy function … but Get-SQL is already a function. So, if I can write something in the function to check how it was invoked it can say “A-ha! I was called as ‘Peru’ and ‘Peru’ is the name of a database session, so I should set $session to ‘Peru’.” Whatever the alias is, provided there is a connection of the same name it will get used. This turns out to be almost laughably easy.

In my Get-SQL function the $session Parameter is declared like this
[ValidateNotNullOrEmpty()]
[string]$Session = "Default"

A function can find out the name was used to invoke it by looking at $MyInvocation.InvocationName. If Get-SQL is invoked with no value provided for -session the value of $Session will be set to  ‘Default’: if that is the case and there is a database session whose name matches the invocation name then that name should go into $Session, like this:   
if ($Session -eq "Default" -and  $Global:DbSessions[$MyInvocation.InvocationName])
    {
$Session = $MyInvocation.InvocationName}

Of course the parameter is not part of the alias definition – but the function can detect the alias and set the parameter internally – the laws stand, but I have my loophole. Although it’s split here into two lines I think of the IF statement as one line of code. When Get-SQL creates a connection it finishes by calling New-Alias -Name $session -Value Get-SQLForce. So two lines give me what I wanted.
Tab expansion++ was so valuable, but stopping here would mean its argument completers don’t know what to do – when they need a lists for fields or tables they call Get-SQL, and this worked fine when a -session parameter was passed but I’ve gone to all this trouble to get rid of that parameter, so now the completers will try to get a list by calling the function using its canonical name and the default connection. There is a different way to find the invocation name inside an argument completer – by getting it from the parameter which holds the Command Abstract Syntax Tree, like this:   

$cmdnameused = $commandAst.toString() -replace "^(.*?)\s.*$",'$1'
if ($Global:DbSessions[$cmdnameused]) {$session = $cmdnameused}
else {set $session the same way as before}
 

> PeruDescribe 

Will pop up the list of tables in that database for me to choose “Projects”. There was a fair amount of thinking about it, but as you can see, only four lines of code. Result


* James Doohan – the actor who played “Scotty” in Star Trek actually lived in Redmond – ‘home’ of Microsoft, though Bill Gates and other famous Microsoft folk lived elsewhere around greater Seattle. So I think it’s OK to call him the most famous resident of the place.

June 9, 2014

Screen scraping for pleasure or profit (with PowerShell’s Invoke-RestMethod)

Filed under: Powershell — jamesone111 @ 3:28 pm

Twenty odd years ago I wrote some Excel Macros to get data into Excel from one of the worlds less friendly materials management systems. It was far easier to work with the data in Excel,  and the Macros were master-pieces of send-keys and prayer-based parsing and it was the first time I heard the term Screen Scrape . (When I searched for “Prayer-based parsing” – it took me to another page on this blog where you might develop the argument that in Unix and Linux, the pipe we take for granted is little more than an automated screen scrape).

The technology changes (and gets easier) but the same things push us to screen scrape. Data would be more useful in from other than the one in which it is made available. There is a benefit if we can shift it from one form to the other. There are times when taking the data might break usage terms, or have other legal or ethical reasons why you shouldn’t do it. You need to work out those questions for yourself, here I’m just going to talk about one technique which I’ve found myself using several times recently.  

imageA lot of the data we want today is delivered in web pages. This, of itself, can be helpful, because well formed HTML can often be treated as XML data so with something like PowerShell the parsing is (near enough) free. Instead of having to pull the text completely apart there might be a small amount of preliminary tidying up followed and then putting [XML] in front of something conjures up a navigable hierarchy of data. The data behind my post on Formula one statistics was gathered mainly using this method.  (And the odd regular expression for pulling a needle-like scrap of information from a Haystack of HTML)

But when you’re lucky, the you don’t even need to parse the data out of the format it is displayed. Recently someone sent me a link a to a story about Smart phone market share. It’s presented as a map but it’s not a great way to see how share has gone or down at different times, or compare one place against another at the same time. These days when I see something like this I think “how do they get the data for the map”. The easy way to find out out is to Press [F12] in internet explorer, turn on the network monitor (click the router Icon and then the  “play” button) and reload the page. The hope is a tell-tale sign of data being requested so and processed by the browser ready for display. Often this data will jump out because of the way it is formatted. And circled in the network trace is some JSON format data. JSON or XML data is a real gift for PowerShell….

Once upon a time if you wanted to get data from a web server you had to write a big chuck of code. Then Microsoft provided a System.Net.WebClient object which would do fetching and carrying but left the parsing to you. In recent versions of PowerShell there are two cmdlets, Invoke-WebRequest is basically a wrapper for this. It will do some crunching of the HTML page so you can work with the document. PowerShell also has ConvertFrom-JSON, so you can send the content into that and don’t have to write your own JSON parser. But it’s even easier than that. Invoke-RestMethod will get the page and if it can parse what comes back, it does: so you don’t need a separate convert step. So I can do get the data back and quickly explore it like this:

> $j = Invoke-RestMethod -Uri http://www.kantarworldpanel.com/php/comtech_data.php
>$j
years                                                                                                        
-----                                                                                                        
{@{year=2012; months=System.Object[]}, @{year=2013; months=System.Object[]}, @{year=2014; months=System.Obj...

> $j.years[0]
year                                                    months                                               
----                                                    ------                                               
2012                                                    {@{month=0; cities=System.Object[]}, @{month=1; cit...

> $j.years[0].months[0]
month                                                   cities                                               
-----                                                   ------                                               
0                                                       {@{name=USA; lat=40.0615504; lng=-98.51893029999997...

> $j.years[0].months[0].cities[0]
name                        lat                         lng                         platforms                
----                        ---                         ---                         ---------                
USA                         40.0615504                  -98.51893029999997          {@{name=Android; share=...

> $j.years[0].months[0].cities[0].platforms[0]
name                                                    share                                                
----                                                    -----                                                
Android                                                 43 

From there it’s a small step to make something which sends the data to the clipboard ready formatted for Excel

$j = Invoke-RestMethod -Uri "http://www.kantarworldpanel.com/php/comtech_data.php&quot;
$(   foreach ($year  in $J.years)  {
        foreach ($month in $year.months) {
            foreach ($city in $month.cities) {
                foreach ($platform in $city.platforms) {
                    ('{0} , {1} , "{2}" , "{3}", {4}' -f $year.year ,
                   
  ($month.month + 1) , $city.name , $platform.name , $platform.share) }}}}) | clip

Web page to Excel-ready in less time than it takes to describe it.  The other big win with these two cmdlets is that they understand how to keep track of sessions (which technically means Invoke-RestMethod  can also work with things which aren’t really RESTful)  Invoke-WebRequest understands forms on the page. I can logon to a photography site I use like this
if (-not $cred) { $Global:cred  = Get-Credential -Message "Enter logon details For the site"}
$login                          = Invoke-WebRequest "$SiteRoot/login.asp" –SessionVariable SV
$login.Forms[0].Fields.email    = $cred.UserName
$login.Forms[0].Fields.password = $cred.GetNetworkCredential().Password
$homePage                       = Invoke-WebRequest ("$SiteRoot/" + $login.Forms[0].action) -WebSession $SV -Body $login -Method Post
if ($homePage.RawContent        -match "href='/portfolio/(\w+)'")
    {$SiteUser                    = $Matches[1]}

Provided my subsequent calls to Invoke-WebRequest and Invoke-RestMethod specify the same session variable in their –WebSession parameter the site treats me as logged in and gives me my  data (which mostly is ready prepared JSON with the occasional needs to be parsed out of the HTML) rather than public data. With this I can get straight to the data I want with only the occasional need to resort to regular expressions. Once you have done one or two of these you start seeing a pattern for how you can either pick up XML or JSON data, or how you can Isolate links from a page of HTML which contain the data you want, and then isolate the table which has the data

For example

$SeasonPage  =  Invoke-WebRequest -Uri  "http://www.formula1.com/results/season/2014/&quot;
$racePages   =  $SeasonPage.links | where  href -Match "/results/season/2014/\d+/$"
Will give me a list in $racepages of all the results pages for the 2014 F1 seasons  – I can loop though the data. Setting $R to the different URLS.

$racePage        = Invoke-WebRequest -Uri $R
$contentmainDiv  = ([xml]($racePage.ParsedHtml.getElementById("contentmain").outerhtml -replace "&nbsp;"," ")).div
$racedate        = [datetime]($contentmainDiv.div[0].div.span -replace "^.*-","")

$raceName        = $contentmainDiv.div[0].div.h2

Sometimes finding the HTML section to convert to XML is easy – here it needed a little tweak to remove non breaking spaces because the [XML] type converter doesn’t like them, but once I have $contentMainDiv I can start pulling race data out  – and 20 years on Excel is still the tool of choice, and how I get data into Excel will have to wait for another day.

July 5, 2013

PowerShell TabCompletion++ or TabCompletion#

Filed under: Databases / SQL,Powershell — jamesone111 @ 9:23 pm

One of the tricks of PowerShell V3 is that it is even cleverer with Tab completion / intellisense than previous versions, though it is not immediately obvious how you can take control of it.  When I realised what was possible I had to apply it to a Get-SQL command command I had written. I wanted the ISE to be able to give me something like this

image

I happened to have Jan Egil Ring ‘s article on the subject for PowerShell magazine (where he credits another article by Tobias Weltner ) open in my browser, when I watched Jason Shirk gave a talk covering the module he has published via GitHub named TabExpansion ++. This module includes the custom tab completers for some PowerShell which don’t have them and some for legacy programs.  Think about that for a moment,  if you use NET STOP in PowerShell instead of in CMD, tab completion fills in the names of the services. Yes, I know PowerShell has a Stop-Service cmdlet, but if you’ve been using the NET commands since the 1980s (yes, guilty) why stop using them ?

More importantly Jason has designed a framework where you can easily add your own tab completers – which are the basis for intellisense in the ISE. On loading, his module searches all ps1 and psm1 files in the paths $env:PSModulePath and $env:PSArgumentCompleterPath for functions with the ArgumentCompleter attribute – I’ll explain that shortly.  When it finds one, it extracts the function body and and "imports" it into the TabExpansion++ module. If I write argument completer functions and save them with my modules (which are in the PSModulePath) then when I load TabExpansion++ …. whoosh! my functions get tab completion.

A lot of my work at the moment involves dealing with data in a MySQL database, and I have installed the MySQL ODBC driver, and I wrote a function Named Get-SQL (which I can just invoke as SQL)
When I first wrote it, it was simple enough: leave an ODBC connection open as a global variable and pump SQL queries into it. After a while I found I was sending a lot of “Select * From table_Name” queries, and so I gave it a –Table parameter which would be built into a select query and a –gridview parameter which would sent the data to the PowerShell grid viewer. Then I found that I was doing a lot of “Desc table_name” queries, so I added a -describe parameter. One way and another the databases have ended up with long table names which are prone to mistyping, this seemed like a prime candidate for an argument completer, so I set about extending TabExpansion++ (does that make it TabExpansion#? if you haven’t noticed with C# the # sign is ++ ++ one pair above the other).

It takes 4 things to make a tab completer function. First: one or more ArgumentCompleter attributes
[ArgumentCompleter(Parameter = 'table',
                         Command = ('SQL','Get-SQL'),
                      Description = 'Complete Table names for Get-SQL , for example: Get-SQL -GridView -Table ')]

This defines the parameter that the completer works with – which must be a single string. If the completer supports multiple parameters, you must use multiple ArgumentCompleter attributes. 
And it defines the command(s) that the completer works with. The definition can be a string, an array of strings, or even a ScriptBlock. 
The Second thing needed is a param block  that understands the parameters passed to a tab completer.
param($commandName, $parameterName, $wordToComplete, $commandAst, $fakeBoundParameter)

The main one here is $wordToComplete – the partially typed word that tab completion is trying to fill in. However as you can see in the screen shot it is possible to look at the parameters already completed and use them to produce the list of possible values.
$wordToComplete -is used in the third part is the body that gets those possible parameter value. So in my function I have something a bit like this…
$parameters = Get-TableName | Where-Object { $_ -like "$wordToComplete*" } | Sort-Object

And the final part is to return the right kind of object to tab completion process, and Jason’s module has a helper function for this
$parameters | ForEach-Object {$tooltip = "$_"
                              New-CompletionResult $_ $tooltip}

There is the option to have different text as the tool tip – in some places $tooltip – which is shown in intellisense – would be set to something other than the value being returned. Here I’ve kept it in place to remind me rather than a calling  New-CompletionResult $_ $_

And that’s it. Unload and reload TabExpansion++ and my SQL function now knows how to expand -table. I added a second attribute to allow the same code to handle -describe and then wrote something to get field names so I could have a picklist for –orderby and –select as well. With -select intellisense doesn’t pop up a second time if you select a name and enter a comma to start a second; but tab completion works. Here’s the finished item

Function SQLFieldNameCompletion {
   [ArgumentCompleter(Parameter = ('where'),
                        Command = ('SQL','Get-SQL'),
                    Description = 'Complete field names for Get-SQL , for example: Get-SQL -GridView -Table ')]
   [ArgumentCompleter(Parameter = ('select'),
                        Command = ('SQL','Get-SQL'),
                    Description = 'Complete field names for Get-SQL , for example: Get-SQL -GridView -Table ')]
   [ArgumentCompleter(Parameter = ('orderBy'),
                        Command = ('SQL','Get-SQL'),
                    Description = 'Complete field names for Get-SQL , for example: Get-SQL -GridView -Table ')]
   param($commandName, $parameterName, $wordToComplete, $commandAst, $fakeBoundParameter)
   If ($DefaultODBCConnection) {
       $TableName = $fakeBoundParameter['Table']
       Get-SQL -describe $TableName | Where-Object { $_.column_name -like "$wordToComplete*" } | Sort-Object -Property column_name |
       ForEach-Object {$tooltip           = $_.COLUMN_NAME + " : " + $_.TYPE_NAME
                       New-CompletionResult $_.COLUMN_NAME $tooltip
       }
   }
}

Which all saves me a few seconds a few dozen times a day.

June 30, 2013

PowerShell where two falses make a true … and other quirks.

Filed under: Powershell — jamesone111 @ 9:11 pm

I was in a conversation on twitter recently about the way operations in PowerShell aren’t always commutative. We expect addition, multiplication and equality operators to work the same way whether we write 2 x 3 or 3 x 2.
But what happens in a language like PowerShell which tries to flatten out the differences between types and let you multiply a text string by a number?  Let’s explain 

Rule 1. If the operator dictates types, then PowerShell will convert the operands to match.
The classic case is for the –and and –or operators: PowerShell will covert the operands to Booleans – knowing how the conversion is done opens up some useful shortcuts and also avoids some traps:

Any non-zero number is treated as true. Only Zero is false,
for example
> 3.14 -and $true
True

> 0 -and $true
False

In some programming languages 8 –or 16 would calculate a "bitwise OR" (also called a binary OR), in other words it would convert 8 to binary 0000 1000 and convert 16 to binary 0001 0000 and do OR operations on each column to produce 0001 1000 – 24 in decimal. PowerShell provides this functionality though separate operators ‑bOr, ‑bAnd –bXor and –bNot

Any non-empty string is true. Only empty strings are false.
For example.
> "" -and $true
False

> "Hello, world" -and $true
True

The string "FALSE" is not empty, and so is treated as the Boolean value True. If you convert $False to a string and back to a boolean it doesn’t come back to false
> [boolean][string]$false
True

If you need the text "true" to convert to True and "False" to convert to false you can use the [Convert] class
> [convert]::ToBoolean("false")
False

Any non-empty object is treated as true. An empty object , or array, or Null converts to false.

> (dir *.jpg) -and $true
True

> (dir *.pjg) -and $true
False

> @() –and $true
False

Any array with one element is treated as  that element.
> @(0) -and $true
False

Any array with multiple elements is treated as true even if all those elements are false
> @($false,$false) -and $true
True

Rule 2. Null is less than to anything except another null

> "" -gt $null
True

Rule 3. If the operator does not dictate types and the arguments are different types, the first operand’s type determines behaviour

This causes bewilderment in people who don’t understand it, because an operator which is normally commutative (works the same with the operands reversed) is only commutative if the operands are of the same type.
> 5 * "25"
125

> "25" * 5
2525252525

The string “25” is converted to a number, 5×25 = 125, but if the string is placed first, the multiplication operator repeats it.
> 3.14 -eq $true
False

> $true -eq 3.14
True

Converting $true to a number returns 1. Converting a (non-zero) number to a Boolean returns true.
Similarly converting any non empty string to a Boolean returns true, but converting false to a string returns “false”
$true -eq "false"
True

> "false" -eq $true
False

Rule 4. When applied to an array , an operator which returns a Boolean when applied to single items, will return an array of items which individually return true

> @(1,2,3,4) –gt 2
3
4

When you put this together with tests for null, confusion can result: see if you can work out what the following returns.

> @($null,1,2,””,$null) –eq $null

And why the following return the opposite results

> [boolean](@($null,1,2,””,$null) –eq $null)

> [boolean](@($null,1,2,”” ) –eq $null)

Hint : if you can’t work it out try

(@($null,1,2,””,$null) –eq $null).Count

August 9, 2012

Getting to the data in Adobe Lightroom–with or without PowerShell

Filed under: Databases / SQL,Photography,Powershell — jamesone111 @ 7:01 am

Some Adobe software infuriates me (Flash), I don’t like their PDF reader and use Foxit instead, apps which use Adobe-Air always seem to leak memory. But I love Lightroom .  It does things right – like installations – which other Adobe products get wrong. It maintains a “library” of pictures and creates virtual folders of images ( “collections” ) but it maintains metadata in the images files so data stays with pictures when they are copied somewhere else – something some other programs still get badly wrong. My workflow with Lightroom goes something like this.

  1. If I expect to manipulate the image at all I set the cameras to save in RAW, DNG format not JPG (with my scuba diving camera I use CHDK to get the ability to save in DNG)
  2. Shoot pictures – delete any where the camera was pointing at the floor, lens cap was on, studio flash didn’t fire etc. But otherwise don’t edit in the camera.
  3. Copy everything to the computer – usually I create a folder for a set of pictures and put DNG files into a “RAW” subfolder. I keep full memory cards in filing sleeves meant for 35mm slides..
  4. Using PowerShell I replace the IMG prefix with something which tells me what the pictures are but keeps the camera assigned image number. 
  5. Import Pictures into Lightroom – manipulate them and export to the parent folder of the “RAW” one. Make any prints from inside Lightroom. Delete “dud” images from the Lightroom catalog.
  6. Move dud images out of the RAW folder to their own folder. Backup everything. Twice. [I’ve only recently learnt to export the Lightroom catalog information to keep the manipulations with the files]
  7. Remove RAW images from my hard disk

There is one major pain. How do I know which files I have deleted in Lightroom ? I don’t want to delete them from the hard-disk I want to move them later. It turns out Lightroom uses a SQL Lite database and there is a free Windows ODBC driver for SQL Lite available for download.  With this in place one can create a ODBC data source – point it at a Lightroom catalog and poke about with data. Want a complete listing of your Lightroom data in Excel? ODBC is the answer. But let me issue these warnings:

  • Lightroom locks the database files exclusively – you can’t use the ODBC driver and Lightroom at the same time. If something else is holding the files open, Lightroom won’t start.
  • The ODBC driver can run UPDATE queries to change the data: do I need to say that is dangerous ? Good.
  • There’s no support for this. If it goes wrong, expect Adobe support to say “You did WHAT ?” and start asking about your backups. Don’t come to me either. You can work from a copy of the data if you don’t want to risk having to fall back to one of the backups Lightroom makes automatically

   I was interested in 4 sets of data shown in the following diagrams. Below is image information with the Associated metadata, and file information. Lightroom stores images (Adobe_Images table) IPTC and EXIF metadata link to images – their “image” field joins to the “id_local” primary key in images. Images have a “root file” (in the AgLibraryFile table) which links to a library folder (AgLibraryFolder) which is expressed as a path from a root folder (AgLibraryRootFolder table). The link always goes to the “id_local” field I could get information about the folders imported into the catalog just by querying these last two tables (Outlined in red)

image

The SQL to fetch this data looks like this for just the folders
SELECT RootFolder.absolutePath || Folder.pathFromRoot as FullName
FROM   AgLibraryFolder     Folder
JOIN   AgLibraryRootFolder RootFolder O
N  RootFolder.id_local = Folder.rootFolder
ORDER BY FullName 

SQLlite is one of the dialects of SQL which doesn’t accept AS in the FROM part of a SELECT statement . Since I run this in PowerShell I also put a where clause in which inserts a parameter. To get all the metadata the query looks like this
SELECT    rootFolder.absolutePath || folder.pathFromRoot || rootfile.baseName || '.' || rootfile.extension AS fullName, 
          LensRef.value AS Lens,     image.id_global,       colorLabels,                Camera.Value       AS cameraModel,
          fileFormat,                fileHeight,            fileWidth,                  orientation ,
         
captureTime,               dateDay,               dateMonth,                  dateYear,
          hasGPS ,                   gpsLatitude,           gpsLongitude,               flashFired,
         
focalLength,               isoSpeedRating ,       caption,                    copyright
FROM      AgLibraryIPTC              IPTC
JOIN      Adobe_images               image      ON      image.id_local = IPTC.image
JOIN      AgLibraryFile              rootFile   ON   rootfile.id_local = image.rootFile
JOIN      AgLibraryFolder            folder     ON     folder.id_local = rootfile.folder
JOIN      AgLibraryRootFolder        rootFolder ON rootFolder.id_local = folder.rootFolder
JOIN      AgharvestedExifMetadata    metadata   ON      image.id_local = metadata.image
LEFT JOIN AgInternedExifLens         LensRef    ON    LensRef.id_Local = metadata.lensRef
LEFT JOIN AgInternedExifCameraModel  Camera     ON     Camera.id_local = metadata.cameraModelRef
ORDER BY FullName

Note that since some images don’t have a camera or lens logged the joins to those tables needs to be a LEFT join not an inner join. Again the version I use in PowerShell has a Where clause which inserts a parameter.

OK so much for file data – the other data I wanted was about collections. The list of collections is in just one table (AgLibraryCollection) so very easy to query, and but I also wanted to know the images in each collection.

 image

Since one image can be in many collections,and each collection holds many images AgLibraryCollectionImage is a table to provide a many to relationship. Different tables might be attached to AdobeImages depending on what information one wants from about the images in a collection, I’m interested only in mapping files on disk to collections in Lightroom, so I have linked to the file information and I have a query like this.

SELECT   Collection.name AS CollectionName ,
         RootFolder.absolutePath || Folder.pathFromRoot || RootFile.baseName || '.' || RootFile.extension AS FullName
FROM     AgLibraryCollection Collection
JOIN     AgLibraryCollectionimage cimage     ON collection.id_local = cimage.Collection
J
OIN     Adobe_images             Image      ON      Image.id_local = cimage.image
JOIN     AgLibraryFile            RootFile   ON   Rootfile.id_local = image.rootFile
JOIN     AgLibraryFolder          Folder     ON     folder.id_local = RootFile.folder
JOIN     AgLibraryRootFolder      RootFolder ON RootFolder.id_local = Folder.rootFolder
ORDER BY CollectionName, FullName

Once I have an ODBC driver (or an OLE DB driver) I have a ready-made PowerShell template for getting data from the data source. So I wrote functions to let me do :
Get-LightRoomItem -ListFolders -include $pwd
To List folders, below the current one, which are in the LightRoom Library
Get-LightRoomItem  -include "dive"
To list files in LightRoom Library where the path contains  "dive" in the folder or filename
Get-LightRoomItem | Group-Object -no -Property "Lens" | sort count | ft -a count,name
To produce a summary of lightroom items by lens used. And
$paths = (Get-LightRoomItem -include "$pwd%dng" | select -ExpandProperty path)  ;   dir *.dng |
           where {$paths -notcontains $_.FullName} | move -Destination scrap -whatif

  Stores paths of lightroom items in the current folder ending in .DNG in $paths;  then gets files in the current folder and moves those which are not in $paths (i.e. in Lightroom.) specifying  -Whatif allows the files to be confirmed before being moved.

Get-LightRoomCollection to list all collections
Get-LightRoomCollectionItem -include musicians | copy -Destination e:\raw\musicians    to copies the original files in the “musicians” collection to another disk

I’ve shared the PowerShell code on Skydrive

July 31, 2012

Rotating pictures from my portfolio on the Windows 7 Logon screen

Filed under: Photography,Powershell — jamesone111 @ 12:15 pm

In the last 3 posts I outlined my Get-IndexedItem function for accessing windows Search. The more stuff I have on my computers the harder it is to find a way of classifying it so it fits into hierarchical folders : the internet would be unusable without search, and above a certain number of items local stuff is too.  Once I got search I start leaving most mail in my Inbox and outlook uses search to find what I want; I have one “book” in Onenote with a handful of sections and if I can’t remember where I put something, search comes to the rescue. I take the time to tag photos so that I don’t have to worry too much about finding a folder structure to put them in. So I’ll tag geographically  (I only have a few pictures from India – one three week trip, so India gets one tag but UK pictures get divided by County , and in counties with many pictures I put something like Berkshire/Reading. Various tools will make a hierarchy with Berkshire then Newbury, Reading etc) People get tagged by name – Friends and Family being a prefix to group those and so on. I use Windows’ star ratings to tag pictures I like – whether I took them or not – and Windows “use top rated pictures” for the Desktop background picks those up. I also have a tag of “Portfolio”

Ages ago I wrote about Customizing the Windows 7 logon screen. So I had the idea “Why not find pictures with the Portfolio tag, and make them logon backgrounds.”  Another old post covers PowerShell tools for manipulating images so I could write a script to do it, and use Windows scheduled tasks to run that script each time I unlocked the computer so that the next time I went to the logon screen I would have a different picture. That was the genesis of Get-IndexedItem. And I’ve added it, together with the New-LogonBackground to the image module download on the Technet Script Center

If you read that old post you’ll see one of the things we depend on is setting a registry key so the function checks that registry key is set and writes a warning if it isn’t:

if ( (Get-ItemProperty HKLM:\SOFTWARE\Microsoft\Windows\CurrentVersion\Authentication\LogonUI\Background
     
).oembackground -ne 1) {
        Write-Warning "Registry Key OEMBackground under
          HKLM:\SOFTWARE\Microsoft\Windows\CurrentVersion\Authentication\LogonUI\Background needs to be set to 1"

        Write-Warning "Run AS ADMINISTRATOR with -SetRegistry to set the key and try again."
}

So if the registry value isn’t set to 1, the function prints a warning which tells the user to run with –SetRegistry. After testing this multiple times – I found changing windows theme resets the value – and forgetting to run PowerShell with elevated permissions, I put in a try / catch to pick this up and say “Run Elevated”. Just as a side note here I always find when I write try/catch it doesn’t work and it takes me a moment to remember catch works on terminating errors and the command you want to catch must usually needs –ErrorAction stop

if ($SetRegistry ) {
  try{ Set-ItemProperty -Name oembackground -Value 1 -ErrorAction Stop `
               -PATH "HKLM:\SOFTWARE\Microsoft\Windows\CurrentVersion\Authentication\LogonUI\Background" 

   }
  catch [System.Security.SecurityException]{
     Write-Warning "Permission Denied - you need to run as administrator"
  }
   return
}

The function also tests that it can write to the directory where the images are stored, since this doesn’t normally have user access: if it can’t write a file, it tells the user to set the permissions. Instead of using try/catch here I use $? to see if the previous command was successful
Set-content -ErrorAction "Silentlycontinue" -Path "$env:windir\System32\oobe\Info\Backgrounds\testFile.txt" `
              -Value "This file was created to test for write access. It is safe to remove"
if (-not $?) {write-warning "Can't create files in $env:windir\System32\oobe\Info\Backgrounds please set permissions and try again"
             
return
}
else         {Remove-Item -Path "$env:windir\System32\oobe\Info\Backgrounds\testFile.txt"}

The next step is to find the size of the monitor. Fortunately, there is a WMI object for that, but not all monitor sizes are supported as bitmap sizes so the function takes –Width and –Height parameters. If these aren’t specified it gets the value from WMI and allows for a couple of special cases – my testing has not been exhaustive, so other resolutions may need special handling. The Width and height determine the filename for the bitmap, and later the function check the aspect ratio – so it doesn’t try to crop a portrait image to fit landscape monitor.

if (-not($width -and $height)) {
    $mymonitor = Get-WmiObject Win32_DesktopMonitor -Filter "availability = '3'" | select -First 1
    $width, $height = $mymonitor.ScreenWidth, $mymonitor.ScreenHeight
    if ($width -eq 1366) {$width = 1360}
    if (($width -eq 1920) -and ($height -eq 1080)) {$width,$height = 1360,768}
}
if (@("768x1280" ,"900x1440" ,"960x1280" ,"1024x1280" ,"1280x1024" ,"1024x768" , "1280x960" ,"1600x1200",
      "1440x900" ,"1920x1200" ,"1280x768" ,"1360x768") -notcontains "$($width)x$($height)" )
{
    write-warning "Screen resolution is not one of the defaults. You may need to specify width and height"
}
$MonitorAspect = $Width / $height
$SaveName = "$env:windir\System32\oobe\Info\Backgrounds\Background$($width)x$height.jpg"

The next step is to get the image – Get-Image is part of the PowerShell tools for manipulating images .

$myimage = Get-IndexedItem -path $path -recurse -Filter "Kind=Picture","keywords='$keyword'",
                            "store=File","width >= $width ","height >= $height " |
                      where-object {($_.width -gt $_.height) -eq ($width -gt $height)} |
                           
 get-random | get-image

Get-Indexed item looks for files in folder specified by –Path parameter – which defaults to [system.environment]::GetFolderPath( [system.environment+specialFolder]::MyPicture – the approved way to find the "my pictures" folder -recurse tells it to look in sub-folders and it looks for a file with keywords which match the –Keyword Parameter (which defaults to “Portfolio”). It filters out pictures which are smaller than the screen and then where-object filters the list down to those with have the right aspect ratio. Finally one image is selected at random and piped into Get-Image.

If this is successful , the function logs what it is doing to the event log. I set up a new log source “PSLogonBackground” in the application log by running PowerShell as administrator and using the command
New-EventLog -Source PSLogonBackground -LogName application
Then my script can use that as a source – since I don’t want to bother the user if the log isn’t configured I use -ErrorAction silentlycontinue here
write-eventlog -logname Application -source PSLogonBackground -eventID 31365 -ErrorAction silentlycontinue `
                -message "Loaded $($myImage.FullName) [ $($myImage.Width) x $($myImage.Height) ]"

image

The next thing the function does is to apply cropping and scaling image filters from the original image module as required to get the image to the right size.  When it has done that it tries to save the file, by applying a conversion filter and saving the result. The initial JPEG quality is passed as a parameter if the file is too big, the function loops round reducing the jpeg quality until the file fits into the 250KB limit and logs the result to the event log.

Set-ImageFilter -filter (Add-ConversionFilter -typeName "JPG" -quality $JPEGQualitypassthru) -image $myimage -Save $saveName
$item = get-item $saveName
while ($item.length -ge 250kb -and ($JPEGQuality -ge 15) ) {
      $JPEGQuality= 5
      Write-warning "File too big - Setting Quality to $Jpegquality and trying again"
      Set-ImageFilter -filter (Add-ConversionFilter -typeName "JPG" -quality $JPEGQuality -passThru) -image $myimage -Save $saveName
      $item = get-item $saveName
}
if ($item.length -le 250KB) {
         
write-eventlog -logname Application -source PSLogonBackground -ErrorAction silentlycontinue `
           -eventID 31366 -message "Saved $($Item.FullName) : size $($Item.length)"
 }

image

That’s it. If you download the module  remove the “Internet block” on the zip file and expand the files into \users\YourUserName\windowsPowerShell\modules, and try running New-logonbackground  (with –Verbose to get extra information if you wish).
If the permissions on the folder have been set, the registry key is set,  pressing [Ctrl]+[Alt]+[Del] should reveal a new image.  YOU might want to use a different keyword or a different path or start by trying to use a higher JPEG quality in which case you can run it with parameters as needed.

Then it is a matter of setting up the scheduled task: here are the settings from my scheduler

image

image

image

The program here is the full path to Powershell.exe and the parameters box contains
-noprofile -windowstyle hidden -command "Import-Module Image; New-logonBackground"

Lock, unlock and my background changes. Perfect. It’s a nice talking point and a puzzle – sometimes people like the pictures (although someone said one of a graveyard was morbid) – and sometimes they wonder how the background they can see is not only not the standard one but not the one they saw previously.

June 30, 2012

Using the Windows index to search from PowerShell: Part three. Better function output

Filed under: Powershell — jamesone111 @ 10:53 am

Note: this was originally written for the Hey,Scripting guy blog where it appeared as the 27 June 2012 episode. The code is available for download . I have some more index related posts coming up so I wanted to make sure everything was in one place


In part one, I introduced a function which queries the Windows Index using filter parameters like

  • "Contains(*,’Stingray’)"
  • "System.Keywords = ‘Portfolio’ "
  • "System.Photo.CameraManufacturer LIKE ‘CAN%’ "
  • "System.image.horizontalSize > 1024"

In part two, I showed how these parameters could be simplified to

  • Stingray (A word on its own becomes a contains term)
  • Keyword=Portfolio (Keyword, without the S is an alias for System.Keywords and quotes will be added automatically))
  • CameraManufacturer=CAN* (* will become %, and = will become LIKE, quotes will be added and CameraManufacturer will be prefixed with System.Photo)
  • Width > 1024 (Width is an alias or System.image.horizontalsize, and quotes are not added around numbers).

There is one remaining issue. PowerShell is designed so that one command’s output becomes another’s input. This function isn’t going to do much with Piped input: I can’t see another command spitting out search terms for this one, nor can I multiple paths being piped in. But the majority of items found by a search will be files: and so it should be possible to treat them like files, piping them into copy-item or whatever.
The following was my first attempt at transforming the data rows into something more helpful

$Provider= "Provider=Search.CollatorDSO; Extended Properties=’Application=Windows’;"
$adapter = new-object system.data.oledb.oleDBDataadapter -argument $SQL, $Provider
$ds      = new-object system.data.dataset
if ($adapter.Fill($ds))
    
{ foreach ($row in $ds.Tables[0])
            {if ($row."System.ItemUrl" -match "^file:")
                  {$obj = New-Object psobject -Property @{
                                Path = (($row."System.ItemUrl" -replace "^file:","") -replace "\/","\")}}
             Else {$obj = New-Object psobject -Property @{Path = $row."System.ItemUrl"}
}
             Add-Member -force -Input $obj -Name "ToString" -MemberType "scriptmethod" -Value {$this.path}
             foreach ($prop in (Get-Member -InputObject $row -MemberType property |
                                    where-object {$row."$($_.name)" -isnot [system.dbnull] }))
                  { Add-Member -ErrorAction "SilentlyContinue" -InputObject $obj -MemberType NoteProperty `
                               -Name (($prop.name -split "\." )[-1]) -Value $row."$($prop.name)"
                 
}
             foreach ($prop in ($PropertyAliases.Keys |
                                    Where-Object {$row."$($propertyAliases.$_)" -isnot [system.dbnull] }))
                  { Add-Member -ErrorAction "SilentlyContinue" -InputObject $obj ` -MemberType AliasProperty ` 
                               -Name $prop ` -Value ($propertyAliases.$prop -split "\." )[-1]
                  }
             $obj
            }
      }
This is where the function spends most of its time, looping through the data creating a custom object for each row; non-file items are given a path property which holds the System.ItemURL property; for files the ItemUrl is processed into normal format (rather than file:c/users/james format) – in many cases the item can be piped into another command successfully if it just has a Path property.

Then, for each property (database column) in the row a member is added to the custom object with a shortened version of the property name and the value (assuming the column isn’t empty).
Next, alias properties are added using the definitions in $PropertyAliases.
Finally some standard members get added. In this version I’ve pared it down to a single method, because several things expect to be able to get the path for a file by calling its tostring() method.

When I had all of this working I tried to get clever. I added aliases for all the properties which normally appear on a System.IO.FileInfo object and even tried fooling PowerShell’s formatting system into treating my file items as a file object, something that only needs one extra line of code
$Obj.psobject.typenames.insert(0, "SYSTEM.IO.FILEINFO")
Pretending a custom object is actually another type seems dangerous, but everything I tried seemed happy provided the right properties were present. The formatting worked except for the "Mode" column. I found the method which that calculates .Mode for FILEINFO objects, but it needs a real FILEINFO object. It was easy enough to get one – I had the path and it only needs a call to Get‑Item but I realized that if I was getting a FILEINFO object anywhere in the process, then it made more sense to add extra properties to that object and dispense with the custom object. I added an extra switch -NoFiles to supress this behaviour
So the code then became
$Provider ="Provider=Search.CollatorDSO; Extended Properties=’Application=Windows’;"
$adapter  = new-object system.data.oledb.oleDBDataadapter -argument $SQL, $Provider
$ds       = new-object system.data.dataset
if ($adapter.Fill($ds))
     { foreach ($row in $ds.Tables[0])
        
 { if (($row."System.ItemUrl" -match "^file:") -and (-not $NoFiles)) 
                 {$obj = Get-item -Path (($row."System.ItemUrl" -replace "^file:","") -replace "\/","\")}
            Else {$obj = New-Object psobject -Property @{Path = $row."System.ItemUrl"}
                  Add-Member -force -Input $obj -Name "ToString" -MemberType "scriptmethod" -Value {$this.path} 
          }
         
ForEach...
 
The initial code was 36 lines, making the user input more friendly took it to 60 lines, and the output added about another 35 lines, bring the total to 95.
There were 4 other kinds of output I wanted to produce:

  • Help. I added comment-based-help with plenty of examples. It runs to 75 lines making it the biggest constituent in the finished product.
    In addition I have 50 lines that are comments or blank for readability as insurance against trying to understand what those regular expressions do in a few months’ time – but there are only 100 lines of actual code.
  • A –list switch which lists the long and short names for the fields (including aliases)
  • Support for the –Debug switch – because so many things might go wrong, I have write‑debug $SQL immediately before I carry out the query, and to enable it that I have
    [CmdletBinding()] before I declare the parameters.
  • A –Value switch which uses the GROUP ON… OVER… search syntax so I can see what the possible values are in a column.
    GROUP ON queries are unusual because they fill the dataset with TWO tables.
    GROUP ON System.kind OVER ( SELECT STATEMENT) will produce a something like this as the first table.

SYSTEM.KIND   Chapter
-----------   -------
communication 0
document      1
email         2
folder        3
link          4
music         5
picture       6
program       7
recordedtv    8

The second table is the normal data suitably sorted. In this case it has all the requested fields grouped by kind plus one named "Chapter", which ties into the first table. I’m not really interested in the second table but the first helps me know if I should enter "Kind=image", "Kind=Photo" or "Kind=Picture"

I have a Select-List function which I use in my configurator and Hyper-V library, and with this I can choose which recorded TV program to watch, first selecting by title, and then if there is more than one episode, by episode.
$t=(Get-IndexedItem -Value "title" -filter "kind=recordedtv" -recurse |
            Select-List -Property title).title
start (Get-IndexedItem -filter "kind=recordedtv","title='$t'" -path |
           
 Select-List -Property ORIGINALBROADCASTDATE,PROGRAMDESCRIPTION)


In a couple of follow up posts I’ll show some of the places I use Get-IndexedItem. But for now feel free to download the code and experiment with it.

December 10, 2011

PowerShell HashTables: Splatting, nesting, driving selections and generally simplifying life

Filed under: Powershell — jamesone111 @ 11:43 pm
Tags:

Having spent so much of my life working in Microsoft environments where you typed a password to get into your computer and never had to type it again until the next time you had to login / unlock it (however many servers you worked with in between)  I find working in an office with a bunch of Linux servers but no shared set of accounts to be alternately comical and stupid. 
Some of our services use Active Directory to provide LDAP authentication so at least I retype the same user name and password over and over: but back as far as the LAN Manager client for DOS, Microsoft network clients tried to transparently negotiate connections using credentials they already had. Internet explorer does the same when connecting to intranet servers. Even in a workgroup if you manually duplicate accounts between machines you’ll get connected automatically. I stopped retyping passwords about the same time as I stopped saving work to floppy disk; I’m baffled that people in the Unix/Linux world tolerate it.
Some of our services use a separate LDAP service instead of AD (just to keep me on my toes I’m JamesOne on one, and Joneill on the other) and others again use their own internal accounts database. I might log on to a given box as root, and sign into MySQL as root but the passwords are different. If I go to another box the root password is different again. And so on.
Recently we hit a roadblock because one of the team had set up a server with a root password he couldn’t share (because of the other places he’d used it) and he gave me a key file as a work round. I’ve talked before about using the NetCmdlets to connect run commands on the Linux boxes. And this sent me back to look at how I was using them. I ended up driving them from Hash Table of Hash Tables  – something Ed Wilson has recently written about on the “Hey Scripting Guy” blog.

I use the Hash tables for splatting  – the name, if not the concept, is peculiar to PowerShell. If it’s not familiar, lets say I want to invoke the command

connect-ssh –Server "192.168.42.42" -Force –user "root" –AuthMode "publickey" –CertStoreType "pemkeyfile" `
            -CertSubject "*" -certStore="$env:USERPROFILE\documents\secret.priv"

I can create a hash table ” with members Server, Force, Authmode and so on.  Normally when we refer to PowerShell variables we want to work with their Values so we prefix the name with the $ sign. Prefixing the name with the @ sign instead turns the members of an object into parameters for a command : like this

$master =@{Server="192.168.42.42"; Force=$True;
           user="root"; AuthMode="publickey"; CertStoreType="pemkeyfile"; 
           CertSubject="*"; certStore="$env:USERPROFILE\documents\secret.priv"}
connect-ssh @master

For another server I might logon with conventional user name and password so I might have:

$Spartacus =@{Server="192.168.109.71"; Force=$True; user="root";}

and use that hash table as the basis of logging on.  If I want to use one of the copy commands I might add two hash tables together – one containing logon information and the other containing the parameters relating to what should be copied – or I might specify these in the normal way in addition to my splatted variable.
In my original piece about using the NetCmdlets I showed Invoke-RemoteCommand, Get-RemoteItem, Copy-Remote-Item, Copy-LocalItem and so on: I have modified all of these to take a hash table as a parameter and use splatting. I also showed a set-up function named connect-remote: the hash tables are set up when I load the module, but they don’t contain credentials: connect-remote now looks at the hash-table for the connection I want to make and says “Is this fully specified specified to use a certificate, if not does it have credentials ?” If the answer is no to both parts it prompts for credentials and adds them to the hash table – in the snippet below $Server contains the hash table, and $user can be a parameter passed to connect-remote or in the Hash table, and if it can’t be found in either place it is set to the current user name

if (-not ($server.AuthMode -or $server.credential)) {
    $server["credential"] = $Host.ui.PromptForCredential("Login","Enter Your Details for $ServerName","$user","")
}

A global variable set in connect-remote keeps track of which of the hash tables with SSH settings in should be used as the default for Invoke-RemoteCommand, Get-RemoteItem, Copy-Remote-Item, Copy-LocalItem and so on. But it makes sense to have all the hash tables listed somewhere where they can be accessed by name so I have

$hosts = @{ master =@{Server="192.168.42.42"; Force=$True; user="root";
                    AuthMode="publickey"; CertStoreType="pemkeyfile";
                    CertSubject="*"; certStore="$env:USERPROFILE\documents\secret.priv"}
         Spartacus =@{Server="192.168.109.71"; Force=$True; user="root";}
           Maximus =@{Server="192.168.10.10"; Force=$True; user="joneill";}
}

In connect-remote, the –server parameter is declared like this 

  $server = $hosts[(select-item $hosts -message "Which server do you want to connect to ? " -returnkey)]

Select-item is a function I wrote ages ago which takes a hash table and offers the user a choice based of the keys in the hash table and returns either the number of the choice (not very helpful for hash tables) or its name.
Function Select-Item
{[CmdletBinding()]
param ([parameter(ParameterSetName="p1",Position=0)][String[]]$TextChoices,
       [Parameter(ParameterSetName="p2",Position=0)][hashTable]$HashChoices,
       [String]$Caption="Please make a selection",
       [String]$Message="Choices are presented below",
       [int]$default=0,
       [Switch]$returnKey
      )
$choicedesc = New-Object System.Collections.ObjectModel.Collection[System.Management.Automation.Host.ChoiceDescription]
switch ($PsCmdlet.ParameterSetName) { 
       "p1" {$TextChoices | ForEach-Object { $choicedesc.Add((
              
New-Object "System.Management.Automation.Host.ChoiceDescription" -ArgumentList $_ )) } }
       "p2" {foreach ($key in $HashChoices.Keys) { $choicedesc.Add((
              
New-Object "System.Management.Automation.Host.ChoiceDescription" -ArgumentList $key,$HashChoices[$key] )) } }
}
If ($returnkey) { $choicedesc[$Host.ui.PromptForChoice($caption, $message, $choicedesc, $default)].label }
else            {             $Host.ui.PromptForChoice($caption, $message, $choicedesc, $default) }
}

December 3, 2011

PowerShell–full of stringy goodness.

Filed under: Powershell — jamesone111 @ 2:28 pm

I think almost everyone who works with PowerShell learns two things in their first few minutes.
(a) Assigning some text to a Variable looks like this $Name =  "James"
(b) When you wrap a string in double quotes it expands variables inside it, for example "Hello $Name" will evaluate to Hello James.

A little later we tend to learn that if we want to put a property of a variable in string things get marginally more complex.
"The name $Name is $name.length characters long" gives the text the name James is James.length characters long.

To get the length of what is stored in the variable named “name” we need to use "The name $Name is $($name.length) characters long". This gives the name James is 5 characters long.

And most of us also learn that PowerShell can have Multi-line strings, sometimes called Here-Strings. which look like this

$HereString=@"
Here is a string
The string $name made
It hasn’t finished yet
There’s more
OK we’re done
"@

Here-strings are really useful because they don’t close until they reach “@ at the start of a line, they can contain quotes, newlines and if they use double quotes they too will expand variables.

I should say here that I don’t have any detailed knowledge of how PowerShell’s parser actually deals with strings, but it seems pretty clear that when it hits a $ sign it says “OK I need to work out a value here”. I think using $ to mean “value of” was something PowerShell picked up from another language but I can’t be sure which one it was. My inner Pedant wants to correct people when they say “PowerShell variable names begin with $” $foo means the value that is in foo: the name is foo. The way I understand it, when the parser sees $ inside a string it looks to see what comes next: if it is a sequence of letters it assumes they are the name of variable, which it should insert into the string. If is a “(“ character then it looks for the matching “)” and evaluates the expression inside, like $name.length. Easy. Only this week I found myself saying … these are nothing special … well, we can get much better things than that.

I had to create over a dozen XML files: all identical except each one had a different database table name and a subset of the fields in the table – some had just 2 fields others had more than a dozen, the XML went like this.
<stuff>
   <kind1 name=TableName>
      <field name=field1>
      <field name=field2>
      <field name=field3>
   </kind1>
   <xmlgumf />
   <kind2 name=TableName>
      <field name=field1>
      <field name=field2>
      <field name=field3>
      <field name=field3>
   </kind2>
   <xmlgumf />
</stuff>

Except the real XML was much more complicated. Using the tool which builds the XML file from scratch takes about an hour per file. The XML files then go into another tool which is where the real magic happens. Building a dozen files would be couple of days work, or with the project I’m working on , a very late night.
I had a document with the table names and the field lists in them, so could I automate the creation of these XML files ?
Outside of a string I’d write something like this
$fields=”Field1”,”Field2”,”Field3”
$fields | foreach -begin {$a=""} -process {$a += [environment]::newline + "<Field name=$_ >"} -end {$a}

Would that second line work in a here-string based on an existing XML file? How well would this work ?

$Table = “TableName”
$fields=”Field1”,”Field2”,”Field3”
@”
<stuff>
   <kind1 name=$Table>$($fields | foreach -begin {$a=””} -process {$a += [environment]::newline + ”      <Field name=$_ >”} -end {$a})
   </kind1>
   <xmlgumf />
   <kind2 name=$Table>$($fields | foreach -begin {$a=””} -process {$a += [environment]::newline + ” <Field name=$_ >”} -end {$a})
   </kind2>
   <xmlgumf />
</stuff>
“@ | out-file “$table.xml” –encoding ascii

The simple answer is it works like a charm.
Building the first XML file took an hour using the normal tool to do it from scratch, it then took 5 minutes to convert that file into a template – which to my immense delight worked first time. Producing each additional XML file took 2 minutes. Half an hour in all. That’s 10.5 hours I’ve got to do something else with. And showing it to colleagues their reaction was I had performed serious magic. The kind of magic which lets me disappear early.

October 24, 2011

Maximize the reuse of your PowerShell

Filed under: Powershell — jamesone111 @ 3:18 pm
Last week I was at The Experts Conference in Frankfurt presenting at the PowerShell Deep Dive. My presentation was entitled “Maximize the reuse of your PowerShell”.
My PowerShell library for managing Hyper-V has now gone through a total of 100,000 downloads over all its different versions but whether it’s got wide use because of the way I wrote it or in spite of it, I can’t say.  Some people whose views I value seem to agree with the ideas I put forward in the talk, so I’m setting them out in this (rather long) post.
image

I have never lost my love of Lego. Like Lego, PowerShell succeeds by being a kit of small, general purpose blocks to link together.  Not everything we do can extend that kit, but we should aim to do that where possible. I rediscovered the Monad Manifesto recently via a piece Jeffrey Snover about how PowerShell has remained true to the original vision, named Monad. It talks of a model where “Every executable should do a narrow set of functions and complex functions should be composed by pipelining or sequencing executables together”.  Your work is easier to reuse if it becomes a building block that can be incorporated into more complex solutions; and this certainly isn’t unique to PowerShell.   

Functions for re-use, scripts for a single task. If you a use .PS1 script to carry out a task it is effectively a batch file:  it is automating stuff so that’s good, but it isn’t a building block for other work. Once loaded, functions behave like like compiled cmdlets. If you separate parts of a script into functions it should be done with the aim of making those parts easy to reuse and recycle, not simply be to reformat a script into subroutines.
If we’re going to write functions how should they look?

Choose names wisely, One task : One Function, there is no “DoStuffWith” verb for a reason.
PowerShell cmdlets use names in the form Verb-Noun and it is best to stick to this model; the Get-Verb command will give you a list of the Approved verbs. PowerShell’s enforcement of these standards is limited to raising a warning if you use non-standard names in a module. If your action really isn’t covered by a standard verb, then log the need for another one on connect; but if you use “Duplicate” when the standard says “Copy”, or “Delete” when the standard says “Remove” it simply makes things more difficult for people who know PowerShell but don’t know your stuff. The same applies to parameter names: try to use the ones people already know. Getting the function name right sets the scope for your work. For my talk I used an example of generating MD5 hashes for files. My assertion was that the MD5 hash belonged to the file, and so command should return a file with an added MD5 hash – meaning my command should be ADD-MD5.

Output: Use the right Out- and Write- cmdlets Other speakers at the Deep Dive made the point that output created with Write-Host isn’t truly returned (in the sense that it can be piped or redirected) it is writing on the console screen; so only use Write-Host if you want to prevent output going anywhere else. There are other “write on the screen without returning” cmdlets which might be better: for example when debugging we often want messages that say “starting stage 1” , “starting stage 2” and so on. One way to do this would be to add Write-Host statements and after the code is debugged remove them. A better way is to use Write-Verbose which outputs if you specify a –verbose switch and doesn’t if you don’t. As a side effect you have a quasi-comment in your code which tells you what is happening. The same is true when you use Write-Progress to indicate something is happening in a long running function, (remember to run it with the -completed switch when you have finished otherwise your progress box can remain on screen while something else runs). Write-Debug, Write-Error and Write-Warning are valuable, I try to prevent errors appearing where the code can recover and write a warning when something didn’t go to plan, in a non-fatal way.

Formatting results can be a thorny issue: you can redirect the results of Format-Table or Format-List to a file, but the output is useless if you want to pipe results into another command – which needs the original object.
Some objects can have ugly output: it’s not a crime to have an –asTable switch so output goes through Format-Table at the end of the function or even to produce pretty output by default provided you ensure that it is possible to get the raw object into the pipe with a –noFormat or –Raw switch. But it’s best to create formatting XML and a lot easier with tools like James Brundage’s E-Z-out.

Output with the pipeline in mind. It’s not enough to just avoid write-host and return some sort of object. I argued that a function should return the richest object that is practical – my example used Add-Member to take a file object and give it an additional MD5Hash property. I can do anything with the result that I can do with a file and default formatting rules for a file are used (which is usually good)  but being the right type need not matter if the object has the right property/properties. For example, this line of PowerShell :
  Select-String -Path *.ps1 -Pattern "Select-String"
looks at all the .PS1 files in the current folder and where it finds the text  “Select-String”  it outputs a MatchInfo object with the properties: .Context, .Filename, .IgnoreCase, .Line, .LineNumber, .Matches, .Path and .Pattern. A Cmdlet like Copy-Item has no idea about MatchInfo objects, but if an object piped in has a .path property, it will knows what it is being asked to work on. If Matchinfo objects named this property “NameOfMatchingFile” it just would not work. 

Think about pipelined input. All it takes to tell PowerShell that a parameter should come from the pipeline is prefixing its declaration with
  [parameter(ValueFromPipeLine= $true)]
If you find that you are using your new function like this
  Get-thing | ForEach {Verb-Thing –thing $_}
It’s telling you that -thing should take pipeline values.

The pipeline can supply multiple items , so a function may need to be split into Begin{}process{} and End {} blocks. (If you don’t specify these the whole function body is treated as an end block and only the last item passed is processed). Eventually the realization dawns that if the example above works, it should be possible have two lines:
  $t = Get-thing
Verb-Thing –thing $t
 
So parameters need to be able to handle arrays – something I’ll come back to further down.

You can do the same thing that Copy-Item was shown doing above: I have another function which is a wrapper for Select-String, its parameters include:
  [Parameter(ValueFromPipeLine=$true,Mandatory=$true)]
  $Pattern,
  [parameter(ValueFromPipelineByPropertyName = $true)][Alias('Fullname','Path')]
  $Include=@("*.ps1","*.js","*.sql")

If the function gets passed a string via the pipeline it is the value for the -pattern parameter. If it gets an object containing a property named “Include”, “Fullname” or “path” that property becomes the value for the –include parameter.

Sometimes a function needs to output to a destination based on input: so you can check to see if a parameter is a script block and evaluate it if it is. 

Don’t require users to know syntax for things inside your function. If you are going to write code to do one job and never reuse it then you don’t need to be flexible. If the code is to be reused, you need to do a little extra work so users don’t have to. For example: if something you use needs a fully qualified path to a file, then the function should use Resolve-Path to avoid an error when the user supplies the name of a file in the current directory. Resolve-Path is quite content to resolve multiple items so replacing   
  Do_stuff_with $path
with
  Resolve-Path $path | for-each {Do stuff with $_ }
Delivers, at a stroke, support for Wildcards, multiple items passed in $path and relative names .
Another example is with WMI, where syntax is based on SQL so the wildcard is “%”, not “*”. Users will assume the wildcard is *. In this case do you say:
(a) Users should learn to use “%”   
(b) My  function should include   $parameter = $parameter -Replace "*","%" 
For my demo I showed a function I have named “Get-IndexedItem” which finds things using the Windows index. I wanted to find my best underwater photos- they are tagged “portfolio” and are the only photos I shoot with a Canon camera. My function lets me type
  Get-IndexedItem cameramaker=can*,tag=portfolio
Inside the function the search system needs a where condition of
  “System.Photo.Cameramanufacturer LIKE 'can%'  AND System.Keywords = 'portfolio'
Some tools would require me to type the filtering condition in this form, but I don’t want to remember what prefixes are needed and whether the names are “camera Maker” or “camera Manufacturer” and “keyword”, “keywords” or “tag”. Half the time I’ll forget to wrap things in single quotes, or use “*” as I wild card because I forgot this was SQL. And if I have multiple search terms why shouldn’t they be a list not “A and b and C” (there is a write-up coming for how I did this processing. ).

Set sensible defaults.  The talk before mine highlighted some examples of “bad” coding, and showed a function which accepted a computer name. Every time the user runs the command they must specify the computer. Is it valid to assume that most of the time the command will run against the current computer? If so the parameter should default to that. If the tool deals with files is it valid to assume “all files in the current directory” – if the command is delete, probably not, if it displays some aspect of the files, it probably can.
Constants could be Parameter defaults With computer name example you might write a function which only ran against the current computer. Obviously it is more useful if the computer name is a parameter (with a default) not a constant. In a surprising number of a cases, something you create to do a specific task can carry out a generic task if you change a fixed value in your code into a parameter, which defaults to the former fixed value.

Be flexible about parameter types and other validation This is a variation on not making the user understand the internals of your function and I have talked about it before, in particular the dangers of people who are trained systems programmers applying the techniques in PowerShell they would use in something like C#: in those languages a function declaration might look like:
  single Circumference(single radius) {}
which says Circumference takes a parameter which must have been declared to be a single precision real number and returns a single precision real number. Writing
   c = Circumference("Hello");
or  c = Circumference("42");
will cause the compiler to give a “type mismatch” error – “Hello” and “42” are strings, not single precision real numbers . Similarly
System.io.fileinfo f = Circumference(42);
Is a type mismatch: f is a fileInfo object and we can’t assign a number to it. The compiler picks these things up before the program is ever run, so  users don’t see run-time errors.
PowerShell isn’t compiled and its Function declarations don’t include a return a type: Get-Item, for example, deals with the file system, certificate stores, the registry etc. so it can return more than a dozen different types: there is no way to know in advance what type of item Get-Item $y will return. If the result is stored in a variable (with  $x = Get-item $y )  the type of the variable isn’t specified, but defined at runtime.
Trying to translate that declaration into PowerShell gives something like this.
  Function Get-Circumference{
  Param([single]$Radius)
  $radius * 2 * [system.math]::PI
  }

Calling
  Get-Circumference "Hello"
Produces an error but a closer inspection show it is not a type mismatch: it says
Cannot process argument transformation on parameter 'radius'. Cannot convert value "hello" to type "System.Single".
the [Single] says “always try to cast $f as a single”.  Which means
  Get-Circumference "42"
Will be cast from a string to a single and the function returns 263.89

So you might expect 
  [System.IO.FileInfo]$f = Get-Circumference 42
To throw an error, but it doesn’t, PowerShell casts 263.893782901543 to an object representing a file with that name in \windows\system32. The file doesn’t exist and is read-only!  So it can be better to resolve types in code.

I’d go further than that. Some validation is redundant because the parameter will be passed to a cmdlet which is more flexible than the function writer thought, in other cases parameter validation is there to cover up a programmer’s laziness . When I see a web site which demands that I don’t enter spaces in my credit card number I think “Lazy. It’s easy to strip spaces and dashes”. Having “O’Neill” for a surname means that the work of slovenly developers who don’t check their SQL inputs or demand only letters gets drawn to my attention too. If the user is forced to use the equivalent of
  Stop-Process (get-process Calc)
they will think “Lazy. you  couldn’t be bothered even to provide a –name parameter”.  Stop-process does just that to cope with process objects, names and IDs (and notice it allows you to pass more than one ID)  for example:
  stop-process [-id ] 4472,5200,5224
  $p = get-process calc ; Stop-Process -InputObject $p 
  stop-process -name calc

Other cmdlets are able to resolve types without using different parameters for example
  ren '.\100 Meter Event.txt' "100 Metre Event.txt"
  $f = get-item '.\100 Metre Event.txt' ; ren $f '100 Meter Event.txt'
In one case the first item is a string and the other is a file object: from a user’s point of view this is better than stop-process which in turn is much better than having to get an object representing the process you want to stop. In my talk I used Set-VMMemory from my Hyper-V module on Codeplex, which has:
  param(
        [parameter(ValueFromPipeLine = $true)]
        $VM,        
        [Alias("MemoryInBytes")]
        [long]$Memory = 0 ,
        $Server = "."
  )

There isn’t a way for me to work out what a user means if they specify a memory size which isn’t a number (and can’t be cast to one). If the user specifies many Virtual Machines, catching something during parameter validation will produce one error instead of one per VM (so this would be the place to trap negative numbers too).
I have a –server parameter and it makes sense to assume the local machine – but I can’t make an assumption about which VM(s) the user might want to modify. The VM can come from the pipeline, and it might be an object which represents a VM, or a string with a VM name (possibly a wildcard) or an array of VM objects and/or Names. If the user says
  Set-VMMemory –memory 1GB –vm london* –server h1,h2
the function, not the user, translates that into the necessary objects. If this doesn’t match any VMs I don’t want things to break. (Though I might produce a warning). It takes 4 lines to support all of this
   if ($VM -is [String]) { $VM = Get-VM -Name $VM -Server $Server}
  if ($VM.count -gt 1 )  {[Void]$PSBoundParameters.Remove("VM")
                           $VM | ForEach-object {Set-VMMemory -VM $_ @PSBoundParameters}}
  if ($vm.__CLASS -eq 'Msvm_ComputerSystem') {
      do the main part of the code 
  }
 
Incidentally I use the __CLASS property because it works with remoting when other methods for checking a WMI type failed.

Allow parts to be switched off In the talk I explained I have a script which applies new builds of software to a remote host. It does a backup and creates a roll back script. Sometimes I have to roll back, produce another build and then apply that. Since I have an up to date backup I don’t need to run the backup a second time so that part of the code runs unless I specify a –nobackup switch.

Support –whatif  or restore data. You choose. A function can use $pscmdlet.shouldProcess – which returns true if the function should proceed into a danger zone and false if it shouldn’t. It only takes one line  line before declaring the parameters at the start of a function to enable this
  [CmdletBinding(SupportsShouldProcess=$true, ConfirmImpact='High' )]
There are 4 confirmation levels “None”, “Low”, “Medium,”, “High”, and for the Confirm impact value and the $Confirmation preference variable. If either is set to “None” no confirmation appears. If the impact level is higher or equal to the preference setting this line
  if ($pscmdlet.shouldProcess($file.name,”Delete file”) {Remove-item $file}
Will delete the file or not depending on the users response to a prompt – in this case “Performing operation “Delete File” on Target “filename”. Confirmation can be forced on by specifying –confirm. Specifying –whatif will echo the message but return false. If no confirmation is needed,  
–verbose
will echo the message but return true.

Prepare your code for other people, including you.  I’m not the person I was a year ago. You’re not either, and we’ll be different people in 3 or 6 months. So even if you are not publishing your work are writing for someone else. You, in 3 months time. You under-pressure at 3 in the morning. And that you will curse the you of today if leave your code in a mess.

There are many ways to format your code: find a style of indenting that works for you, if it does, then the chances that anyone else will like it are about the same as the chance they will hate it.  Some people like to play “PowerShell Golf” where fewest [key]strokes wins – this is fine for the command prompt. Expanding things to make them easier to read is generally good. That’s not an absolute ban on aliases – using Sort and Where instead of Sort-Object and Where-Object may help readability – the key is to break up very long lines, but without creating so many short lines that you are constantly scrolling up and down.

Everyone says “Put in brief comments”, but I’d refine that: explain WHY not WHAT, for example  I only decided to use Set-VMMemory as an example in my talk at the last moment.  It has a line
  if (-not ($memory % 2mb))  {$memory /= 1mb}

I can’t have looked code in 8 months. Why would I do that ? Fortunately I’ve put in WHY comment – the WHAT is self explanatory
# The API takes the amount of memory in MB, in multiples of 2MB.
# Assume that anything less than 2097152 is in already MB (we aren't assigning 2TB to the VM). If user enters 1024MB or 0.5GB divide by 1MB 
>

Instead of putting comments next to each parameter saying what it does, move the comments into comment based help, take a couple of places where you have used / tested the command and put them into the comment based help as examples. Your future self will thank you for it.

Next Page »

Create a free website or blog at WordPress.com.