James O'Neill's Blog

June 16, 2017

More on writing clear scripts: Write-output and return … good or bad ?

Filed under: Powershell — jamesone111 @ 11:26 am

My last post talked about writing understandable scripts and something I read a piece entitled Let’s kill Write-Output by Mark Krauss (actually I found it because Thomas Lee Tweeted it with “And sort out return too”).

So let’s start with one practicality. You can’t remove a command which has been in a language for 10 years unless you are prepared for a lot of pain making people re-write scripts. It’s alias “echo” was put there for people who come from other scripting languages who start by asking “How do I print to the console”. But if removing it altogether is impractical, we can advise people to avoid it, write rules to catch it in the script analyser and so on. Should we ? And when is a good idea to use it ?

Mark points out he’s not talking about Write-host, which should be kept for limited scenarios: if you want the user to see it by default, but it isn’t part of the output then that’s a job for Write-host, for example with my Get-SQL command   $result = Get-SQL $sqlQuery writes “42 rows returned” to the console but the output saved into $result is the 42 rows of data. Mark gives an example:    
Write-Output "PowerShell Processes:"
Get-Process -Name PowerShell

and says it is better written as  
"PowerShell Processes:"
Get-Process -Name PowerShell

And this is actually a case where Write-host should be used … why ? Let’s turn that into a function.
Function Get-psProc {
  "PowerShell Processes:"
  Get-Process -Name"*PowerShell*"
}

Looks fine doesn’t it ? But it outputs two different types of object into the pipeline. All is well if we run Get-psProc on its own, but if we run
 Get-psProc | ConvertTo-Csv 
It returns
#TYPE System.String
"Length"
"21" 

The next command in the pipeline saw that the first object was a string and that determined its behaviour. “PowerShell processes” is decoration you want the user to see but isn’t part of the output. That earlier post on understandable scripts came from a talk about writing good code and the one of the biggest problems I find in other peoples code is fixation with printing to the screen.  That leads to things like the next example – which is meant to read a file and say how many lines there are and their average length.

$measurement = cat $path | measure -average Length
echo ("Lines read    : {0}"    -f $measurement.Count  )
echo ("Average length: {0:n0}" -f $measurement.Average)

This runs and it does the job the author intended but I’d suggest they might be new to PowerShell and haven’t yet learnt that Output is not the same as “stuff for a user to read” (as in the previous example) , and they feel their output must be printed for reading. Someone more experienced with PowerShell might just write:
cat $path| measure -average Length
If they aren’t bothered about the labels,  or if the labels really matter
cat $path | measure -average Length | select @{n="Lines Read";e={$_.count}}, @{n="Average Length";e={[math]::Round($_.Average,2)}}

If this is something we use a lot, we might change the aliases to cmdlet names, specify parameter names and save it for later use. And it is re-usable, for example if we want to do something when there are more than x lines in the file, where the previous version can only return text with the number of lines embedded in it.  Resisting the urge to print everything is beneficial and that gets rid of a lot of uses of Write-output (or echo).

Mark’s post has 3 beefs with Write-Output.

  1. Performance. It is slower but rarely noticeably so, so I’d discount this.
  2. Security / Predictability – Write-Output can be redefined, and that allows for something malign or just buggy. True, but it also allows you to redefine it for logging debugging and so on. So you could use a proxy Write-output for testing and the standard one in production. So this is not exclusively bad
  3. The false sense of security. He says that explicitly returning stuff is held to be better than implicit return, which implies
    Write-Output $result
          is better than just   $result            
    But no-one says you should write    cat $path | Write-Output it’s obviously redundant, but when you don’t isn’t that implying output ?

My take on the last point is piping output into write-output (or Out-Default) is a tautology “Here’s some output, take it and output it”. It’s making things longer but not clearer. If using write-output does make things clearer then it is a sign the script is hard to read and at least needs some comments, and possibly some redesign. Joel Bennett sums up the false sense of security part in a sentencewhile some people like it because it highlights the spots where you intentionally output something — other people argue it’s presence distracts you from the fact that other lines could output.”  [Thanks Joel, that would have taken me a paragraph!]

This is where Thomas’ comment about return comes in. Return tells PowerShell to bail out of a function, and there are many good reasons for doing that, it also has a two in one syntax :  return $result is the same as
$result
return

When I linked to Joel above he also asks the question whether, as the last lines of a function, this
$output = $temp + (Get-Thing $temp)
return $output

is better or worse than
$output = $temp + (Get-Thing $temp)
$output

Not many people would add return to the second example – it’s redundant.  But if you store the final output in a variable there is some logic to using return (or Write-output) to send it back. But is it making things any clearer to store the result in a variable ? or it just as easy to read the following.  
$temp + (Get-Thing $temp)

As with Write-output, sometimes using return $result makes things clearer and sometimes it’s a habit from other programming languages where functions return results in a single  place so multiple parts must be gathered and then returned. Here’s something which combines the results of 3 queries and returns them

$result =  (Get-SQL $sqlQuery1)
$result += (Get-SQL $sqlQuery2)
$result +  (Get-SQL $sqlQuery3)

So the first line assigns an array of database rows to a variable the second appends more rows and the third returns these rows together with the results of a third query.  You need to look at  the operator in each line to figure out which sends to the pipeline. Arguably this it is clearer to replace the last line with this:

$result += (Get-SQL $sqlQuery3)
return $result

When there are 3 or 4 lines between introducing $result and putting it into the pipeline this is OK. But lets say there are 50 lines of script between storing the results of the first query and appending the results of the second.  Has the script been made clearer by storing a partial result … or would you see something being appended to $result and look further up the script for where it was originally set and anywhere it was changed ? This example does nothing with the combined segments (like sorting them) we’re just following an old habit of only outputting in one place. Not outputting anything until we have everything can mean it takes a lot longer to run the script – we could have processed all the results from the first query while waiting for the second to run. I would dispense with the variable entirely and use

Get-SQL $sqlQuery1
Get-SQL $sqlQuery2
Get-SQL $sqlQuery3

If there is a lot of script between each I’d then use a #region around the lines which lead up to each query being run
#region build query and return rows for x
#etc etc
Get-SQL $sqlQuery1
#endregion 

so when I collapse the outlining regions in my editor I see
#region build query and return rows for x
#region build query and return rows for y
#region build query and return rows for z

Which gives me a very good sense of what the script is doing at a high level and then I can drill into the regions if I need to. If I do need to do something to the combined set of rows (like sorting) then my collapsed code might become
#region build query for x and keep rows for sorting later
#region build query for y and keep rows for sorting later
#region build query for z and keep rows for sorting later
#region return sorted and de-duplicated results of x,y and Z

Both outlines give a sense of where there should be output and where any output might be a bug.

In conclusion. 
When you see Lots of echo / write-output commands that’s usually a bad sign – it’s usually an indication of too many formatted strings going into the pipeline, but Write-Output is not automatically bad when used sparingly – and used properly return isn’t bad either. But if you find yourself adding either for clarity it should make you ask “Is there a better way”.  


Advertisements

June 2, 2017

On writing understandable scripts.

Filed under: Uncategorized — jamesone111 @ 7:20 pm

 

At two conferences recently I gave a talk on “What makes a good PowerShell module”  (revisiting an earlier talk) the psconf.eu guys have posted a video of it and I’ve made the slides available (the version in US used the same slide deck with a different template). .

One of the my points was Prefer the familiar way to the clever way. A lot of us like the brilliant PowerShell one-liner (I belong to the “We don’t need no stinking variables” school and will happily pipe huge chains of commands together). But sometimes breaking it into multiple commands means that when you return it later or someone new picks up what you have done, it is easier to understand what is happening.  There are plenty of other examples, but generally clever might be opaque ; opaque needs comments and somewhere I picked up that what applies to jokes, applies to programming: if you have to explain it, it isn’t that good.

Sometimes, someone doesn’t now the way which is familiar to everyone else, and they throw in something like this example which I used in the talk:
Set-Variable -Scope 1 -Name "variableName" -Value $($variableName +1)
I can’t recall ever using Set-Variable, and why would someone use it to to set a variable to its current value + 1? The key must be in the -scope parameter, –scope 1 means “the parent scope” , most people would write $Script:VariableName ++ or $Global:VariableName ++ When we encounter something like this, unravelling what Set-Variable is doing interrupts the flow of understanding … we have to go back and say “so what was happening when that variable was set …” 

There are lots of cases where there are multiple ways to do something some are easier to understand but aren’t automatically the one we pick: all the following appear to do the same job

"The value is " + $variable
"The value is " + $variable.ToString()
"The value is $variable"
"The value is {0}" -f $variable
"The value is " -replace "$",$variable

You might see .ToString() and say “that’s thinking like a C# programmer” … but if $variable holds a date and the local culture isn’t US the first two examples will produce different results (to string will use local cultural settings) .
If you work a lot with the –f operator , you might use {0:d} to say “insert the first item in ‘short date’ format for the local culture” and naturally write
“File {0} is {1} bytes in size and was changed on {2}” –f $variable.Name,$variable.Length,$variable.LastWriteTime
Because the eye has to jump back and forth along the line to figure out what goes into {0} and then into {1} and so on, this loses on readability compared concatenating the parts with + signs, it also assumes the next person to look at the script has the same familiarity with the –f operator. I can hear old hands saying “Anyone competent with PowerShell should be familiar with –f” but who said the person trying to understand your script meets your definition of competence.   
As someone who does a lot of stuff with regular expressions, I might be tempted by the last one … but replacing the “end of string” marker ($) to as a way of appending excludes people who aren’t happy with regex.  I’m working on something which auto-generates code at the moment and it uses this because the source that it reads doesn’t provide a way of specifying “append”, but has “replace”. I will let it slide in this case but being bothered by it is a sign that I do ask myself “are you showing you’re clever or writing something that can be worked on later”.  Sometimes the only practical way is hard, but if there is a way which takes an extra minute to write and pays back when looking at the code in a few months time.   

Create a free website or blog at WordPress.com.