James O'Neill's Blog

November 24, 2019

Redefining CD in PowerShell

Filed under: Powershell — jamesone111 @ 6:48 pm

For some people my IT career must seem to have begun in Mesolithic times – back then we had a product called Novell Netware (and Yorkshiremen of a certain age will say “Aye, and Rickets and Diphtheria too”). But I was thinking about one of of Netware’s features recently; well as the traditional cd .. for the parent directory Netware users could refer to two levels up as … , three levels up as …. and so on. And after a PowerShell session going up and down a directory tree I got nostalgic for that. And I thought…

  • I can convert some number of dots into a repetition of “..\” fairly easily with regular expressions.
  • I’ve recently written a blog post about argument transformers and
  • I already change cd in my profile, so why not change it a little more ?

By default, PowerShell defines CD as an alias for SET-Location and for most of the time I have been working with PowerShell I have set cd- as an alias for POP-Location, deleted the initial cd alias (until PowerShell 6 there was no Remove-Alias cmdlet, so this meant using Remove-Item Alias:\cd –force) and created a new alias from cd to PUSH-location , so I can use cd in the normal way but I have cd- to re-trace my steps.
To get the exta functionality means attaching and Argument transformer to the parameter where it is declared, so I would have to make “new cd” a function instead of an alias. The basic part of it looks like this:-

function cd {
    <#
.ForwardHelpTargetName Microsoft.PowerShell.Management\Push-Location
.ForwardHelpCategory Cmdlet
#>

    [CmdletBinding(DefaultParameterSetName='Path')]
    param(
        [Parameter(ParameterSetName='Path', Position=0,
             ValueFromPipeline=$true, ValueFromPipelineByPropertyName=$true)]
        [PathTransform()]
        [string]$path
    )
    process {
        Push-Location @PSBoundParameters
    }

}

The finished item (posted here) has more parameters – it is built like a proxy function, it forwards help to Push-Location’s  help. If the path is “”(or a sequence of – signs) to Pop-Location  is called for each “–”, so I can use a bash-style to cd  - as well as cd-  and Push-Location  is only called if a path is specified.
If the path isn’t valid I don’t want the error to say it occurred at a location in the function so I added a validate script to the parameter.

The key piece is the [PathTransform()] attribute on the path Parameter – it comes from a class, with a name ending “attribute” (which can be omitted when writing the parameter attribute in the function). Initially the class was mostly wrapping around one line of code

class PathTransformAttribute : System.Management.Automation.ArgumentTransformationAttribute {
    [object] Transform([System.Management.Automation.EngineIntrinsics]$EngineIntrinsics,
                       [object] $InputData)
     {
        return $InputData -replace "(?<=^\.[./\\]*)(?=\.{2,}(/|\\|$))"  ,  ".\"
    }
}

The class line defines the name and says it descends from the ArgumentTransformationAttribute class;
the next line says it has a Transform method which returns an object, and takes parameters EngineIntrinsics, and InputData
and the line which does the works a is regular expression. In Regex:
(?<=AAA)(?=ZZZ)
says find the part of the text where looking behind you, you see AAA and looking ahead, you see ZZZ; this doesn’t specify anything to select between the two, so “replacing” it doesn’t remove anything it is just “insert where…”.  In the code above, the look-behind part says ‘the start of the text “(^”), a dot (“\.”), and then dots, forward or back slashes (“[./\\]”) repeated zero or more times (“*”) ’ ;  and the look ahead says ‘a dot (“\.”) repeated at least 2 times (“{2,}”) followed by / or \ or the end of the text (“/|\\|$”).
So names like readme…txt won’t match, neither will …git but …\.git will become ..\..\.git. .

BUT …[tab] and doesn’t expand two levels up – the parameter needs an argument completer for that. Completers take information about the command line  – and especially the current word to complete and return CompletionResult objects for tab expansion to suggest.
PowerShell has 5 ready-made completers for Command, Filename, Operator, Type and Variable. Pass any of these completers a word-to-complete and it returns  CompletionResult objects – for example you can try
[System.Management.Automation.CompletionCompleters]::CompleteOperator("-n")

A simple way to use for one of these is to view help in its own window, a feature which is returning in PowerShell 7 (starting in preview 6); I  like this enough to have a little function, Show-Help which calls  Get-Help –ShowWindow. Adding an argument completer my function’s command parameter means it tab-completes matching commands.

function Show-Help {
  param (
    [parameter(ValueFromPipeline=$true)]
    [ArgumentCompleter({
        param($commandName, $parameterName,$wordToComplete,$commandAst,$fakeBoundParameter)
        [System.Management.Automation.CompletionCompleters]::CompleteCommand($wordToComplete)
    })]
    $Command
  )
  process {foreach ($c in $Command) {Get-Help -ShowWindow $c} }

}

The completer for Path in my new cd needs more work and there was a complication which took little while to discover: PSReadline caches alias parameters and their associated completers so after the cd alias is replaced my profile I need to have this:

if (Get-Module PSReadLine) {
    Remove-Module -Force PsReadline
    Import-Module -Force PSReadLine
    Set-PSReadlineOption -BellStyle None
    Set-PSReadlineOption -EditMode Windows
}

You might have other psreadline options to set.
I figured that I might want to use my new completer logic in more than one command, and I also prefer to keep anything lengthy scripts out of the Param() block, which led me to use an argument completer class. The outline of my class appears below:

class PathCompleter : System.Management.Automation.IArgumentCompleter {
    [System.Collections.Generic.IEnumerable[ System.Management.Automation.CompletionResult]] CompleteArgument(
                   [string]$CommandName,
                   [string]$ParameterName,
                   [string]$WordToComplete,
                   [System.Management.Automation.Language.CommandAst]$CommandAst,
                   [System.Collections.IDictionary] $FakeBoundParameters
    )
    {
        $CompletionResults = [System.Collections.Generic.List[ System.Management.Automation.CompletionResult]]::new()

        # populate $wtc from $WordToComplete

foreach
($result in
           [System.Management.Automation.CompletionCompleters]::CompleteFilename($wtc) ) {
             if ($result.resultType -eq "ProviderContainer") {$CompletionResults.Add($result)}
        }
        return $CompletionResults
    }
}

The class line names the class and says it implements the IArgumentCompleter interface, Everything else defines the class’s CompleteArgument method, which returns a collection of completion results, and takes the standard parameters for a completer (seen here). The body of the method creates the collection of results as its first line and returns that collection as its last line, in-between it calls the CompleteFileName method I mentioned earlier, filtering the results to containers. The final version uses the CommandName  parameter to filter results for some commands and return everything for others. Between initializing $CompletionResults and the foreach loop is something to convert the WordToComplete  parameter into the $wtc argument passed to CompleteFileName

The initial idea was to expand 3, 4, or more dots. But I found ..[tab] .[tab] and ~[tab] do not expand – they all need a trailing \ or /.  “I can fix that” I thought…
Then I thought “Wouldn’t it be could if I could find a directory somewhere on my current path” so if I’m in a sub-sub-sub-folder of Documents  \*doc [tab] will expand to documents.
What about getting back to the PowerShell directory ? I decided ^[tab] should get me there.
Previously pushed locations on the stack? It would be nice if I could tab expand “-“ but PowerShell takes that to be the start of a parameter name, not a value so I use = instead =[tab] will cycle through locations == [tab] gives 2nd entry on the stack ===[tab] the third and so on.  There aren’t many characters to choose from; “.” and all the alphanumerics are used in file names; #$@-><;,| and all the quote and bracket characters tell PowerShell about what comes next. \ and / both mean “root directory”, ? and * are wild cards, ~ is the home directory. Which leaves !£%^_+ and = as available (on a UK keyboard), and = has the advantage of not needing shift. And I’m sure some people use ^ and or = at the start of file names  – they’d need to change my selections.

All the new things to be handled go into one regular-expression based switch statement as seen below; the regexes are not the easiest to read because so many of characters need to be escaped. “\\\*” translates as \ followed by * and “^\^” means “beginning with a ^”  and the result looks like some weird ascii art.

$dots    = [regex]"^\.\.(\.*)(\\|$|/)" 
$sep     = [system.io.path]::DirectorySeparatorChar
$wtc     = ""
switch -regex ($wordToComplete) {
    $dots       {$newPath = "..$Sep" * (1 + $dots.Matches($wordToComplete)[0].Groups[1].Length)
                         $wtc = $dots.Replace($wordtocomplete,$newPath) ; continue }
    "^=$"       { foreach ($stackPath in (Get-Location -Stack).ToArray().Path) {
                    if ($stackpath -match "[ ']") {$stackpath = '"' + $stackPath + '"'}
                    $results.Add([System.Management.Automation.CompletionResult]::new($stackPath))
                    }
                    return $results ; continue
                }
    "^=+$"      {$wtc = (Get-Location -Stack).ToArray()[$wordToComplete.Length -1].Path  ; continue }
    "^\\\*|/\*" {$wtc = $pwd.path -replace "^(.*$($WordToComplete.substring(2)).*?)[/\\].*$",'$1' ; continue }
    "^~$"       {$wtc = $env:USERPROFILE  ; continue }
    "^\^$"      {$wtc = $PSScriptRoot     ; continue }  
    "^\.$"
      {$wtc = ""                ; continue }  
    default
     {$wtc = $wordToComplete}
}

Working up from the bottom,

  • The default is to use the parameter as passed in CompleteFileName. Every other branch of the switch uses continue to jump out without looking at the remaining options.
  • if the parameter is “.”, ”^” or “~” CompleteFileName will be told use an empty string, the script directory or the user’s home directory respectively. ($env:userProfile is only, set on Windows by default. Earlier in my profile I have something to set it to [Environment]::GetFolderPath([Environment+SpecialFolder]::UserProfile) if it is missing, and this will return the home directory regardless of OS)
  • if the  parameter begins with \* or begins with /* the script takes the current directory, and selects from the beginning to whatever comes after the * in the parameter, and continues selecting up to the next / or \ and discards the rest. The result is passed into completeFileName
  • If the parameter contains a sequence of = signs and nothing else, a result is returned which from the stack, = is position 0, == is position 1 using the length of the parameter
  • If the parameter is a single = sign the function returns without calling Completefilename . It looks at each item on the stack in turn, those which contain either a space or a single quote, are wrapped in double quotes before being added to $results, which is returned at the end is returned.
  • And the first section of the switch uses an existing regex object as the regular expression. The regex object will get the sequence of dots before the last two, and repeats “..\”  as many times as there are dots, and drops that into $WordToComplete . PowerShell is quite happy to use / on windows where \ would be normal, and to use \ on Linux where / would be normal. Instead of hard coding one I get the “normal” one as $sep and insert that with the two dots.

Adding support for = and ^ meant going back to the argument transformer and adding the option so that cd ^ [Enter] and cd = [Enter] work

I’ve put the code here and a summary of what I’ve enabled appears below.

Keys

 

Before

 

After
cd ~[Tab] – (needs ~\) Expands
cd ~[Enter] Set-Location Push-Location
cd ..[Tab] – (needs ..\) Expands <Parent>
cd ..[Enter] Set-Location Push-Location
cd …[Tab] Expands
and higher levels with each extra “.”
cd …[Enter] ERROR Push-Location
& beyond with each extra “.”
cd /*pow [Tab] Expand directory/ directories above containing “pow”
cd /*pow [Enter] ERROR Push-location to directory containing “pow”
(if unique; error if not unique)
cd ^[Tab] Expands PS Profile directory
cd ^[Enter] ERROR Push-Location PS Profile directory
cd =[Tab] Cycle through location stack
cd =[Enter] ERROR Push-location to nth on stack:
= is 1st, == 2ndetc
(and allow ‘Pop’ back to current location)
cd -[Enter] ERROR Pop-location (repeats Pop for each extra – except for 2– which suffers from a bug)
Does not allow Pop back to current location
cd- [Enter] ERROR Pop-location
cd\ [Enter] Set-Location \ Push Location \
cd.. [Enter] Set-Location .. Push-location ..
cd~ [Enter] ERROR Push-Location ~

 

November 10, 2019

PowerShell Arrays, performance and [not] sweating the small stuff

Filed under: Powershell — jamesone111 @ 12:22 pm

I’ve read a couple of posts on arrays recently Anthony (a.k.a. the POSH Wolf) posted one and Tobias had another. To their advice I’d add Avoid creating huge arrays, where practical.  I’ve written about the problems doing “Is X in the set” with large arrays; hash-tables were a better answer in that case. Sometimes we can avoid storing a big lump of data altogether, and I often prefer designs which do. 

We often know this technique is “better” than that one, but we also want code which takes a short time-to-write; is clear so that later it has a short time-to-understand , but doesn’t take an excessive time-to-run. As with most of these triangles, you can often get two and rarely get all three. Spending 30 seconds writing something which takes 2 seconds to run might beat something which takes 10 minutes to write and runs in 50 milliseconds. But neither is any good if next week we spend hours figuring out how the data was changed last Tuesday.   

Clarity and speed aren’t mutually exclusive, but sometimes there is a clear, familiar way which doesn’t scale up and a less attractive technique which does. Writing something which scales to "bicycle" will hit trouble when the problem reaches "Jumbo Jet" size, and applying "Jumbo" techniques to a "bike" size problem can be an unnecessary burden. And (of course) expertise is knowing both the techniques and where they work (and don’t).

One particular aspect of arrays in PowerShell causes a problem at large scale. Building up large arrays one member at a time is something to try to design out, but sometimes it is the most (or only) practical way. PowerShell arrays are created as a fixed size; adding a member means creating a new array, and copying the existing array and one more member to a new array gets slower as the array gets bigger. If the time to do each of n operations depends on the number done so far, which is 0 at the start, n at the end and averages n/2 during the process, the average time per item is some_constant * n/2. Let’s define k as 2* the constant, so average time per item is kn  and time to do all n items is  kn². The time rises with an exponent of n. People like to say “rises exponentially” for “fast” but this is exponential. You can try this test, the result from my computer appears below. The numbers don’t perfectly fit a square law, but the orders of magnitude do. 

$hash=[ordered]@{}    
foreach ($size in 100,1000,10000,100000) {
  $hash["$size"]=(measure-command {
       $array=@(); foreach ($x in (1..$size)){$array += $x}
  }).TotalMilliseconds;
}
$hash

Array Size

Total Milliseconds

100

5

1,000

43

10,000

2,800

100,000

310,000

if 43ms sounds a bit abstract, disqualification rules in athletics say you can’t react in less than 100ms. A “blink of an eye” takes about 300-400ms. It takes ~60ms for PowerShell to generate my prompt, it’s not worth cutting less than 250ms off  time-back-to-prompt. Even then a minute’s work to save a whole second only pays for itself after 60 runs. (I wrote in April about how very small “costs” for an operation can be multiplied many-fold: saving less than 100ms on an operation still adds up if a script does that operation 100,000 times; we can also see a difference when typing between 50 and 100ms responses, but here I’m thinking of things which only run once in a script). 

At 10K array items, the possible saving is a couple of seconds, this is in the band of acceptable times that are slow enough to notice. Slower still and human behaviour changes: we think it’s crashed, or swap to another task and introduce a coffee break before the next command runs. 100K items takes 5 minutes. But even that might be acceptable in a script which runs as a scheduled task. Do you want to guess how long a million would take ?
$a = 1..999999 will put 999,999 items into an array in 60ms on my machine – $variable = Something_which_outputs_an_array   is usually a quick operation.
$a += 1000000 takes 100ms. Adding the millionth item takes as long as adding the first few thousand. The first 100K take a few minutes, the last 100K take a few hours. And that’s too long even for a scheduled task.

The exponent which makes things scale UP badly means they scale DOWN brilliantly – is a waste of effort to worry about scale if tens of items is a big set, but when thousands of items is a small set it could be critical. Removing operations where each repetition takes longer than the one before can be a big win because these are the root of a exponential execution time.   
The following scrap works, but its is unnecessarily slow; it’s not dreadful on clarity but it is longwinded. There are also faster methods than Piping many items into foreach-object

$results = @()
Get-Stuff | foreach-object {$results += $_ }
return $results

This pattern can stem from thinking every function must have a return, which is called exactly once, which isn’t the case in PowerShell. It’s quicker and simpler just as Get-Stuff, or if some processing needs to happen between getting and returning the results then something like the following, if the work is done object-by-object:
Get-Stuff | foreach-object {output_processed $_} 
or if the work must be done on the whole set:       
$results = Get-Stuff    
work_on $results  #returning  final result

Another pattern looks like this.
put_some_result in $a
some_loop {
  Operations         
  $a += result
}

this works better as
put_some_result in_$a
$a += some_loop {
  Operations         
  Output_result
}

A lot of cases where a better “add to array” looks like then answer are forms of this pattern and getting down to just one add is a better answer.
When thousands of additions are unavoidable, a lot of advice says use [Arraylist] but as Anthony’s post points out, more recent advice is to use [List[object]] or [List[Type]].

Postscript

At the same time as I was posting this, Tobias was looking at the same problem with strings. Again building up a 1,000,000 line string one line at a time is something to be avoided and again, it takes a lot longer to add create a new sting which is old-string + one-line when the old-string is big than when it is small, and I found that fitted a square law nicely 10,000 string-appends took 1.7 seconds; 100,000 took 177 seconds. It takes as long to add 10 strings at lines to 100,000 to 100,010 line as adding the first 3,000 to an empty version. His conclusion – that if you really can’t avoid doing this, using a stringBuilder is much more efficient – is a good one, but I wouldn’t bother with one to join half a dozen strings together.

Blog at WordPress.com.