James O'Neill's Blog

December 6, 2016

Do the job 100 times faster with Parallel Processing in PowerShell

Filed under: Powershell — jamesone111 @ 11:12 pm
Tags: ,

It’s a slightly click-baity title, but I explain below where the 100 times number comes from below. The module is on the PowerShell gallery and you can install it with  Install-Module -Name Start-parallel

Some of the tasks we need to do in PowerShell involve firing off many similar requests and waiting for their answers – for example getting status from lots computers on a network. It might take several seconds to do each one – maybe longer if the machines don’t respond. Doing them one after the other could take ages. If I want to ping all 255 addresses on on my home subnet most machines will be missing and it will take 3 seconds to time out for each of the 200+ inactive addresses. Even if I try only one ping for each address it’s going to take 10-12 minutes, and for >99% of that time my computer will be waiting for a response. Running them in parallel could speed things up massively.
Incidentally the data which comes back from ping isn’t ideal, and I prefer this to the Test-Connection cmdlet.
Function QuickPing {
    param ($LastByte)
    $P = New-Object -TypeName "System.Net.NetworkInformation.Ping"
    $P.Send("192.168.0.$LastByte") | where status -eq success | select address, roundTripTime
}

PowerShell allows you to start multiple processes using Jobs and there are places where Jobs work well. But it only takes a moment to see the flaw in jobs: if you run
Get-Process *powershell*
Start-Job -ScriptBlock {1..5 | foreach {start-sleep -Seconds 1 ; $_ } }
Get-Process *powershell*

You see that the job creates a new instance of PowerShell … doing that for a single ping is horribly inefficient – jobs are better suited to tasks where run time is much longer than the set up time AND where we don’t want run lots concurrently. In fact I’ve found creating large numbers of jobs tends to crash the PowerShell ISE; so my first attempts at parallelism involved tracking the number of jobs running and keeping to a maximum – starting new jobs only as others finished. It worked but in the process I read this by Boe Prox and this by Ryan Witschger which led me to a better way to RunSpaces and the RunSpace factory.
MSDN defines a RunSpace as “the operating environment where the command pipeline of the PowerShell object is invoked”; and says that the PowerShell object allows applications that programmatically use Windows PowerShell to create pipelines of commands, invoke them and access the results. The factory can create single RunSpaces, or a pool of RunSpaces. So a program (or script) PowerShell object which says “Run this, with these named parameters and these unnamed arguments. Run it asynchronously (i.e. start it and don’t wait for it complete, give me some signal when it is done), and in an a space from this pool.” If there are more things wanting to run than there are RunSpaces, the pool handles queuing. 

Thus the idea for Start-Parallel was born.  I wanted to be able to do this
Get-ListOfComputers | Start-Parallel Get-ComputerSettings.ps1
or this  
1..255 | Start-Parallel -Command QuickPing -MaxThreads 500
or even pipe PS objects or hash tables in to provide multiple parameters a same command

-MaxThreads in the second example says create a pool where 500 pings can be in progress, so every QuickPing can be running at the same time (performance monitor shows a spike of threads). So how long does it take to do 255 pings now? 240 inactive addresses taking 3 seconds each gave me ~720 seconds and the version above runs in a little under 7, so a that’s 100 fold speed increase!  This is pretty consistent with what I’ve found with polling servers over the couple of years I’ve been playing with Start-Parallel – things that would take a morning or an afternoon run in a couple of minutes. 

You can install it from the PowerShell Gallery. Some tips

  • Get-ListOfComputers | Start-Parallel Get-ComputerSettings.ps1 
    works better than
    $x = Get-ListOfComputers ; Start-Parallel -InputObject $x -Command Get-ComputerSettings.ps1
    if Get-ListOfComputers is slow, we will probably have the results for the first computer(s) before we have been told the last one on the list to check.    
  • Don’t hit the same same service with many requests in parallel – unless you want to mount a denial of service attack.   
  • Remember that RunSpaces don’t share anything – the parallel RunSpaces won’t load your profile, or inherit anything from the session which launches them. And there is no guarantee that every module out there always behaves as expected if run in multiple RunSpaces simultaneously. In particular if, “QuickPing” is defined in a the same PS1 file which runs Start-Parallel, then Start-Parallel is defined in the global scope and can’t see QuickPing in the script scope. The work round for this is to use  
    Start-Parallel –scriptblock ${Function:\QuickPing}
  • Some commands by their nature specify a computer. For others it is easier to define a script block inside another script block (or a function) which takes a computer name as a parameter and runs
    Invoke-Command –ComputerName $computer –scriptblock $InnerScriptBlock
  • I don’t recommend running Start-Parallel inside itself, but based on very limited testing it does appear to work.

You can install it by running Install-Module -Name Start-parallel

 

November 30, 2016

Powershell Piped Parameter Peculiarities (and a Palliative pattern!)

Filed under: Uncategorized — jamesone111 @ 7:33 am

Writing some notes before sharing a PowerShell module,  I did a quick fact check and rediscovered a hiccup with piped parameters and (eventually) remembered writing a simplified script to show the problem – 3 years ago as it turns out. The script appears below: it has four parameter sets and all it does is tell us which parameter set was selected: There are four parameters: A is in all 4 sets, B is in Sets 2,3 and 4, C is only in 3 and D is only in set 4. I’m not really a fan of parameter sets but they help intellisense to remove choices which don’t apply. 

function test { 
[CmdletBinding(DefaultParameterSetName="PS1")]
param (  [parameter(Position=0, ValueFromPipeLine=$true)]
         $A
         [parameter(ParameterSetName="PS2")]
         [parameter(ParameterSetName="PS3")]
         [parameter(ParameterSetName="PS4")]
         $B,
         [parameter(ParameterSetName="PS3", Mandatory)]
         $C,
         [parameter(ParameterSetName="PS4", Mandatory)]
         $D
)
$PSCmdlet.ParameterSetName
}

So lets check out what comes back for different parameter combinations
> test  1
PS1

No parameters or parameter A only gives the default parameter set. Without parameter C or D it can’t be set 3 or 4, and with no parameter B it isn’t set 2 either.

> test 1 -b 2
PS2
Parameters A & B or parameter B only gives parameter set 2, – having parameter B it must be set 2,3 or 4 and but 3 & 4 can be eliminated because C and D are missing. 

> test 1 -b 2 –c 3 
PS3

Parameter C means it must be set 3 (and D means it must be set 4) ; so lets try piping the input for parameter A
> 1 | test 
PS1
> 1 | test  -b 2 -c 3
PS3

So far it’s as we’d expect.  But then something goes wrong.
> 1 | test  -b 2
Parameter set cannot be resolved using the specified named parameters

Eh ? If data is being piped in, PowerShell no longer infers a parameter set from the absent mandatory parameters.  Which seems like a bug. And I thought about it: why would piping something change what you can infer about a parameter not being on the command line? Could it be uncertainty whether values could come from properties the piped object ? I thought I’d try this hunch
   [parameter(ParameterSetName="PS3", Mandatory,ValueFromPipelineByPropertyName=$true)]
  $C,
  [parameter(ParameterSetName="PS4", Mandatory,ValueFromPipelineByPropertyName=$true)]
  $D

This does the trick – though I don’t have a convincing reason why two places not providing the values works better than one – (in fact that initial hunch doesn’t seem to stand up to logic) . This (mostly) solves the problem– there could be some odd results if parameter D was named “length” or “path” or anything else commonly used as a property name. I also found in the “real” function that adding ValueFromPipelineByPropertyName to too many parameters – non mandatory ones – caused PowerShell to think a set had been selected and then complain that one of the mandatory values was missing from the piped object. So just adding it to every parameter isn’t the answer

November 19, 2016

Format-XML on the PowerShell Gallery

Filed under: Powershell — jamesone111 @ 8:08 pm
Tags: , ,

In the last post, I spoke about those bits of PowerShell we carry around and never think to share. Ages ago I wrote a function named “Format-XML” which “pretty prints” XML with nice indents. I’ve passed it on to a few people over the years -  it’s been included as a “helper” in various modules – but I hadn’t published it on its own.

I’ve got that nagging feeling  I should be crediting someone for providing the original bits but I’ve long since lost track of who. In Britain people sometimes talk about “Trigger’s broom” which classicists tend to call the Ship of Theseus – if you change a part it’s still the same thing, right? But after every part has been changed? That’s even more true of the “SU” script which will be the subject of a future post but in that case I’ve kept track of its origins.

Whatever… Format-XML is on the PowerShell gallery – you can click Show under FileList on that page to look at the code, or use PowerShell Get (see the Gallery homepage for details) to install it, using Install-Script -Name format-xml the licence is chosen to all you to incorporate it into anything you want with no strings attached.

Then you can run to load it with .  format-xml.ps1 – that leading “.” matters … and run it with
Format-XML $MyXML
or $MyXML | Format-XML
Where $MyXML is either any of

  • An XML object
  • Some text in XML format
  • The name of a file which contains XML, or
  • A file object where the file contains XML

Incidentally, if you have stuff to share on the PowerShell gallery the sign-up process is quick, and the PowerShell Get module has a Update-ScriptFileInfo command to set the necessary metadata and then Publish-Script puts the script into the gallery – it couldn’t be easier.  

November 13, 2016

One of those “everyday” patterns in PowerShell –splitting a list

Filed under: Powershell — jamesone111 @ 9:16 pm

For a PowerShell enthusiast, the gig I’ve been on for the last few weeks is one of those “Glass Half Full/Glass Half Empty” situations: the people I’m working with could do a lot more with PowerShell than they are (half empty) but that’s an opportunity to make things better (half full). A pattern which I take for granted took on practically life-changing powers for a couple of my team this week…. 

We had to move some  … lets just say “things”, my teammates know they run Move-Thing Name  Destination but they had been mailed several lists with maybe 100 things to move in each one. Running 100 command lines is going to be a chore.  So I gave them this
@"
PASTE
YOUR
LIST
HERE
"@
   -Split  "\s*[\r\n]+\s*"  | ForEach-Object { Move-thing $_ "Destination"}

Text which is wrapped in @"<newline> and <newline> "@ is technically called a "here string" but to most people it is just a way to have a multiline string.  So pasting a list of items between the quotes is trivial, but the next bit looks like some magic spell …
PowerShell’s –split operator takes regular expressions and splits the text where it finds them (and throws matching bit away). in Regex \r is carriage Return, and \n is New line and [\r\n] is “either return or newline”, so [\r\n]+ means "at least one of the line break characters, but any number in any order." And I usually use –split this way, but here we found the lists often included spaces and tabs – adding \s* at the beginning and end adds “preceded by / followed by  any number of white space characters – even zero”

So the multiline string is now a bunch of discrete items. The command we want to run doesn’t always need to be in a foreach {} – text piped in to many commands becomes the right parameter like this 
@"
List
"@
   -Split  "\s*[\r\n]+\s*"  | Get-Thing | Format-Table
But for  a foreach {} will always work even if it is cumbersome.

I think lots of us have these ready made patterns – as much as anything this post is a call to think about ones you might share. it was nice to pass this one on and hear the boss’s surprise when one of junior guys told him everything was done.

July 1, 2016

Just enough admin and constrained endpoints. Part 2: Startup scripts

Filed under: Uncategorized — jamesone111 @ 1:36 pm

In part 1 I looked at endpoints and their role in building your own JEA solution, and said applying constraints to end points via a startup script did these things

  • Loads modules
  • Hides cmdlets, aliases and functions from the user.
  • Defines which scripts and external executables may be run
  • Defines proxy functions to wrap commands and modify their functionality
  • Sets the PowerShell language mode, to further limit the commands which can be run in a session, and prevent new ones being defined.

The endpoint is a PowerShell RunSpace running under its own user account (ideally a dedicated account) and applying the constraints means a user connecting to the endpoint can do only a carefully controlled set of things. There are multiple ways to set up an endpoint, I prefer to do it with using a start-up script, and below is the script I used in a recent talk on JEA. It covers all the points and works but being an example the scope is extremely limited :

$Script:AssumedUser  = $PSSenderInfo.UserInfo.Identity.name
if ($Script:AssumedUser) {
   
Write-EventLog -LogName Application -Source PSRemoteAdmin -EventId 1 -Message "$Script:AssumedUser, Started a remote Session"
}
# IMPORT THE COMMANDS WE NEED
Import-Module -Name PrintManagement -Function Get-Printer

#HIDE EVERYTHING. Then show the commands we need and add Minimum functions
if (-not $psise) { 
    Get-Command -CommandType Cmdlet,Filter,Function | ForEach-Object  {$_.Visibility = 'Private' }
    Get-Alias                                       | ForEach-Object  {$_.Visibility = 'Private' }
    #To show multiple commands put the name as a comma separated list 
    Get-Command -Name Get-Printer                   | ForEach-Object  {$_.Visibility = 'Public'  } 

    $ExecutionContext.SessionState.Applications.Clear()
    $ExecutionContext.SessionState.Scripts.Clear()

    $RemoteServer =  [System.Management.Automation.Runspaces.InitialSessionState]::CreateRestricted(
                                     
[System.Management.Automation.SessionCapabilities]::RemoteServer)
    $RemoteServer.Commands.Where{($_.Visibility -eq 'public') -and ($_.CommandType -eq 'Function') } |
              
ForEach-Object {  Set-Item -path "Function:\$($_.Name)" -Value $_.Definition }
}

#region Add our functions and business logic
function Restart-Spooler {
<#
.Synopsis
    Restarts the Print Spooler service on the current Computer
.Example
    Restart-Spooler
    Restarts the spooler service, and logs who did it  
#>

    Microsoft.PowerShell.Management\Restart-Service -Name "Spooler"
    Write-EventLog -LogName Application -Source PSRemoteAdmin -EventId 123 -Message "$Script:AssumedUser, restarted the spooler"
}
#endregion
#Set the language mode
if (-not $psise) {$ExecutionContext.SessionState.LanguageMode = [System.Management.Automation.PSLanguageMode]::NoLanguage}

Logging
Any action taken from the endpoint will appear to be carried out by privileged Run As account, so the script needs to log the name of the user who connects runs commands. So the first few lines of the script get the name of the connected user and log the connection: I set-up PSRemoteAdmin as a source in the event log by running.  
New-EventLog -Source PSRemoteAdmin -LogName application

Then the script moves on to the first bullet point in the list at the start of this post: loading any modules required; for this example, I have loaded PrintManagement. To make doubly sure that I don’t give access to unintended commands, Import-Module is told to load only those that I know I need.

Private functions (and cmdlets and aliases)
The script hides the commands which we don’t want the user to have access to (we’ll assume everything). You can try the following in a fresh PowerShell Session (don’t use one with anything you want to keep!)

function jump {param ($path) Set-Location -Path $path }
(Get-Command set-location).Visibility = "Private"
cd \
This defines jump as a function which calls Set-Location – functionally it is the same as the alias CD; Next we can hide Set-location, and try to use CD but this returns an error
cd : The term 'Set-Location' is not recognized
But Jump \ works: making something private stops the user calling it from the command line but allows it to be called in a Function. To stop the user creating their own functions the script sets the language mode as its final step 

To allow me to test parts of the script, it doesn’t hide anything if it is running in the in the PowerShell ISE, so the blocks which change the available commands are wrapped in  if (-not $psise) {}. Away from the ISE the script hides internal commands first. You might think that Get-Command could return aliases to be hidden, but in practice this causes an error. Once everything has been made Private, the Script takes a list of commands, separated with commas and makes them public again (in my case there is only one command in the list). Note that script can see private commands and make them public, but at the PowerShell prompt you can’t see a private command so you can’t change it back to being public.

Hiding external commands comes next. If you examine $ExecutionContext.SessionState.Applications and $ExecutionContext.SessionState.Scripts you will see that they are both normally set to “*”, they can contain named scripts or applications or be empty. You can try the following in an expendable PowerShell session

$ExecutionContext.SessionState.Applications.Clear()
ping localhost
ping : The term 'PING.EXE' is not recognized as the name of a cmdlet function, script file, or operable program.
PowerShell found PING.EXE but decided it wasn’t an operable program.  $ExecutionContext.SessionState.Applications.Add("C:\Windows\System32\PING.EXE") will enable ping, but nothing else.

So now the endpoint is looking pretty bare, it only has one available command – Get-Printer. We can’t get a list of commands, or exit the session, and in fact PowerShell looks for “Out-Default” which has also been hidden. This is a little too bare; we need to Add constrained versions of some essential commands;  while to steps to hide commands can be discovered inside PowerShell if you look hard enough, the steps to put in the essential commands need to come from documentation. In the script $RemoteServer gets definitions and creates Proxy functions for:

Clear-Host   
Exit-PSSession
Get-Command  
Get-FormatData
Get-Help     
Measure-Object
Out-Default  
Select-Object

I’ve got a longer explanation of proxy functions here, the key thing is that if PowerShell has two commands with the same name, Aliases beat Functions, Functions beat Cmdlets, Cmdlets beat external scripts and programs. “Full” Proxy functions create a steppable pipeline to run a native cmdlet, and can add code at the begin stage, at each process stage for piped objects and at the end stage, but it’s possible to create much simpler functions to wrap a cmdlet and change the parameters it takes; either adding some which are used by logic inside the proxy function, removing some or applying extra validation rules. The proxy function PowerShell provides for Select-Object only supports two parameters: property and InputObject, and property only allows 11 pre-named properties. If a user-callable function defined for the endpoint needs to use the “real” Select-Object – it must call it with a fully qualified name: Microsoft.PowerShell.Utility\Select-Object (I tend to forget this, and since I didn’t load these proxies when testing in the ISE, I get reminded with a “bad parameter” error the first time I use the command from the endpoint).  In the same way, if the endpoint manages active directory and it creates a Proxy function for Get-ADUser, anything which needs the Get-ADUser cmdlet should specify the ActiveDirectory module as part of the command name.

By the end of the first if … {} block the basic environment is created. The next region defines functions for additional commands; these will fall mainly into two groups: proxy functions as I’ve just described and functions which I group under the heading of business logic. The end point I was creating had “Initialize-User” which would add a user to AD from a template, give them a mailbox, set their manager and other fields which appear in the directory, give them a phone number, enable them Skype-For-Business with Enterprise voice and set-up Exchange voice mail, all in one command. How many proxy and business logic commands there will be, and how complex they are both depend on the situation; and some commands – like Get-Printer in the example script – might not need to be wrapped in a proxy at all.
For the example I’ve created a Restart-Spooler command. I could have created a Proxy to wrap Restart-Service and only allowed a limited set of services to be restarted. Because I might still do that the function uses the fully qualified name of the hidden Restart-Service cmdlet, and I have also made sure the function writes information to the event log saying what happened. For a larger system I use a 3 digits where the first indicates the type of object impacted (1xx for users , 2xx for mailboxes and so on) and the next two what was done (x01 for Added , x02 for Changed a property).

The final step in the script is to set the language mode. There are four possible language modes Full Language is what we normally see; Constrained language limits calling methods and changing properties to certain allowed .net types, the MATH type isn’t specifically allowed, so [System.Math]::pi will return the value of pi, but [System.Math]::Pow(2,3) causes an error saying you can’t invoke that method, the SessionState type isn’t on the allowed list either so trying to change the language back will say “Property setting is only allowed on core types”. Restricted language doesn’t allow variables to be set and doesn’t allow access to members of an object (i.e. you can look at individual properties, call methods, or access individual members of an array), and certain variables (like $pid) are not accessible. No language stops us even reading variables 

Once the script is saved it is a question of connecting to the end point to test it. In part one I showed setting-up the end point like this
$cred = Get-Credential
Register-PSSessionConfiguration -Name "RemoteAdmin"       -RunAsCredential $cred `
                                -ShowSecurityDescriptorUI
-StartupScript 'C:\Program Files\WindowsPowerShell\EndPoint.ps1'
The start-up script will be read from the given path for each connection, so there is no need to do anything to the Session configuration when the script changes; as soon as the script is saved to the right place I can then get a new session connecting to the “RemoteAdmin” endpoint, and enter the session. Immediately the prompt suggests something isn’t normal:

$s = New-PSSession -ComputerName localhost -ConfigurationName RemoteAdmin
Enter-PSSession $s
[localhost]: PS>

PowerShell has a prompt function, which has been hidden. If I try some commands, I quickly see that the session has been constrained

[localhost]: PS> whoami
The term 'whoami.exe' is not recognized…

[localhost]: PS> $pid
The syntax is not supported by this runspace. This can occur if the runspace is in no-language mode...

[localhost]: PS> dir
The term 'dir' is not recognized ….

However the commands which should be present are present. Get-Command works and shows the others

[localhost]: PS> get-command
CommandType  Name                    Version    Source
-----------  ----                    -------    ------
Function     Exit-PSSession
Function     Get-Command
Function     Get-FormatData
Function     Get-Help
Function     Get-Printer                 1.1    PrintManagement                                                                                        
Function     Measure-Object
Function     Out-Default
Function     Restart-Spooler
Function     Select-Object

We can try the following to show how the Select-object cmdlet has been replaced with a proxy function with reduced functionality:
[localhost]: PS> get-printer | select-object -first 1
A parameter cannot be found that matches parameter name 'first'.

So it looks like all the things which need to be constrained are constrained, if the functions I want to deliver – Get-Printer and Restart-Spooler – if  work properly I can create a module using
Export-PSSession -Session $s -OutputModule 'C:\Program Files\WindowsPowerShell\Modules\remotePrinters' -AllowClobber -force
(I use -force and -allowClobber so that if the module files exist they are overwritten, and if the commands have already been imported they will be recreated.)  
Because PowerShell automatically loads modules (unless $PSModuleAutoloadingPreference tells it not to), saving the module to a folder listed in $psModulePath means a fresh PowerShell session can go straight to using a remote command;  the first command in a new session might look like this

C:\Users\James\Documents\windowsPowershell> restart-spooler
Creating a new session for implicit remoting of "Restart-Spooler" command...
WARNING: Waiting for service 'Print Spooler (Spooler)' to start...

The message about creating a new session comes from code generated by Export-PSSession which ensures there is always a session available to run the remote command. Get-PSSession will show the session and Remove-PSSession will close it. If a fix is made to the endpoint script which doesn’t change the functions which can be called or their parameters, then removing the session and running the command again will get a new session with the new script. The module is a set of proxies for calling the remote commands, so it only needs to change to support modifications to the commands and their parameters. You can edit the module to add enhancements of your own, and I’ve distributed an enhanced module to users rather than making them export their own. 

You might have noticed that the example script includes comment-based help – eventually there will be client-side tests for the script, written in pester, and following the logic I set out in Help=Spec=Test, the test will use any examples provided. When Export-PsSession creates the module, it includes help tags to redirect requests, so running Restart-Spooler –? locally requests help from the remote session; unfortunately requesting help relies on a existing session and won’t create a new one.

June 29, 2016

Just enough admin and constrained endpoints. Part 1 Understanding endpoints.

Filed under: DevOps,Powershell — jamesone111 @ 1:42 pm

Before we can dive into Just Enough Admin and constrained end points, I think we need fill in some of the background on endpoints and where they fit in PowerShell remoting

When you use PowerShell remoting, the local computer sees a session, which is connected to an endpoint on a remote computer. Originally, PowerShell installations did not enabling inbound sessions but this has changed with newer. If the Windows Remote Management service (also known as WSMAN) is enabled, it will listen on port 5985; you can check with
NetStat -a | where {$_ -Match 5985}
If WSMAN is not listening you can use the Enable-PSRemoting cmdlet to enable it.

With PS remoting enabled you can try to connect. If you run
$s = New-PSSession -ComputerName localhost
from a Non-elevated PowerShell session, you will get an access denied error but from an elevated session it should run successfully. The reason for this is explained later. When then command is successful, $s will look something like this:
Id Name ComputerName State ConfigurationName Availability
-- ---- ------------ ----- ----------------- ------------
2 Session2 localhost Opened Microsoft.PowerShell Available

We will see the importance of ConfigurationName later as well. The Enter-PSSession cmdlet switches the shell from talking to the local session to talking to a remote one running  
Enter-PSSession $s
will change the prompt to something like this
[localhost]: PS C:\Users\James\Documents>
showing where the remote session is connected: Exit-PSSession returns to the original (local) session; you can enter and exit the session at will, or create a temporary session on demand, by running
Enter-PsSession -ComputerName LocalHost

The Get-PsSession cmdlet shows a list of sessions and will show that there is no session left open after exiting an “on-demand” session. As well as interacting with a session you can use Invoke-command to run commands in the session, for example
Invoke-Command -Session $s -ScriptBlock {Get-Process -id $pid}
Handles NPM(K) PM(K) WS(K) VM(M) CPU(s)   Id SI ProcessName PSComputerName
------- ------ ----- ----- ----- ------   -- -- ----------- -------------- 
    547     26 61116 77632 ...45   0.86 5788 0  wsmprovhost      localhost

At first sight this looks like a normal process object, but it has an additional property, "PSComputerName". In fact, a remote process is represented different type of object. Commands in remote sessions might return objects which are not recognised on the local computer. So the object is serialized – converted to a textual representation – sent between sessions, and de-serialized back into a custom object. There are two important things to note about this.

  1. De-serialized objects don’t have Methods or Script Properties. Script properties often will need access to something on the originating machine – so PowerShell tries to convert them to Note Properties. A method can only be invoked in the session where the object was created – not one which was sent a copy of the object’s data.
  2. The object type changes. The .getType() method will return PsObject, and the PSTypeNames property says the object is a Deserialized.System.Diagnostics.Process; PowerShell uses PSTypenames to decide how to format an object and will use rules defined for type X to format a Deserialized.X object.
    However, testing the object type with -is [x] will return false, and a function which requires a parameter to be of type X will not accept a Deserialized.X. In practice this works as a safety-net, if you want a function to be able to accept remote objects you should detect that they are remote objects and direct commands to the correct machine.

Invoke-Command allows commands which don’t support a -ComputerName parameter (or equivalent) to be targeted at a different machine, and also allows commands which aren’t available on the local computer to be used remotely. PowerShell provides two additional commands to make the process of using remote modules easier. Import-PSSession creates a temporary module which contains proxies for all the cmdlets and functions in the remote session that don’t already exist locally, this means that instead of having to write
Invoke-Command -Session $s -ScriptBlock {Get-ADUser}
the Get-ADUser command works much as it would with a locally installed Active Directory module. Using Invoke-Command will return a Deserialized AD user object and the local copy of PowerShell will fall back to default formatting rules to display it; but when the module is created it includes a format XML file describing how to format additional objects.
Import-PSSession adds commands to a single session using a temporary module: its partner Export-PSSession saves a module that can be imported as required – running commands from such a module sets up the remote session and gives the impression that the commands are running locally.

What about the configuration name and the need to logon as Admin?

WSMAN has multiple end points which sessions can connect to, the command Get-PSSessionConfiguration lists them – by default the commands which work with PS Sessions connect to the end point named "microsoft.powershell", but the session can connect to other end points depending on the tasks to be carried out.
Get-PSSessionConfiguration shows that by default for the "microsoft.powershell" endpoint has StartUpScript and RunAsUser properties which are blank and a permission property of
NT AUTHORITY\INTERACTIVE        AccessAllowed,
BUILTIN\Administrators          AccessAllowed,
BUILTIN\Remote Management Users AccessAllowed

This explains why we need to be an administrator (or in the group “Remote Management Users”) to connect. It is possible to modify the permissions with
Set-PSSessionConfiguration -Name "microsoft.powershell" -ShowSecurityDescriptorUI

When Start-up script and Run-As User are not set, the session looks like any other PowerShell session and runs as the user who connected – you can see the user name by running whoami or checking the $PSSenderInfo automatic variable in the remote session.

Setting the Run-As User allows the session to run with more privileges than are granted to the connecting user: to prevent this user running amok, the end point is Constrained a- in simpler terms we put limits what can be done in that session. Typically, we don’t the user to have access to every command available on the remote computer, and we may want to limit the parameters which can be used with those that are allowed. The start-up script does the following to setup the constrained environment:

  • Loads modules
  • Defines proxy functions to wrap commands and modify their functionality
  • Hides cmdlets, aliases and functions from the user.
  • Defines which scripts and external executables may be run
  • Sets the PowerShell language mode, to further limit the commands which can be run in a session, and prevent new ones being defined.

If the endpoint is to work with Active Directory, for example, it might hide Remove-ADGroupMember. (or import only selected commands from the AD module); it might use a proxy function for Add-ADGroupMember so that only certain groups can be manipulated. The DNS Server module might be present on the remote computer but the Import-Module cmdlet is hidden so there is no way to load it.

Hiding or restricting commands doesn’t stop people doing the things that their access rights allow. An administrator can use the default endpoint (or start a remote desktop session) and use the unconstrained set of tools. The goal is to give out fewer admin rights and give people Just Enough Admin to carry out a clearly defined set of tasks: so the endpoint as a privileged account (even a full administrator account) but other, less privileged accounts are allowed to connect run the constrained commands that it provides.
Register-PSSessionConfiguration sets up a new endpoint can and Set-PSSessionConfiguration modifies an existing one ; the same parameters work with both -for example

$cred = Get-Credential
Register-PSSessionConfiguration -Name "RemoteAdmin" `
                               
-RunAsCredential $cred `
                               
-ShowSecurityDescriptorUI  `
                                -StartupScript
'C:\Program Files\WindowsPowerShell\EndPoint.ps1'
The -ShowSecurityDescriptorUI switch pops up a permissions dialog box – to set permissions non-interactively it is possible to use -SecurityDescriptorSddl and specify the information using SDDL but writing SDDL is a skill in itself.

With the end point defined the next part is to create the endpoint script, and I’ll cover that in part 2

June 27, 2016

Technical Debt and the four most dangerous words for any project.

Filed under: Uncategorized — jamesone111 @ 9:15 am

I’ve been thinking about technical debt. I might have been trying to avoid the term when I wrote Don’t swallow the cat, or more likely I hadn’t heard it, but I was certainly describing it – to adapt Wikipedia’s definition it is the future work that arises when something that is easy to implement in the short run is used in preference to the best overall solution”. However it is not confined to software development as Wikipedia suggests.
“Future work” can come from bugs (either known, or yet to be uncovered because of inadequate testing), design kludges which are carried forward, dependencies on out of date software, documentation that was left unwritten… and much more besides.

The cause of technical debt is simple: People won’t say “I (or we) cannot deliver what you want, properly, when you expect it”.
“When you expect it” might be the end of a Scrum Sprint, a promised date or “right now”. We might be dealing with someone who asks so nicely that you can’t say “No” or the powerful ogre to whom you dare not say “No”. Or perhaps admitting “I thought I could deliver, but I was wrong” is too great a loss of face. There are many variations.

I’ve written before about “What you measure is what you get” (WYMIWIG) it’s also a factor. In IT we measure success by what we can see working. Before you ask “How else do you judge success?”, Technical debt is a way to cheat the measurement – things are seen to be working before all the work is done. To stretch the financial parallel, if we collect full payment without delivering in full, our accounts must cover the undelivered part – it is a liability like borrowing or unpaid invoices.

Imagine you have a deadline to deliver a feature. (Feature could be a piece of code, or an infrastructure service however small). Unforeseeable things have got in the way. You know the kind of things: the fires which apparently only you know how to extinguish, people who ask “Can I Borrow You”, but should know they are jeopardizing your ability to meet this deadline, and so on.
Then you find that doing your piece properly means fixing something that’s already in production. But doing that would make you miss the deadline (as it is you’re doing less testing than you’d like and documentation will have to be done after delivery). So you work around the unfixed problem and make the deadline. Well done!
Experience teaches us that making the deadline is rewarded, even if you leave a nasty surprise for whoever comes next – they must make the fix AND unpick your workaround. If they are up against a deadline they will be pushed to increase the debt. You can see how this ends up in a spiral: like all debt, unless it is paid down, it increases in future cycles.

The Quiet Crisis unfolding in Software Development has a warning to beware of high performers, they may excel at the measured things by cutting corners elsewhere. It also says watch out for misleading metrics – only counting “features delivered” means the highest performers may be leaving most problems in their wake. Not a good trait to favour when identifying prospective managers.

Sometimes we can say “We MUST fix this before doing anything else.”, but if that means the whole team (or worse its manager) can’t do the thing that gets rewarded then we learn that trying to complete the task properly can be unpopular, even career limiting. Which isn’t a call to do the wrong thing: some things can be delayed without a bigger cost in the future; and borrowing can open opportunities that refusing to ever take on any debt (technical or otherwise) would deny us. But when the culture doesn’t allow delivery plans to change, even in the face of excessive debt, it’s living beyond its means and debt will become a problem.

We praise delivering on-time and on-budget, but if capacity, deadline and deliverables are all fixed, only quality is variable. Project management methodologies are designed to make sure that all these factors can be varied and give project teams a route to follow if they need to vary by too great a margin. But a lot of work is undertaken without this kind of governance. Capacity is what can be delivered properly in a given time by the combination of people, skills, equipment and so on, each of which has a cost. Increasing headcount is only one way to add capacity, but if you accept adding people to a late project makes it later then it needs to be done early. When me must demonstrate delivery beyond our capacity, it is technical debt that covers the gap.

Forecasting is imprecise, but it is rare to start with plan we don’t have the capacity to deliver. I think another factor causes deadlines which were reasonable to end up creating technical debt.

The book The Phoenix Project has a gathered a lot of fans in the last couple of years, and one of its messages is that Unplanned work is the enemy of planned work. This time management piece separates Deep work (which gives satisfaction and takes thought, energy, time and concentration) from Shallow work (the little stuff). We can do more of value by eliminating shallow work and the Quiet Crisis article urges managers to limit interruptions and give people private workspaces, but some of each day will always be lost to email, helping colleagues and so on.

But Unplanned work is more than workplace noise. Some comes from Scope Creep, which I usually associate with poor specification, but unearthing technical debt expands the scope, forcing us to choose between more debt and late delivery. But if debt is out in the open then the effort to clear it – even partially – can be in-scope from the start.
Major incidents can’t be planned and leave no choice but to stop work and attend to them. But some diversions are neither noise, nor emergency. “Can I Borrow You?” came top in a list of most annoying office phrases and “CIBY” serves as an acronym for a class of diversions which start innocuously. These are the four dangerous words in the title.

The Phoenix Project begins with the protagonist being made CIO and briefed “Anything which takes focus away from Phoenix is unacceptable – that applies to whole company”. For most of the rest of the book things are taking that focus. He gets to contrast IT with manufacturing where a coordinator accepts or declines new work depending on whether it would jeopardize any existing commitments. Near the end he says to the CEO Are we even allowed to say no? Every time I’ve asked you to prioritize or defer work on a project, you’ve bitten my head off. …[we have become] compliant order takers, blindly marching down a doomed path”. And that resonates. Project steering boards (or similarly named committees) can to assign capacity to some projects and disappoint others. Without one – or if it is easy to circumvent – we end up trying to deliver everything and please everyone;  “No” and “What should I drop?” are answers, we don’t want to give especially to those who’ve achieved their positions by appearing to deliver everything, thanks to technical debt.

Generally, strategic tasks don’t compete to consume all available resources. People recognise these should have documents covering

  • What is it meant to do, and for whom? (the specification / high level design)
  • How does it do it? (Low level design, implementation plan, user and admin guides)
  • How do we know it does what it is meant to? (test plan)

But “CIBY” tasks are smaller, tactical things; they often lack specifications: we steal time for them from planned work assuming we’ll get them right first time, but change requests are inevitable. Without a spec, there can be no test plan: yet we make no allowance for fixing bugs. And the work “isn’t worth documenting”, so questions have to come back to the person who worked on it.  These tasks are bound to create technical debt of their own and they jeopardize existing commitments pushing us into more debt.

Optimistic assumptions aren’t confined to CIBY tasks. We assume strategic tasks will stay within their scope: we set completion dates using assumptions about capacity (the progress for each hour worked) and about the number of hours focused on the project each day. Optimism about capacity isn’t a new idea, but I think planning doesn’t allow for shallow / unplanned work – we work to a formula like this:
TIME = SCOPE / CAPACITY
In project outcomes, debt is a fourth variable and time lost to distracting tasks a fifth. A better formula would look like this
DELIVERABLES = (TIME * CAPACITY) – DISTRACTIONS + DEBT  

Usually it is the successful projects which get a scope which properly reflects the work needed, stick to it, allocate enough time and capacity and hold on to it. It’s simple in theory, and projects which go off the rails don’t do it in practice, and fail to adjust. The Phoenix Project told how failing to deliver “Phoenix” put the company at risk. After the outburst I quoted above, the CIO proposes putting everything else on hold, and the CEO, who had demanded 100% focus on Phoenix, initially responds “You must be out of your right mind”. Eventually he agrees, Phoenix is saved and the company with it. The book is trying to illustrate many ideas, but one of them boils down to “the best way to get people to deliver what you want is to stop asking them to deliver other things”.

Businesses seem to struggle to set priorities for IT: I can’t claim to be an expert in solving this problem, but the following may be helpful

Understanding the nature of the work. Jeffrey Snover likes to say “To ship is to choose”. A late project must find an acceptable combination of additional cost, overall delay, feature cuts, and technical debt. If you build websites, technical debt is more acceptable than if you build aircraft. If your project is a New Year’s Eve firework display, delivering without some features is an option, delay is not. Some feature delays incur cost, but others don’t.

Tracking all work: Have a view of what is completed, what is in Progress, what is “up next”, and what is waiting to be assigned time. The next few points all relate to tracking.
Work in progress has already consumed effort but we only get credit when it is complete. An increasing number of task in progress may mean people are passing work to other team members faster than their capacity to complete it or new tasks are interrupting existing ones.
All work should have a specification
before it starts. Writing specifications takes time, and “Create specification for X” may be task in itself.
And yes, I do know that technical people generally hate tracking work and writing specifications. 
Make technical debt visible. It’s OK to split an item and categorize part as completed and the rest as something else. Adding the undelivered part to the backlog keeps it as planned work, and also gives partial credit for partial delivery – rather than credit being all or nothing. It means some credit goes to the work of clearing debt.
And I also know technical folk see “fixing old stuff” as a chore, but not counting it just makes matters worse.
Don’t just track planned work. Treat jobs which jumped the queue, that didn’t have a spec or that displaced others like defects in a manufacturing process – keep the score, and try to drive it down to zero. Incidents and “CIBY” jobs might only be recorded as an afterthought but you want see where they are coming from and try to eliminate them at source.

Look for process improvements. if a business is used to lax project management, it will resist attempts to channel all work through a project steering board. Getting stakeholders together in a regular “IT projects meeting” might be easier, but get the key result (managing the flow of work).

And finally Having grown-up conversations with customers.
Businesses should understand the consequences of pushing for delivery to exceed capacity; which means IT (especially those in management) must be able to deliver messages like these.
“For this work to jump the queue, we must justify delaying something else”
“We are not going be able to deliver [everything] on time”, perhaps with a follow up of “We could call it delivered when there is work remaining but … have you heard of technical debt?”

June 1, 2016

A different pitch for Pester

Filed under: DevOps,Powershell,Testing — jamesone111 @ 2:10 pm

If you work with PowerShell but don’t consider yourself to be a developer, then when people get excited by the new (newish) testing framework named Pester you might think “what has that got to with me” …
Pester is included with PowerShell 5 and downloadable for older versions, but most things you find written abut it are by software testers for software testers – though that is starting to change. This post is for anyone thinks programs are like Sausages: you don’t want to know how either are made.

Let’s consider a way of how we’d give someone some rules to check something physical 
“Here is a description of an elephant
It is a mammal
It is at least 1.5 M tall
It has grey wrinkly skin
It has long flexible nose” 

Simple stuff. Tell someone what you are describing, and make simple statements about it (that’s important, we don’t say “It is a large grey-skinned mammal with a long nose” . Check those statements and if they are all true you can say “We have one of those”. So lets do the same, in PowerShell for something we can test programmatically – this example  has been collapsed down in the ISE which shows a couple of “special” commands from Pester

$Connections = Get-NetIPConfiguration | Where-Object {$_.netadapter.status -eq "up" }
Describe "A computer with an working internet connection on my home network" {
    It "Has a connected network interface"  {...}
    foreach ($c in $Connections)            {  
        It "Has the expected Default Gateway on the interface named  '$($C.NetAdapter.InterfaceDescription)' "   {...}
        It "Gets a 'ping' response from the default gateway for      '$($C.NetAdapter.InterfaceDescription)' "   {...} 
        It "Has an IPV4 DNS Server configured on the interface named '$($C.NetAdapter.InterfaceDescription)' "   {...}
    }
    It "Can resolve the DNS Name 'www.msftncsi.com' " {...}
    It "Fetches the expected data from the URI 'http://www.msftncsi.com/ncsi.txt' " {...}
}

So Pester can help testing ANYTHING, it isn’t just for checking that Program X gives output Y with input Z: Describe which says what is being tested
Describe "A computer with an working internet connection on my home network" {}
has the steps needed to perform the test inside the braces. Normally PowerShell is easier to read with parameter names included but writing this out in full as
Describe -Name "A computer with an working internet connection on my home network" -Fixture  {}
would make it harder to read, so the norm with Pester is to omit the switches.  
We describe a working connection by saying we know that it has a connected network, it has the right default gateway and so on. The It statements read just like that with a name and a test inside the the braces (again switches are omitted for readability). When expanded, the first one in the example looks like this.

     It "Has a connected network interface"  {
        $Connections.count | Should not beNullOrEmpty
    }

Should is also defined in Pester. It is actually a PowerShell function which goes to some trouble to circumvent normal PowerShell syntax (the PowerShell purist in me doesn’t like that, but and I have to remember the famous quote about “A foolish consistency is the hobgoblin of little minds”) the idea is to make the test read more like natural language than programming.
This example has a test that says there should be some connections, and then three tests inside a loop use other variations on the Should syntax.

$c.DNSServer.ServerAddresses -join "," | Should match "\d+\.\d+\.\d+\.\d+"
$c.IPv4DefaultGateway.NextHop          | Should  be "192.168.0.1"
{
Test-Connection -ComputerName $c.IPv4DefaultGateway.NextHop  -Count 1} | Should not throw

You can see Should allows us to check for errors being thrown (or not) empty values (or not) regular expression matches (or not) values, and depending on what happens in the Should the it command can decide if that test succeeded. When one Should test fails the script block being run by the It statement stops, so in my example it would be better to combine “has a default gateway”, and “Gets a ping response” into a single It, but as it stands the script generates output like this:

Describing A computer with an working internet connection on my home network
[+] Has a connected network interface 315ms
[+] Has the expected Default Gateway on the interface named  'Qualcomm Atheros AR956x Wireless Network Adapter'  56ms
[+] Gets a 'ping' response from the default gateway for      'Qualcomm Atheros AR956x Wireless Network Adapter'  524ms
[+] Has an IPV4 DNS Server configured on the interface named 'Qualcomm Atheros AR956x Wireless Network Adapter'  25ms
[+] Can resolve the DNS Name 'www.msftncsi.com'  196ms
[+] Fetches the expected data from the URI 'http://www.msftncsi.com/ncsi.txt'  603ms

Pester gives this nicely formatted output without having to do any extra work  – it can also output the results as XML so we can gather up the results for automated processing. It doesn’t allow us to test anything that couldn’t be tested before – the benefit is it simplifies turning a description of the test into a script that will perform it and give results which mirror the description.
The first example showed how a folding editor (the PowerShell ISE or any of the third party ones) can present the script as so it looks like a the basic specification.
Here’s an outline of a test to confirm that a machine had been built correctly – I haven’t filled in the code to test each part.  
Describe "Server 432" {
   It "Is Registered in Active Directory"                 {}
   It "Is has an A record in DNS"                         {}
   It "Responds to Ping at the address in DNS"            {}
   It "Accepts WMI Connections and has the right name"    {}
   It "Has a drive D: with at least 100 GB of free space" {}
   It "Has Dot Net Framework installed"                   {}
}
 
This doesn’t need any PowerShell knowledge: it’s easy to take a plain text file with suitable indents and add the Describes, Its, braces and quote marks – and hand the result to someone who knows how to check DNS from PowerShell and so on, they can fill in the gaps. Even before that is done the test script still executes. 

Describing Server 432
[?] Is Registered in Active Directory 32ms
[?] Is has an A record in DNS 13ms
[?] Responds to Ping at the address in DNS 4ms
[?] Accepts WMI Connections and has the right name 9ms
[?] Has a drive D: with at least 100 GB of free space 7ms
[?] Has Dot Net Framework installed 5ms

The test output uses [+] for a successful test, [-] for a failure, [!] for one it was told to skip, and [?] for one which is “pending”, i.e. we haven’t started writing it. 
I think it is good to start with relatively simple set of tests, and add to them, so for checking the state of a machine, is such-and-such a service present and running, are connections accepted on a particular port, is data returned, and so on.  In fact whenever we find something wrong which can be tested it’s often a good idea to add a test for that to the script.

So if you were in any doubt at the start, hopefully you can see now that Pester is just as valuable as a tool for Operational Validation as it is for software testing.

May 31, 2016

Help = Spec = Test

Filed under: Powershell,Testing — jamesone111 @ 2:55 pm

Going back for some years – at least as far the talks which turned into the PowerShell Deep Dives book – I have told people ”Start Help Early” (especially when you’re writing anything that will be used by anyone else).
In the face of time pressure documentation is the first thing to be cut – but this isn’t a plea to keep your efforts from going out into the world undocumented. 
Help specifies what the command does, and help examples are User Stories – a short plain English description of something someone needs to do.
Recently I wrote something to combine the steps of setting up a Skype for business (don’t worry – you don’t need to know S4B to follow the remainder) – the help for one of the commands looked like this

<#
.SYNOPSIS
Sets up a Skype for business user including telephony, conference PIN and Exchange Voice mail
.EXAMPLE
Initialize-CsUser –ID bob@mydomain –PhoneExtension 1234 –pin 2468 –EnterpriseVoice
Enables a pre-existing user, with enterprise voice, determines and grants the correct voice policy,
sets a conferencing PIN, updates the Phone number in AD, and enables voice mail for them in Exchange.
#>

I’ve worked with people who would insist on writing user stories as “Alice wants to provision Bob… …to do this she …”  but the example serves well enough as both help for end users and a specification for one use case: after running the command  user “bob” will

  • Be enabled for Skype-for-Business with Enterprise Voice – including “Phone number to route” and voice policy
  • Have a PIN to allow him to use voice conferencing
  • Have a human readable “phone number to dial”  in AD
  • Have appropriate voice mail on Exchange

The starting point for a Pester test (the Pester testing module ships with PowerShell V5, and is downloadable for earlier versions) ,  is set of simple statements like this – the thing I love about Pester it is so human readable.

Describe "Adding Skype for business, with enterprise voice, to an existing user"  {
### Code to do it and return the results goes here
    It "Enables enterprise voice with number and voice policy" {    }
    It "Sets a conference PIN"                                 {    }
    It "Sets the correct phone number in the directory"        {    }
    It "Enables voice mail"                                    {    }
}

The “doing” part of the test script is the command line from the example (through probably with different values for the parameters).
Each thing we need to check to confirm proper operation is named in an It statement with the script to test it inside the braces. Once I have my initial working function, user feedback will either add further user stories (help examples), which drive the creation of new tests or it will refine this user story leading either to new It lines in an existing test (for example “It Sets the phone number in AD in the correct format”) or to additional tests (for example “It generates an error if the phone number has been assigned to another user”)

In my example running the test a second time proves nothing, because the second run will find everything has already been configured, so a useful thing to add to the suite of tests would be something to undo what has just been done. Because help and test are both ways of writing the specification, you can start by writing the specification in the test script – a simplistic interpretation of “Test Driven Development”.  So I might write this

Describe "Removing Skype for business from a user"   {
### Code to do it and return the results goes here       
    It "Disables S4B and removes any voice number"   {    } –Skip
    It "Removes voice mail"                          {    } –Skip
}

The –Skip prevents future functionality from being tested. Instead of making each command a top-level Describe section in the Pester script, each can be a second-level Context section.

Describe "My Skype for business add-ons" {
    Context "Adding Skype for business, with enterprise voice, to an existing user"   {...}
    Context "Removing Skype for business from a user"  {...}
}

So… you can start work by declaring the functions with their help and then writing the code to implement what the help specifies, and finally create a test script based on the Help/Spec OR you can start by writing the specification as the outline of a Pester test script, and as functionality is added, the help for it can be populated with little more than a copy and paste from the test script.
Generally, the higher level items will have a help example, and the lower level items combine to give the explanation for the example. As the project progresses, each of the It commands has its –Skip removed and the test block is populate, to-do items show up on the on the test output as skipped.

Describing My Skype for business add-ons
   Context Adding Skype for business, with enterprise voice, to an existing user

    [+] Sets the phone number to call in the directory 151ms
    [+] Enables enterprise voice with the phone number to route and voice policy  33ms
    [+] Sets a conference PIN  18ms
    [+] Enables voice mail  22ms

   Context Removing Skype for business from a user
    [!] Disables S4B and removed any voice number 101ms
    [!] Removes voice mail 9m
Tests completed in 347ms
Passed: 4 Failed: 0 Skipped: 2 Pending: 0
 

With larger pieces of work it is possible to use –skip and an empty script block for an It statement to mean different things (Pester treats the empty script block as “Pending”), so the test output can shows me which parts of the project are done, which are unfinished but being worked on, and which aren’t even being thought about at the moment, so it compliments other tools to keep the focus on doing the things that are in the specification. But when someone says “Shouldn’t it be possible to pipe users into the remove command”, we don’t just go and write the code, we don’t even stop at writing and testing. We bring the example in to show that way of working.

May 23, 2016

Good and bad validation in PowerShell

Filed under: Powershell — jamesone111 @ 10:35 am
Tags:

I divide input validation into good and bad. image

Bad validation on the web makes you hate being a customer of a whichever organization. It’s the kind which says “Names can only contain alphabetic characters” so O’Neill isn’t a valid name.
Credit card companies think it’s easier to write blocks of 4 digits but how many web sites demand an unbroken string of 16 digits?

Good validation tolerates spaces and punctuation and also spots credit card numbers which are too short or don’t checksum properly and knows the apostrophe needs special handling. Although it requires the same care on the way out as on the way in as this message from Microsoft shows.
And bad validation can be good validation paired with an unhelpful message  – for example telling your new password you chose isn’t acceptable without saying what is.

In PowerShell, parameter declarations can include validation, but keep in mind validation is not automatically good.
Here’s good validation at work: I can write parameters like this. 
     [ValidateSet("None", "Info", "Warning", "Error")]
     [string]$Icon = "error"

PowerShell’s intellisense can complete values for the -Icon parameter, but if I’m determined to put an invalid value in here’s the error I get.
Cannot validate argument on parameter 'Icon'.
The argument "wibble" does not belong to the set "None,Info,Warning,Error" specified by the ValidateSet attribute.
Supply an argument that is in the set and then try the command again.

It might be a bit a verbose, but it’s clear what is wrong and what I have to do to put it right. But PowerShell builds its messages from templates and sometimes dropping in the text from the validation attribute gives something incomprehensible, like this 
Cannot validate argument on parameter 'Path'.
The argument "c:" does not match the "^\\\\\S*\\\S*$" pattern.
Supply an argument that matches "^\\\\\S*\\\S*$" and try the command again.

This is trying to use a regular expression to check for a UNC path to a share ( \\Server\Share), but when I used it in a conference talk none of 50 or 60 PowerShell experts could work that out quickly. And people without a grounding in regular expressions have no chance.
Moral: What is being checked is valid but to get a good message, do the test in the body of the function.

Recently I saw this – or something like it via a link from twitter.

function Get-info {
  [CmdletBinding()]
  Param (
          [string]$ComputerName
  )
  Get-WmiObject –ComputerName $ComputerName –Class 'Win32_OperatingSystem'
}

Immediately I can see too things wrong with the parameter.
First is “All parameters must have a type” syndrome. ComputerName is a string, right? Wrong! GetWmiObject allows an array of strings, most of the time you or I or the person who wrote the example will call it with a single string, but when a comma separated list is used the “Make sure this is a string” validation concatenates the items into a single string.
Moral. If a parameter is passed straight to something else, either copy the type from there or don’t specify a type at all.

And Second, because the parameter isn’t mandatory and doesn’t’ have a default, so if we run the function with no parameter, it calls Get-WmiObject with a null computer name, which causes an error. I encourage people to get in the habit of setting defaults for parameters.

The author of that article goes on to show that you can use a regular expression to validate the input. As I’ve shown already regular expression give unhelpful error messages, and writing comprehensive ones can be and art in itself in the example, the author used
  [ValidatePattern('^\w+$')]
But if I try
Get-info MyMachine.mydomain.com
Back comes a message to
Supply an argument that matches "^\w+$" and try the command again
The author specified only “word” characters (letters and digits), no dots, no hyphens and so on. The regular expression can be fixed, but as it becomes more complicated, the error message grows harder to understand.

He moves on to a better form of validation, PowerShell supports a validation script for parameters, like this
[ValidateScript({ Test-Connection -ComputerName $_ -Quiet -Count 1 })]
This is a better test, because it checks whether the target machine is pingable or not. But it is still let down by a bad error message.
The " Test-Connection -ComputerName $_ -Quiet -Count 1 " validation script for the argument with value "wibble" did not return a result of True.
Determine why the validation script failed, and then try the command again.

In various PowerShell talks I’ve said that a user should not have to understand the code inside a function in order to use the function. In this case the validation code is simple enough that someone working knowledge of PowerShell can figure out the problem but, again, to get a good message, do the test in the body seems good advice, in simple form the test would look like this
if (Test-Connection -ComputerName $ComputerName -Quiet -Count 1) {
        Get-WmiObject –ComputerName $ComputerName –Class 'Win32_OperatingSystem'
}
else {Write-Warning "Can't connect to $computername" }

But this doesn’t cope with multiple values in computer name – if any are valid the code runs so it would be better to run.
foreach ($c in $ComputerName) {
    if (Test-Connection -ComputerName $c -Quiet -Count 1 ) {
        Get-WmiObject –ComputerName $c –Class 'Win32_OperatingSystem'
    }
    else {Write-Warning "Can't connect to $c"}
}

This doesn’t support using “.” to mean “LocalHost” in Get-WmiObject – hopefully by now you can see the problem: validation strategies can either end up stopping things working which should work or the validation becomes a significant task. If a bad parameter can result in damage, then a lot validation might be appropriate. But this function changes nothing so there is no direct harm if it fails; and although the validation prevents some failures, it doesn’t guarantee the command will succeed. Firewall rules might allow ping but block an RPC call, or we might fail to logon and so on. In a function which uses the result of Get-WmiObject we need to check that result is valid before using it in something else. In other words, validating the output might be better than validating the input.

Note that I say “Might”: validating the output isn’t always better. Depending on the kind of things you write validating input might be best, most of the time. Think about validation rather than cranking it out while running on autopilot. And remember you have three duties to your users

  • Write help (very simple, comment-based help is fine) for the parameter saying what is acceptable and what is not. Often the act of writing “The computer name must only contain letters” will show you that you have lapsed into Bad validation
  • Make error messages understandable. One which requires the user to read code or decipher a regular expression isn’t, so be wary of using some of the built in validation options.
  • Don’t break things. Work the way the user expects to work. If commands which do similar things take a list of targets, don’t force a single one.
    If “.” works, support it.
    If your code uses SQL syntax where “%” is a wildcard, think about converting “*” to “%”, and doubling up single apostrophes (testing with my own surname is a huge help to me!)
    And if users want to enter redundant spaces and punctuation, it’s your job to remove them.

September 1, 2014

The start of a new chapter.

Filed under: General musings — jamesone111 @ 7:19 pm

A symbolic moment earlier, I updated my Linked-in profile. From September 1st I am Communications and Collaboration Architect at the MERCEDES AMG PETRONAS Formula One Team.
Excited doesn’t really cover it – even if there are some “new school” nerves too. I’ve spent the last three years working with people I thought the world of, so I’m sad to say goodbye to them; but this is a role I’d accept at almost any company – but at a company where I’d take just about any role – I’ve joked with that if they had offered me a job as senior floor sweeper I’d have asked "how senior ?" People I’ve told have said “Pretty much your dream job then ?”. Yes, in a nutshell.

F1 is a discipline where you can lose a competitive advantage by careless talk: anything to do with the car or comings and goings at the factory are obviously off limits. Pat Simmons of Williams said in a recent interview that the intellectual property (IP) in racing “is not the design of our front wing endplate, you can take a photo of that. The IP is the way we think, the way we operate, the way we do things.” The first lesson of induction at Microsoft was ‘Never compromise the IP’ (and I learnt that IP wasn’t just the software, but included the processes used in Redmond). So although it’s been part of my past jobs to talk about what the company was doing, at Mercedes it won’t be. I find F1 exciting – more so this season than the last few – and it’s not really possible to be excited but not have opinions about the sport, although things I’ve said in the past don’t all match my current opinion: the James Hunt/ Niki Lauda battle of 1976 is almost my first F1 memory, if I still thought of Lauda as the enemy I wouldn’t work for a company which had him as chairman. In fact one of the initial attractions of the team for me was the degree I found myself agreeing with what their management said in public. Commenting on the F1 issues of the day from inside a team looks like something which needs to take a lot of different sensitivities into account and it’s something I’m more than happy to leave to those who have it in their job description.

If I have interesting things to blog about things which don’t relate to motor sport or the job or about software which anyone working with same produces could find out for themselves (i.e. not specific to one company), then hopefully the blog posts will continue. 

June 16, 2014

A trick with Powershell aliases–making them do parameters

Filed under: Powershell — jamesone111 @ 10:37 am

 

The first thing that everyone learns about PowerShell Aliases is that they just replace the name of a command, aliases don’t do parameters.
DIR is an alias for Get-ChildItem ; you can’t make an alias RDIR for Get-ChildItem –Recurse. If you want that you need a function.
To quote Redmond’s most famous resident* I canna change the laws of physics, Captain, but I can find ye a loophole.

I wrote a function which I use a lot – 100+ times some days – named Get-SQL. Given an unknown command “xxx”,  PowerShell will see if there is a command “Get-XXX” before reporting an “Not recognized” error, so I usually just run it as “SQL” without the to Get-. The function talks to databases and sessions it keeps connections open between calls: connections tend to named after places in South America, so to open a session I might run
> SQL -Session Peru -Connection DSN=PERU
and then to find out the definition of a table I use
> SQL -Session Peru -Describe Projects
I’ve previously written about Jason Shirk’s tab expansion++ which gets a list of tables available to -Describe (or tables which can be Selected from, or updated or Inserted into) in that session, and provides tab expansion or an intellisense pick list: this is incredibly useful when you can’t remember if the table is named “project”, or “Projects”, and tab expansion++ does the same for field names so I don’t have to remember if the field I want is named “ID_Project”, “Project_ID”, “ProjectID” or simply “ID”

Get-SQL has a default session: I might use-Session Peru 20 times in a row but still want to leave the default connection alone
I found myself thinking ‘I want “Peru” to be an alias for Get-SQL -Session Peru. One line – New-Alias -Name $session -Value something – inside Get-SQL could set it all up for me when I make the connection.’
As I said we all know you can’t do that with an alias, but doing this with functions is – bluntly – a nightmare, creating functions on the fly is possible but awkward, and Tab expansion++ wouldn’t know it was supposed to work with them (it does figure out aliases). Defining the functions in advance for each data source is would give me a maintenance headache…

Then I had a flash of inspiration: if I needed this to work for a built-in cmdlet, I’d need to create a proxy function … but Get-SQL is already a function. So, if I can write something in the function to check how it was invoked it can say “A-ha! I was called as ‘Peru’ and ‘Peru’ is the name of a database session, so I should set $session to ‘Peru’.” Whatever the alias is, provided there is a connection of the same name it will get used. This turns out to be almost laughably easy.

In my Get-SQL function the $session Parameter is declared like this
[ValidateNotNullOrEmpty()]
[string]$Session = "Default"

A function can find out the name was used to invoke it by looking at $MyInvocation.InvocationName. If Get-SQL is invoked with no value provided for -session the value of $Session will be set to  ‘Default’: if that is the case and there is a database session whose name matches the invocation name then that name should go into $Session, like this:   
if ($Session -eq "Default" -and  $Global:DbSessions[$MyInvocation.InvocationName])
    {
$Session = $MyInvocation.InvocationName}

Of course the parameter is not part of the alias definition – but the function can detect the alias and set the parameter internally – the laws stand, but I have my loophole. Although it’s split here into two lines I think of the IF statement as one line of code. When Get-SQL creates a connection it finishes by calling New-Alias -Name $session -Value Get-SQLForce. So two lines give me what I wanted.
Tab expansion++ was so valuable, but stopping here would mean its argument completers don’t know what to do – when they need a lists for fields or tables they call Get-SQL, and this worked fine when a -session parameter was passed but I’ve gone to all this trouble to get rid of that parameter, so now the completers will try to get a list by calling the function using its canonical name and the default connection. There is a different way to find the invocation name inside an argument completer – by getting it from the parameter which holds the Command Abstract Syntax Tree, like this:   

$cmdnameused = $commandAst.toString() -replace "^(.*?)\s.*$",'$1'
if ($Global:DbSessions[$cmdnameused]) {$session = $cmdnameused}
else {set $session the same way as before}
 

> PeruDescribe 

Will pop up the list of tables in that database for me to choose “Projects”. There was a fair amount of thinking about it, but as you can see, only four lines of code. Result


* James Doohan – the actor who played “Scotty” in Star Trek actually lived in Redmond – ‘home’ of Microsoft, though Bill Gates and other famous Microsoft folk lived elsewhere around greater Seattle. So I think it’s OK to call him the most famous resident of the place.

June 9, 2014

Screen scraping for pleasure or profit (with PowerShell’s Invoke-RestMethod)

Filed under: Powershell — jamesone111 @ 3:28 pm

Twenty odd years ago I wrote some Excel Macros to get data into Excel from one of the worlds less friendly materials management systems. It was far easier to work with the data in Excel,  and the Macros were master-pieces of send-keys and prayer-based parsing and it was the first time I heard the term Screen Scrape . (When I searched for “Prayer-based parsing” – it took me to another page on this blog where you might develop the argument that in Unix and Linux, the pipe we take for granted is little more than an automated screen scrape).

The technology changes (and gets easier) but the same things push us to screen scrape. Data would be more useful in from other than the one in which it is made available. There is a benefit if we can shift it from one form to the other. There are times when taking the data might break usage terms, or have other legal or ethical reasons why you shouldn’t do it. You need to work out those questions for yourself, here I’m just going to talk about one technique which I’ve found myself using several times recently.  

imageA lot of the data we want today is delivered in web pages. This, of itself, can be helpful, because well formed HTML can often be treated as XML data so with something like PowerShell the parsing is (near enough) free. Instead of having to pull the text completely apart there might be a small amount of preliminary tidying up followed and then putting [XML] in front of something conjures up a navigable hierarchy of data. The data behind my post on Formula one statistics was gathered mainly using this method.  (And the odd regular expression for pulling a needle-like scrap of information from a Haystack of HTML)

But when you’re lucky, the you don’t even need to parse the data out of the format it is displayed. Recently someone sent me a link a to a story about Smart phone market share. It’s presented as a map but it’s not a great way to see how share has gone or down at different times, or compare one place against another at the same time. These days when I see something like this I think “how do they get the data for the map”. The easy way to find out out is to Press [F12] in internet explorer, turn on the network monitor (click the router Icon and then the  “play” button) and reload the page. The hope is a tell-tale sign of data being requested so and processed by the browser ready for display. Often this data will jump out because of the way it is formatted. And circled in the network trace is some JSON format data. JSON or XML data is a real gift for PowerShell….

Once upon a time if you wanted to get data from a web server you had to write a big chuck of code. Then Microsoft provided a System.Net.WebClient object which would do fetching and carrying but left the parsing to you. In recent versions of PowerShell there are two cmdlets, Invoke-WebRequest is basically a wrapper for this. It will do some crunching of the HTML page so you can work with the document. PowerShell also has ConvertFrom-JSON, so you can send the content into that and don’t have to write your own JSON parser. But it’s even easier than that. Invoke-RestMethod will get the page and if it can parse what comes back, it does: so you don’t need a separate convert step. So I can do get the data back and quickly explore it like this:

> $j = Invoke-RestMethod -Uri http://www.kantarworldpanel.com/php/comtech_data.php
>$j
years                                                                                                        
-----                                                                                                        
{@{year=2012; months=System.Object[]}, @{year=2013; months=System.Object[]}, @{year=2014; months=System.Obj...

> $j.years[0]
year                                                    months                                               
----                                                    ------                                               
2012                                                    {@{month=0; cities=System.Object[]}, @{month=1; cit...

> $j.years[0].months[0]
month                                                   cities                                               
-----                                                   ------                                               
0                                                       {@{name=USA; lat=40.0615504; lng=-98.51893029999997...

> $j.years[0].months[0].cities[0]
name                        lat                         lng                         platforms                
----                        ---                         ---                         ---------                
USA                         40.0615504                  -98.51893029999997          {@{name=Android; share=...

> $j.years[0].months[0].cities[0].platforms[0]
name                                                    share                                                
----                                                    -----                                                
Android                                                 43 

From there it’s a small step to make something which sends the data to the clipboard ready formatted for Excel

$j = Invoke-RestMethod -Uri "http://www.kantarworldpanel.com/php/comtech_data.php&quot;
$(   foreach ($year  in $J.years)  {
        foreach ($month in $year.months) {
            foreach ($city in $month.cities) {
                foreach ($platform in $city.platforms) {
                    ('{0} , {1} , "{2}" , "{3}", {4}' -f $year.year ,
                   
  ($month.month + 1) , $city.name , $platform.name , $platform.share) }}}}) | clip

Web page to Excel-ready in less time than it takes to describe it.  The other big win with these two cmdlets is that they understand how to keep track of sessions (which technically means Invoke-RestMethod  can also work with things which aren’t really RESTful)  Invoke-WebRequest understands forms on the page. I can logon to a photography site I use like this
if (-not $cred) { $Global:cred  = Get-Credential -Message "Enter logon details For the site"}
$login                          = Invoke-WebRequest "$SiteRoot/login.asp" –SessionVariable SV
$login.Forms[0].Fields.email    = $cred.UserName
$login.Forms[0].Fields.password = $cred.GetNetworkCredential().Password
$homePage                       = Invoke-WebRequest ("$SiteRoot/" + $login.Forms[0].action) -WebSession $SV -Body $login -Method Post
if ($homePage.RawContent        -match "href='/portfolio/(\w+)'")
    {$SiteUser                    = $Matches[1]}

Provided my subsequent calls to Invoke-WebRequest and Invoke-RestMethod specify the same session variable in their –WebSession parameter the site treats me as logged in and gives me my  data (which mostly is ready prepared JSON with the occasional needs to be parsed out of the HTML) rather than public data. With this I can get straight to the data I want with only the occasional need to resort to regular expressions. Once you have done one or two of these you start seeing a pattern for how you can either pick up XML or JSON data, or how you can Isolate links from a page of HTML which contain the data you want, and then isolate the table which has the data

For example

$SeasonPage  =  Invoke-WebRequest -Uri  "http://www.formula1.com/results/season/2014/&quot;
$racePages   =  $SeasonPage.links | where  href -Match "/results/season/2014/\d+/$"
Will give me a list in $racepages of all the results pages for the 2014 F1 seasons  – I can loop though the data. Setting $R to the different URLS.

$racePage        = Invoke-WebRequest -Uri $R
$contentmainDiv  = ([xml]($racePage.ParsedHtml.getElementById("contentmain").outerhtml -replace "&nbsp;"," ")).div
$racedate        = [datetime]($contentmainDiv.div[0].div.span -replace "^.*-","")

$raceName        = $contentmainDiv.div[0].div.h2

Sometimes finding the HTML section to convert to XML is easy – here it needed a little tweak to remove non breaking spaces because the [XML] type converter doesn’t like them, but once I have $contentMainDiv I can start pulling race data out  – and 20 years on Excel is still the tool of choice, and how I get data into Excel will have to wait for another day.

March 19, 2014

Exploring the Transcend Wifi-SD card

Filed under: Linux / Open Source,Photography — jamesone111 @ 1:37 pm
Tags: , , , , ,

There are a number variations on a saying  ”Never let a programmer have a soldering iron; and never let an engineer have a compiler”

WP_20140309_11_34_59_ProIt’s been my experience over many years that hardware people are responsible for rubbish software. Years upon years of shoddy hardware drivers, dreadful software bundled with cameras (Canon, Pentax I’m looking at both of you); Printers (HP, Epson), Scanners (HP – one day I might forgive you) have provided the evidence. Since leaving Microsoft I’ve spent more time working with Linux, and every so often I get into a rant about the lack of quality control: not going back and fixing bugs, not writing proper documentation (the “Who needs documentation when you’ve got Google” attitude meant when working on one client problem all we could find told us it could not be solved. Only a lucky accident found the solution). Anyone can program: my frustrations arise when they do it without  proper specification, testing regime, documentation and “after care”. The Question is … what happens when Engineers botch together an embedded Linux system.

Let me introduce you to what I believe to be the smallest commercially available  Linux computer and Web server.

I’ve bought this in its Transcend form – which is available for about £25. It’s a 16GB memory card, an ARM processor and a WIFI chip all in an SD card package.  Of course chip designers will be able to make it smaller but since it’s already too easy to lose a Micro-SD card, I’m not sure the would be any point in squeezing it into a smaller form factor.  Transcend aren’t the only firm to use the same hardware. There is a page on OpenWrt.Org which shows that Trek’s Flu-Card, and PQI’s Aircard use the same hardware and core software. The Flu card is of particular interest to me, as Pentax have just released the O-FC1 : a custom version of the flu card with additional functions including the ability to remotely control their new K3 DSLR. Since I don’t have the K3 (yet) and Pentax card is fairly expensive I went for the cheap generic option.

The way these cards works is different from the better known Eye-FI card. They are SERVERS : they don’t upload pictures to a service by themselves, instead they expect a client to come to them, discover the files they want and download them. The way we’re expected to do this is using HTTP , either from a web browser or from an App on a mobile device which acts as wrapper for the same HTTP requests. If you want your pictures to be uploaded to photo sharing sites like flickr, photobucket, smugmug, one line storage like Dropbox, Onedrive (nee skydrive), or social media sites (take your pick) these cards – as shipped – won’t do that. Personally I don’t want that, so that limitation’s fine. The cards default to being an access point on their own network – which is known as “Direct share mode” – it feels odd but can be changed.   

imageVarious people have reported that Wifi functionality doesn’t start if you plug the card into the SD slot of a laptop; and it’s suggested this is a function of the power supplied. Transcend supply a USB card reader in the package, and plugged into it my brand-new card soon popped up as a new wireless network. It’s not instant – there’s an OS to load – but it’s less than a minute. This has another point for use in a camera: if the camera powers down, the network goes with it; so the camera has to stay on long enough for you to complete whatever operations are needed.

imageThe new card names its network WIFISD and switching to that – which has a default Key of 12345678gave me wireless connection with a nominal connection speed of 72Mbits/sec and a new IP configuration, a Connection-Specific DNS Suffix of WifiCard, an IP Address of 192.168.11.11 and DNS server, Default gateway, and DHCP server of 192.168.11.254 : that’s the server. The first thing I did to point my browser at 192.168.11.254, enter the login details (user admin, password admin) and hey presto up came the home page. This looks like it was designed by someone with the graphic design skills of a hardware engineer, or possibly a blind person. I mean, I know the card is cheap, but effort seemed to have gone in to making it look cheap AND nasty.

However with the [F12] developer tools toggled on in Internet explorer I get to my favourite tool. Network monitor. First of all I get a list of what has been fetched, and if I look at Details for one of the requests, the response headers tell me the clock was set to 01 Jan 2012 when the system started and the server is Boa/0.94.14rc21

The main page has 4 other pages which are arranged as a set of frames. frame1 is the menu on the left, frame2 is the banner (it only contains Banner.jpg) and frame3 initally holds page.html, which just contains page.jpg and there is a blank.html to help the layout. Everything of interest is in frame1, what is interesting is that you can navigate to frame1.html without entering a user name and password and from there you can click settings and reset the admin password.
The settings page is built by a perl script (/cgi-bin/kcard_edit_config_insup.pl) and if you view the page source, the administrator password is there in the html form so you don’t even need to reset it. Secure ? Not a bit of it. Within 5 minutes of plugging the card in I’d found a security loophole (I was aware of others before I started thanks to the openwrt page and Pablo’s investigation). I love the way that Linux fans tell me you can build secure systems with Linux (true) and it can be used on tiny bits of embedded hardware where Windows just isn’t a an option (obviously true here): but you don’t automatically get both at the same time. A system is only as good as the people who specified, tested, documented and patched it.

While I had the settings page open I set the card to work in “internet mode” by default and gave it the details of my access point. You can specify 3 access points; it seems if the card can’t find a known access point it drops back to direct share mode so you can get in and change the settings (I haven’t tried this for myself). So now the card is on my home wifi network with an address from that network. (The card does nothing to tell you the address, so you have to discover it for yourself). Since there is a just a process of trying to connect to an access point with a pre-shared key, any hotspots which need a browser-based sign-on won’t work.

The next step was to start exploring the File / Photo / Video pages. Using the same IE monitor as before it’s quite easy to see how they work – although Files is a Perl script and pictures & videos are .cgi files the result is the same. A page which calls   /cgi-bin/tslist?PATH=%2Fwww%2Fsd%2FDCIM and processes the results. What’s interesting is that path /www/sd/DCIM. It looks like an absolute path… What is returned by changing to path to, for example, / ? A quick test showed that /cgi-bin/tslist?PATH=%2F does return the contents of the root directory. So /cgi-bin/tslist?PATH={whatever} requires no security and shows the contents of any directory.
The pictures page shows additional calls to /cgi-bin/tscmd?CMD=GET_EXIF&PIC={fullpath}  and /cgi-bin/thumbNail?fn={full path}. The files page makes calls to /cgi-bin/tscmd?CMD=GET_FILE_INFO&FILE={full path} (picture EXIF is a bit disappointing it doesn’t show Lens, or shutter settings, or camera model or exposure time it just shows file size – at least with files we see modified date; thumbnail is also a disappointment. There is a copy of DCRAW included on the system which is quite capable of extracting the thumbnail stored in the raw files, but it’s not used)
And there is a link to download the files /cgi-bin/wifi_download?fn={name}fd={directory}.  By the way, notice the lack of consistency of parameter naming the same role is filled by PATH=, PIC=, fn=  and fn=&fd=  was there an organised specification for this ?

Of course I wanted to use PowerShell to parse some of the data that came back from the server and I hit a snag early on
Invoke-WebRequest http://192.168.1.110/cgi-bin/tscmd?CMD=GET_EXIF&PIC=%2Fwww%2Fsd%2FHello%20James%2FWP_20131026_007.jpg
Throws an error: The server committed a protocol violation. Section=ResponseHeader Detail=CR must be followed by LF

Shock horror! More sloppiness in the CGI scripts, the last response header is followed not by [CR][LF] but by [LF][LF] fortunately Lee Holmes has already got an answer for this one.  I also found found the space in my folder path /www/sd/hello James caused a problem. When it ran through [System.Web.HttpUtility]::UrlEncode the space became a + sign not the %20 in the line above: the CGI only accepts %20, so that needs to be fixed up. Grrr. 

Since we can get access to any of the files on the server we can examine all the configuration files, and those which control the start-up are of particular interest. Pablo’s post was the first that I saw where someone had spotted that init looks for a autorun.sh script in the root of the SD file system which can start services which aren’t normally launched. There seems to be only one method quoted for starting an FTP service
tcpsvd -E 0.0.0.0 21 ftpd -w / &
There are more ways given for starting the telnet service, and it looks for all the world as if this revision of transcend card has a non-working version of telnetd (a lot of the utilities are in a single busybox executable), so Pablo resorted to getting a complete busybox, quickly installing it and using
cp /mnt/sd/busybox-armv5l /sbin/busybox
chmod a+x /sbin/busybox
/sbin/busybox telnetd -l /bin/bash &

This was the only one which worked for me. Neither ftp nor telnet need any credentials: with Telnet access it doesn’t take long to find the Linux Kernel is 2.6.32.28,  the Wi-Fi is an Atheros AR6003 11n and the package is a KeyASIC WIFI-SD (searching for this turns up pages by people who have already been down this track), or more specifically KeyASIC Ka2000 EVM with an ARM926EJ-S CPU, which seems to be used in tablets as well.

Poking around inside the system there are various references to “Instant upload” and to G-PLUS but there doesn’t seem to be anything present to upload to any of the services I talked about before, when shooting gigabytes of photos it doesn’t really make sense to send them up to the cloud before reviewing and editing them. In fact even my one-generation-behind camera creates problems of data volume. File transfer with FTP is faster than HTTP but it is still slow. HTTP manages about 500KBytes/sec and FTP between 750 and 900KBytes/Sec. That’s just too slow, much too slow.  Looking at some recent studio shoots I’ve use 8GB of storage in 2 hours: averaging a bit more than 1MB/Second. With my K5, RAW files are roughly 22MB so take about 45 seconds to transfer using HTTP but it can shoot 7 frames in a second – and then spend five minutes to transferring  the files: it’s quicker to take the memory card out of the camera, plug it into the computer, copy files and return the card to the camera. It might get away with light use, shooting JPGs, but in those situations – which usually mean wandering round snapping a picture here and a picture there – would your WiFi connected machine be setup and in range ?

The sweet spot seems to be running something on a laptop / tablet phone to transfer preview JPGs – using lower than maximum resolution, and some compression rather than best quality (the worry here is forgetting to go back to best possible JPEG and turning RAW support off). In this situation it really is a moot point which end is the client and which end is the server. Having the card upload every file to the cloud is going run into problems with the volume of data, connecting to access points and so on. So is pulling any great number of RAW files off the card. Writing apps to do this might be fun, and of course there’s a world of possible hacks for the card itself.

February 26, 2014

Depth of field

Filed under: Uncategorized — jamesone111 @ 7:51 pm

Over the years I have seen a lot written about Depth of Field and recently I’ve seen it explained wrongly but with great passion. So I thought I would post the basic formulae, show how they are derived and explain how they work in practical cases.

So first: a definition. When part of an image is out of focus that’s not an on/off state. There’s massively blurred, a little blurred, and such a small amount of blur it still looks sharply focused: if we magnify the “slightly out of focus” parts enough we can see that they are not sharply focused. Depth of field is a measure of how far either side of the point of focus, appears to be properly focused (even though it is very slightly out of focus), given some assumptions about how much the image is magnified. 

image

When a lens focuses an image, the lens-to subject distance, D, and lens-to-image distance, d, are related to the focal length with the equation
1/D + 1/d = 1/f

We can rearrange this to derive the distance to the subject (D) in terms of focal length (f) and image distance (d)
D = df/(d-f)

Since d is always further than f, we can write the difference as Δ and replace d with f+Δ. Putting that into the previous equation makes it
D = (f2+Δf)/Δ which re-arranges to
D = (f2/Δ)+f

When Δ is zero the lens is focused at infinity, so if you think of that position as the start point for any lens, to focus nearer to the camera we move lens away from the image by distance of Δ

The formula be rearranged as the “Newtonian form” of the equation
D-f = f2 , therefore
Δ(D-f) = f2 and since Δ = (d-f)
(d-f)(D-f) = f2

We can work out a focus scale for a lens using D = (f2/Δ)+f . Assume we have a 60mm lens, and it moves in or out 1/3mm for each 30 degrees of turn ; 
When the lens is 60mm from the image we can mark ∞ at the 12 O’clock position: Δ = 0 and D= ∞,
if we turn the ∞ mark to the 1 O’clock position (30 degrees) Δ = 1/3 and D= 3600/(1/3) = 10800 + 60 = 10.86 M, so we can write 10.9 in at the new 12 O’clock position
turn the ∞ mark to the 2 O’clock position (60 degrees)  Δ = 2/3 and  D= 3600/(2/3) = 5400 + 60 = 5.46 M , so we can write 5.5 at the latest 12 O’clock position
turn the ∞ mark to the 3 O’clock position (90 degrees)  Δ = 1  and D= 3600 + 60 = 3.66 M, so this time we write 3.7 at 12 O’clock
turn the ∞ mark to the 4 O’clock position (120 degrees) Δ = 4/3 and D= 3600/(4/3) = 2700 + 60 = 2.76 M, so 2.8 goes on the scale at 12 O’clock
turn the ∞ mark to the 5 O’clock position (150 degrees) Δ = 5/3 and D= 3600/(5/3) = 2160 + 60 = 2.22 M
turn the ∞ mark to the 6 O’clock position (180 degrees) Δ = 2 and  D= 3600/2 = 1800 + 60 = 1.86 M so we can 2.2 and 1.9 to the scale to finish the job

And so on. For simplicity of calculation we often consider the extra 60mm insignificant and D ≈(f2/Δ) is usually close enough. It’s also worth noting that the roles of D as subject distance and d as image distance can be swapped – the whole arrangement is symmetrical.

In the diagram above, the blue lines show the lens is focused at a distance D, which has a lens to image distance (d) of (f+Δ) , something further away than D will not come to a sharp focus at the image plane, but some distance in front of it (something nearer than D will come to a focus behind the image plane). Focused rays of light form a cone: if the point of the code is not on the image plane, the rays form a disc which is called “a circle of confusion”. The red lines in the diagram illustrate the case for something at infinity and show how smaller a aperture width (bigger f/ number) leads to a smaller circle of confusion. 
The only factors which determine the size of the circle that is formed are focal length, aperture, and the distance between the lens and the image (i.e. the distance at which the lens is focused) Two set-ups using the same focal length, and same aperture, focused at the same distance will produce the same size circle regardless of the size of the recording medium which captures the image, the size of the image circle produced by the lens or any other factor.

A point at infinity will form an image at a distance f behind the lens (that’s the definition of focal length) and so we know it forms a image Δ in front of the film/sensor in the setup in the diagram.
The red lines form two similar triangles between the lens and the image. The “base” of the large one is w (the aperture width) and its "height" is f.
We normally write aperture as a ratio between width and focal length, e.g. f/2 means the aperture’s width is half the focal length.
So f = aw (where a is the f/ number) , so we can say this triangle has a base of w and a height of w*a

The base of smaller triangle is the circle of confusion from the mis-focused point at infinity.
This circle’s diameter is normally written as c, so using similar triangles the height of the smaller triangle must be its base * a, so:
Δ = c * a

As the lens moves further away from the image, the circle for the point at infinity gets bigger: a small enough circle looks like a point and but there comes a size where we can see it is a circle.
If we know that size we, can calculate the value of Δ as c*a and since we know that D = (f2/Δ) + f, we can define the subject distance when a point at infinity starts to look out of focus as
(f2/ca) + f  .
This distance is known as the hyperfocal distance (H) strictly, H = (f2/ca ) + f,  but it usually accurate enough to write H ≈ f2/ca ;
It later we’ll use a rearrangement of this: since Δ = ca, this simplified form of the equation can be turned into Δ≈f2/H

image

We can see that we get the same size circle if the image plane is c*a in front of where the image would focus as well as c*a behind it, so we can say
(1) for an subject at distance is D, the lens to image distance is approximately (f2/D)+f  (more accurately it is  (f2/(D-f))+f ) and
(2) the zone of apparent sharp focus runs from anything which would be in focus at (f2/D)+f -ca to anything which would be in focus at (f2/D)+f + ca

This formula is accurate enough for most purposes: but it would be more accurate to say the range runs ((f2/(D-f))+f)*(f+ca)/f  to ((f2/(D-f))+f)*(f-ca)/f because this accounts for  Δ getting slightly bigger as d increases for nearer and nearer subjects.  The error is biggest at short distances with wide apertures.
A  35mm frame with c=0.03 and an aperture of f/32, gives c*a ≈ 1. If we focus a 50mm lens at 1m (f2/D)+f = 52.5,
So the simple form of the formula would say an image formed 51.5-53.5mm behind the lens is in the “in focus zone”. The long form is 51.45- 53.55.
So instead of the dof extending from  1.716M to 0.764m  it actually goes from 1.774 to 0.754m.   
Since we only measure distance to 1 or two significant figures, aperture to 1 or 2 significant figures (and f/22 is really f/23) and focal length to the nearest whole mm (and stated focal length can be inaccurate by 1 or 2 mm) the simple formula gives the the point where most people kind-of feel that the image isn’t really properly focused to enough accuracy.  

It’s also worth noting that the if we have a focus scale like the one out lined above the same distance either side of a focus will give the same Δ, so we can calculate Δ for each aperture mark, and put depth of field scale marks on a lens.

∞    11.    5.5    3.7    2.8    2.2    1.9M 
^ | ^

If we want to work out D.o.F numbers (e.g. to make our own tables) , we know that the lens to image distance for the far point (df ) is (f2/Df)+f  and for the near point, (dn) it is (f2/Dn)+f

therefore,  f2/Df + f = f2/D + f – Δ    (or + Δ for the near point)

we can remove +f from each side and get f2/Df  = f2/D – Δ ;

since Δ = f2/H, we can rewrite this it as f2/Df  = f2/D  – f2/H ;

the f2 terms cancel out so we get  1/Df = 1/D – 1/H , for the far point and for the near point 1/Dn = 1/D + 1/H  ;

We can rewrite these as  1/Df =(H-D)/(H*D), for the far point and for the near point 1/Dn =(H+D)/(H*D) so

Df = HD/(H-D), for the far point and for the near point Dn =HD/(H+D)

These produce an interesting series

Focus Distance (D) Near Point(Dn) Far Point(Df)
H
H H/2
H/2 H/3 H
H/3 H/4 H/2
H/4 H/5 H/3
H/5 H/6 H/4

In other words, if the focus distance is H/x the near point is H/(x+1) and the far point is H/(x-1).

[These formulae: Dn = H/((H/D)+1)  ,  Df = H/((H/D)-1)  can be re-arranged to Dn = H/((H+d)/D)  ,  Df = H/((H-D)/D) and then to Dn = HD/(H+d), Df = HD/(H-D) – the original formulae ]

This can useful for doing a quick mental d.o.f calculation. A 50mm lens @ f/8 on full frame has a hyperfocal distance of roughly 10m (502/(.03*8) +50 = 10.46M). If I focus at 1M (roughly H/10) the near point is H/11 = 0.90909M and the far point is 1.1111M so I have roughly 9CM in front and 11 CM behind

Earlier I said “As the lens moves further away from the image, the circle gets bigger: a small enough circle looks like a point and but there comes a size where it starts looking like a circle. If we know that size…

How much of the image the circle occupies, determines whether it is judged to be still in focus, or a long way out of focus. So value for c must be proportional to the size of the image, after any cropping has been done.

By convention 35mm film used c=0.03mm and APS-C crop sensor cameras use c=0.02mm. Changing image (sensor) size changes allowable circle size c, and so changes Δ , and so the depth of field scale on a lens designed for one size of of image needs to be adjusted if used on a camera where the image is different size (on an APS-C camera reading the scale for 1 stop wider aperture than actually set will give roughly the right reading).

Size of the circle formed does not depend on image size but allowable circle size does and hyperfocal distance and apparent depth of field change when c changes

Changing sensor size (keeping same position with the same lens and accepting a change of framing).

If we use two different cameras – i.e. use a different circle size – at the same spot, focused on the same place and we use the same focal length and same aperture on both, then the one with the smaller image has less depth of field. It doesn’t matter how we get to the smaller image, whether it is by cropping a big one or starting with a smaller film/sensor size.

We get less D.o.F because c has become smaller, so f2/ca – the hyperfocal distance has moved further away. When you look at f2/ca,  a smaller value of c needs a larger value of a to compensate.

Changing sensor size and focal length (getting the same framing from same position)

If we use two cameras – with different circle size – and use different focal lengths to give the same angle of view, but keep the same aperture then the larger image will have less depth of field because the f and c have gone up by the same factor , but f is squared in the equation. A larger value of a is needed to compensate for f being squared.

So: a 50mm @ f/8 on full frame has the approximate field of view and depth of field of a 35mm @ f/5.6 on APS-C . If that’s we want, the full frame camera needs to use a slower shutter speed or higher ISO to compensate, which have their own side effects.

If we want the depth that comes from the 35mm @ f/32 on APS-C , the 50 might not stop down to f/44 to give the same depth on Full Frame.

But if we use the 50 @ f/1.4 to isolate a subject from the background on full frame the 35 probably doesn’t open up to f/1

Changing focal length and camera position

People often think of perspective as a function of the angle of view of the lens. Strictly that isn’t correct : perspective is a function of the ratios of subject to camera distances. If you have two items the same size with one a meter behind the other and you stand a meter from the nearer one, the far one is 2M away, and will appear 1/2 the size. If you stand 10 meters from the first (and therefore 11 meters from the second), the far object will appear 10/11ths of the size. It doesn’t matter what else is in the frame. But: if you fit a wider lens the natural response is to move closer to the subject : it is that change of viewpoint which causes that the change of perspective. Changing focal length and keeping position constant means the perspective is constant, and the framing changes. Changing focal length and keeping framing constant means a change of position and with it a change of perspective.

If you have two lenses for the same camera and a choice between standing close with a wide angle lens or further away with a telephoto (and accepting the change of perspective for the same framing) we can work out the distances.

Let’s say with the short lens, H is 10 and you stand 5 meters away.

The near point is (10 * 5) / (10+5) = 3.33 : 1.67 meters in front

The far point is (10 * 5) / (10-5) = 10 : 5 meters behind = 6.67 in total

If we double the focal length and stand twice as far away the hyperfocal distance increases 4 fold (if the circle size and aperture don’t change), so we get a d.o.f zone like this

(40*10) / (40+10) = 8 : 2 meters in front

(40*10) / (40-10) = 13.33: 3.33 meters behind =5.33 in total.

Notice the background is more out of focus with the long lens, but there is actually MORE in focus in front of the subject. The wider lens includes more "stuff" in the background and it is sharper – which is why long lenses are thought of as better at isolating a subject from the background.

Changing camera position and sensor size.

If you only have one lens and your choice is to move further away and crop the image (or use a smaller sensor) or come close and use a bigger image what we can calculate that too: keeping the full image / close position as the first case from the previous example we would keep the near point 1.6667 meters in front and a far point 5 meters behind = 6.67 in total

If we use half the sensor width, we halve c and double H, if we double the distances we have doubled every term in the equation.

(20*10) / (20+10) = 6.6667 : 3.33 meters (in front)

(20*10) / (20-10) = 20 : 10 meters (behind) – 13.33Meters in total, so you get twice as much in the zone either side of the image but dropping back and cropping.

July 5, 2013

PowerShell TabCompletion++ or TabCompletion#

Filed under: Databases / SQL,Powershell — jamesone111 @ 9:23 pm

One of the tricks of PowerShell V3 is that it is even cleverer with Tab completion / intellisense than previous versions, though it is not immediately obvious how you can take control of it.  When I realised what was possible I had to apply it to a Get-SQL command command I had written. I wanted the ISE to be able to give me something like this

image

I happened to have Jan Egil Ring ‘s article on the subject for PowerShell magazine (where he credits another article by Tobias Weltner ) open in my browser, when I watched Jason Shirk gave a talk covering the module he has published via GitHub named TabExpansion ++. This module includes the custom tab completers for some PowerShell which don’t have them and some for legacy programs.  Think about that for a moment,  if you use NET STOP in PowerShell instead of in CMD, tab completion fills in the names of the services. Yes, I know PowerShell has a Stop-Service cmdlet, but if you’ve been using the NET commands since the 1980s (yes, guilty) why stop using them ?

More importantly Jason has designed a framework where you can easily add your own tab completers – which are the basis for intellisense in the ISE. On loading, his module searches all ps1 and psm1 files in the paths $env:PSModulePath and $env:PSArgumentCompleterPath for functions with the ArgumentCompleter attribute – I’ll explain that shortly.  When it finds one, it extracts the function body and and "imports" it into the TabExpansion++ module. If I write argument completer functions and save them with my modules (which are in the PSModulePath) then when I load TabExpansion++ …. whoosh! my functions get tab completion.

A lot of my work at the moment involves dealing with data in a MySQL database, and I have installed the MySQL ODBC driver, and I wrote a function Named Get-SQL (which I can just invoke as SQL)
When I first wrote it, it was simple enough: leave an ODBC connection open as a global variable and pump SQL queries into it. After a while I found I was sending a lot of “Select * From table_Name” queries, and so I gave it a –Table parameter which would be built into a select query and a –gridview parameter which would sent the data to the PowerShell grid viewer. Then I found that I was doing a lot of “Desc table_name” queries, so I added a -describe parameter. One way and another the databases have ended up with long table names which are prone to mistyping, this seemed like a prime candidate for an argument completer, so I set about extending TabExpansion++ (does that make it TabExpansion#? if you haven’t noticed with C# the # sign is ++ ++ one pair above the other).

It takes 4 things to make a tab completer function. First: one or more ArgumentCompleter attributes
[ArgumentCompleter(Parameter = 'table',
                         Command = ('SQL','Get-SQL'),
                      Description = 'Complete Table names for Get-SQL , for example: Get-SQL -GridView -Table ')]

This defines the parameter that the completer works with – which must be a single string. If the completer supports multiple parameters, you must use multiple ArgumentCompleter attributes. 
And it defines the command(s) that the completer works with. The definition can be a string, an array of strings, or even a ScriptBlock. 
The Second thing needed is a param block  that understands the parameters passed to a tab completer.
param($commandName, $parameterName, $wordToComplete, $commandAst, $fakeBoundParameter)

The main one here is $wordToComplete – the partially typed word that tab completion is trying to fill in. However as you can see in the screen shot it is possible to look at the parameters already completed and use them to produce the list of possible values.
$wordToComplete -is used in the third part is the body that gets those possible parameter value. So in my function I have something a bit like this…
$parameters = Get-TableName | Where-Object { $_ -like "$wordToComplete*" } | Sort-Object

And the final part is to return the right kind of object to tab completion process, and Jason’s module has a helper function for this
$parameters | ForEach-Object {$tooltip = "$_"
                              New-CompletionResult $_ $tooltip}

There is the option to have different text as the tool tip – in some places $tooltip – which is shown in intellisense – would be set to something other than the value being returned. Here I’ve kept it in place to remind me rather than a calling  New-CompletionResult $_ $_

And that’s it. Unload and reload TabExpansion++ and my SQL function now knows how to expand -table. I added a second attribute to allow the same code to handle -describe and then wrote something to get field names so I could have a picklist for –orderby and –select as well. With -select intellisense doesn’t pop up a second time if you select a name and enter a comma to start a second; but tab completion works. Here’s the finished item

Function SQLFieldNameCompletion {
   [ArgumentCompleter(Parameter = ('where'),
                        Command = ('SQL','Get-SQL'),
                    Description = 'Complete field names for Get-SQL , for example: Get-SQL -GridView -Table ')]
   [ArgumentCompleter(Parameter = ('select'),
                        Command = ('SQL','Get-SQL'),
                    Description = 'Complete field names for Get-SQL , for example: Get-SQL -GridView -Table ')]
   [ArgumentCompleter(Parameter = ('orderBy'),
                        Command = ('SQL','Get-SQL'),
                    Description = 'Complete field names for Get-SQL , for example: Get-SQL -GridView -Table ')]
   param($commandName, $parameterName, $wordToComplete, $commandAst, $fakeBoundParameter)
   If ($DefaultODBCConnection) {
       $TableName = $fakeBoundParameter['Table']
       Get-SQL -describe $TableName | Where-Object { $_.column_name -like "$wordToComplete*" } | Sort-Object -Property column_name |
       ForEach-Object {$tooltip           = $_.COLUMN_NAME + " : " + $_.TYPE_NAME
                       New-CompletionResult $_.COLUMN_NAME $tooltip
       }
   }
}

Which all saves me a few seconds a few dozen times a day.

June 30, 2013

PowerShell where two falses make a true … and other quirks.

Filed under: Powershell — jamesone111 @ 9:11 pm

I was in a conversation on twitter recently about the way operations in PowerShell aren’t always commutative. We expect addition, multiplication and equality operators to work the same way whether we write 2 x 3 or 3 x 2.
But what happens in a language like PowerShell which tries to flatten out the differences between types and let you multiply a text string by a number?  Let’s explain 

Rule 1. If the operator dictates types, then PowerShell will convert the operands to match.
The classic case is for the –and and –or operators: PowerShell will covert the operands to Booleans – knowing how the conversion is done opens up some useful shortcuts and also avoids some traps:

Any non-zero number is treated as true. Only Zero is false,
for example
> 3.14 -and $true
True

> 0 -and $true
False

In some programming languages 8 –or 16 would calculate a "bitwise OR" (also called a binary OR), in other words it would convert 8 to binary 0000 1000 and convert 16 to binary 0001 0000 and do OR operations on each column to produce 0001 1000 – 24 in decimal. PowerShell provides this functionality though separate operators ‑bOr, ‑bAnd –bXor and –bNot

Any non-empty string is true. Only empty strings are false.
For example.
> "" -and $true
False

> "Hello, world" -and $true
True

The string "FALSE" is not empty, and so is treated as the Boolean value True. If you convert $False to a string and back to a boolean it doesn’t come back to false
> [boolean][string]$false
True

If you need the text "true" to convert to True and "False" to convert to false you can use the [Convert] class
> [convert]::ToBoolean("false")
False

Any non-empty object is treated as true. An empty object , or array, or Null converts to false.

> (dir *.jpg) -and $true
True

> (dir *.pjg) -and $true
False

> @() –and $true
False

Any array with one element is treated as  that element.
> @(0) -and $true
False

Any array with multiple elements is treated as true even if all those elements are false
> @($false,$false) -and $true
True

Rule 2. Null is less than to anything except another null

> "" -gt $null
True

Rule 3. If the operator does not dictate types and the arguments are different types, the first operand’s type determines behaviour

This causes bewilderment in people who don’t understand it, because an operator which is normally commutative (works the same with the operands reversed) is only commutative if the operands are of the same type.
> 5 * "25"
125

> "25" * 5
2525252525

The string “25” is converted to a number, 5×25 = 125, but if the string is placed first, the multiplication operator repeats it.
> 3.14 -eq $true
False

> $true -eq 3.14
True

Converting $true to a number returns 1. Converting a (non-zero) number to a Boolean returns true.
Similarly converting any non empty string to a Boolean returns true, but converting false to a string returns “false”
$true -eq "false"
True

> "false" -eq $true
False

Rule 4. When applied to an array , an operator which returns a Boolean when applied to single items, will return an array of items which individually return true

> @(1,2,3,4) –gt 2
3
4

When you put this together with tests for null, confusion can result: see if you can work out what the following returns.

> @($null,1,2,””,$null) –eq $null

And why the following return the opposite results

> [boolean](@($null,1,2,””,$null) –eq $null)

> [boolean](@($null,1,2,”” ) –eq $null)

Hint : if you can’t work it out try

(@($null,1,2,””,$null) –eq $null).Count

February 8, 2013

Getting SkyDrive to sync the way I want (like Mesh)

Filed under: Windows 7 — jamesone111 @ 10:37 pm

A few days ago Geekwire ran a story entitled “Microsoft, Let’s be friends” It began

Dear Microsoft,
Can we just be friends again? Please?
It’s been exactly five years now since I left you. During our time together, I poured all the emotion and energy I had into the products I helped build for you.

Whilst it is only slightly more than two years since I left Microsoft, that grabbed my interest.  The author goes though some of the products in which he invested emotional capital but saw Microsoft kill off.

  • For 14 years, I used Microsoft Money fanatically … And then you killed it
  • [I] fell in love with FrontPage, … But you threw it under the bus for no apparent reason.
  • I bought my wife a Zune for her birthday … you gave up and now Zune is in the graveyard, may it rest in peace. Meanwhile, I feel like a sucker … again.
  • Microsoft Digital Image Suite was the best image editing package ever to have existed for consumers. Yes, better than Photoshop Elements. … I want you to know that you truly broke my heart when you buried this product.

imageI never used MS-Money. FrontPage passed me by. Being in a small area which Redmond calls “rest of the world”,  I never got to buy a Zune. But I felt the heartbreak at the loss of Digital Image suite. It was one of the few non Xbox things I ever bought on staff purchase and I still use it. But my present pain is for Windows Live Mesh. Since I left Microsoft this product has quietly kept half a dozen key folders backed up to the cloud and replicated to all the computers I use.
Microsoft have mailed me twice about their producticidal plans for Mesh. Worryingly, they say 40% of Mesh users are using SkyDrive meaning 60% are not. I never used Mesh’s remote access. But it can sync folders from all over my hard disk and SkyDrive needs folders to share a common root in order to sync them.  I could move some of the folders I sync out of “My Documents” into “SkyDrive” but PowerShell (for example) insists on having its profile folder in a specific location. In short “out of the box” SkyDrive can’t do what Mesh can.

With the the death of Mesh now only a week away, I decided to try something I’ve been meaning to do for ages. Create a symbolic link from the Skydrive folder to the “proper” folders where my files reside.

image

It’s pretty easy. Creating Links requires an Administrative command prompt (unless you change the system policy on the machine), and you need a /d to tell MkLink it is a directory not a file – then it is a question of the name of the link and the place it links to and – cue drum roll – you have a link.

imageThe folder appears under SkyDrive, and right away the SkyDrive client starts syncing in the background. Maybe it is designed not to hog bandwidth or maybe it’s plain slow but on my computer it took a fair while to copy everything.

image

Repeat for as necessary for the other folders which need to be sync’d

August 9, 2012

Getting to the data in Adobe Lightroom–with or without PowerShell

Filed under: Databases / SQL,Photography,Powershell — jamesone111 @ 7:01 am

Some Adobe software infuriates me (Flash), I don’t like their PDF reader and use Foxit instead, apps which use Adobe-Air always seem to leak memory. But I love Lightroom .  It does things right – like installations – which other Adobe products get wrong. It maintains a “library” of pictures and creates virtual folders of images ( “collections” ) but it maintains metadata in the images files so data stays with pictures when they are copied somewhere else – something some other programs still get badly wrong. My workflow with Lightroom goes something like this.

  1. If I expect to manipulate the image at all I set the cameras to save in RAW, DNG format not JPG (with my scuba diving camera I use CHDK to get the ability to save in DNG)
  2. Shoot pictures – delete any where the camera was pointing at the floor, lens cap was on, studio flash didn’t fire etc. But otherwise don’t edit in the camera.
  3. Copy everything to the computer – usually I create a folder for a set of pictures and put DNG files into a “RAW” subfolder. I keep full memory cards in filing sleeves meant for 35mm slides..
  4. Using PowerShell I replace the IMG prefix with something which tells me what the pictures are but keeps the camera assigned image number. 
  5. Import Pictures into Lightroom – manipulate them and export to the parent folder of the “RAW” one. Make any prints from inside Lightroom. Delete “dud” images from the Lightroom catalog.
  6. Move dud images out of the RAW folder to their own folder. Backup everything. Twice. [I’ve only recently learnt to export the Lightroom catalog information to keep the manipulations with the files]
  7. Remove RAW images from my hard disk

There is one major pain. How do I know which files I have deleted in Lightroom ? I don’t want to delete them from the hard-disk I want to move them later. It turns out Lightroom uses a SQL Lite database and there is a free Windows ODBC driver for SQL Lite available for download.  With this in place one can create a ODBC data source – point it at a Lightroom catalog and poke about with data. Want a complete listing of your Lightroom data in Excel? ODBC is the answer. But let me issue these warnings:

  • Lightroom locks the database files exclusively – you can’t use the ODBC driver and Lightroom at the same time. If something else is holding the files open, Lightroom won’t start.
  • The ODBC driver can run UPDATE queries to change the data: do I need to say that is dangerous ? Good.
  • There’s no support for this. If it goes wrong, expect Adobe support to say “You did WHAT ?” and start asking about your backups. Don’t come to me either. You can work from a copy of the data if you don’t want to risk having to fall back to one of the backups Lightroom makes automatically

   I was interested in 4 sets of data shown in the following diagrams. Below is image information with the Associated metadata, and file information. Lightroom stores images (Adobe_Images table) IPTC and EXIF metadata link to images – their “image” field joins to the “id_local” primary key in images. Images have a “root file” (in the AgLibraryFile table) which links to a library folder (AgLibraryFolder) which is expressed as a path from a root folder (AgLibraryRootFolder table). The link always goes to the “id_local” field I could get information about the folders imported into the catalog just by querying these last two tables (Outlined in red)

image

The SQL to fetch this data looks like this for just the folders
SELECT RootFolder.absolutePath || Folder.pathFromRoot as FullName
FROM   AgLibraryFolder     Folder
JOIN   AgLibraryRootFolder RootFolder O
N  RootFolder.id_local = Folder.rootFolder
ORDER BY FullName 

SQLlite is one of the dialects of SQL which doesn’t accept AS in the FROM part of a SELECT statement . Since I run this in PowerShell I also put a where clause in which inserts a parameter. To get all the metadata the query looks like this
SELECT    rootFolder.absolutePath || folder.pathFromRoot || rootfile.baseName || '.' || rootfile.extension AS fullName, 
          LensRef.value AS Lens,     image.id_global,       colorLabels,                Camera.Value       AS cameraModel,
          fileFormat,                fileHeight,            fileWidth,                  orientation ,
         
captureTime,               dateDay,               dateMonth,                  dateYear,
          hasGPS ,                   gpsLatitude,           gpsLongitude,               flashFired,
         
focalLength,               isoSpeedRating ,       caption,                    copyright
FROM      AgLibraryIPTC              IPTC
JOIN      Adobe_images               image      ON      image.id_local = IPTC.image
JOIN      AgLibraryFile              rootFile   ON   rootfile.id_local = image.rootFile
JOIN      AgLibraryFolder            folder     ON     folder.id_local = rootfile.folder
JOIN      AgLibraryRootFolder        rootFolder ON rootFolder.id_local = folder.rootFolder
JOIN      AgharvestedExifMetadata    metadata   ON      image.id_local = metadata.image
LEFT JOIN AgInternedExifLens         LensRef    ON    LensRef.id_Local = metadata.lensRef
LEFT JOIN AgInternedExifCameraModel  Camera     ON     Camera.id_local = metadata.cameraModelRef
ORDER BY FullName

Note that since some images don’t have a camera or lens logged the joins to those tables needs to be a LEFT join not an inner join. Again the version I use in PowerShell has a Where clause which inserts a parameter.

OK so much for file data – the other data I wanted was about collections. The list of collections is in just one table (AgLibraryCollection) so very easy to query, and but I also wanted to know the images in each collection.

 image

Since one image can be in many collections,and each collection holds many images AgLibraryCollectionImage is a table to provide a many to relationship. Different tables might be attached to AdobeImages depending on what information one wants from about the images in a collection, I’m interested only in mapping files on disk to collections in Lightroom, so I have linked to the file information and I have a query like this.

SELECT   Collection.name AS CollectionName ,
         RootFolder.absolutePath || Folder.pathFromRoot || RootFile.baseName || '.' || RootFile.extension AS FullName
FROM     AgLibraryCollection Collection
JOIN     AgLibraryCollectionimage cimage     ON collection.id_local = cimage.Collection
J
OIN     Adobe_images             Image      ON      Image.id_local = cimage.image
JOIN     AgLibraryFile            RootFile   ON   Rootfile.id_local = image.rootFile
JOIN     AgLibraryFolder          Folder     ON     folder.id_local = RootFile.folder
JOIN     AgLibraryRootFolder      RootFolder ON RootFolder.id_local = Folder.rootFolder
ORDER BY CollectionName, FullName

Once I have an ODBC driver (or an OLE DB driver) I have a ready-made PowerShell template for getting data from the data source. So I wrote functions to let me do :
Get-LightRoomItem -ListFolders -include $pwd
To List folders, below the current one, which are in the LightRoom Library
Get-LightRoomItem  -include "dive"
To list files in LightRoom Library where the path contains  "dive" in the folder or filename
Get-LightRoomItem | Group-Object -no -Property "Lens" | sort count | ft -a count,name
To produce a summary of lightroom items by lens used. And
$paths = (Get-LightRoomItem -include "$pwd%dng" | select -ExpandProperty path)  ;   dir *.dng |
           where {$paths -notcontains $_.FullName} | move -Destination scrap -whatif

  Stores paths of lightroom items in the current folder ending in .DNG in $paths;  then gets files in the current folder and moves those which are not in $paths (i.e. in Lightroom.) specifying  -Whatif allows the files to be confirmed before being moved.

Get-LightRoomCollection to list all collections
Get-LightRoomCollectionItem -include musicians | copy -Destination e:\raw\musicians    to copies the original files in the “musicians” collection to another disk

I’ve shared the PowerShell code on Skydrive

August 7, 2012

The cloud, passwords, and problems of trust and reliance

Filed under: Privacy,Security and Malware — jamesone111 @ 9:02 pm

In recent days a story has been emerging of a guy called Mat Honan. Mat got hacked, the hackers wanted his twitter account simply because he had a three letter twitter name. Along the way they wiped his Google mail account and (via Apple’s iCloud) his iPhone, iPad and his Macbook. Since he relied on stuff being backed up in the cloud he lost irreplaceable family photos, and lord only knows what else. There are two possible reactions Schadenfreude – “Ha, ha I don’t rely on Google or Apple look what happens to people who do” , “What an idiot, not having a backup”, or “There but for the grace of God goes any of us”.

Only people who’ve never lost data can feel unsympathetic to Mat and I’ve lost data. I’ve known tapes which couldn’t be read on a new unit after the old one was destroyed in a fire. I’ve learnt by way of a disk crash that a server wasn’t running it’s backups correctly. I’ve gone back to optical media which couldn’t be read. My backup drive failed a while back – though fortunately everything on it existed somewhere else, making a new backup showed me in just how many places. I’ve had memory cards fail in the camera before I had copied the data off them and I had some photos which existed only on a laptop and a memory card which were in the same bag that got stolen (the laptop had been backed up the day before the photos were taken). The spare memory card I carry on my key-ring failed recently, and I carry that because I’ve turned up to shoot photos with no memory card in the camera – never close the door on the camera with the battery or memory card out. I treat memory cards like film and just buy more and keep the old cards as a backstop copy. So my data practices look like a mixture of paranoia and superstition and I know, deep down, that nothing is infallible.

For many of us everything we have in the cloud comes down to one password. I don’t mean that we logon everywhere with “Secret1066!”  (no, not my password). But most of us have one or perhaps two email address which we use when we register.  I have one password which I use on many, many sites which require me to create an identity but that identity doesn’t secure anything meaningful to me. It doesn’t meet the rules of some sites (and I get increasingly cross with sites which define their own standards for passwords) and on those sites I will set a one off password. Like “2dayisTuesday!” when I come to use the site again I’ll just ask them to reset my password. Anything I have in the cloud is only as secure as my email password. 
There are Some hints here, first: any site which can mail you your current password doesn’t encrypt it properly the proper way to store passwords is as something computed from the password so it is only possible to tell if the right password was entered not what the password is. And second, these computations are case sensitive and set no maximum password length, so any site which is case insensitive or limits password length probably doesn’t have your details properly secured.  Such sites are out there – Tesco for example – and if we want to use them we have to put up with their security. However if they get hacked (and you do have to ask , if they can’t keep passwords securely, what other weaknesses are there ?) your user name , email and password are in the hands of the hackers, so you had better use different credentials anywhere security matters – which of course means on your mailbox.

So your email password is the one password to rule them all and obviously needs to be secure. But there is a weak link, and that seems to be where the people who hacked Mat found a scary loophole. The easiest way into someone’s mailbox might be to get an administrator to reset the password over the phone – not to guess or brute force it. The only time I had my password reset at Microsoft the new one was left on my voicemail – so I had to be able to login to that. If the provider texts the password to a mobile phone or resets it (say) to the town where you born (without saying what it is) that offers a level of protection; but – be honest – do you know what it takes to get someone at your provider to reset your password, or what the protocol is ?  In Mat’s case the provider was Apple – for whom the hacker knew an exploitable weakness – but it would be naive to think that Apple was uniquely vulnerable.

Mat’s pain may show the risk in having only a mailbox providers password reset policy to keep a hacker out of your computer and/or your (only) backup. One can build up a fear of other things that stop you having access to either computer or backup without knowing how realistic they are.  I like knowing that my last few phones could be wiped easily but would I want remote wipe of a laptop ? When my laptop was stolen there wasn’t any need to wipe it remotely as it had full volume encryption with Microsoft’s bitlocker (saving me a difficult conversation with corporate security) and after this story I’ll stick to that. Cloud storage does give me off-site backup and that’s valuable – it won’t be affected if I have a fire or flood at home – but I will continue to put my faith in traditional off-line backup and I’ve just ordered more disk capacity for that.

Next Page »

Blog at WordPress.com.