James O'Neill's Blog

May 31, 2018

More tricks with PowerShell and Excel

Filed under: Office,Powershell — jamesone111 @ 6:25 am

I’ve already written about Doug Finke’s ImportExcel module – for example, this post from last year covers

  • Basic exporting (use where-object to reduce the number of rows , select-object to remove columns that aren’t needed)
  • Using -ClearSheet to remove old data, –Autosize to get the column-widths right, setting titles, freezing panes and applying filters, creating tables
  • Setting formats and conditional format      
  • In this post I want to round up a few other things I commonly use.

    Any custom work after the export means asking Export-Excel to pass through the unsaved Excel Package object like this

    $xl = Get-WmiObject -Class win32_logicaldisk | select -Property DeviceId,VolumeName, Size,Freespace |
               Export-Excel -Path "$env:computerName.xlsx" -WorkSheetname Volumes –PassThru

    Then we can set about making modifications to the sheet. I can keep referring to it via the Excel package object, but it’s easier to use a variable. 
    $Sheet = $xl.Workbook.Worksheets["Volumes"]

    Then I can start applying formatting, or adding extra information to the file
    Set-Format -WorkSheet $sheet -Range "C:D" -NumberFormat "0,000"
    Set-Column -Worksheet $sheet -Column 5
    -Heading "PercentageFree" -Value {"=D$row/C$row"} -NumberFormat "0%" 

    I talked about Set-column in another post. Sometimes though, the data isn’t a natural row or column and the only way to do things is by “Poking” individual cells, like this

        
    $sheet.Cells["G2"].value = "Collected on"
    $sheet.Cells["G3"].value = [datetime]::Today
    $sheet.Cells["G3"].Style.Numberformat.Format =
     "mm-dd-yy"
    $sheet.Cells.AutoFitColumns()
    Close-ExcelPackage $xl –Show

    Sharp-eyed readers will see that the date format appears to be “Least-significant-in-the-middle” which is only used by one country – and not the one where I live. It turns out Excel tokenizes some formatsthis MSDN page explains and describes “number formats whose formatCode value is implied rather than explicitly saved in the file….. [some] can be interpreted differently, depending on the UI language”. In other words if you write “mm-dd-yy” or “m/d/yy h:mm” it will be translated into the local date or date time format. When Export-Excel encounters a date/time value it uses the second of these; and yes, the first one does use hyphens and the second does use slashes. My to-do list includes adding an argument completer for Set-Format so that it proposes these formats.

    Since the columns change their widths during these steps I only auto-size them when I’ve finished setting their data and formats. So now I have the first page in the audit workbook for my computer

    image

    Of course there times when we don’t want a book per computer with each aspect on it’s own sheet, but we want book for each aspect with a page per computer.
    If we want to copy a sheet from one workbook to another, we could read the data and write it back out like this

    Import-Excel -Path "$env:COMPUTERNAME.xlsx" -WorksheetName "volumes" | 
         Export-Excel
    -Path "volumes.xlsx" -WorkSheetname $env:COMPUTERNAME

    but this strips off all the formatting and loses the formulas  – however the Workbook object offers a better way, we can get the Excel package for an existing file with
    $xl1 = Open-ExcelPackage -path "$env:COMPUTERNAME.xlsx"

    and create a new file and get the Package object for it with 
    $xl2 = Export-Excel -Path "volumes.xlsx" -PassThru

    (if the file exists we can use Open-ExcelPackage). The worksheets collection has an add method which allows you to specify an existing sheet as the basis of the new one, so we can call that, remove the default sheet that export created, and close the files (saving and loading in Excel, or not, as required) 

    $newSheet = $xl2.Workbook.Worksheets.Add($env:COMPUTERNAME, ($xl1.Workbook.Worksheets["Volumes"]))
    $xl2.Workbook.Worksheets.Delete("Sheet1")
    Close-ExcelPackage $xl2 -show
    Close-ExcelPackage $xl1 –NoSave

    The new workbook looks the same (formatting has been preserved -  although I have found it doesn’t like conditional formatting) but the file name and sheet name have switched places.

    image

    Recently I’ve found that I want the equivalent of selecting “Transpose” in Excel’s paste-special dialog- take an object with many properties and instead of exporting it so it runs over many columns in making a two-column list of Property name and value
    For example
    $x = Get-WmiObject win32_computersystem  | Select-Object -Property Caption,Domain,Manufacturer,
                                Model, TotalPhysicalMemory, NumberOfProcessors, NumberOfLogicalProcessors

    $x.psobject.Properties | Select-Object -Property name,value |
        Export-Excel -Path "$env:COMPUTERNAME.xlsx" -WorkSheetname General -NoHeader -AutoSize –Show

    imagec

    When I do this i a real script I use the –passthru swtich and apply some formatting

    $ws    = $excel.Workbook.Worksheets["General"]
    $ws.Column(1).Width                     =  64
    $ws.Column(1).Style.VerticalAlignment   = "Center"
    $ws.Column(2).Width                     =  128
    $ws.Column(2).Style.HorizontalAlignment = "Left"
    $ws.Column(2).Style.WrapText            = $true

    Of course I could use Set-Format instead but sometimes the natural way is to refer to use .Cells[]  , .Row() or .Column().

    Advertisements

    May 14, 2018

    A couple of easy boosts for PowerShell performance.

    Filed under: Powershell — jamesone111 @ 10:55 am

    At the recent PowerShell and Dev-ops summit I met Joshua King and went to his session – Whip Your Scripts into Shape: Optimizing PowerShell for Speed – (an area where I overestimated my knowledge) and it’s made me think about some other issues.  If you find this post interesting it’s a fair bet you’ll enjoy watching Joshua’s talk. There are a few of things to say before looking at a performance optimization which I added to my knowledge this week.

  • Because scripts can take longer to write than to run, we need to know when it is worth optimizing for speed. After all, if cut we the time from pressing return to the reappearance of the prompt from 1/2 second to 1/4 or even to 1/1000th second our reaction time is such that we don’t do the next thing we’re going to do any sooner. On the other hand if something takes 5 minutes to run (which might be the same command being called many times inside a script), giving minutes back is usable time.
  • Execution time varies with input – it often goes up with the square of the number of items being processed.  (Typically when the operation is in the form “For every item, look at [some subset of] all items”). So you might process 1,000 rows of data in half a second … but then someone takes your code and complains that their data take 5 minutes to process, because they’re working with many more rows. Knowing if you should optimize here isn’t straightforward  – most of the time doesn’t matter, but when it matters at all, it matters a lot.  You can discover if performance tails off badly at 10,000 or 1,000,000 rows but it isn’t easy to predict how many of any given size there will be and whether optimizing performance is time is well spent . If the problem happens at scale, then you might run sub-tasks in parallel (especially if each runs on a different computer), or change the way of working – for example this piece on hash tables is about avoiding the “look at every item” problem.
  • No one writes code to be slow. But the fast way might require something which is longer and/or harder to understand. If we want to write scripts which are reusable we might prefer tidy-but-slower over fast-but-incomprehensible. (All other things being equal we’d love the elegance of something tidy and fast, but a lot of us aren’t going to let the pursuit of that prevent us going home). 
    Something like $SetA | where {$_ –notIn $setB}  is easy to understand but if the sets are big enough it might need billions of comparisons, the work which gave rise to the hash tables piece  cut the number from billions to under a million (and meant that we could run the script multiple times per hour instead of once or twice in a day, so we could test it properly for the first time). But it takes a lot more to understand how it works.
  • One area from Joshua’s talk where the performance could be improved without adding complexity was reducing or eliminating the hit from using Pipelines; usually this doesn’t matter – in fact the convenience of being able to construct a bespoke command by piping cmdlets together was compelling before it was named “PowerShell”.  Consider these two scripts which time how long it takes to increment a counter a million times.

    $i  = 0 ; $j = 1..1000000 ;
    $sw = [System.Diagnostics.Stopwatch]::StartNew() ;
    $J | foreach {$i++ }  ;
    $sw.Stop() ; $sw.Elapsed.TotalMilliseconds

    $i  = 0 ; $j = 1..1000000 ;
    $sw = [System.Diagnostics.Stopwatch]::StartNew() ;
    foreach ($a in $j) {$i++ }  ;
    $sw.Stop() ; $sw.Elapsed.TotalMilliseconds

     The only thing which is different is the foreach – is it the alias for ForEach-Object, or is it a foreach statement . The logic hasn’t changed, and readability is pretty much the same; you might expect them to take roughly the same time to run … but they don’t: on my machine, using the statement is about 6 times faster than piping to the cmdlet.
    This is doing unrealistically simple work; replacing the two “ForEach” lines with

    $j | where {$_ % 486331 -eq 0}
    and
    $j.where(  {$_ % 486331 -eq 0} )

    does something more significant for each item and I find the pipeline version takes 3 times as long! And the performance improvement remains if the output of the .where() goes into a pipeline. I’ve written in the past that sometimes very long pipelines can be made easier to read by breaking them up (even though I have a dislike storing intermediate results), and it turns out we also can boost performance by doing that.

    Recently I found another change : if I define a function

    Function CanDivide {
    Param ($Dividend)
        $Dividend % 486331 -eq 0
    }
    and repeat the previous test with the command as
    $j.where( {CanDivide $_ } )

    People will separate roughly 50:50 into those who find the new version easier to understand, and those who say “I have to look somewhere else to see what ‘can divide’ does”. But is it faster or slower and by how much ? It’s worth verifying this for yourself, but my test said the function call makes the command slower by a factor of 6 or 7 times.  If a function is small, and/or is only called from one place, and/or is called many times to complete a piece of work then it may be better to ‘flatten’ the script. I’m in the “I don’t want to look somewhere else” camp so my bias is towards flattening code, but – like reducing the amount of piping – it might feel wrong for other people. It can make the difference between “fast enough”, and “not fast enough” without major changes to the logic.

    Create a free website or blog at WordPress.com.