James O'Neill's Blog

June 30, 2012

Using the Windows index to search from PowerShell: Part three. Better function output

Filed under: Powershell — jamesone111 @ 10:53 am

Note: this was originally written for the Hey,Scripting guy blog where it appeared as the 27 June 2012 episode. The code is available for download . I have some more index related posts coming up so I wanted to make sure everything was in one place


In part one, I introduced a function which queries the Windows Index using filter parameters like

  • "Contains(*,’Stingray’)"
  • "System.Keywords = ‘Portfolio’ "
  • "System.Photo.CameraManufacturer LIKE ‘CAN%’ "
  • "System.image.horizontalSize > 1024"

In part two, I showed how these parameters could be simplified to

  • Stingray (A word on its own becomes a contains term)
  • Keyword=Portfolio (Keyword, without the S is an alias for System.Keywords and quotes will be added automatically))
  • CameraManufacturer=CAN* (* will become %, and = will become LIKE, quotes will be added and CameraManufacturer will be prefixed with System.Photo)
  • Width > 1024 (Width is an alias or System.image.horizontalsize, and quotes are not added around numbers).

There is one remaining issue. PowerShell is designed so that one command’s output becomes another’s input. This function isn’t going to do much with Piped input: I can’t see another command spitting out search terms for this one, nor can I multiple paths being piped in. But the majority of items found by a search will be files: and so it should be possible to treat them like files, piping them into copy-item or whatever.
The following was my first attempt at transforming the data rows into something more helpful

$Provider= "Provider=Search.CollatorDSO; Extended Properties=’Application=Windows’;"
$adapter = new-object system.data.oledb.oleDBDataadapter -argument $SQL, $Provider
$ds      = new-object system.data.dataset
if ($adapter.Fill($ds))
    
{ foreach ($row in $ds.Tables[0])
            {if ($row."System.ItemUrl" -match "^file:")
                  {$obj = New-Object psobject -Property @{
                                Path = (($row."System.ItemUrl" -replace "^file:","") -replace "\/","\")}}
             Else {$obj = New-Object psobject -Property @{Path = $row."System.ItemUrl"}
}
             Add-Member -force -Input $obj -Name "ToString" -MemberType "scriptmethod" -Value {$this.path}
             foreach ($prop in (Get-Member -InputObject $row -MemberType property |
                                    where-object {$row."$($_.name)" -isnot [system.dbnull] }))
                  { Add-Member -ErrorAction "SilentlyContinue" -InputObject $obj -MemberType NoteProperty `
                               -Name (($prop.name -split "\." )[-1]) -Value $row."$($prop.name)"
                 
}
             foreach ($prop in ($PropertyAliases.Keys |
                                    Where-Object {$row."$($propertyAliases.$_)" -isnot [system.dbnull] }))
                  { Add-Member -ErrorAction "SilentlyContinue" -InputObject $obj ` -MemberType AliasProperty ` 
                               -Name $prop ` -Value ($propertyAliases.$prop -split "\." )[-1]
                  }
             $obj
            }
      }
This is where the function spends most of its time, looping through the data creating a custom object for each row; non-file items are given a path property which holds the System.ItemURL property; for files the ItemUrl is processed into normal format (rather than file:c/users/james format) – in many cases the item can be piped into another command successfully if it just has a Path property.

Then, for each property (database column) in the row a member is added to the custom object with a shortened version of the property name and the value (assuming the column isn’t empty).
Next, alias properties are added using the definitions in $PropertyAliases.
Finally some standard members get added. In this version I’ve pared it down to a single method, because several things expect to be able to get the path for a file by calling its tostring() method.

When I had all of this working I tried to get clever. I added aliases for all the properties which normally appear on a System.IO.FileInfo object and even tried fooling PowerShell’s formatting system into treating my file items as a file object, something that only needs one extra line of code
$Obj.psobject.typenames.insert(0, "SYSTEM.IO.FILEINFO")
Pretending a custom object is actually another type seems dangerous, but everything I tried seemed happy provided the right properties were present. The formatting worked except for the "Mode" column. I found the method which that calculates .Mode for FILEINFO objects, but it needs a real FILEINFO object. It was easy enough to get one – I had the path and it only needs a call to Get‑Item but I realized that if I was getting a FILEINFO object anywhere in the process, then it made more sense to add extra properties to that object and dispense with the custom object. I added an extra switch -NoFiles to supress this behaviour
So the code then became
$Provider ="Provider=Search.CollatorDSO; Extended Properties=’Application=Windows’;"
$adapter  = new-object system.data.oledb.oleDBDataadapter -argument $SQL, $Provider
$ds       = new-object system.data.dataset
if ($adapter.Fill($ds))
     { foreach ($row in $ds.Tables[0])
        
 { if (($row."System.ItemUrl" -match "^file:") -and (-not $NoFiles)) 
                 {$obj = Get-item -Path (($row."System.ItemUrl" -replace "^file:","") -replace "\/","\")}
            Else {$obj = New-Object psobject -Property @{Path = $row."System.ItemUrl"}
                  Add-Member -force -Input $obj -Name "ToString" -MemberType "scriptmethod" -Value {$this.path} 
          }
         
ForEach...
 
The initial code was 36 lines, making the user input more friendly took it to 60 lines, and the output added about another 35 lines, bring the total to 95.
There were 4 other kinds of output I wanted to produce:

  • Help. I added comment-based-help with plenty of examples. It runs to 75 lines making it the biggest constituent in the finished product.
    In addition I have 50 lines that are comments or blank for readability as insurance against trying to understand what those regular expressions do in a few months’ time – but there are only 100 lines of actual code.
  • A –list switch which lists the long and short names for the fields (including aliases)
  • Support for the –Debug switch – because so many things might go wrong, I have write‑debug $SQL immediately before I carry out the query, and to enable it that I have
    [CmdletBinding()] before I declare the parameters.
  • A –Value switch which uses the GROUP ON… OVER… search syntax so I can see what the possible values are in a column.
    GROUP ON queries are unusual because they fill the dataset with TWO tables.
    GROUP ON System.kind OVER ( SELECT STATEMENT) will produce a something like this as the first table.

SYSTEM.KIND   Chapter
-----------   -------
communication 0
document      1
email         2
folder        3
link          4
music         5
picture       6
program       7
recordedtv    8

The second table is the normal data suitably sorted. In this case it has all the requested fields grouped by kind plus one named "Chapter", which ties into the first table. I’m not really interested in the second table but the first helps me know if I should enter "Kind=image", "Kind=Photo" or "Kind=Picture"

I have a Select-List function which I use in my configurator and Hyper-V library, and with this I can choose which recorded TV program to watch, first selecting by title, and then if there is more than one episode, by episode.
$t=(Get-IndexedItem -Value "title" -filter "kind=recordedtv" -recurse |
            Select-List -Property title).title
start (Get-IndexedItem -filter "kind=recordedtv","title='$t'" -path |
           
 Select-List -Property ORIGINALBROADCASTDATE,PROGRAMDESCRIPTION)


In a couple of follow up posts I’ll show some of the places I use Get-IndexedItem. But for now feel free to download the code and experiment with it.

Advertisements

Using the Windows index to search from PowerShell:Part Two – Helping with user input

Filed under: Uncategorized — jamesone111 @ 10:46 am

Note: this was originally written for the Hey,Scripting guy blog where it appeared as the 26 June 2012 episode. The code is available for download . I have some more index related posts coming up so I wanted to make sure everything was in one place


In part one I developed a working PowerShell function to query the Windows index. It outputs data rows which isn’t the ideal behaviour and I’ll address that in part three; in this part I’ll address another drawback: search terms passed as parameters to the function must be "SQL-Ready". I think that makes for a bad user experience so I’m going to look at the half dozen bits of logic I added to allow my function to process input which is a little more human. Regular expressions are the way to recognize text which must be changed, and I’ll pay particular attention to those as I know I lot of people find them daunting.

Replace * with %

SQL statements use % for wildcard, but selecting files at the command prompt traditionally uses *. It’s a simple matter to replace – but for the need to "escape" the* character, replacing * with % would be as simple as a –replace statement gets:
$Filter = $Filter -replace "\*","%"
For some reason I’m never sure if the camera maker is Canon or Cannon so I’d rather search for Can*… or rather Can%, and that replace operation will turn "CameraManufacturer=Can*" into "CameraManufacturer=Can%". It’s worth noting that –replace is just as happy to process an array of strings in $filter as it is to process one.

Searching for a term across all fields uses "CONTAINS (*,’Stingray’)", and if the -replace operation changes* to % inside a CONTAINS() the result is no longer a valid SQL statement. So the regular expression needs to be a little more sophisticated, using a "negative look behind"
$Filter = $Filter -replace " "(?<!\(\s*)\*","%"

In order to filter out cases like CONTAINS(*… , the new regular expression qualifies "Match on *",with a look behind – "(?<!\(\s*)" – which says "if it isn’t immediately preceded by an opening bracket and any spaces". In regular expression syntax (?= x) says "look ahead for x" and (?<= x) says "Look behind for x" (?!= x) is “look ahead for anything EXCEPT x” and (?<!x) is “look behind for anything EXCEPT x” these will see a lot of use in this function. Here (?<! ) is being used, open bracket needs to be escaped so is written as \( and \s* means 0 or more spaces.

Convert "orphan" search terms into ‘contains’ conditions.

A term that needs to be wrapped as a "CONTAINS" search can be identified by the absence of quote marks, = , < or > signs or the LIKE, CONTAINS or FREETEXT search predicates. When these are present the search term is left alone, otherwise it goes into CONTAINS, like this.
$filter = ($filter | ForEach-Object {
    if  ($_ -match "'|=|<|>|like|contains|freetext") 
          
{$_}
    else   {"Contains(*,'$_')"}
})

Put quotes in if the user omits them.

The next thing I check for is omitted quote marks. I said I wanted to be able to use Can*, and we’ve seen it changed to Can% but the search term needs to be transformed into "CameraManufacturer=’Can%’ ". Here is a –replace operation to do that.
$Filter = $Filter -replace "\s*(=|<|>|like)\s*([^'\d][^\s']*)$",' $1 ''$2'' '
This is a more complex regular expression which takes a few moments to understand

Regular expression

Meaning

Application

\s*(=|<|>|like)\s*
([^’\d][^\s’]*)$

Any spaces (or none)

 

\s*(=|<|>|like)\s*
([^’\d][^s’]*)$

= or < or > or "Like"

CameraManufacturer=Can%

\s*(=|<|>|like)\s*
([^’\d][^\s’]*)$

Anything which is NOT a ‘ character
or a digit

CameraManufacturer=Can%

\s*(=|<|>|like)\s*
([^’\d][^\s’]*)$

Any number of non-quote,
non-space characters (or none)

CameraManufacturer=Can%

\s*(=|<|>|like)\s*
([^’\d][^\s’]*)$

End of line

\s*(=|<|>|like)\s*
([^’\d][^\s’]*)$

Capture the enclosed sections
as matches

$Matches[0]= "=Can%"
$Matches[1]= "="
$Matches[2]= "Can%"

‘ $1 ”$2” ‘0

Replace Matches[0] ("=Can%")
with an expression which uses the
two submatches "=" and "can%".

= ‘Can%’

Note that the expression which is being inserted uses $1 and $2 to mean matches[1] and[2] – if this is wrapped in double quote marks PowerShell will try to evaluate these terms before they get to the regex handler, so the replacement string must be wrapped in single quotes. But the desired replacement text contains single quote marks, so they need to be doubled up.

Replace ‘=’ with ‘like’ for Wildcards

So far, =Can* has become =’Can%’, which is good, but SQL needs "LIKE" instead of "=" to evaluate a wildcard. So the next operation converts "CameraManufacturer = ‘Can%’ "into "CameraManufacturer LIKE ‘Can%’ ":
$Filter = $Filter -replace "\s*=\s*(?='.+%'\s*$)" ," LIKE "

Regular expression

Meaning

Application

\s*=\s*(?=’.+%’\s*$)

= sign surrounded by any spaces

CameraManufacturer = ‘Can%’

\s*=\s*(?=.+%’\s*$)

A quote character

CameraManufacturer = Can%’

\s*=\s*(?=’.+%’\s*$)

Any characters (at least one)

CameraManufacturer = ‘Can%’

\s*=\s*(?=’.+%’\s*$)

% character followed by ‘

CameraManufacturer = ‘Can%’

\s*=\s*(?=’.+%’\s*$)

Any spaces (or none)
followed by end of line

\s*=\s*(?=‘.+%’\s*$)

Look ahead for the enclosed expression but don’t include it in the match

$Matches[0] = "="
(but only if ‘Can%’ is present)

Provide Aliases

The steps above reconstruct "WHERE" terms to build syntactically correct SQL, but what if I get confused and enter “CameraMaker” instead of “CameraManufacturer” or “Keyword” instead of “Keywords” ? I need Aliases – and they should work anywhere in the SQL statement – not just in the "WHERE" clause but in "ORDER BY" as well.
I defined a hash table (a.k.a. a "dictionary", or an "associative array") near the top of the script to act as a single place to store the aliases with their associated full canonical names, like this:
$PropertyAliases = @{
    Width       = "System.Image.HorizontalSize";
    Height      = "System.Image.VerticalSize";
    Name        = "System.FileName";
    Extension   = "System.FileExtension";
    Keyword     = "System.Keywords";
    CameraMaker = "System.Photo.CameraManufacturer"
}
Later in the script, once the SQL statement is built, a loop runs through the aliases replacing each with its canonical name:
$PropertyAliases.Keys | ForEach-Object {
    $SQL= $SQL -replace "(?<=\s)$($_)(?=\s*(=|>|<|,|Like))",$PropertyAliases[$_]
}
A hash table has .Keys and .Values properties which return what is on the left and right of the equals signs respectively. $hashTable.keyName or $hashtable[keyName] will return the value, so $_ will start by taking the value "width", and its replacement will be $PropertyAliases["width"] which is "System.Image.HorizontalSize", on the next pass through the loop, "height" is replaced and so on. To ensure it matches on a field name and not text being searched for, the regular expression stipulates the name must be preceded by a space and followed by "="or "like" and so on.

Regular expression

Meaning

Application

(?<=\s)Width(?=\s*(=|>|<|,|Like))

The literal text "Width"

Width > 1024

(?<=\s)Width(?=\s*(=|>|<|,|Like))

A Space

(?<=\s)Width(?=\s*(=|>|<|,|Like))

Look behind for the enclosed expression
but don’t include it in the match

$Matches[0] = "Width"
(but only if a leading space is present)

(?<=\s)Width(?=\s*(=|>|<|,|Like))

any spaces (or none)

(?<=\s)Width(?=\s*(=|>|<|,|Like))

The literal text "Like", or any of the characters comma, equals, greater than or less than

Width > 1024

(?<=\s)Width(?=\s*(=|>|<|,|Like))

Look ahead for the enclosed expression
but don’t include it in the match

$Matches[0] = "Width"
(but only if " >" is present)

If the prefix is omitted put the correct one in.

This builds on the ideas we’ve seen already. I want the list of fields and prefixes to be easy to maintain, so just after I define my aliases I define a list of field types
$FieldTypes = "System","Photo","Image","Music","Media","RecordedTv","Search"
For each type I define two variables, a prefix and a fieldslist : the names must be FieldtypePREFIX and FieldTypeFIELDS – the reason for this will become clear shortly but here is what they look like
$SystemPrefix = "System."
$SystemFields = "ItemName|ItemUrl"
$PhotoPrefix  = "System.Photo."
$PhotoFields  = "cameramodel|cameramanufacturer|orientation"
In practice the field lists are much longer – system contains 25 fieldnames not just the two shown here. The lists are written with "|" between the names so they become a regular expression meaning "ItemName or ItemUrl Or …". The following code runs after aliases have been processed
foreach ($type in $FieldTypes) {
   $fields = (get-variable "$($type)Fields").value
   $prefix = (get-variable "$($type)Prefix").value 
   $sql    = $sql -replace "(?<=\s)(?=($Fields)\s*(=|>|<|,|Like))" , $Prefix
}
I can save repeating code by using Get-Variable in a loop to get $systemFields, $photoFields and so on, and if I want to add one more field, or a whole type I only need to change the variable declarations at the start of the script. The regular expression in the replace works like this:

Regular expression

Meaning

Application

(?<=\s)(?=(cameramanufacturer|
orientation)\s*(=|>|<|,|Like))"

Look behind for a space
but don’t include it in the match

 

(?<=\s)(?=(cameramanufacturer|
orientation
)\s*(=|>|<|,|Like))"

The literal text "orientation" or "cameramanufacturer"

CameraManufacturer LIKE ‘Can%’

(?<=\s)(?=(cameramanufacturer|
orientation)\s*(=|>|<|,|Like))"

any spaces (or none)

 

(?<=\s)(?=(cameramanufacturer|
orientation)\s*(=|>|<|,|Like))"

The literal text "Like", or any of the characters comma, equals, greater than or less than

CameraManufacturer LIKE ‘Can%’

(?<=\s)(?=(cameramanufacturer|
orientation)\s*(=|>|<|,|Like))"

Look ahead for the enclosed expression
but don’t include it in the match

$match[0] is the point between the leading space and "CameraManufacturer LIKE" but doesn’t include either.

We get the effect of an "insert" operator by using ‑replace with a regular expression that finds a place in the text but doesn’t select any of it.
This part of the function allows "CameraManufacturer LIKE ‘Can%’" to become "System.Photo CameraManufacturer LIKE ‘Can%’ " in a WHERE clause.
I also wanted "CameraManufacturer" in an ORDER BY clause to become "System.Photo CameraManufacturer". Very sharp-eyed readers may have noticed that I look for a Comma after the fieldname as well as <,>,=, and LIKE. I modified the code which appeared in part one so that when an ORDER BY clause is inserted it is followed by a trailing comma like this:
if ($orderby) { $sql += " ORDER BY " + ($OrderBy -join " , " ) + ","}

the new version will work with this regular expression but the extra comma will cause a SQL error and so it must be removed later.
When I introduced the SQL I said the SELECT statement looks like this:

SELECT System.ItemName, System.ItemUrl,      System.FileExtension, System.FileName, System.FileAttributes, System.FileOwner, 
       System.ItemType, System.ItemTypeText , System.KindText,     System.Kind,     System.MIMEType,       System.Size

Building this clause from the field lists simplifies code maintenance, and as a bonus anything declared in the field lists will be retrieved by the query as well as accepted as input by its short name. The SELECT clause is prepared like this:
if ($First) 
     {$SQL = "SELECT TOP $First "}
else {$SQL = "SELECT "}
foreach ($type in $FieldTypes)
     {$SQL +=((get-variable "$($type)Fields").value -replace "\|",", " ) + ", "}

This replaces the "|" with a comma and puts a comma after each set of fields. This means there is a comma between the last field and the FROM – which allows the regular expression to recognise field names, but it will break the SQL , so it is removed after the prefixes have been inserted (just like the one for ORDER BY).
This might seem inefficient, but when I checked the time it took to run the function and get the results but not output them it was typically about 0.05 seconds (50ms) on my laptop – it takes more time to output the results.
Combining all the bits in this part with the bits in part one turns my 36 line function into about a 60 line one as follows

Function Get-IndexedItem{
Param ( [Alias("Where","Include")][String[]]$Filter ,
        [Alias("Sort")][String[]]$OrderBy,
        [Alias("Top")][String[]]$First,
        [String]$Path,
        [Switch]$Recurse 
      )
$PropertyAliases = @{Width ="System.Image.HorizontalSize"; 
                    Height = "System.Image.VerticalSize"}
$FieldTypes      = "System","Photo"
$PhotoPrefix     = "System.Photo."
$PhotoFields     = "cameramodel|cameramanufacturer|orientation"
$SystemPrefix    = "System."
$SystemFields    = "ItemName|ItemUrl|FileExtension|FileName"
if ($First) 
     {$SQL = "SELECT TOP $First "}
else {$SQL = "SELECT "}
foreach ($type in $FieldTypes)
     {$SQL +=((get-variable "$($type)Fields").value -replace "\|",", ")+", " }
if ($Path -match "\\\\([^\\]+)\\.")
     {$SQL += " FROM $($matches[1]).SYSTEMINDEX WHERE "}
else {$SQL += " FROM SYSTEMINDEX WHERE "}
if ($Filter)
     {$Filter = $Filter -replace "\*","%"
      $Filter = $Filter -replace"\s*(=|<|>|like)\s*([^'\d][^\s']*)$",' $1 ''$2'' '
      $Filter = $Filter -replace "\s*=\s*(?='.+%'\s*$)" ," LIKE "
      $Filter = ($Filter | ForEach-Object {
          if ($_ -match "'|=|<|>|like|contains|freetext")
               {$_}
          else {"Contains(*,'$_')"}
      })
      $SQL += $Filter -join " AND "
    }
if ($Path)
    {if ($Path -notmatch "\w{4}:") {$Path = "file:" + $Path}
     $Path = $Path -replace "\\","/"
     if ($SQL -notmatch "WHERE\s$") {$SQL += " AND " }
     if ($Recurse) 
          {$SQL += " SCOPE = '$Path' "}
     else {$SQL += " DIRECTORY = '$Path' "}
}
if ($SQL -match "WHERE\s*$")
     { Write-warning "You need to specify either a path , or a filter." ; return }
if ($OrderBy) { $SQL += " ORDER BY " + ($OrderBy -join " , " ) + ","}
$PropertyAliases.Keys | ForEach-Object 
     { $SQL= $SQL -replace"(?<=\s)$($_)(?=\s*(=|>|<|,|Like))", $PropertyAliases.$_ }
foreach ($type in $FieldTypes)
   
{$fields = (get-variable "$($type)Fields").value
     $prefix = (get-variable "$($type)Prefix").value
     $SQL    = $SQL -replace "(?<=\s)(?=($Fields)\s*(=|>|<|,|Like))" , $Prefix
    }
$SQL = $SQL -replace "\s*,\s*FROM\s+" , " FROM "
$SQL = $SQL -replace "\s*,\s*$" , ""
$Provider="Provider=Search.CollatorDSO;"+ "Extended Properties=’Application=Windows’;"
$Adapter = new-object system.data.oledb.oleDBDataadapter -argument $SQL, $Provider
$DS     = new-object system.data.dataset
if ($Adapter.Fill($DS)) { $DS.Tables[0] }
}

In part 3 I’ll finish the function by turning my attention to output

Using the Windows index to search from PowerShell: Part one: Building a query from user input.

Filed under: Uncategorized — jamesone111 @ 10:43 am

Note: this was originally written for the Hey,Scripting guy blog where it appeared as the 25 June 2012 episode. The code is available for download . I have some more index related posts coming up so I wanted to make sure everything was in one place


I’ve spent some time developing and honing a PowerShell function that gets information from the Windows Index– the technology behind the search that is integrated into explorer in Windows 7 and Vista. The Index can be queried using SQL and my function builds the SQL query from user input, executes it and receives rows of data for all the matching items. In Part three, I’ll look at why rows of data aren’t the best thing for the function to return and what the alternatives might be. Part two will look at making user input easier – I don’t want to make an understanding SQL a prerequisite for using the function. In this part I’m going to explore the query process.

We’ll look at how at how the query is built in a moment, for now please accept that a ready-to-run query stored in the variable $SQL. Then it only takes a few lines of PowerShell to prepare and run the query

$Provider="Provider=Search.CollatorDSO;Extended Properties=’Application=Windows’;"
$adapter = new-object system.data.oledb.oleDBDataadapter -argument $sql, $Provider
$ds      = new-object system.data.dataset
if ($adapter.Fill($ds)) { $ds.Tables[0] }

The data is fetched using oleDBDataAdapter and DataSet objects; the adapter is created specifying a "provider" which says where the data will come from and a SQL statement which says what is being requested. The query is run when the adapter is told to fill the dataset. The .fill() method returns a number, indicating how many data rows were returned by the query – if this is non-zero, my function returns the first table in the dataset. PowerShell sees each data row in the table as a separate object; and these objects have a property for each of the table’s columns, so a search might return something like this:

SYSTEM.ITEMNAME : DIVE_1771+.JPG
SYSTEM.ITEMURL : file:C:/Users/James/pictures/DIVE_1771+.JPG
SYSTEM.FILEEXTENSION : .JPG
SYSTEM.FILENAME : DIVE_1771+.JPG
SYSTEM.FILEATTRIBUTES : 32
SYSTEM.FILEOWNER : Inspiron\James
SYSTEM.ITEMTYPE : .JPG
SYSTEM.ITEMTYPETEXT : JPEG Image
SYSTEM.KINDTEXT : Picture
SYSTEM.KIND : {picture}
SYSTEM.MIMETYPE : image/jpeg
SYSTEM.SIZE : 971413

There are lots of fields to choose from, so the list might be longer. The SQL query to produce it looks something like this.

SELECT System.ItemName, System.ItemUrl,        System.FileExtension,
       System.FileName, System.FileAttributes, System.FileOwner, 
       System.ItemType, System.ItemTypeText ,  System.KindText, 
       System.Kind,     System.MIMEType,       System.Size
FROM   SYSTEMINDEX
WHERE  System.Keywords = 'portfolio' AND Contains(*,'stingray')

In the finished version of the function, the SELECT clause has 60 or so fields; the FROM and WHERE clauses might be more complicated than in the example and an ORDER BY clause might be used to sort the data.
The clauses are built using parameters which are declared in my function like this:

Param ( [Alias("Where","Include")][String[]]$Filter ,
        [Alias("Sort")][String[]]$orderby,
        [Alias("Top")][String[]]$First,
        [String]$Path,
        [Switch]$Recurse
)

In my functions I try to use names already used in PowerShell, so here I use -Filter and -First but I also define aliases for SQL terms like WHERE and TOP. These parameters build into the complete SQL statement, starting with the SELECT clause which uses -First

if ($First) {$SQL = "SELECT TOP $First "}
else        {$SQL = "SELECT "}
$SQL += " System.ItemName, System.ItemUrl " # and the other 58 fields

If the user specifies –First 1 then $SQL will be "SELECT TOP 1 fields"; otherwise it’s just "SELECT fields". After the fields are added to $SQL, the function adds a FROM clause. Windows Search can interrogate remote computers, so if the -path parameter is a UNC name in the form \\computerName\shareName the SQL FROM clause becomes FROM computerName.SYSTEMINDEX otherwise it is FROM SYSTEMINDEX to search the local machine.
A regular expression can recognise a UNC name and pick out the computer name, like this:

if ($Path -match "\\\\([^\\]+)\\.") {
$sql += " FROM $($matches[1]).SYSTEMINDEX WHERE "
}
else {$sql += " FROM SYSTEMINDEX WHERE "}

The regular expression in the first line of the example breaks down like this

Regular expression

Meaning

Application

\\\\([^\\]+)\\.

2 \ characters: "\" is the escape character, so each one needs to be written as \\

\\computerName\shareName

\\\\([^\\]+)\\.

Any non-\ character, repeated at least once

\\computerName\shareName

"\\\\([^\\]+)\\."

A \,followed by any character

\\computerName\shareName

"\\\\([^\\]+)\\."

Capture the section which is enclosed by the brackets as a match

$matches[0] =\\computerName\s
$matches[1] =computerName

I allow the function to take different parts of the WHERE clause as a comma separated list, so that
-filter "System.Keywords = 'portfolio'","Contains(*,'stingray')"
is equivalent to
-filter "System.Keywords = 'portfolio' AND Contains(*,'stingray')"

Adding the filter just needs this:

if ($Filter) { $SQL += $Filter -join " AND "}

The folders searched can be restricted. A "SCOPE" term limits the query to a folder and all of its subfolders, and a "DIRECTORY" term limits it to the folder without subfolders. If the request is going to a remote server the index is smart enough to recognise a UNC path and return just the files which are accessible via that path. If a -Path parameter is specified, the function extends the WHERE clause, and the –Recurse switch determines whether to use SCOPE or DIRECTORY, like this:

if ($Path){
     if ($Path -notmatch "\w{4}:") {
           $Path = "file:" + (resolve-path -path $Path).providerPath
     }
     if ($sql -notmatch "WHERE\s*$") {$sql += " AND " }
     if ($Recurse)                   {$sql += " SCOPE = '$Path' " }
      else                           {$sql += " DIRECTORY = '$Path' "}
}

In these SQL statements, paths are specified in the form file:c:/users/james which isn’t how we normally write them (and the way I recognise UNC names won’t work if they are written as file://ComputerName/shareName). This is rectified by the first line inside the If ($Path) {} block, which checks for 4 "word" characters followed by a colon. Doing this will prevent ‘File:’ being inserted if any protocol has been specified –the same search syntax works against HTTP:// (though not usually when searching on your workstation), MAPI:// (for Outlook items) and OneIndex14:// (for OneNote items). If a file path has been given I ensure it is an absolute one – the need to support UNC paths forces the use of .ProviderPath here. It turns out there is no need to convert \ characters in the path to /, provided the file: is included.
After taking care of that, the operation -notmatch "WHERE\s*$" sees to it that an "AND" is added if there is anything other than spaces between WHERE and the end of the line (in other words if any conditions specified by –filter have been inserted). If neither -Path nor -filter was specified there will be a dangling WHERE at the end of the SQL statement .Initially I removed this with a ‑Replace but then I decided that I didn’t want the function to respond to a lack of input by returning the whole index so I changed it to write a warning and exit. With the WHERE clause completed, final clause in the SQL statement is ORDER BY, which – like WHERE – joins up a multi-part condition.

if ($sql -match "WHERE\s*$") {
     Write-warning "You need to specify either a path, or a filter."
     Return
}
if ($orderby) { $sql += " ORDER BY " + ($OrderBy -join " , ") }

When the whole function is put together it takes 3 dozen lines of PowerShell to handle the parameters, build and run the query and return the result. Put together they look like this:

Function Get-IndexedItem{
Param ( [Alias("Where","Include")][String[]]$Filter ,
         [Alias("Sort")][String[]]$OrderBy,
         [Alias("Top")][String[]]$First,
         [String]$Path,
         [Switch]$Recurse
)
if ($First) {$SQL = "SELECT TOP $First "}
else        {$SQL = "SELECT "}
$SQL += " System.ItemName, System.ItemUrl " # and the other 58 fields
if ($Path -match "\\\\([^\\]+)\\.") {
              $SQL += "FROM $($matches[1]).SYSTEMINDEX WHERE "
}
else         {$SQL += " FROM SYSTEMINDEX WHERE "}
if ($Filter) {$SQL += $Filter -join " AND "}
if ($Path) {
    if ($Path -notmatch "\w{4}:") {$Path = "file:" + $Path}
    $Path = $Path -replace "\\","/"
    if ($SQL -notmatch "WHERE\s*$") {$SQL += " AND " }
    if ($Recurse)                   {$SQL += " SCOPE = '$Path' " }
    else                            {$SQL += " DIRECTORY = '$Path' "}
}
if ($SQL -match "WHERE\s*$") {
    Write-Warning "You need to specify either a path or a filter."
    Return
}
if ($OrderBy) { $SQL += " ORDER BY " + ($OrderBy -join " , " ) }
$Provider = "Provider=Search.CollatorDSO;Extended Properties=’Application=Windows’;"
$Adapter  = New-Object system.data.oledb.oleDBDataadapter -argument $SQL, $Provider
$DS       = New-Object system.data.dataset
if ($Adapter.Fill($DS)) { $DS.Tables[0] }
}

The -Path parameter is more user-friendly as a result of the way I handle it, but I’ve made it a general rule that you shouldn’t expect the user to know too much of the underlying syntax ,and at the moment the function requires too much knowledge of SQL: I don’t want to type

Get-Indexed-Item –Filter "Contains(*,'Stingray')", "System.Photo.CameraManufacturer Like 'Can%'"

and it seems unreasonable to expect anyone else to do so. I came up with this list of things the function should do for me.

  • Don’t require the user to know whether a search term is prefixed with SYSTEM. SYSTEM.DOCUMENT, SYSTEM.IMAGE or SYSTEM.PHOTO. If the prefix is omitted put the correct one in.
  • Even without the prefixes some fieldnames are awkward for example "HorizontalSize" and "VerticalSize" instead of width and height. Provide aliases
  • Literal text in searches needs to be enclosed in single quotes, insert quotes if the user omits them.
  • A free text search over all fields is written as Contains(*,’searchTerm’) , convert "orphan" search terms into contains conditions.
  • SQL uses % (not *) for a wild card – replace* with % in filters to cope with users putting the familiar *
  • SQL requires the LIKE predicate(not =) for wildcards : replace = with like for Wildcards

In Part two, I’ll look at how I do those things.

June 25, 2012

Lonesome George

Filed under: Uncategorized — jamesone111 @ 3:15 pm

At Easter I was in the Galapagos Islands; work had taken me to Ecuador diving in the Galapagos was too good an opportunity to miss. Mainland Ecuador was a country I knew little about and two weeks working in the capital (Quito, just South of the Equator, and 10,000 feet up in the Andes) doesn’t qualify me as an expert. The client there was a good one to work with, and what I saw of the city (a bunch of Taxi rides and a bus tour on the one day we weren’t working) mean I’d go back if asked. Travel wasn’t good and the return flights so bad that I’ve vowed never to fly with Iberia again. Flying to the islands the plane had a problem which meant if it landed it couldn’t take off again so having got within sight of the islands we had to go all the way back to Quito and get another plane. Down at sea level the heat was ferocious, the transportation scary and the insect bites the worst I’ve had. But the diving… different kinds of Sharks (including a close encounter with a group of Hammerheads), Seal Lions, Turtles, Rays (including a Manta encounter on the very first dive which set the tone) – I’d put up with a lot for that. And if some search engine has steered you here, I dived with Scuba Iguana, and if I manage to go back I’ll dive with them again.

The Scuba place was pretty much next door to the Darwin station: home of giant tortoises and a tortoise breading programme. Galapagos comes from the Spanish word for saddle because the shape of the giant tortoise’s shell looked like a traditional saddle. I also learnt that some languages – including Spanish – don’t have distinct words for Tortoise and (Marine) Turtles.  The sex of tortoises is determined by the temperature at which the eggs incubate and the breeding programme gathers eggs, incubates them to get extra females, and looks after the baby tortoises keeping them safe human introduced species (like rats) which feed on eggs and baby tortoises. Each island’s tortoises are different so the eggs and hatchlings are marked up so they go back to right island. But there is no breeding programme for Pinta island (also named Abingdon island by the Royal Navy. According to a story told by Stephen Fry on QI, sailors ate giant tortoises and found them very good.)  A giant Tortoise was found on Pinta; but a search over several years failed to find a second. So he – Lonesome George, was brought to the Darwin station in the 1970s. No one knows for sure how he was then. All efforts to find a mate for him failed: so George lived out his final decades as the only known example of  Geochelone nigra abingdoni  the Pinta Galapagos tortoise.
On Easter Sunday I walked up see George and the giants from other islands who live at the station. George was keeping out of the sun; he shared an enclosure and I wondered what he made of the other species – if somewhere in that ancient reptile brain lurked a memory of others just like him, a template into which the other tortoises didn’t quite fi.

Later in trip I was asked to help with some work on survey being prepared about Hammerhead sharks. I was told they estimated as having a 20% chance of becoming extinct in the next 100 years. This statistic is quite hard to digest: my chances of being extinct in 100 years are close to 100%: so my contribution to the survey was suggest that telling people if things continue as they are the chances of seeing Hammerheads on a dive in 5 years will be X amount less than today. It’s not fair, but we care more about some species than others and I hope there will still be Hammerheads for my children to see in a few years. Sadly they won’t get the chance to see the Pinta Tortoise.

June 22, 2012

Windows Phone 8. Some Known Knowns, and Known Unknowns

Filed under: Uncategorized — jamesone111 @ 10:23 am

Earlier this week Microsoft held its Windows Phone Summit where it made a set of announcements about the next generation of Windows Phone – Windows phone 8 in summary these were

  • Hardware Support for Multi-core processors , Additional Screen resolutions, Removable memory cards, NFC
  • Software The Windows 8 core is now the base OS, Support for native code written in C or C++ (meaning better games), IE 10, new mapping (from Nokia), a speech API which apps can use. Better business-oriented features, VOIP support, a new “Wallet” experience and in-app payments, and a new start screen.

This summit came hot on the heels of the Surface tablet launch, which seemed to be a decision by Microsoft that making hardware for Windows 8 was too important to be left to the hardware makers. The first thing I noted about phone announcement was the lack of a Microsoft branded phone. I’ve said that Microsoft should make phones itself since before the first Windows Mobile devices appeared – when Microsoft was talking about “Stinger” internally; I never really bought any of the reasons given for not doing so. But I’d be astounded if Nokia didn’t demand that Microsoft promise not to make a phone (whether there’s a binding agreement or just an understanding between Messrs Elop and Ballmer we don’t know). Besides Nokia Microsoft has 3 other device makers on-board: HTC have devices for every Microsoft mobile OS since 2000, but also have a slew of devices for Android, Samsung were a launch partner for Phone 7 but since then have done more with Android ; LG were in the line up for the Windows Phone 7 launch and are replaced by Huawei.  What these 3  feel about Nokia mapping technology is a matter of guesswork but depends on branding and licensing terms.

There are some things we think we know, but actually they are things we know that we don’t know.

  • Existing 7.x phones will not run Windows Phone 8 but will get an upgrade to Windows 7.8. I have an HTC Trophy which I bought in November 2010 and it has gone from 7.0 to 7.5 and I’ll be quite happy to get 7.8. on a 2 year old phone. Someone who just bought a Nokia Lumia might not feel quite so pleased.  What will be in 7.8 ? The new start screen has been shown. But will it have IE10 ? Will it have the new mapping and Speech capabilities. The Wallet, In-app-payments ?  This matters because….
  • Programs specifically targeting Windows Phone 8 won’t run on 7. Well doh! Programs targeting Windows 7 don’t run XP. But what programs will need to target the new OS ? Phone 7 programs are .NET programs and since .NET compiles to an intermediate language not to CPU instructions, a program which runs on Windows 8-RT (previous called Windows on ARM) should go straight onto a Windows 8-intel machine (but not vice versa), and Phone 7 programs will run on Phone 8. An intriguing comment at the launch says the Phone 8 emulator runs on Hyper-V; unless Hyper-V has started translating between different CPU instruction sets this means the Emulated phone has an Intel CPU but it doesn’t matter because it is running .NET intermediate language not binary machine code. So how many new programs will be 8-specific ? – Say Skype uses the VOIP support and in-app payments for calling credit. Will users with old phones be stuck with the current version of Skype? Do they get a new version where those features don’t light up. Or do they get the same experience as someone with a new phone. If the only things which are Phone 8 specific are apps which need Multiple cores (or other newly supported hardware) there would never have been any benefit upgrading a single core phone from 7 to 8.  
  • Business support. This covers many possibilities, but what is in and what is out ? Will the encryption send keys back to base as bit-locker does ? Will there be support for Direct-Access ? Corporate wireless security ? Will adding Exchange server details auto-configure the phone to corporate settings (like the corporate app-store) . Will it be possible to block updates ? This list goes on and on.

It’s going to interesting to see what this does for Microsoft and for Nokia’s turn-round.

June 19, 2012

Microsoft’s new surface. All it needs is love.

Filed under: General musings — jamesone111 @ 4:44 pm

People who buy Apple stuff worry me. Their attachment to one brand of equipment is somewhere between addiction and religious fervour. It is Love.

For over 20 years I’ve used Microsoft stuff, because it is simply a better way to get a job done. Some of the people in my office use Macs and can’t do their job without creating a virtual PC running Windows. It doesn’t reduce their love for their Macs. As the saying goes: The opposite of love is not hate, it’s indifference. And its hard to feel anything but indifferent to the products of any of the major PC makers. The Dell in front of me it is well made, well specified and does everything I ask and more. But Love it ? Most PC users will tell you loving a computer is crazy (which is why Apple folk are so disturbing). 

I think Microsoft’s new surface tablets are trying to create a Windows machine which people – if they can’t actually Love – feel more than indifferent about.  Surface is two Machines: one uses an ARM processor and runs the new Windows RT, so won’t run all your existing software. The other uses an Intel chip – and is thicker and heavier to give it 1/3 more battery life (roughly 130cc and 225g more to get 40 Watt Hours instead of 30.), but that means it should work with existing software and USB devices. I can plug in a mouse and keyboard, and attach two monitors via display port and have a system just like the one I have today. Unplug it and I can use it iPad style or take the “touch Cover” keyboard * and write documents or use the Pen to annotate documents if that’s what I choose. The ARM version has office built in, but no pen and a different video connector (so probably only 1 screen). Even with its smaller battery it will probably run for longer (though like the shipment date and price, battery life is yet to be confirmed).

Mary-Jo Foley wrote of the launch of Surface “It’s the end of an era. Or maybe the start of a new one.” Indeed. Microsoft began by providing OEMs (including Apple) with BASIC, then with Operating Systems. OEMs where very much the Geese that laid golden eggs for Microsoft – during the 10 years I worked at there (not in the OEM part of the business) there were times when I felt that the company forgot the problem with Geese is they produce a lot of … guano. The OEMs have been poor on design and innovation for a long time: Bloomberg business week no-less talks about how the PC industry should be shamed by Surface, and talks about recent years of PC Development as “the great stagnation”. The Bloomberg piece puts some blame on Microsoft and Intel for taking too much from the OEMs, I doubt that if the chips and OS had cost less the difference would have gone into innovations that add value. That lack of added value means margins on PCs aren’t great and that’s led manufacturers to take money to install all manner of junk on the machines they ship. The whole DoJ saga – which grew out of Microsoft trying to prevent OEMs installing software it didn’t like – left a situation where the company was required to sell Windows to anyone who wanted it and could not do anything to prevent an explosion of crapware. Lots of people are asking WHY has Microsoft chosen to get into making computers? There answer is either (a) It can make much more profit by selling computers and operating systems together. or (b) it has an idea of what a PC should be in the second decade of the 21st century and it doesn’t trust PC makers to deliver that by themselves.

If I were fielding calls from angry OEMs upset Microsoft arriving in their market I’d make the case that no OEM would have made a product like this: their lack of a similar product both lost them right to complain and forced Microsoft to do something: if they do have something , Microsoft is saying they won’t undercut on price, something we won’t know for sure until the units go on sale. Some people wonder if Microsoft will aim to make the same from selling the hardware as anyone else and make the price of Windows on top of that; or if they will think $x of margin per unit sold is the same whether they sell a computer/OS combination or they sell a licence to an OEM. The latter would make it very hard for OEMs to compete; but Microsoft trying to make desirable hardware profitably ? That’s a lot less of a threat to OEMs. Apple doesn’t sell as many units as Samsung, but the profit to Apple per unit is more than the retail price of the Samsung. When the iPhone was launched I questioned whether there was sufficient market for a phone at that price point: it has actually sold more units than Apple envisaged at the start: which proves one thing – people will pay handsomely for something they love. 

 

image* There are two keyboards. The thin pressure sensitive  “Touch cover” and a thicker moving key “Type cover” You can see the difference in this picture  (click for a bigger version)

Blog at WordPress.com.