James O'Neill's Blog

June 30, 2012

Using the Windows index to search from PowerShell: Part one: Building a query from user input.

Filed under: Uncategorized — jamesone111 @ 10:43 am

Note: this was originally written for the Hey,Scripting guy blog where it appeared as the 25 June 2012 episode. The code is available for download . I have some more index related posts coming up so I wanted to make sure everything was in one place

I’ve spent some time developing and honing a PowerShell function that gets information from the Windows Index– the technology behind the search that is integrated into explorer in Windows 7 and Vista. The Index can be queried using SQL and my function builds the SQL query from user input, executes it and receives rows of data for all the matching items. In Part three, I’ll look at why rows of data aren’t the best thing for the function to return and what the alternatives might be. Part two will look at making user input easier – I don’t want to make an understanding SQL a prerequisite for using the function. In this part I’m going to explore the query process.

We’ll look at how at how the query is built in a moment, for now please accept that a ready-to-run query stored in the variable $SQL. Then it only takes a few lines of PowerShell to prepare and run the query

$Provider="Provider=Search.CollatorDSO;Extended Properties=’Application=Windows’;"
$adapter = new-object system.data.oledb.oleDBDataadapter -argument $sql, $Provider
$ds      = new-object system.data.dataset
if ($adapter.Fill($ds)) { $ds.Tables[0] }

The data is fetched using oleDBDataAdapter and DataSet objects; the adapter is created specifying a "provider" which says where the data will come from and a SQL statement which says what is being requested. The query is run when the adapter is told to fill the dataset. The .fill() method returns a number, indicating how many data rows were returned by the query – if this is non-zero, my function returns the first table in the dataset. PowerShell sees each data row in the table as a separate object; and these objects have a property for each of the table’s columns, so a search might return something like this:

SYSTEM.ITEMURL : file:C:/Users/James/pictures/DIVE_1771+.JPG
SYSTEM.KIND : {picture}
SYSTEM.MIMETYPE : image/jpeg
SYSTEM.SIZE : 971413

There are lots of fields to choose from, so the list might be longer. The SQL query to produce it looks something like this.

SELECT System.ItemName, System.ItemUrl,        System.FileExtension,
       System.FileName, System.FileAttributes, System.FileOwner, 
       System.ItemType, System.ItemTypeText ,  System.KindText, 
       System.Kind,     System.MIMEType,       System.Size
WHERE  System.Keywords = 'portfolio' AND Contains(*,'stingray')

In the finished version of the function, the SELECT clause has 60 or so fields; the FROM and WHERE clauses might be more complicated than in the example and an ORDER BY clause might be used to sort the data.
The clauses are built using parameters which are declared in my function like this:

Param ( [Alias("Where","Include")][String[]]$Filter ,

In my functions I try to use names already used in PowerShell, so here I use -Filter and -First but I also define aliases for SQL terms like WHERE and TOP. These parameters build into the complete SQL statement, starting with the SELECT clause which uses -First

if ($First) {$SQL = "SELECT TOP $First "}
else        {$SQL = "SELECT "}
$SQL += " System.ItemName, System.ItemUrl " # and the other 58 fields

If the user specifies –First 1 then $SQL will be "SELECT TOP 1 fields"; otherwise it’s just "SELECT fields". After the fields are added to $SQL, the function adds a FROM clause. Windows Search can interrogate remote computers, so if the -path parameter is a UNC name in the form \\computerName\shareName the SQL FROM clause becomes FROM computerName.SYSTEMINDEX otherwise it is FROM SYSTEMINDEX to search the local machine.
A regular expression can recognise a UNC name and pick out the computer name, like this:

if ($Path -match "\\\\([^\\]+)\\.") {
$sql += " FROM $($matches[1]).SYSTEMINDEX WHERE "
else {$sql += " FROM SYSTEMINDEX WHERE "}

The regular expression in the first line of the example breaks down like this

Regular expression




2 \ characters: "\" is the escape character, so each one needs to be written as \\



Any non-\ character, repeated at least once



A \,followed by any character



Capture the section which is enclosed by the brackets as a match

$matches[0] =\\computerName\s
$matches[1] =computerName

I allow the function to take different parts of the WHERE clause as a comma separated list, so that
-filter "System.Keywords = 'portfolio'","Contains(*,'stingray')"
is equivalent to
-filter "System.Keywords = 'portfolio' AND Contains(*,'stingray')"

Adding the filter just needs this:

if ($Filter) { $SQL += $Filter -join " AND "}

The folders searched can be restricted. A "SCOPE" term limits the query to a folder and all of its subfolders, and a "DIRECTORY" term limits it to the folder without subfolders. If the request is going to a remote server the index is smart enough to recognise a UNC path and return just the files which are accessible via that path. If a -Path parameter is specified, the function extends the WHERE clause, and the –Recurse switch determines whether to use SCOPE or DIRECTORY, like this:

if ($Path){
     if ($Path -notmatch "\w{4}:") {
           $Path = "file:" + (resolve-path -path $Path).providerPath
     if ($sql -notmatch "WHERE\s*$") {$sql += " AND " }
     if ($Recurse)                   {$sql += " SCOPE = '$Path' " }
      else                           {$sql += " DIRECTORY = '$Path' "}

In these SQL statements, paths are specified in the form file:c:/users/james which isn’t how we normally write them (and the way I recognise UNC names won’t work if they are written as file://ComputerName/shareName). This is rectified by the first line inside the If ($Path) {} block, which checks for 4 "word" characters followed by a colon. Doing this will prevent ‘File:’ being inserted if any protocol has been specified –the same search syntax works against HTTP:// (though not usually when searching on your workstation), MAPI:// (for Outlook items) and OneIndex14:// (for OneNote items). If a file path has been given I ensure it is an absolute one – the need to support UNC paths forces the use of .ProviderPath here. It turns out there is no need to convert \ characters in the path to /, provided the file: is included.
After taking care of that, the operation -notmatch "WHERE\s*$" sees to it that an "AND" is added if there is anything other than spaces between WHERE and the end of the line (in other words if any conditions specified by –filter have been inserted). If neither -Path nor -filter was specified there will be a dangling WHERE at the end of the SQL statement .Initially I removed this with a ‑Replace but then I decided that I didn’t want the function to respond to a lack of input by returning the whole index so I changed it to write a warning and exit. With the WHERE clause completed, final clause in the SQL statement is ORDER BY, which – like WHERE – joins up a multi-part condition.

if ($sql -match "WHERE\s*$") {
     Write-warning "You need to specify either a path, or a filter."
if ($orderby) { $sql += " ORDER BY " + ($OrderBy -join " , ") }

When the whole function is put together it takes 3 dozen lines of PowerShell to handle the parameters, build and run the query and return the result. Put together they look like this:

Function Get-IndexedItem{
Param ( [Alias("Where","Include")][String[]]$Filter ,
if ($First) {$SQL = "SELECT TOP $First "}
else        {$SQL = "SELECT "}
$SQL += " System.ItemName, System.ItemUrl " # and the other 58 fields
if ($Path -match "\\\\([^\\]+)\\.") {
              $SQL += "FROM $($matches[1]).SYSTEMINDEX WHERE "
else         {$SQL += " FROM SYSTEMINDEX WHERE "}
if ($Filter) {$SQL += $Filter -join " AND "}
if ($Path) {
    if ($Path -notmatch "\w{4}:") {$Path = "file:" + $Path}
    $Path = $Path -replace "\\","/"
    if ($SQL -notmatch "WHERE\s*$") {$SQL += " AND " }
    if ($Recurse)                   {$SQL += " SCOPE = '$Path' " }
    else                            {$SQL += " DIRECTORY = '$Path' "}
if ($SQL -match "WHERE\s*$") {
    Write-Warning "You need to specify either a path or a filter."
if ($OrderBy) { $SQL += " ORDER BY " + ($OrderBy -join " , " ) }
$Provider = "Provider=Search.CollatorDSO;Extended Properties=’Application=Windows’;"
$Adapter  = New-Object system.data.oledb.oleDBDataadapter -argument $SQL, $Provider
$DS       = New-Object system.data.dataset
if ($Adapter.Fill($DS)) { $DS.Tables[0] }

The -Path parameter is more user-friendly as a result of the way I handle it, but I’ve made it a general rule that you shouldn’t expect the user to know too much of the underlying syntax ,and at the moment the function requires too much knowledge of SQL: I don’t want to type

Get-Indexed-Item –Filter "Contains(*,'Stingray')", "System.Photo.CameraManufacturer Like 'Can%'"

and it seems unreasonable to expect anyone else to do so. I came up with this list of things the function should do for me.

  • Don’t require the user to know whether a search term is prefixed with SYSTEM. SYSTEM.DOCUMENT, SYSTEM.IMAGE or SYSTEM.PHOTO. If the prefix is omitted put the correct one in.
  • Even without the prefixes some fieldnames are awkward for example "HorizontalSize" and "VerticalSize" instead of width and height. Provide aliases
  • Literal text in searches needs to be enclosed in single quotes, insert quotes if the user omits them.
  • A free text search over all fields is written as Contains(*,’searchTerm’) , convert "orphan" search terms into contains conditions.
  • SQL uses % (not *) for a wild card – replace* with % in filters to cope with users putting the familiar *
  • SQL requires the LIKE predicate(not =) for wildcards : replace = with like for Wildcards

In Part two, I’ll look at how I do those things.

Create a free website or blog at WordPress.com.

%d bloggers like this: