James O'Neill's Blog

February 16, 2010

Desktop Virtualization Hour

I had a mail earlier telling me about desktop virtualization hour , planned for 4PM (GMT) on March 18th. (That’s 9AM Seattle time, 5PM CET … you can work out the others I’m sure). More information and a downloadable meeting request are Here.

Some effort seems to be going into this one, which makes me think it is more than the average web cast.

This post originally appeared on my technet blog.

November 17, 2009

Making Word clouds (Part 1: how it works).

Filed under: Desktop Productivity,Music and Media,Office,Powershell — jamesone111 @ 9:41 am

I’ve been playing with word clouds on and off for the last couple of months, and finally I’ve decided the time has come to share what I have been doing. 


Word clouds turn up in all sorts of places, and I wanted to produce something which could take any text, be customized, and let me edit the the final version. The last requirement was key, because anything which produces a bitmap graphic at the end is not going to be easy to edit. I’ve seen it done with HTML tables but they are hard redesign (You can’t move words round easily). So it needed to be something like Visio or PowerPoint, or WMF which can produce a drawing containing text. Eventually I settled on PowerPoint. Although I’m using the beta of Office 2010 it relies on an object model for PowerPoint which hasn’t changed for several versions. And, since I only seem to program in PowerShell these days I wrote it in PowerShell. This gives me an easy way of taking any text – like Tweets from Twitter – and pushing it into a cloud. So I wrote my longest single PowerShell function yet to do the job.


wordCloud



  1. If Not already connected to PowerPoint, get connected. Start a new, blank, slide

  2. Get a list of “Noise words” from a file (I used a copy of the Noise.dat, which is part of Windows Search, as a starting point) and merge that list with any passed via the –ExtraNoiseWords parameter.

  3. Take text from a file (specified by the –Filename Parameter) , a PowerShell variable or expression (specified by the –text parameter) or from the pipeline in PowerShell, and  produce a “clean” set of words by:

    1. Removing anything which is not a space, letter, digit or apostrophe from the text.

    2. Removing `s at the end of words, and convert “_” to space.

    3. Splitting the text at spaces.

    4. Removing “words” which are either URLs or numbers .

  4. Count the occurrences of the words , and determine the “cut-off” frequency which words must meet to get into the final cloud (a –HowMany parameter sets the number of words, if this is the default value of 150 and the 150th non-noise word occurs 10 times, accept all words with 10 occurrences, even if that gives 160 non-noise words )

  5. if the –phrases switch is specified:

    1. Find phrases which contain any of the words which meet the cut-off frequency.

    2. Ignore those phrases which don’t make the cut-off frequency.

    3. Repeat the process looking for longer phrases which contain the phrases which were just found. Keep repeating until no phrases are found which meet the cut-off frequency.

    4. Add the phrases to the list of found words and reduce the count of their constituent words.

  6. Remove noise words, and two word phrases where one is a noise word, and words which do not reach the cut-off frequency, sort the list of words by frequency and then number of letters

  7. Store the words in a global variable ($words) so that the function can be re-run with the ‑useExisting switch. $words can be reviewed or exported and re-imported later.

  8. If the –noPlot  switch is specified , stop leaving the words and phrases found and their counts in $words.

  9. Set additional properties on the word:
    Set the font size for the word, scaled between the values set by the -minFont and –maxFont parameters (these default to 16 and 80 point respectively)
    Set the margins to the value specified in the –Margin parameter – Powerpoint uses quite generous margins by default, but script defaults to 0.
    If –RandomVertical and/or -RandomBold, and/or -RandomItalic values are specified, generate random number for each and if it are less than the specified number, set the text attributes to true
    If -Randomtwist is specified set the twistAngle attribute to a random amount up to the value of randomtwist
    If multiple rgb colours have been provided using the -RgbSet parameter, select one at random. If not the default PowerPoint colour will be used – normally black.
    If the -fontname parameter has been provided  and is a single name, set the word to use it it, if multiple fonts have been specified select one at random. If not font is specified the default PowerPoint font will be used. 

  10. Place the first (most common) word in a Powerpoint Shape (rectangle) at the centre of the slide, store the positions of its corners as properties of the word

  11. Place each remaining word in its own shape at the top left corner of the slide, setting its properties as already defined. Get its size from PowerPoint, then try to place it around the boundaries of each existing shape, stopping when the placement won’t overlap with any of the other placed shapes. (The starting point for this method was something I read by Chris Done it was here but his pages on word clouds only show up in Search Engine caches now.)  Note that the the more shapes which have been placed, the longer each new shape will take to place. Store the positions of the newly-placed shape’s corners as properties for use placing future shapes.

  12. Stop when either the number of words cannot be placed exceeds the value in –maxFailsToPlace (3 by default) or all words have been placed successfully.

In part 2 I’ll include the PowerShell code: the example above was from the Tweets about teched and I’ll show some more examples, with the command lines which were used. As you can see from the above, there are 20 or so parameters to explain.


Update Thanks Ian for letting me know that Chris’s Page is missing in action, the italicized part of point 11 has been changed accordingly.


tweetmeme_style = ‘compact’;
tweetmeme_url = ‘http://blogs.technet.com/jamesone/archive/2009/11/17/making-word-clouds-part-1-how-it-works.aspx’;

This post originally appeared on my technet blog.

February 23, 2009

How to use Advanced Queries in Windows search.

Filed under: Beta Products,Desktop Productivity,How to,Windows 7 — jamesone111 @ 4:57 pm

If there was one single feature about Windows Vista which made me say “I’m never ever going back to Windows XP” it was search and the way search was integrated everywhere.  True you can download Microsoft Search for Windows XP (and , as they say other kinds of desktop search are available) but it doesn’t permeate everywhere the way it does in Vista. In Windows 7 the search has got better still, with one important exception which I will come to in a moment.

Click for full size version

On the left you can see the result of typing in the search box ,and as you can see the search results are grouped by type. If you click on one of the of the titles it shows you just the matches of that type. However if you click “See more results” you get everything.

imageIt so happens I was looking for copies of my invoices from Virgin media which I know are in my inbox. The problem I have is I automatically go to “see more results”, and in any event you can see that there are a lot of other things in outlook – mostly from my news feed – about what Virgin group are doing. Click through to More Results and, if you’re used to vista’s search you’ll see we’ve lost something. In Vista this box had buttons to select different kinds of content. In Windows 7 it has gone …

 

However , you can use the Advanced Query Syntax (AQS) and boy is there a lot of it. Type Kind: and you get a list to choose from. Type size: you get some classifications, type: date: you get a calendar and bands of dates, isAttachment and HasAttachement let you pick yes or no. And a quick read of the AQS page shows there is a whole lot more you can enter. Helpfully when you enter a valid field name with the colon (:) after it it turns blue , an an invalid one stays back. 

Now I doubt if anyone is going to remember every single option for AQS – and since it narrows the search down it is sometimes going to be quicker to scroll through the search than find out the way to narrow it down. Still I’m a great believer that we all use our own subsets of the available functionality, so have a look at what you  can do, make use of the bits that help you and forget the rest.

This post originally appeared on my technet blog.

July 28, 2006

On powerpoint …

Filed under: Desktop Productivity,Office — jamesone111 @ 7:27 pm

Inspired by Darren’s recent post


Tagged as Microsoft Office Powerpoint

This post originally appeared on my technet blog.

March 28, 2006

Civilisation will come to an end because no one will understand what anybody else is saying

Filed under: Desktop Productivity,General musings — jamesone111 @ 1:22 pm


In yesterday’s post I said “I’ve got an interest in how we communicate – and how sometimes we say a lot without getting a message across”. Recently I was asked to review someone else’s document: it was full of jargon, which I didn’t think the target audience would understand: but worse than that I couldn’t find the idea the author wanted to convey in among the long stream of buzzwords he wanted to use.


It reminded me of something I had read in Sir Ernest Gowers book “Plain Words” some years ago, and I wrote my own version. Having circulated it to the amusement of others I’ll share it here.


It is alright to use long words where they are needed. But it is inconsiderate to use language which is inappropriate to the reader, and doing so may mean the message isn’t read or is misunderstood.


Or should I say:


There is no imperative to condemn the utilization of polysyllabic constructions where their necessity demands it. However: one should be cognizant of the fact that if, in the course of authorship, language is selected which repeatedly falls outside the experience of the reader – albeit within the context of constructions which in and of themselves constitute a valid syntactic framework – this has the potential to render communications sub optimal in a number of dimensions. The reader may feel that the author lacks empathy with their situation, or that unreasonable demands are being made of their ability to remain in sync with the sequence of ideas being expressed. Either of as a result of the feelings this engenders, or independently, the reader may determine that the investment of time required to discern the author’s true meaning shows sufficiently little return that such time would better deployed in the pursuit of alternative activities. Furthermore, the use of language and grammatical constructions of undue complexity means that, even with due diligence on the part of the reader, the possibility is created that what has been said, despite having a clear meaning in the mind of the author, has the ability to deliver more than a single semantic outcome, and in such a situation many possibilities exist for unforeseen repercussions resulting from the reader’s making an erroneous selection from the divergent interpretations available to them at the point of reading.”


The thing is – how often are we expected to read things like the latter, which say no more than the former ? Too often !


It’s a fair guess that I’ll come back to Gowers again.

This post originally appeared on my technet blog.

Create a free website or blog at WordPress.com.