Tuesday, June 14, 2011

IID

I woke up this morning thinking about keyword extraction systems.

Now, I should mention that I know nothing about keyword extraction systems, which are tools to take a body of data and develop tags or phrases to categorize the data's content. I used to know a teeny tiny bit about ten years ago, but now, nada. I'm doing good to remember the word 'corpus', which is (or at least was) the elegant way of referring to ' a bunch of stuff that we're trying to analyse'. Usually, you would see things like ' a corpus consisting of the collected works of Shakespeare' for someone who was trying to find a way to analyze Shakespeare's work for underlying themes. Since corpus could also mean 'dead', I suppose it's still a good way of thinking about my level of knowledge.

According to the Hitchhikers Guide to the Galaxy, the Infinite Improbability Drive was created by an engineering student after his betters had collectively said that such a feat was impossible. His insight was that while such a drive was, by definition, infinitely impossible, if you assumed that it was possible, then the degree of impossibility had to be a finite number. He solved for that, built the drive, got the Galactic Prize for Extreme Cleverness, and was then stoned to death by his peeved betters.

I don't anticipate anything like that happening to me, because I'm not nearly bright enough. I know all the smart guys have all the approaches to this mountain range mapped out fifteen ways from Sunday, including two that involve creation of a hyperspatial bypass.

I just like thinking about it.

(Update: On a walk this morning, it occurred to me that perhaps the WordCloud software might be helpful here.)

No comments: