So I kinda stopped posting here a long time ago, but I am contributing to a Dutch group blog: geencommentaar, so everybody head over there!

Most Dangerous idea

Every year, The Edge asks a number of top scientists and other thinkers a single question. This year the question was: "WHAT IS YOUR DANGEROUS IDEA?"

Some very cool people contributed (Rodney Brooks, Daniel Dennett, Freeman Dyson, Richard Dawkins, V.S. Ramachandran, Robert Shapiro, to name a few). One of my favorites is the Daniel Dennetts most dangerous idea: There aren't enough minds to house the population explosion of memes. I love memes...

Sint Game

For our first anual Sinterklaas game is coming up and these are the rules:
Each one of us bought 2 or more pakjes with a combined value of 5 Euros. These pakjes are put on a big pile. Then everyone gathers around in a circle and on his or her turn throws two regular dice:

2 - Give one pakje away
3 - Put one pakje back on the pile
4 - Choose someone to take a pakje from the pile or from someone else
5 - Everyone gives their pakjes to their left neighbour
6 - Choose someone with a pakje to unwrap a pakje
7 - Nothing happens
8 - Take one pakje from the pile
9 - Unwrap one pakje of your own
10 - Swap a pakje with someone or the pile (if you have nothing, just take one)
11 - Take two pakjes from the pile or someone
12 - Take all pakjes from one person

Continue throwing the dice untill all pakjes have been distributed. After that, three more rounds are played. After that everyone keeps their pakjes and unwraps any unopened ones.

Blog du Olivier + Bullettime

A journalist friend of mine started a blog, so I thought I would help him skyrocket his google ranking by adding a link from my little blog. Here you go:

Olivier van Beemen's blog

Ok, while I'm typing away, I made a Matrix-esque 'bullet-time' video at a friends party. I guess it is pretty a in-crowd thing, but why hide my art from the world:

Biktie Bullettime (.wmv, not to big)

New Grouplog

Mattijs (the one from the tijsepijs log) came up with a great idea today. We should create a group blog for anyone at the HCS Lab.
Creating Blogs being so easy, we did just that and now we have the group blog: Human-Computer Blog (for current lack of a better name). In the following weeks we will have to see if this will be as big a succes as the Biktorrr blog :)

SSSW 2005

By the way, I attended the Semantic Web Summerschool 2005 in Cercedilla, Spain. IT was really cool and I met a lot of nice people working on similar topics. And it is really nice to hear the SW stuff from the 'Awesome Superstars of the Semantic Web' themselves.

Most of the students are now member of the sssw05 yahoo group, and a lot of people put their pictures online. I stole a couple of them and put them up on my own webspace.

One of the cool things about the Summerschool was that we were to do a lot of practical stuff. This was either in the form of the hands-on sessions or a mini-project. I was in a mini project with three other students (Rinke, Tom and Jan). Tom came up with a cool idea to envision a semantic framework for lonely hearts ads. This actually landed us the third prize! (An alarm clock)

Google API

Ok, so it has been over three months since my last post, but hey, who said I would post regularly?

I actually am making this new post for two reasons. First a Greek Ph.D. Student read my blog on Ontology Learning and sent me a link to his homepage. It seems he is working on this kind of stuff. That encouraged me to make a new blog post. The second reason was that I did some experiments with the Google API: content!

So I read this paper by Rudy Cilibrasi and Paul Vitanyi on the Normalized Google Distance (NGD), which deals with using Google to extract meaning from the web by exploiting the redundancy of knowledge. It is a really nice paper, very intuitive and firmly grounded in complexity theory as well. In short: the NGD between two concepts is determined by taking the number of hits each concept (term) has on a Google query (NrHits1 and NrHits2) and comparing this to the number of hits the Google query composed of the boolean combination of the two concepts (NrHit1+2). This is normalized by the total number of Google-indexed pages (M). The complete formula is:

(MAX(LOG(NrHits1), LOG(NrHits2)) - LOG(NrHits1+2))/(LOG(M) - MIN(LOG(NrHits1),LOG(NrHits2)))

Anyway, I decided to play around a little bit with this Google distance in my chosen domain: Artists and Art styles. I ran a couple of test. One consisted of calculating the NGD between an Art Style concept ('Impressionism') and Artist Names ('Vincent van Gogh','Manet',...). I found these results (Table shows NGD and name of Artist):

0.171111166168 Monet, Claude
0.41242976086 Hassam, Childe
0.42292288857 Frieseke, Frederick Carl
0.425858890051 Gogh, Vincent van
0.438108128715 Pissarro, Camille
0.456466856759 Morisot, Berthe
0.479547692307 Caravaggio, Michelangelo Merisi da
0.488549764746 Nolde, Emil
0.488922434217 Manet, Edouard
0.496153921128 Rembrandt Harmensz. van Rijn
0.497940393553 Degas, Edgar
0.505376793035 Warhol, Andy
0.519882855212 Goya y Lucientes, Francisco Jose de
0.525532648974 Picasso, Pablo
0.55359075495 Munch, Edvard
0.565804822367 Dali, Salvador

Especially the big difference between number 1 and 2 is pretty weird. Anyway I decided to check on the actual number of hits the google API returned to me, this is what I found (Table shows Name of Artist; Number of hits accoriding to Google API; The number of hits according to the manual search on the Google web page and how these two are related):

API Google Web Page API/Web Page
Monet, Claude 67200 368000 5.476190476
Gogh, Vincent van 7520 40200 5.345744681
Manet, Edouard 34500 184000 5.333333333
Warhol, Andy 44500 237000 5.325842697
Goya y Lucientes, Francisco Jose de 318 681 2.141509434
Degas, Edgar 44400 237000 5.337837838

As you can see, the Google API is normally off by a factor 5.3, but sometimes (in Goya's case) by a completely different factor. Either the Google API is wrong or the normal web based search estimate of the number of hits is off.

The google API news group actually reported the same problem with the number of hits Google API returns. Apparently Google is aware of this problem but doesn't really fix it. And since automatic invocation of Google web pages through http modules is not in compliance with the Googles Terms of Use I wonder if there is a legal way to obtain good values to use in calculating NGD values. (I hope Cilibrasi and Vitanyi didn't use the values the API gave them in their experiments)

Films I have seen and you should too

(A highly subjective, yet alphabetical, list)

