Feb 17, 2008

Blogs, Tagging and Summary Problem

(Originally mentioned by Ghai uncle, this is an extended problem of the same kind)

A blogger just wrote a review on movie "Shivaji" starring Rajnikant and tagged it as "Rajnikant movie, Shivaji, movie review". For better surfing experience of the blogger, the blogging portal (your firm) wants to develop following features:

1. The moment the blogger posts his review, he is shown a list of other reviews on this movie. (To make it simple, the list contains links to blogs from his friends only)

2. There is an option to list all such blogs in a page. The listing page contains a shortest summary from each link containing all the tags and the tags highlighted properly in different colors in the summary. So for example, "Rajnikant movie" will be colored red, "Shivaji" would be colored blue, "movie review" will be colored green in the summary; however "Rajnikant" merit no coloring and "shivaji movie" merit coloring of "shivaji" only and likewise.

How would you do it?

Hint: There are essentially three problems:
1. finding blog links with similar tags.
2. finding shortest summary of text from a document containing all the tags in it.
3. finding out occurrences of a set of keywords/tags in a document.

No comments: