Personalizatin and Diversification Dataset
This dataset contains evaluation information from 35 users over 180 search topics.
The main characteristics of the datasets are the following
- 180 search topics
- Over 3800 relevance judgments for at least the top 5 results for each query topic and the top 5 results for 8 state of the art personalization and diversification techniques
- ~1K additional topics,with 180 evaluated, fully annotated up to the first 300 results
- Relevance judgments include user relevance and topic relevance, and a manual assignment of a subtopic interpretation for each evaluated result
- Includes the Delicious profile of 35 users: all bookmarks, top tags, tag assignments, etc.
- Diversification: results are categorized using the Textwise service into ODP topics
- User ids are hashed to preserve anonymity
The dataset is both available in database (.sql) and CVS format. The total dataset is comprised of three databases:
- User representation database. Contains user profiles, as extracted from Delicious
- Topic representation database. Contains topic definiion, delicious annotation and textwise ODP categorization of the top 300 results of each topic.
- User relevance judgements. Containing relevance assesmments for 180 search topics, done by 35 different users.
How to obtain
To obtain the dataset you can send an email to attaching the agreement form .
When using the dataset, please refer to the following publication:
D. Vallet, P. Castells. Personalized Diversification of Search Results.
In Proceedings of the 35th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012).
Portland, OR, USA, August 2012.