Relevant Fragments for Question Classification

From this page it is possible to download the most relevant tree fragments (structured features) identified for the Coarse Grained Question Classification task. The fragments were selected using the Tree-Kernel model reverse engineering technique which we described in this paper.

The coarse grained QC task is described in:

  • Dell Zhang and Wee Sun Lee. 2003. Question classification using support vector machines. In Proceedings of SIGIR’03, pages 26–32, and
  • Xin Li and Dan Roth. 2006. Learning question classifiers: the role of semantic information. Natural Language Engineering, 12(3):229–249.

The data that we used are those released for the TREC 10 QA evaluation campaign, described in:

  • Ellen M. Voorhees. 2001. Overview of the trec 2001 question answering track. In In Proceedings of the Tenth Text REtrieval Conference (TREC), pages 42–51.

In line with our ongoing effort to gain some knowledge from the information encoded in the models learnt by SVMs using high dimensional kernel functions, we put these fragments online so that experts can study them and possibly discover some interesting clues about the QC task.

To generate this ranking of fragments, we first applied our linearization framework with the following parameters: maxexp=4, L=10000, S=1. For a detailed description of the process, please refer to this paper.

After learning the linear models, we calculated the weight of each component of the linearized fragment space and sorted fragments based on their relevance.

License

The data is provided for research purposes only. Published works based on these data should cite the following papers:

@inproceedings { PighinMoschitti:2009:EMNLP,
    title = {Reverse Engineering of Tree Kernel Feature Spaces},
    booktitle = {EMNLP'09},
    year = {2009},
    month = {08/2009},
    address = {Singapore},
    author = {Daniele Pighin and Alessandro Moschitti}
}

and

@inproceedings { PighinMoschitti:2009:CoNLL,
    title = {Efficient Linearization of Tree Kernel Functions},
    booktitle = {CoNLL'09},
    year = {2009},
    month = {06/2009},
    address = {Boulder, CO, USA},
    author = {Daniele Pighin and Alessandro Moschitti}
}

Download

The package contains one file for each of the 6 coarse grained classes. Each line lists the cumulative relevance of a fragment and the fragment itself. You can download it by clicking here.