CoNLL 2010 - Relevant Fragments for QC, RE and SRL

This package contains the most relevant syntactic tree kernel fragments identified for each class on three different linguistic benchmarks:

  • question classification (QC)
  • relation extraction (RE)
  • semantic role labeling (SRL)

The fragments were isolated by reverse engineering SVM models, as described in:

 @InProceedings{pighin-moschitti:2010:CoNLL,
   author    = {Pighin, Daniele  and  Moschitti, Alessandro},
   title     = {{On Reverse Engineering of Syntactic Tree Kernels}},
   booktitle = {Proceedings of the Fourteenth Conference on Computational Natural Language Learning (CoNLL-2010)},
   month     = {July},
   year      = {2010},
   address   = {Uppsala, Sweden},
   publisher = {Association for Computational Linguistics}
}

More details on how the three tasks were modeled can be found in the paper, while per-task more specific information can be found in the papers therein referenced.

Each file is generated as follows:

  • A one-vs-all SVM model is learned in the STK space
  • The most relevant fragments (according to the greedy algorithm detailed in the paper) are collected in a dictionary
  • The training data is linearized, by counting how many times each relevant fragment appears in each input tree
  • A model is learned in the lower dimensional linear space
  • The fragments are ranked according to the components of the normal vector to the separating hyperplane (the gradient) in the linear space

Each line in each file accounts for the weight of an attribute of  the gradient, along with the fragment associated with that attribute.

License

This resource is released under a double licensing scheme. 

For personal, teaching or research uses, the resource is available 

under the Creative Commons Attribution Non-Commercial Share Alike 

licensing scheme.

The full text of the license is available at 

http://creativecommons.org/licenses/by-nc-sa/3.0/

If you use this resource in your research, please cite the following paper:

@InProceedings{pighin-moschitti:2010:CoNLL,


   author    = {Pighin, Daniele  and  Moschitti, Alessandro},
   title     = {{On Reverse Engineering of Syntactic Tree Kernels}},
   booktitle = {Proceedings of the Fourteenth Conference on Computational Natural Language Learning (CoNLL-2010)},
   month     = {July},
   year      = {2010},
   address   = {Uppsala, Sweden},
   publisher = {Association for Computational Linguistics}
}

Please note that research uses do NOT include those involving the development 

of technology to be employed for commercial or any other 

kind of revenue purposes. These include selling, releasing, or 

providing commercial services based on this resource.

 

For any other uses, the software is released under a commercial 

license. The terms of the license are defined on a per-request basis.

 

 

 

Download

Please, follow this link to download the resource.