Structured Features for Semantic Role Labeling
From this page you can download the structured features that were used in our experiments on Semantic Role Labeling. The features are extracted from Charniak automatic parses as provided for the CoNLL 2005 shared task on SRL. The task and the extraction process are detailed in this paper.
Format
The format of the feature files records is as follows:
<sentence id> <rel offset> <node offset> <node upsteps> <role label> <feature>
+ -- annotation identifier --+
+ ---- identifier of a node in a sentence annotation ---- +
where:
- <sentence id> is the unique identifier of each sentence within the section;
- <rel offset> is the offset of the predicate (relation) word in the sentence, starting with 0. Each pair (<sentence id>, <rel offset>) is a unique identifier for a predicate/proposition within a section;
- <node offset> and <node upsteps> are used to identify a node within a parse tree. Each node is a candidate argument of a proposition. The notation is the same as used for PropBank annotations: the offset is the offset of the first word (leaf) dominated by the node, the upsteps is the number of nodes (starting from the POS) that must be climbed in the hierarchy in order to identify the node.
The tuple (<sentence id>, <rel offset>, <node offset>, <node upsteps>) is a unique identifier for a tree node/candidate argument within an annotation;
- <role label> is the role label of the candidate argument. __NARG means that the corresponding node is not an argument of the proposition;
- <feature> is the structured feature describing the predicate/candidate argument pair.
Here are a few examples:
0 6 11 1 AM-TMP (VP (VBN__REL taken)(NP__ARG (DT__ARG this)(NN__ARG week)))
0 6 11 0 __NARG (VP (VBN__REL taken)(NP (DT__ARG this)))
0 6 12 0 __NARG (VP (VBN__REL taken)(NP (NN__ARG week)))
0 6 13 0 __NARG (VP (VBN__REL taken)(,__ARG ,)) Three classes of features are available:
- AST1: it is the minimum tree that covers all only the words dominated by the predicate node and a candidate argument;
- AST1m: like an AST1, but the label of the predicate node is marked with "__REL" and the candidate argument node is marked with "__ARG". Marking means that a node label is extended, not replaced;
- AST1cm: like an AST1m, but all the descendants of the candidate argument node (but the leaves) are marked with "__ARG" as well.
For a more in-depth description of these structured features, please refer to this paper.
Licensing
The data is provided for research purposes only. Published works based on these data should cite the following paper:
@article{MoschittiEtAl08,
author = {Moschitti,, Alessandro and Pighin,, Daniele and Basili,, Roberto},
title = {Tree Kernels for Semantic Role Labeling},
journal = {Computational Linguistics},
volume = {34},
number = {2},
year = {2008},
pages = {193--224},
}
Two versions of the data are available:
- a version without alignment information, i.e. the first four fields as described above. This version can be used to train and test all the classifiers required to recognize argument boundaries and roles.
- the full version containing alignment information. This version is freely available too, but those who are interested should prove that they have a valid Penn TreeBank license, issued by the LDC, before they're allowed to download the data. The alignment information, along with Charniak parses of the input sentences, is sufficient to carry out the complete SRL task.
Related work
@article{MarcusEtAl94,
author = {Marcus, Mitchell P. and Santorini, Beatrice and Marcinkiewicz, Mary A. },
journal = {Computational Linguistics},
number = {2},
pages = {313--330},
title = {Building a Large Annotated Corpus of English: The Penn Treebank},
volume = {19},
year = {1994}
}
@inproceedings{Charniak00,
author = {Charniak,, Eugene},
title = {A maximum-entropy-inspired parser},
booktitle = {Proceedings of NAACL'00},
year = {2000},
pages = {132--139},
}
@inproceedings{CarrerasEtAl05,
author = {Carreras, Xavier and Marquez, Lluis },
booktitle = {Proceedings of CoNLL '05},
title = {Introduction to the CoNLL-2005 Shared Task: Semantic Role Labeling},
year = {2005}
}
Downloads
Structured features without alignment information:
- AST1 features
- AST1m features
- AST1cm features
Structured features with alignment information:
- Please contact daniele dot pighin at gmail dot com.
