The 11th Linguistic Annotation Workshop
April 3rd, 2017, Valencia, Spain
TED Multilingual Discourse Bank (TED-MDB): A parallel corpus annotated in the PDTB style
TED-MDB is an initiative to develop a multilingual resource of TED talks manually annotated in the PDTB style. Currently, the corpus comprises transcripts of six TED talks in the original language, English (~7000 words) and their time-stamped subtitles for five languages (Turkish, European Portuguese, Polish, German and Russian). In this talk, I will introduce the initiative, describe our annotation procedure and discuss the benefits and challenges of implementing the PDTB style for the TED talks genre.
Cross-lingual Semantic Annotation
Abstract to be added.