A Procedure to Build Multiword Expression Data Set

Metin, Senem Kumova; Taze, Mehmet

doi:10.81043/aperta.49863

Published January 1, 2017 | Version v1

Conference paper Open

A Procedure to Build Multiword Expression Data Set

1. Izmir Univ Econ, Fac Engn, Dept Software Engn, Izmir, Turkey
2. Izmir Univ Econ, Fac Engn, Dept Comp Engn, Izmir, Turkey

In this paper, we propose a procedure employing natural language processing methods to build a golden standard multiword expression data set and present our Turkish MWE data set of 3946 positive and 4230 negative candidates that is built following the proposed procedure. The proposed procedure covers three main tasks. The first task is collecting a variety of MWE data resources in order to extract MWE candidates. We suggest the use of corpora together with idiom and term dictionaries. Second task in building MWE data set is extracting different types of MWE candidates from the resources. Here, we suggest the aggregation of four methods. Firstly, statistical methods are applied to extract MWE candidates that have high occurrence frequencies. Secondly, the linguistic properties such as part of speech patterns are considered to select MWE candidates. Thirdly, the candidates that mimic the properties of idioms or are already true idioms are chosen. Lastly, the candidates with domain specific properties, term-similar, are extracted. The final task to build a golden standard MWE data set is the labeling. In this task, the candidates are labeled either as MWE or non-MWE by multiple judges.

Files

bib-83e9ec27-bbe4-4ff7-bae1-f2ddd45b7ab5.txt

Files (164 Bytes)

Name	Size	Download all
bib-83e9ec27-bbe4-4ff7-bae1-f2ddd45b7ab5.txt md5:b127c330e2a25fcfb96f532a9d01e6d2	164 Bytes	Preview Download

	All versions	This version
Views	79	79
Downloads	14	14
Data volume	2.3 kB	2.3 kB

A Procedure to Build Multiword Expression Data Set

Files

bib-83e9ec27-bbe4-4ff7-bae1-f2ddd45b7ab5.txt

Files (164 Bytes)

TÜBİTAK ULAKBİM

CONTACT

A Procedure to Build Multiword Expression Data Set

Creators

Description

Files

bib-83e9ec27-bbe4-4ff7-bae1-f2ddd45b7ab5.txt

Files (164 Bytes)