Publication Type : Conference Paper
Publisher : ACM International Conference Proceeding Series
Source : ACM International Conference Proceeding Series, 10-11-October-2014, art. no. a22
Campus : Coimbatore
School : School of Engineering
Center : Computational Engineering and Networking
Department : Center for Computational Engineering and Networking (CEN)
Verified : Yes
Year : 2014
Abstract : This paper explores the full-fledged supervised Machine Learning based approach for the automatic extraction of lexical chunks, commonly called as Multi-Word Expression (MWE). The concept of MWE concerns a variety of constructions in everyday language in the form of idioms, phrasal verbs and noun compounds. The pervasiveness of MWE in the NLP tasks that deals with real text, such as Machine Translation and Information retrieval should be provided with enough MWE treatment; if not, the system will fail to generate high-quality natural output. Here, we are extracting phrasal verbs from the English movie subtitle corpus based on their corresponding linguistic pattern and standard association scores. The extracted phrasal verbs have been used to train various machine learning algorithms for discriminating MWE. Two methods of linguistic pattern extraction are implemented, out of which one is proven to be effective. Here, we have demonstrated two major findings, 1) MWE extraction based on dependency information along with POS tag provides better accuracy than it had been extracted from the POS tag pattern alone. 2) The result obtained from extraction is used to train three different machine learning classifiers, out of which Random forest classifier is verified to be the suitable classifier for the application handled.
Cite this Research Publication : Sanjanaashree, P., Anand Kumar, M., Soman, K.P. Dependency based multiword expression extraction towards NLP applications (2014) ACM International Conference Proceeding Series, 10-11-October-2014, art. no. a22