-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy paththeses.html
407 lines (364 loc) · 38 KB
/
theses.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<title>LMS Computational Linguistics Lab (凡土研)</title>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=utf-8">
<meta http-equiv="Content-Style-Type" content="text/css">
<meta name="keywords" content="Lexical Semantics">
<meta name="description" content="This page provides information about
the Computational Linguistics Lab at Nanyang Technological University.">
<!-- <base href="http://lingo.stanford.edu/"> -->
<link href="static/clg.css" rel="stylesheet" type="text/css">
<script src="static/clg.js" language="javascript"></script>
</head>
<body>
<table cellpadding=5 border=0>
<tr>
<td>
<div id="menu">
<center>
<table cellspacing=5 cellpadding=5 border=0>
<tr><td align=center valign=middle><a href="index.html">Home</a></td></tr>
<tr><td align=center valign=middle><a href="events.html">News and Events</a></td></tr>
<tr><td align=center valign=middle><a href="members.html">Current Members</a></td></tr>
<tr><td align=center valign=middle><a href="alumni.html">Alumni</a></td></tr>
<tr><td align=center valign=middle><a href='pubs.html'>Publications</a></td></tr>
<tr><td align=center valign=middle bgcolor="#ffffff"><def>Theses</def></td></tr>
<tr><td align=center valign=middle><a href="projects.html">Projects</td></tr>
<tr><td align=center valign=middle><a href="courses.html">Courses</a></td></tr>
<tr><td align=center valign=middle><a href="links.html">Related Sites</a></td></tr>
<tr><td align=center valign=middle><hr size=1></td></tr>
<tr><td align=center valign=middle><b><em>Current Research</em></b></td></tr>
<tr><td align=center valign=middle><a href="http://compling.hss.ntu.edu.sg/omw/">Open Multilingual Wordnet</a></td></tr>
<tr><td align=center valign=middle><a href="http://compling.hss.ntu.edu.sg/wnja/index.en.html">Japanese Wordnet</a></td></tr>
<tr><td align=center valign=middle><a href="http://wn-msa.sourceforge.net/index.eng.html">Wordnet Bahasa</a></td></tr>
<tr><td align=center valign=middle><a href="https://bond-lab.github.io/cow/">Chinese Open Wordnet</a></td></tr>
<tr><td align=center valign=middle><a href="http://www.delph-in.net/jacy/">Jacy (Japanese HPSG)</a></td></tr>
<tr><td align=center valign=middle><a href="http://moin.delph-in.net/ZhongTop">Zhong (Chinese Meta Grammar)</a></td></tr>
<tr><td align=center valign=middle><a href="http://moin.delph-in.net/IndraTop">INDRA (Indonesian Resource Grammar)</a></td></tr>
<tr><td align=center valign=middle><a href="http://compling.hss.ntu.edu.sg/ntumc/">NTU Multilingual Corpus</a></td></tr>
<tr><td align=center valign=middle><a href="https://github.com/delph-in/JaEn/">Jaen MT system</a></td></tr>
<tr><td align=center valign=middle><hr size=1></td></tr>
<tr>
<td align=center valign=middle><font size='-1'>
<a href="http://www.soh.ntu.edu.sg/Programmes/linguistics/Pages/Home.aspx">Linguistics
and Multilingual Studies</a>
</font></td></tr>
<tr><td align=center valign=middle><font size='-1'>
<a href="http://www.ntu.edu.sg/">Nanyang
Technological University</a>
<font></td></tr>
<tr><td align=center valign=middle><hr size=1></td></tr>
<tr><td align=center valign=middle><a href="contact.html">Contact</a></td></tr>
</table>
</center>
</div>
</td>
<td>
<div id="maintext">
<h1>NTU Computational Linguistics Lab Theses and URECA Projects</h1>
<div align="center">
<a href="#2019">2019</a> *
<a href="#2018">2018</a> *
<a href="#2017">2017</a> *
<a href="#2016">2016</a> *
<a href="#2015">2015</a> *
<a href="#2014">2014</a> *
<a href="#2013">2013</a> *
<a href="#2012">2012</a> *
<a href="#2011">2011</a>
</div>
<br>
<p>This page lists a selection of Final Year Projects (Undergraduate
Theses), Masters and PhD Theses from the computation linguistics group
at the <a href="http://www.soh.ntu.edu.sg/Programmes/linguistics/Pages/Home.aspx">Division of
Linguistics and Multilingual
Studies</a>, <a href="http://www.ntu.edu.sg/">Nanyang Technological
University</a>, Singapore. It also includes theses done outside NTU but co-supervised by us.</p>
<h3><a name="2019">2019</a></h3>
<dl>
<dt>
<a name="Bonansinga:2019">Giulia <b>Bonansinga</b> (2019)</a>
<dd><a class='title' href="pdf/2019-bonansinga-CLWSD.pdf"><i>Cross-lingual word sense annotation with multilingual wordnets</i></a>
Masters Thesis, University of Pisa, Italy.
<br>Code: <a href="https://github.com/jusing-es/clwsd">https://github.com/jusing-es/clwsd</a>
<dt> <a name="Choi:2019">Hannah <b>Choi</b> (2019)</a>
<dd><a class='title' href="https://dr.ntu.edu.sg/handle/10356/136955"><i>A corpus based analysis of -kan and -i in Indonesian</i></a>
Masters Thesis, Nanyang Technical University, Singapore.
<dt>
<a name="Fan:2019"><b>Fan</b> Zhenzhen (2019)</a>
<dd><a class='title' href="https://dr.ntu.edu.sg//handle/10220/48021"><i>Building an HPSG Chinese grammar
(Zhong)</i></a>
Doctoral Dissertation, Nanyang Technical University, Singapore.
<br>Grammar: <a href="http://moin.delph-in.net/ZhongTop">http://moin.delph-in.net/ZhongTop</a>
<dt> <a name="Le:2019">Tuan Anh <b>Le</b> (2019)</a>
<dd><a class='title' href="https://dr.ntu.edu.sg/handle/10356/89208"><i>Developing and applying an integrated semantic framework for natural language understanding</i></a>
Doctoral Dissertation, Nanyang Technical University, Singapore.
<br>Code: <a href='https://github.com/letuananh/intsem.fx'>https://github.com/letuananh/intsem.fx</a>
</dl>
<h3><a name="2018">2018</a></h3>
<dl>
<a name="Goodman:2018">Michael Wayne <b>Goodman</b> (2018)</a>
<dd><a class='title' href="https://alliance-primo.hosted.exlibrisgroup.com/permalink/f/kjtuig/CP71291839820001451"><i>Semantic operations for transfer-based machine translation</i></a>
Doctoral Dissertation, University of Washington, U.S.A.
<dt><a name="Moeljadi:2018">David <b>Moeljadi</b> (2018)</a>
<dd><a class='title' href="https://dr.ntu.edu.sg/handle/10356/82540"><i>An Indonesian resource grammar (INDRA) : and its application to a treebank (JATI)</i></a>
Doctoral Dissertation, Nanyang Technical University, Singapore.
<br>Grammar: <a href="http://moin.delph-in.net/IndraTop">http://moin.delph-in.net/IndraTop</a>
<dt>
<a name="Elvis:2018">Elvis Albertus Toni (2018)</a>
<dd><a class='title' href="https://dr.ntu.edu.sg/handle/10356/88960"><i>Emotions in Adonara-Lamaholot </i></a>
Doctoral Dissertation, Nanyang Technical University, Singapore.
</dd>
</dl>
<h3><a name="2017">2017</a></h3>
<dl>
<a name="Breen:2017">James <b>Breen</b> (2017)</a>
<dd><a class='title' href="https://alliance-primo.hosted.exlibrisgroup.com/permalink/f/kjtuig/CP71291839820001451"><i>Extraction of neologisms from Japanese corpora</i></a>
Doctoral Dissertation, University of Melbourne, Australia.
<br>Lexicon: <a href='http://edrdg.org/jmdict/edict_doc.html'>http://edrdg.org/jmdict/edict_doc.html</a>
</dl>
<h3><a name="2016">2016</a></h3>
<dl>
<dt>
<a name="Lee:2016"><b>Lee</b> Naomi Elizabeth (2016)</a>
<dd><a class='title' href="pdf/2016-fyp-lee-naomi-elizabeth.pdf">An Etymological Study of Singapore Sign Language: The Influence of American Sign Language on Singapore Sign Language</a>, Final Year Project, Linguistics and Multilingual Studies, Nanyang Technological University, Singapore,
<span class='toggle' title ='click to show/hide abstract' onClick="toggle('hide:Lee:2016')">
<b>Abstract</b></span> (click to toggle)
<div class='abstract' style='display:none;' id='hide:Lee:2016'>
Though there is no officially recognized national sign language in Singapore,
Singapore Sign Language (SgSL) is recognized by the local deaf community. It has
influences from Shanghainese Sign Language (SSL) and American Sign Language
(ASL), and is continually developing with locally generated signs. This study aims to
give an insight into the influence of ASL on SgSL, which serves as a first look into
the etymology of SgSL. 14 participants were recruited in this study; 3 participants
were given a Swadesh list for sign languages consisting of 100 words, which they
were asked to sign in SgSL. The videos of ASL signs for the same words were
obtained online and presented alongside the SgSL signs to the other 11 participants,
who gave judgments about the similarities of each pair of signs. The signs were also
transcribed using the Hamburg Sign Language Notation System, or HamNoSys, and
further analyzed based on handedness and four traditional phonological parameters –
handshape, location, movement and orientation. This was done by calculating the
Levenshtein distances between each pair of transcriptions. The similarity of the signs
was then determined after consideration of the participants’ judgments and the
analysis of phonological parameters, and it was found that the signs were similar to a
great extent, which suggests that SgSL is heavily influenced by ASL.
</div>
<dt>
<a name="Lim:2016"><b>Lim</b> Jia Ying (2016)</a>
<dd><a class='title' href="pdf/2016-fyp-lim-jia-ying.pdf">Investigation of Classifiers in Singapore Sign Language through Narratives: A Pilot Study</a>, Final Year Project, Linguistics and Multilingual Studies, Nanyang Technological University, Singapore,
<span class='toggle' title ='click to show/hide abstract' onClick="toggle('hide:Lim:2016')">
<b>Abstract</b></span> (click to toggle)
<div class='abstract' style='display:none;' id='hide:Lim:2016'>
Several sign language corpora such as the British Sign Language corpus have been set up
over the years, providing a platform for extensive research to be done on classifiers. Yet little
is known about the Singaporean variety as the Singapore Sign Language (SgSL) Corpus
Project only kick-started recently. Focusing on video recordings of 12 SgSL users from
varied sociolinguistic backgrounds, this study describes in detail the usage of prototypical
classifiers in SgSL in terms of handshape-orientation combinations and structural patterns of
occurrences. In addition, it explains why variance in classifier handshapes and morphemic
functions occur. Entity classifiers were the most commonly produced classifier type, and
together with Instrumental classifiers, the handshapes of the most commonly used classifiers
in these two categories were found to be similar to those used in American Sign Language
and Hong Kong Sign Language. Handling and Shape and Size Specifier classifiers were also
found to be multi-morphemic and grammaticized in SgSL, suggesting that the most
commonly used classifiers from each classifier type are highly lexicalized in SgSL, allowing
them to remain stable and highly productive. This pilot study on classifiers thus serves to
provide an avenue for future research, while highlighting issues of variation in SgSL.
</div>
<dt>
<a name="Yeow:2016"><b>Yeow</b> Jun Jie (2016)</a>
<dd><a class='title' href="pdf/2016-fyp-yeow-jun-jie.pdf">Investigating the effectiveness of watching videos with L1 or L2 subtitles on second language acquisition</a>, Final Year Project, Linguistics and Multilingual Studies, Nanyang Technological University, Singapore,
<span class='toggle' title ='click to show/hide abstract' onClick="toggle('hide:Yeow:2016')">
<b>Abstract</b></span> (click to toggle)
<div class='abstract' style='display:none;' id='hide:Yeow:2016'>
This study investigates the effectiveness of watching videos for L2 learning and the
effectiveness of L1 and L2 subtitles on L2 learning. Specifically it explores the
potential for L2 videos in learning vocabulary and grammar. 60 Japanese L2
learners of various proficiencies were asked to watch clips from a native Japanese
television drama with either English subtitles, Japanese subtitles or no subtitles.
They were then asked to do a test which includes a comprehension section,
vocabulary section and grammar section. A post-study interview was also
conducted. Results were tabulated and overall test scores found that English
subtitles were more beneficial for beginner learners while Japanese subtitles were
more useful for intermediate to advanced learners (referred to as simply advanced
learners in this study). Vocabulary scores showed a similar result. English subtitles
impeded Japanese grammar learning for all levels of learners. Overall, Japanese
subtitles are beneficial for Japanese learning for advanced learners for both
vocabulary and grammar learning, while English subtitles are beneficial for
beginner learners for vocabulary learning. Results from the study can serve as a
pedagogical tool giving language learners a better idea as to how videos can
benefit their learning, and as to which subtitles, L1 or L2, are more effective for
their current proficiency level. Suggestions for future research have also been
included.
</div>
</dl>
<h3><a name="2015">2015</a></h3>
<dl>
<dt>
<a name="Ho:2015">Jia Qian <b>Ho</b> (2015)</a>
<dd><a class='title' href="pdf/2015-fyp-ho-jia-qian.pdf">Losing One’s Mind Over Meaning: Analysing the Behaviour of English Possessive Idioms</a>, Final Year Project, Linguistics and Multilingual Studies, Nanyang Technological University, Singapore,
<span class='toggle' title ='click to show/hide abstract' onClick="toggle('hide:Ho:2015')">
<b>Abstract</b></span> (click to toggle)
<div class='abstract' style='display:none;' id='hide:Ho:2015'>
Idioms are commonly found in everyday language and reflect the
conventionalisations in speech communities. Regarding English
idioms, past research have examined the syntactic and semantic
analysis of idioms (Villavicencio & Copestake, 2002), along with their
decomposability (Nunberg, 1978; Gibbs, 1989a,b) and comprehension
(Titone & Connine, 1994; Cacciari & Tabossi, 2014). However, there has
been little research on English possessive idioms despite their
uniqueness and interesting properties. This thesis thus seeks to
analyze the syntax and semantics of possessive idioms and describe
their behaviour in terms of their decomposability and plausibility. A
total of 514 idioms were categorized into co-indexed and separate
possessive idioms and then grouped syntactically in order to be
incorporated into new templates in the English Resource Grammar
(Flickinger, 2011). Subsequently, the meaning of either each idiom
component or paraphrase component was linked to WordNet (Fellbaum,
1998a) by choosing the most appropriate sense. The resulting
comprehensive syntactic and semantic idiom descriptions allowed for
analyses of their syntax, semantics, decomposability and
plausibility. Results demonstrated the interplay between syntax and
semantics and revealed novel aspects of possessive idioms, such as
alternation and transformation in idioms. Furthermore, results
confirmed that a degree of decomposability exists and suggested that
possessive idioms could be categorized into four groups according to
their projectability. The comprehensive idiom database will be
released under an open license where it can be used as a dictionary
and to further improve natural language processing applications.
</div>
<dt>
<a name="Wang:2015">Wenjie <b>Wang</b> (2015)</a>
<dd><a class='title' href="pdf/2015-fyp-wang-wenjie.pdf">A-not-A Questions
in Mandarin Chinese: An HPSG Account</a>, Final Year Project,
Linguistics and Multilingual Studies, Nanyang Technological University, Singapore,
<span class='toggle' title ='click to show/hide abstract' onClick="toggle('hide:Wang:2015')">
<b>Abstract</b></span> (click to toggle)
<div class='abstract' style='display:none;' id='hide:Wang:2015'>
In this paper, I look at a-not-a questions in Mandarin Chinese, and create an account
based on the framework of Head-driven Phrase Structure Grammar (HPSG) and Minimal
Recursion Semantics (MRS). While the a-not-a structure has seen extensive research in the
past decades, they have largely been movement-based. Thus, this paper attempts to provide
a non-movement but constraint-based HPSG/MRS account, which has thus far not been
performed. Secondly, I have also begun initial implementation of said account into the HPSG-
based Zhong [|] computational grammar for Chinese. While the basic forms of the a-not-a
structure were accounted for and implemented, it was found that limitations in the system
and formalisation prevented vp-not-vp questions from being successfully implemented.
</div>
</dl>
<h3><a name="2014">2014</a></h3>
<dl>
<dt>
<a name="Soh:2014">Sandra <b>Soh</b> (2014)</a>
<dd><a class='title' href="pdf/2014-fyp-soh-sandra.pdf">Crosslingual corpus-based analysis of English and Japanese Verbs</a>, Final Year Project, Linguistics and Multilingual Studies, Nanyang Technological University, Singapore,
<span class='toggle' title ='click to show/hide abstract' onClick="toggle('hide:Soh:2014')">
<b>Abstract</b></span> (click to toggle)
<div class='abstract' style='display:none;' id='hide:Soh:2014'>
This study looked into divergences and correspondences that occurred in the translation of English verbs to Japanese verbs. A quantitative and qualitative approach was adopted in this study to analyze the distribution of different types of verbs in English and Japanese, based on the parallel NTU Multilingual Corpus (NTU-MC). Verb types examined include English phrasal verbs, Japanese compound verbs, as well as idiomatic verbal expressions and single verbs in English and Japanese. English phrasal verbs and Japanese compound verbs were hypothesized to be correlated. Using a Chi-square test, they were proven to be translation correspondences of each other. Other patterns of translation divergences were also detected, such as lexical gaps and lexical deviations. These divergences were attributed to translation style, but more so on differences in the language systems, such as the system of honorifics in Japanese, of which a similar system is not found in English. Findings from this study can shed some light concerning translation issues of English phrasal verbs into Japanese, and also for learners of either language to understand the semantic behaviours of English phrasal verbs and Japanese compound verbs.
</div>
<dt>
<a name="Lim:2014">Aiden <b>Lim</b> (2014)</a>
<dd><a class='title' href="pdf/2014-fyp-aiden-lim.pdf">Acquiring Predominant Word Senses in Multiple
Languages</a>, Final Year Project, Linguistics and
Multilingual Studies, Nanyang Technological University, Singapore,
<span class='toggle' title ='click to show/hide abstract' onClick="toggle('hide:Lim:2014')">
<b>Abstract</b></span> (click to toggle)
<div class='abstract' style='display:none;' id='hide:Lim:2014'>
The accurate retrieval of the most appropriate sense of a
word, in other words Word Sense Disambiguation, is a key
process in machine learning and machine translation. Yet
oftentimes, the reliabilty and accuracy of a system is a
product of the input data which is itself dependent on
financial and temporal restrictions. The resulting scarcity
also affects the scope of WSD systems by limiting available
data to specific languages. This paper proposes a Word Sense
Disambiguation system that ignores these restrictions by being
able to accept raw text input from any language and extract
predominant word senses from it. This is done through a
two-pronged approach of mathematically mapping the dis-
tribution of a target word and its neighboring words before
calculating the semantic similarity of both. By using both
the Distributional Similarity Score and Semantic Similarity
Score, it becomes possible to mathematically calculate and
propose a Predominant Word Sense without any language-specific
input. This paper reports how the system was used on English
and Mandarin Chinese Wikipedia raw text to extract Predominant
Word Senses with a fair degree of accuracy for a successful
Word Sense Disambiguation System.
</div>
</dl>
<h3><a name="2013">2013</a></h3>
<dl>
<dt><a name="Gao:2013">Eshley <b>Gao</b> (2013)</a>
<dd><a class='title' href="pdf/2013-fyp-gao-eshley.pdf">Crosslingual comparison of linguistic phenomenon in English, Japanese and Chinese</a>, Final Year Project, Linguistics and
Multilingual Studies, Nanyang Technological University, Singapore,
<span class='toggle' title ='click to show/hide abstract' onClick="toggle('hide:Gao:2013')"><b>Abstract</b></span> (click to toggle)
<div class='abstract' style='display:none;' id='hide:Gao:2013'>Recent trends in computational linguistics tend to focus on how to represent meaning. The availability of parallel corpus has allowed researchers to study how languages convey the same information in different ways. This study adopts a quantitative and qualitative method to study translation shifts in the short novel – The Adventure of the Dancing Men. The parallel tri-text corpus in English-Japanese-Chinese is a sub-corpus of the NTU Multilingual Corpus. We tagged the concepts according to the senses in the WordNets, and annotated relationships between translation correspondents. The results show that 49.60% and 50.87% of distinct synsets in the English source text were linked in the English-Japanese and English-Chinese corpus respectively. Of the total linked concepts, 51.58% and 60.07% of them are exact correspondents of the source language in the English-Japanese and English-Chinese corpus correspondingly. The remaining contribute to evidence for translation shifts, which includes direct differentiation like hyponymy relationship to less straightforward variation like translation equivalents. The study also attempts to describe some of the translation shifts observed in the corpus. We estimate that more than half of the translation shifts were due to language differences, although translating style also played a part in the shifts. Data from this study can be used to train machine translation systems to produce more human-like translations. Second language learners of Japanese and Chinese can also take advantage of the data to learn how the same idea can be transmitted in different ways.</div>
<dt><a name="Le,Sun:2013">Le Tuan Anh and Sun Ying (2013)
<dd><a class='title' href="pdf/2013-masters-le-sun-le-tuan-anh-sunying.pdf">Question-Answering Machine based on Deep Linguistic Parsing</a>, Master's Thesis, Institute of Systems Science, National University of Singapore, Singapore<!--,
<span class='toggle' title ='click to show/hide abstract' onClick="toggle('hide:Le,Sun:2013')"><b>Abstract</b></span> (click to toggle)
<div class='abstract' style='display:none;' id='hide:Le,Sun:2013'></div>-->
<dt><a name="Tan:2013">Liling Tan (2013)
<dd><a class='title' href="pdf/2013-masters-tan-liling.pdf">Examining Crosslingual Word Sense Disambiguation</a>, Master's Thesis, Linguistics and
Multilingual Studies, Nanyang Technological University, Singapore,
<span class='toggle' title ='click to show/hide abstract' onClick="toggle('hide:Tan:2013')"><b>Abstract</b></span> (click to toggle)
<div class='abstract' style='display:none;' id='hide:Tan:2013'>Understanding human language computationally remains a challenge at different levels, phonologically, syntactically and semantically. This thesis attempts to understand human language's ambiguity through the Word Sense Disambiguation (WSD) task. Word Sense Disambiguation (WSD) is the task of determining the correct sense of a word given a context sentence and topic models are statistical models of human language that can discover abstract topics given a collection of documents. This thesis examines the WSD task in a crosslingual manner with the usage of topic models and parallel corpus. The thesis defines a topical crosslingual WSD (Topical CLWSD) task as two subtasks (i) Match and Translate: finding a match of the query sentence in a parallel corpus using topic models that provides the appropriate translation of the target polysemous word (ii) Map: mapping the word-translation pair to disambiguate the concept respectively of the Open Multilingual WordNet. The XLING WSD system has been built to attempt the topical WSD task. Although the XLING system underperforms in the topical WSD task, it serves as a pilot approach to crosslingual WSD in a knowledge-lean manner. Other than the WSD task, the thesis briefly presents updates on the ongoing work to compile multilingual data for the Nanyang Technological University-Multilingual Corpus (NTU-MC). Both the NTU-MC project and the XLING system are related in their attempts to build crosslingual language technologies.</div>
<dt><a name="Zulhelmy:2013">Muhammad Zulhelmy Mohd Rosman (2013)
<dd><a class='title' href="pdf/2013-fyp-zulhelmy.pdf">Creating derivational morphology links in Wordnet Bahasa</a>, Final Year Project, Linguistics and Multilingual Studies, Nanyang Technological University, Singapore,
<span class='toggle' title ='click to show/hide abstract' onClick="toggle('hide:Zulhelmy:2013')"><b>Abstract</b></span> (click to toggle)
<div class='abstract' style='display:none;' id='hide:Zulhelmy:2013'>Derivational morphology links are created for the Wordnet Bahasa, a combined Indonesian and Malay online lexical dictionary (Nurril Hirfana, Suerya, & Bond, 2011). The focus was to link root words to affixed words as affixation is one of the more apparent word formation processes in Bahasa Melayu. MorphInd, an Indonesian morphological analyser (Larasati, Kubon, & Zeman, 2011), is used to breakdown affixed words into their root form and affixes. Using Python 2.7 with NLTK, a raw mapping is done by matching the analysed words to the root forms. The derivational links in the Princeton Wordnet (PWN) are used to verify if the same links exist in Wordnet Bahasa. Redundant links are removed by the Part-of-Speech (POS) filter and Semantic Super Type filter. The links are then disambiguated using the Lesk algorithm, where the definitions and other components of the sense (e.g. hypernyms, hyponyms and examples) are compared for their similarity. However, the disambiguation process is rendered ineffective because of the high amount of errors still existing in
Wordnet Bahasa. The derivational links are released as a separate file and only those with similar derivational links to PWN are added into Wordnet Bahasa. Erroneous entries that were identified using MorphInd are removed from Wordnet Bahasa.</div>
<dt><a name="Seah:2013">Yu Jie Seah (2013)
<dd><a class='title' href="pdf/2013-fyp-seah-yujie.pdf">Contrastive Analysis of Pronouns across English, Mandarin Chinese and Japanese</a>, Final Year Project, Linguistics and Multilingual Studies, Nanyang Technological University, Singapore,
<span class='toggle' title ='click to show/hide abstract' onClick="toggle('hide:Seah:2013')"><b>Abstract</b></span> (click to toggle)
<div class='abstract' style='display:none;' id='hide:Seah:2013'>A qualitative and quantitative approach was used in this study to examine the distribution of pronouns in three languages, namely English, Mandarin Chinese and Japanese based on the parallel NTU Multilingual Corpus (NTU-MC) with English being the source language while Mandarin Chinese and Japanese translations are aligned to it at the sentence level. The pronouns are extracted from four subcorpora – two short stories, one essay and the other is an online article about Singapore’s tourism. However, due to time and space constraints, only pronouns from one subcorpus — The Adventure of the Speckled Band, a short story from the Sherlock Holmes series, is tagged, annotated and linked in the corpus. The results show that although English has the most number of pronouns, Mandarin Chinese has the highest percentage of referential pronouns. Also, English has more translated counterparts in Mandarin Chinese as compared to Japanese. We attributed this to the difference in usage of pronouns in the languages. Deprominalisation, surprisingly, was even for both corpora. We believed this to be due to influence from the English text. Findings from this study can shed some light concerning translation issues on pronoun usage for learners of the languages and also contribute to pronoun translation across languages.</div>
<dt><a name="Pozen:2013">Zinaida Pozen (2013)
<dd><a class='title' href="https://digital.lib.washington.edu/dspace/handle/1773/23469">Using Lexical and Compositional Semantics to Improve HPSG Parse Selection</a>, Master's Thesis, University of Washington, United States,
<span class='toggle' title ='click to show/hide abstract' onClick="toggle('hide:Pozen:2013')"><b>Abstract</b></span> (click to toggle)
<div class='abstract' style='display:none;' id='hide:Pozen:2013'>Accurate parse ranking is essential for deep linguistic processing applications and is one of the classic problems for academic research in NLP. Despite significant advances, there remains a big need for improvement, especially for domains where gold-standard training data is scarce or unavailable. An overwhelming majority of parse ranking methods today rely on modelling syntactic derivation trees. At the same time, parsers that output semantic representations in addition to syntactic derivations (like the monostratal DELPH-IN HPSG parsers) offer an alternative structure for training the ranking model, which could be further combined with the baseline syntactic model score for re-ranking. This thesis proposes a method for ranking the semantic sentence representations, taking advantage of compositional and lexical semantics. The methodology does not require sense-disambiguated data, and therefore can be adopted without requiring a solution for word sense disambiguation. The approach was evaluated in the context of HPSG parse disambiguation for two different domains, as well as in a cross-domain setting, yielding relative error rate reduction of 11.36% for top-10 parse selection compared to the baseline syntactic derivation-based parse ranking model, and a standalone ranking accuracy approaching the accuracy of the baseline syntactic model in the best setup.</div>
</dl>
<h3><a name="2012">2012</a></h3>
<dl>
<dt><a name="Tan:2012">Jeanette Tan Yi Wen (2012)
<dd><a class='title' href="pdf/2012-fyp-tan-jeanette">Automatic Generation of Multilingual Crossword Puzzles with WordNet</a>, Final Year Project, Linguistics and
Multilingual Studies, Nanyang Technological University, Singapore,
<span class='toggle' title ='click to show/hide abstract' onClick="toggle('hide:Tan:2012')"><b>Abstract</b></span> (click to toggle)
<div class='abstract' style='display:none;' id='hide:Tan:2012'>This project implements an automatic crossword puzzle generator
in Python and a JavaScript solving interface that allows users to tailor the puzzle to suit their preferences. The crosswords are multilingual with the hints and solutions in different languages as it makes use of the Open Multilingual Wordnet which has linked the WordNets of many different languages together. It aims to provide a fun and effective way for language learners to acquire vocabulary.</div>
<br><a href='https://github.com/zenador/multi-xwords'>Randomly generated multilingual crosswords</a>: <a href='http://multi-xwords.a3c1.starter-us-west-1.openshiftapps.com/'>solve one</a>
<dt><a name="Ko:2012">Ko Tabitha (2012)
<dd><a class='title' href="pdf/2012-fyp-ko-tabitha.pdf">Chinese-English translations for passive constructions</a>, Final Year Project, Linguistics and
Multilingual Studies, Nanyang Technological University, Singapore,
<span class='toggle' title ='click to show/hide abstract' onClick="toggle('hide:Ko:2012')"><b>Abstract</b></span> (click to toggle)
<div class='abstract' style='display:none;' id='hide:Ko:2012'>Retrieval of accurate translations is crucial in today’s technologically advanced world where intercultural communications are frequent and necessary. Past research surrounding passives and translations has largely focused on English-Chinese translations. Therefore, this paper seeks to provide new insight by concentrating on Chinese-English passive translations. In view of past observations, five hypotheses are proposed: (i) BEI++ hypothesis; (ii) RANG+ hypothesis; (iii) BE+ hypothesis; (iv) GET-control hypothesis; (v) BY-ACTOR hypothesis. For the purpose of this study, a Chinese-English multilingual corpus from Korea Advanced Institute of Science and Technology (KAIST) was used. Three of the proposed hypotheses, namely BEI++ hypothesis, BE+ hypothesis and GET-control hypothesis, were supported. However, the RANG+ hypothesis and BY-ACTOR hypothesis were not supported. Additionally, the reduction of Chinese passives to PAST PARTICIPLE PHRASES and two new types of Chinese passive constructions were noticed. Furthermore, analysis of English translations exhibited other types of English passives previously overlooked. Results also illustrated the influence of both source language (SL) and target language (TL) norms in translations. An examination assessing current machine translations indicated a lack of appropriate translations. Thus, two sets of actions for Chinese-English passive translation have been proposed. Future research exploring the application of the proposed actions is recommended.</div>
<dt><a name="Sheefa:2012">Sheefa Samara Sameha (2012)
<dd><a class='title' href="pdf/2012-fyp-sheefa-samara-sameha.pdf">Make Up Your Mind: An Analysis of Idiomatic Possessive Verb Phrase Constructions in English</a>, Final Year Project, Linguistics and
Multilingual Studies, Nanyang Technological University, Singapore,
<span class='toggle' title ='click to show/hide abstract' onClick="toggle('hide:Sheefa:2012')"><b>Abstract</b></span> (click to toggle)
<div class='abstract' style='display:none;' id='hide:Sheefa:2012'>Idiomatic constructions, particularly possessive ones, are inadequately described in English grammar. 307 idioms are structurally clustered and their syntactic and semantic aspects discussed. Minimal recursion semantics of idioms indicates the possessive relationships within the expression. Compositionality is found to affect little of idiomaticity. Conceptual metaphors and image schema are suggested as possible means of understanding when literal expressions become non-literal. Findings point to greater shortcomings in available literature than firstly assumed. A novel means of idiom implementation with a focus on easy access and visual representation is proposed.</div>
</dl>
<h3><a name="2011">2011</a></h3>
<dl>
<dt><a name="Kong:2011">Yun Rui Kong (2011)
<dd><a class='title' href="pdf/2011-fyp-kong-yunrui.pdf">Shape or Substance?: A Cross-linguistic Study of Bilinguals in English and Chinese</a>, Final Year Project, Linguistics and
Multilingual Studies, Nanyang Technological University, Singapore,
<span class='toggle' title ='click to show/hide abstract' onClick="toggle('hide:Kong:2011')"><b>Abstract</b></span> (click to toggle)
<div class='abstract' style='display:none;' id='hide:Kong:2011'>According to the Sapir-Whorf Hypothesis, language shapes how we think. When the hypothesis is tested on monolinguals, the method used will be more straightforward and direct compared to bilinguals. For bilinguals, there are many more factors that need to be considered such as language dominance, language proficiency and similarities or differences between the first and second language. Yet, the most important understanding that we need to have is if the bilingual thought a result of two separate language systems or one system that integrates the two languages together. In this study, we examine if and how the presence and acquisition of a second language in English affects the concept of individuation in speakers of Chinese. This is investigated by looking at how Chinese monolingual, Chinese-English sequential bilingual and English-Chinese simultaneous bilingual undergraduates perform categorization in an online triad-matching task. In the task, participants were required to decide if the shape or substance (material) alternative was more similar to the test image. It is hypothesized for the simultaneous bilinguals to have the highest percent shape response followed by the sequential bilinguals then the Chinese monolinguals. But what we found was that the sequential bilinguals had a lower percent shape response than the Chinese monolinguals. This is suggested to be a result of increased cue sensitivity towards English when acquiring English as their second language. In addition, the results of the Chinese monolinguals were also compared to those of Japanese and English monolinguals in the study by Imai (2000) to see how differences in linguistic features of languages can affect thought. From the findings of this study, we gather that the bilingual thought is a result of the integration of two language systems as one since the acquisition of a second language creates shift in cognition. This is illustrated through how bilinguals categorize compared to monolinguals.</div>
<!-- Issue Date: 2011. 12. 31 -->
<dt><a name="Tan:2011">Liling Tan (2011)
<dd><a class='title' href="pdf/2011-fyp-tan-liling.pdf">Building the Foundation Text for Nanyang Technological University — Multilingual Corpus (NTU-MC)</a>, Final Year Project, Linguistics and
Multilingual Studies, Nanyang Technological University, Singapore,
<span class='toggle' title ='click to show/hide abstract' onClick="toggle('hide:Tan:2011')"><b>Abstract</b></span> (click to toggle)
<div class='abstract' style='display:none;' id='hide:Tan:2011'>The NTU-MC is a multilingual corpus that taps on the availability of multilingual text available in Singapore. The current version of NTU-MC contains a total of ~375,000 words (15,096 sentences) for the NTU-MC in 6 languages (English, Chinese, Japanese, Korean, Indonesian and Vietnamese) from 6 language families (Indo-European, Japonic, Austro-Asiatic, Sino-Tibetan, Austronesian and Korean as a language isolate); all text in English, Chinese, Japanese, Korean and Vietnamese were Part Of Speech (POS) tagged. This project focuses on compiling the foundation text for the NTU-MC and this dissertation describes the motivations, the corpus compilation process and internal and cross-corpora evaluation of the corpus output. The corpus will be made available to the public under the Creative Common – Attribute 3.0 Unported license in Summer 2011.</div>
(First FYP from LMS!)
</dl>
<div id="bottom">
Last modified: 2020-01-02
</div>
</div>
</td>
</tr>
</table>
</body>
</html>