Résumés
Abstract
The Chinese language, unlike some western languages, is written without a space between any two words, which presents itself as a unique problem in Machine Translation: how to segment words in Chinese? The current word-segmentation systems in Machine Translation are either linguistically-oriented or statistically-oriented. Both types, however, have some innate defects that cannot be overcome due to the pragmatically-oriented feature of the Chinese language. This research aims at addressing the problem of Chinese word segmentation of Machine Translation in light of a language investigation consisting of two surveys and eight interviews.
Keywords/Mots-clés:
- Chinese word segmentation,
- machine translation,
- language investigation,
- contextual information,
- semantic plausibility
Résumé
La langue chinoise, à la différence des langues occidentales, ne laisse pas d’espace entre deux mots à l’écrit, ce qui pose un problème à la traduction par ordinateur du chinois à l’anglais : comment segmenter les mots en chinois ? Le système de segmentation de mots utilisé actuellement dans la traduction par machine est doté soit d’une orientation linguistique, soit d’une orientation statistique. Cependant, compte tenu du caractère pragmatique de la langue chinoise, les deux genres de système ont des défauts inhérents que l’on n’arrivera pas à effacer. La présente étude propose des solutions pour résoudre le problème de segmentation de mots dans la traduction par machine par une étude langagière composée de deux enquêtes et de huit interviews.
Parties annexes
References
- Carl, M. et al. (2000): “Towards a Dynamic Linkage of Example-based and Rule-based Machine Translation,” Machine Translation 15, pp. 223-257.
- Huang, Y. (2000): Anaphora: A Cross-linguistic Study (Oxford Studies in Typology and Linguistic Theory), Oxford, Oxford University Press.
- Liu, K. (2000): Automatic Word-segmentation and Tagging for Chinese Texts (In Simplified Chinese), Beijing, The Commercial Press.
- Luo, Z. et al. (1997): “A Review of the Study of Chinese Automatic Segmentation” (In Simplified Chinese), Journal of Zhejiang University, 31-3, pp. 306-312. (In Simplified Chinese)
- Nida, E. A. and J. de Waard (1986): From One Language to Another, Nashville: Thomas Nelson.
- Robertson, D. (2000): “Variability in the Use of the English Article System by Chinese Learners of English,” Second Language Research 16-2, pp. 135-172.
- Somers, H. (ed.) (2003): Computers and Translation: A Translator’s Guide, Philadelphia, John Benjamins Publishing Company.
- Wang, K. et al. (2003): “The Main Techniques in Chinese Word-Segmentation and Its Prospect of Application” (In Simplified Chinese), Communications Technology 138, pp. 12-15.
- Wang, L. (2003): A Sociolinguistic Study of Chinese Words (In Simplified Chinese), Beijing, The Commercial Press.
- Yin, J. (1998): “Automatic Word Segmentation Methods for Chinese Language” (In Simplified Chinese), Computer Engineering and Science 20-3, pp. 60-66.