Sistem Deteksi Bahasa pada Dokumen menggunakan N-Gram

Badrus Zaman; Eva Hariyanti; Endah Purwanti

doi:10.32722/multinetics.v1i2.1027

Published Nov 20, 2015

https://doi.org/10.32722/multinetics.v1i2.1027

Download

PDF (Bahasa Indonesia)

Statistic

Vol. 1 No. 2 (2015): MULTINETICS Nopember (2015)

Badrus Zaman

Fakultas Sains dan Teknologi Universitas Airlangga

Eva Hariyanti

Fakultas Sains dan Teknologi Universitas Airlangga

Endah Purwanti

Fakultas Sains dan Teknologi Universitas Airlangga

Abstract

Language detection on a very large collection of documents can be done to increasing performance of information retrieval system. One of popular method on language detection is N-Grams, based on pieces of n-characters taken from a string. This research is developed language detection system based on N-Gram that performs by Indonesian or English language. In general, the steps being taken there were 3 phases, namely creating profile of each language, system testing, and system evaluation. Fifty documents were used to creating profile of each language, i.e. 25 Indonesian and 25 English. Sixty documents were used for system testing. System performance was evaluated using F-measures. Based on the test, obtained F-measures for unigram, bigram, and unigram respectively 0.933, 0.917, and 0.933.

How to Cite

Zaman, B., Hariyanti, E., & Purwanti, E. (2015). Sistem Deteksi Bahasa pada Dokumen menggunakan N-Gram. MULTINETICS , 1(2), 21–26. https://doi.org/10.32722/multinetics.v1i2.1027

References

Hamzah, A. (2010). Deteksi bahasa untuk dokumen teks berbahasa Indonesia. Dalam prosiding Dukungan ICT dalam bidang industry dan manajemen ESDM. Halaman A-5 – A-13.
Ahmed B., Cha, S.H, dan Tappert C., (2004). Language Identification from Text Using N-Gram Based Cumulative Frequency Addition. Proceedings of Student/Faculty Research Day, CSIS, Pace University, May 7th, 2004.
Grothe, L., De Luca, E.W., dan N¨urnberger, A. (2008). A Comparative Study on Language Identification Methods. Dalam Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08). Halaman 980-985.
Padr´o, M.,dan Padr´o, L. (2004).Comparing methods for language identification. Dalam prosiding Procesamiento del Lenguaje Natural. Halaman 155–162.
Lui M., Lau J. H., dan Baldwin T. (2014). Automatic Detection and Language Identification of Multilingual Documents. Journal of Transactions of the Association for Computational Linguistics, 2 (2014) 27-40.
Ramisch, C., (2008). N-Gram models for language detection. M2R Informatique - Double diplˆome ENSIMAG – UJF/UFRIMA.

About Journal

Focus and Scope

Journal History

Sistem Deteksi Bahasa pada Dokumen menggunakan N-Gram

Abstract

References

About Journal

Focus and Scope

Journal History

##plugins.themes.academic_pro.article.sidebar##

##plugins.themes.academic_pro.article.main##

Abstract

##plugins.themes.academic_pro.article.details##

References