首页    期刊浏览 2022年01月23日 星期日
登录注册

文章基本信息

  • 标题:Investigating the Distribution of Arabic and English Keywords and Their Progress Over Different Text File Formats
  • 本地全文:下载
  • 作者:Boumedyen Shannaq
  • 期刊名称:American Journal of Computing Research Repository
  • 印刷版ISSN:2377-4606
  • 电子版ISSN:2377-4266
  • 出版年度:2013
  • 卷号:1
  • 期号:1
  • 页码:1-5
  • DOI:10.12691/ajcrr-1-1-1
  • 语种:English
  • 出版社:Science and Education Publishing
  • 摘要:This paper explicates a systematic approach of implementing text format categorization. It also emphasizes defined corpus linguistics and accordingly demonstrates how various Text files Html, Pdf, Doc and Txt format respectively could be analyzed. This work concentrates on comparing Arabic text format with English text format, for which various text formats have been considered. Hence the idea is implemented by calculating a distributed factor for the keywords distribution with respect to Arabic and English text documentation. All the text selected is from the Computer Technology domain. The text categorization process is implemented on the text collection and consists of two main corpus namely, Arabic and English text respectively. The obtained results show that the Arabic text format document is well distributed in Doc files compared to the English text document which is well distributed in Xml files. These results shall contribute in handling and building an effective Electronic Learning System for Arabic and English Texts. The results and conclusions are presented here with various graphical outputs for better understanding.
  • 关键词:information retrieval; text categorization; distributing factor; natural language processing; future trends
国家哲学社会科学文献中心版权所有