Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content
The paper discusses the current state of Sindhi corpus construction in detail. Sindhi corpus development issues including corpus acquisition, preprocessing, and tokenization are discussed in detail. Preliminary results and observations... more
    • by 
    •   5  
      ScriptCorpus ConstructionUnigrambigram
Bangla blog is increasing rapidly in the era of information, and consequently, the blog has a diverse layout and categorization. In such an aptitude, automated blog post classification is a comparatively more efficient solution in order... more
    • by 
    •   7  
      Tf-IdfSupervised machine learningUnigrambigram
Bangla blog is increasing rapidly in the era of information, and consequently, the blog has a diverse layout and categorization. In such an aptitude, automated blog post classification is a comparatively more efficient solution in order... more
    • by  and +1
    •   7  
      Tf-IdfSupervised machine learningUnigrambigram
N-grams are consecutive overlapping N-character sequences formed from an input stream. N-grams are used as alternatives to word-based retrieval in a number of systems. In this paper we propose a model applicable to categorization of... more
    • by 
    •   5  
      Text CategorizationKey wordsConjunctsbigram