Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Porter’s Algorithm in C

Porter’s Algorithm in C

The Porter stemming algorithm (or ‘Porter stemmer’) is a process for removing the commoner morphological and inflectional endings from words in English. It is used as part of a term normalization process that is usually done when setting up Information Retrieval systems. The rules in the Porter algorithm are separated into five distinct phases numbered from 1 to 5. They are applied to the words in the text starting from phase 1 and moving on to phase 5. Further, they are applied sequentially one after the other as commands in a program.

Originally written in 1979 at Computer Laboratory, Cambridge (England), it was reprinted in 1997 in the book “Readings in Information Retrieval“. Initially it was written in BCPL language. Here is the list of implementations in other programming languages including C, Java and Pearl implementations done by author himself.

This porter stemming algorithm extensively use conditional operators such as C if-else and while loops.

C++ For Dummies 7th Edition
Unlock the world of programming with C++ for Dummies – the perfect beginner’s guide to mastering C++ with ease and confidence!
View on Amazon

This is the Porter stemming algorithm, coded up in ANSI C by the author himself. You can compile it on Unix with ‘gcc -O3 -o stem stem.c’ after which ‘stem’ takes a list of inputs and sends the stemmed equivalent to stdout.

About The Author

M. Saqib

Saqib is Master-level Senior Software Engineer with over 14 years of experience in designing and developing large-scale software and web applications. He has more than eight years experience of leading software development teams. Saqib provides consultancy to develop software systems and web services for Fortune 500 companies. He has hands-on experience in C/C++ Java, JavaScript, PHP and .NET Technologies. Saqib owns and write contents on mycplus.com since 2004.