Replies: 1 comment 2 replies
-
|
What is stemming? Stemming usually refers to a crude heuristic process that chops off the ends of words in the hope of achieving this goal correctly most of the time, and often includes the removal of derivational affixes. See the following figure (definition and figure from Introduction to Information Retrieval, Solving this problem is being a little bit easy in the case of Latino languages, especially English, and there is adequate amount of already-working programs and algorithms that works fast and efficiently. A ubiquitous one is the Snowball Stemming Algorithms which supports a longish list of languages1, obviously Arabic is not one of them. The principles for doing stemming are also well-designed to deal with Latin words, for example see the following rules for removing endings from words: When it comes to doing Arabic stemming, however, the job is a little bit harder, and we will have to do more work as just removing endings (and even appendage in the start, in some languages) will never work in Arabic, because word forms are not being structured by extensions/removals. Take التاريخ which being fully stemmed to أرخ, another example is الاتصال -> وصل, you can never find a static rules as it with English algorithms. What do we need it for? Stemmers are needed for natural language processing, information retrieval, and text mining applications as: search engines, sentiment analysis, text classification, and machine translation. There is no a 100% accurate Arabic stemmer out there so far; thus all these fields lacks an open source one (I reckon that corporates like Google do have proprietary software to implement stemming). In our case, using a reliable stemmer is critical to make Mu’jam perform in a convenient way. I guess that’s probably the reason that existing Arabic dictionary are not very punctual. Let’s take for example Almaany and Al Mou’aser with the word المتلاعبان, which its gerund is تلاعب. The stemming levels should go as follows: A: User’s input. The user will be interested in C and sometimes F. Let’s try Almaany with And Al Mou’aser does not accept it at all, both of these dictionaries are incapable of relating a word in a different form to its original definition. Options?
1 Armenian, Basque, Catalan, Danish, Dutch, English, Finnish, French, German, 2 https://web.archive.org/web/20210925233743/https://www.worldcat.org/title/mujam-al-awzan-al-sarfiyah/oclc/301487025&referer=brief_results |
Beta Was this translation helpful? Give feedback.



Uh oh!
There was an error while loading. Please reload this page.
-
We should address the flowing points:
Beta Was this translation helpful? Give feedback.
All reactions