You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Jun 20, 2023. It is now read-only.
Balaa (Arabic: بلاء) is an Arabic language stemmer tester, it’s intended to be
used to test jzr and other stemmers in the future.
We need initially to make sure we are going to test all the possible forms of
word in Arabic, we can not afford to have every single word in our test but we
can afford to have a set of وزنs, each one of them corresponds to a small list
of words, which should cover all the similar words. Take for example the
following words: مضروب، ملعوب، مهيوب، مكسور all of them have the وزن مفعول. If
we are able to stem one of them to its root then we guarantee that we will do
the same with every other word.
Also We need to collect every possible وَزن; there are 600-700 of them that are
already collected in the مُعجم book; it’s attached on the repo files. We need to
scarp all of them, this will be a little arduous job to get done however it’s
the only way to guarantee high accuracy for our projects. We should start
developing the test methodology in parallel with collecting the testcases and
actually using the tester.
Here is my proposal for how the project should go:
Simple data set.
We may use sqlite or csv file, the scheme should be very simple, a وزن that
corresponds with a list of words. We should start collecting the words and
وزنs collaboratively.
Once we have enough data we can start testing it on already existing stemmers.
Here we will need to design a simple template for stemmers to follow so we can
analyze their results using unified method.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Balaa (Arabic: بلاء) is an Arabic language stemmer tester, it’s intended to be
used to test jzr and other stemmers in the future.
We need initially to make sure we are going to test all the possible forms of
word in Arabic, we can not afford to have every single word in our test but we
can afford to have a set of وزنs, each one of them corresponds to a small list
of words, which should cover all the similar words. Take for example the
following words: مضروب، ملعوب، مهيوب، مكسور all of them have the وزن مفعول. If
we are able to stem one of them to its root then we guarantee that we will do
the same with every other word.
Also We need to collect every possible وَزن; there are 600-700 of them that are
already collected in the مُعجم book; it’s attached on the repo files. We need to
scarp all of them, this will be a little arduous job to get done however it’s
the only way to guarantee high accuracy for our projects. We should start
developing the test methodology in parallel with collecting the testcases and
actually using the tester.
Here is my proposal for how the project should go:
Simple data set.
We may use sqlite or csv file, the scheme should be very simple, a وزن that
corresponds with a list of words. We should start collecting the words and
وزنs collaboratively.
Once we have enough data we can start testing it on already existing stemmers.
Here we will need to design a simple template for stemmers to follow so we can
analyze their results using unified method.
Beta Was this translation helpful? Give feedback.
All reactions