NLP-Domain-Term-Extraction

This is NLP Domain Term Extraction project for Hindi Language written in Python. We need POS tagged corpus from 3 different domain, we have used online POS tagger for Hindi language.

Please keep below files (part of Zipped file) at some local path -

Term_Extraction_Project.py - This is module version of our project code.
corpusPath.txt - contains path to corpus of all 3 domains Content of this file "corpusPath.txt"- \Domains\Banking\Corpus \Domains\Jyotish\Corpus \Domains\Vamaniki\Corpus Kindly replace with appropraite path in corpusPath.txt before running the program.
hindi_stop_words.txt - contains hindi stop words, being used in our program
Domains.txt - contains Domain names in same order as of Domain corpus path stored in above "corpusPath.txt" file. Content of this file "Domains.txt"
Keep "Domains" folder (you will get it after unzipping the main file), so folder structure will be same of the path mentioned in "corpusPath.txt"

\Domains\Banking\Corpus \Domains\Jyotish\Corpus \Domains\Vamaniki\Corpus

were will be the path where you are keeping all the above listed files.

****Command to run the module - In command prompt, go to src where you have kept above files and run below command -

python Term_Extraction_Project.py ./corpusPath.txt ./hindi_stop_words.txt ./Domains.txt

Output Format - Output file will be generated at same Outfilename - Codeoutput.txt

Implementation Limitation -

It needs POS tagged corpus for 3 domains as of now.
Code can run for only 3 domains as of now, however it can be generalized later to run for as many as domains we will pass in program.
Sequence of Domain name and Domain Corpus path will be same in "Domains.txt" and ""corpusPath.txt"" respectively.
This implementation requires POS tagged corpus.

Data Collection:

We collected data from hindi wikipedia and used an online POS tagger. Tool used for this was Selenium.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Domain Terminology.pptx		Domain Terminology.pptx
Domains.7z		Domains.7z
README.md		README.md
README_Project.txt		README_Project.txt
Term_Extraction_Project.py		Term_Extraction_Project.py
src.7z		src.7z

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

NLP-Domain-Term-Extraction

About

Uh oh!

Releases

Packages

Languages

suraj7289/NLP-Domain-Term-Extraction

Folders and files

Latest commit

History

Repository files navigation

NLP-Domain-Term-Extraction

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages