Sustainable_docuemnts_scoring

This repository contains application for trainig a NLP model and use it for evaluating an document agians a predefined benchmarks.

Recomendation_system__training_v26.py

This module use an input directory of zip file as a source for documents (pdf files) and train a GloVe model to creat a Room for a specific industry or sector. It uses a pretrained 300 dimensional pretraimed model as initiation(glove_6B_300d.txt). the initial pretrained mode collectedform:
from https://nlp.stanford.edu/projects/glove
the module also use a benchmark list for keeping benchmarks in the dictionary:
file_bench = 'rooted_bench.csv'
The module has several initial configuarion factors that all are sets in begining of the module. The trained model is saved periodically (after each 20 files processed) in:
trained_model='glove_dictionary_test.pkl'
Larger files are splitted for memory limtation. If for any reason the process is stoped, the module must be rerun and next run will autoamtically process only input files that were not already processd during earlier trainning. After completly read all input files a meeaage appears about completion of the process. the final traiend model will be :
glove_dictionary_test.pkl

Recomendation_system_NLP_v26.py

This module used a trained model as room a benchmark list and a directory of documents. the input document will evaluated agians the room and the benchmark list and creat a output file containg the level of the similarity of the input documents words to bencmarks.
The benchmarks were catagorized weighted by help of SMEs. The catagorizes are used for grouping required similarity measures. the output will saved as CSV file:
sim_to_bench_file='sim_to_bench.csv'

The required functions are imported through: from Carlo_ngrams_tool.utilities_recommendation import
to import: utilityies_recomndation.py
It should be properly located to be accesible by main modules. there are a library of text processing functions that is imported by utility_recomendation.py as well. it is:
chunking_bforce_plus_space_add.py
It should be also accesible properly. the library contains all text processing functions.
the directory structure is as folllow:

├── working_directory
│ ├── Recomendation_system__training_v26.py
│ ├── Recomendation_system_NLP_v26.py
│ ├── glove_dictionary_test.pkl
│ ├── rooted_benchmarks_11_07_2022.csv
│ ├── bench-to-crawled-duck_unique.csv
│ ├── Carlo_ngrams_tool
│ │ ├── utilityies_recomndation.py
│ │ ├── chunking_bforce_plus_space_add.py

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
Recomendation_system_NLP_v26.py		Recomendation_system_NLP_v26.py
Recomendation_system__training_v26.py		Recomendation_system__training_v26.py
chunking_bforce_plus_space_add.py		chunking_bforce_plus_space_add.py
utilities_recommendation.py		utilities_recommendation.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sustainable_docuemnts_scoring

Recomendation_system__training_v26.py

Recomendation_system_NLP_v26.py

About

Uh oh!

Releases

Packages

Languages

HojatBehrooz/Sustainable_documents_scoring

Folders and files

Latest commit

History

Repository files navigation

Sustainable_docuemnts_scoring

Recomendation_system__training_v26.py

Recomendation_system_NLP_v26.py

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages