semantic-web

This repo serves a pipelined tutorial that build up a knowledge base from web-crawled text using Protege and Apache Jena.

Prerequisite

Check this link for installing Apache Jena & Fuseki (in Korean).

pip install cython
pip install -r requirements.txt

python scrap.py

python stat.py

python translate.py

Annotate the dataset. Insert any class label of your interest as a text span wrapped with doulbe squared bracket, e.g. [[Fruit]]. Also, correct the chunking of paragraphs, incorrect newlines charaters, and etc.

mkdir ./data/anno
cp -r ./data/raw/* ./data/anno
* Annotation with data/anno

python anno2xls.py

Create OWL object classes & properties (T-Box) with Protege and place it under ./data/ontology. You can use a WebProtege to collaborate with your team.
- ./data/ontology/root-ontology.owl
Populate individuls. (Results in ./data/ontology/basic.txt. Copy and paste the generated text to root-ontology.owl.)

python populate.py

Save root-ontology.owl as a turtle syntax.
- ./data/ontology/root-ontology.ttl
Run Apache Jena Fuseki and write some SPARQL query you need. (Results in sparql/competency_questions)

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
__pycache__		__pycache__
data		data
sparql		sparql
.gitignore		.gitignore
README.md		README.md
anno2xls.py		anno2xls.py
basic.txt		basic.txt
paraphrase.py		paraphrase.py
populate.py		populate.py
requirements.txt		requirements.txt
root-ontology.owl		root-ontology.owl
scrap.py		scrap.py
stat.py		stat.py
translate.py		translate.py