Skip to content

convei-lab/semantic-web

Repository files navigation

semantic-web

This repo serves a pipelined tutorial that build up a knowledge base from web-crawled text using Protege and Apache Jena.

Prerequisite

  1. Protege
  2. Apache Jena
  3. Apache Jena Fuseki

Check this link for installing Apache Jena & Fuseki (in Korean).

Installation

  1. Install python dependencies.
pip install cython
pip install -r requirements.txt
  1. Web scrap. (results in /data/raw)
python scrap.py
  1. Explore scrapped data. (Results in ./data/stat)
python stat.py
  1. Translate into Korean. (Optional. Results in ./data/translated)
python translate.py
  1. Annotate the dataset. Insert any class label of your interest as a text span wrapped with doulbe squared bracket, e.g. [[Fruit]]. Also, correct the chunking of paragraphs, incorrect newlines charaters, and etc.
mkdir ./data/anno
cp -r ./data/raw/* ./data/anno
* Annotation with data/anno
  1. Create a neat exel file for annotation. (Optional)
python anno2xls.py
  1. Create OWL object classes & properties (T-Box) with Protege and place it under ./data/ontology. You can use a WebProtege to collaborate with your team.

    • ./data/ontology/root-ontology.owl
  2. Populate individuls. (Results in ./data/ontology/basic.txt. Copy and paste the generated text to root-ontology.owl.)

python populate.py
  1. Save root-ontology.owl as a turtle syntax.

    • ./data/ontology/root-ontology.ttl
  2. Run Apache Jena Fuseki and write some SPARQL query you need. (Results in sparql/competency_questions)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages