Skip to content

Named entity tagging and disambiguation using a Knowledge Graph

Notifications You must be signed in to change notification settings

xaviermathew/Pythia

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Pythia

Named entity tagging and disambiguation using a Knowledge Graph

How does this work

  1. Initializes a Knowledge Graph into a NetworkX Graph object
  2. Given an input query, uses spaCy to identify named entities and noun chunks. Also identifies "question" tokens like "Who" and "Where"
  3. For each identified entity, queries the Knowledge Graph to find candidate nodes that have a similar name (using Jaccard similarity)
  4. Generates all combinations of candidates and finds the best combination using a weighted sum of graph score (80%) and jaccard similarity score (20%)
  5. Graph score is used to represent how closely a given set of nodes/concepts are related. It is calculated as the sum of the lengths of the shortest paths between all nodes in a given combination
  6. Out-of-vocabulary entities (and "question" tokens) are calculated as the centroid of all the other resolved entities (filtering candidates based on the entity label identified by spaCy)

Known issues

  • The text similarity scoring (jaccard) is very dumb right now and doesn't do any stemming/lemmatization or handle typos/variations of spellings
  • On massive graphs, steps (3) and (5) above are going to take a long time

Instructions

  1. Clone this repo
  2. Load example data into the Knowledge Graph or (load your own data using pythia.graph_utils.init_graph)
>>> from pythia.data import example_data
>>> from pythia.graph_utils import init_graph
>>> from pythia.tagger import tag
>>> G = init_graph(example_data)
  1. Start tagging queries
>>> tag(G, "Xavier's father is from tamilnadu")
{u'Xavier': (<Person:xavier mathew>, 0.5),
 u'tamilnadu': (<Place:tamilnadu>, 1.0)}
>>> tag(G, 'Xavier from spain')
{u'Xavier': (<Person:saint francis xavier>, 0.3333333333333333),
 u'spain': (<Place:spain>, 1.0)}
>>> tag(G, 'Who is the priest that worked in india')
{u'Who': (<Person:saint francis xavier>, 8),
 u'india': (<Place:india>, 1.0),
 u'the priest': (<Occupation:priest>, 0.5)}

About

Named entity tagging and disambiguation using a Knowledge Graph

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages