This project builds an extension of Open English Wordnet (OEWN) by adding names from Wikidata. The resulting resource, Open English Namenet, contains millions of extra synsets. This repository includes the manual entries from OEWN as well as code for building the extended resource.
This project is developed using Python and Poetry for dependency management. To set up the environment, follow these steps:
- Install Poetry: If you haven't already, install Poetry by following the instructions at https://python-poetry.org/docs/#installation.
- Clone the Repository: Clone this repository to your local machine.
- Install Dependencies: Navigate to the project directory and run the following command to install the required dependencies:
poetry install
- Activate the Virtual Environment: To activate the virtual environment created by Poetry, run:
poetry shell
The Wikidata database is required to generate the Open English Namenet. It requires Cargo (Rust's package manager), sqlite3, and wget to be installed on your system.
To install Cargo, follow the instructions at https://doc.rust-lang.org/cargo/getting-started/installation.html.
Once you have Cargo installed, run the following command to build the Wikidata database:
cd wikidata_db
bash build_db.shThis downloads the most recent Wikidata dump and takes 6-8 hours and a lot of disk space to process.
You can delete wikidata_db/latest-all.json.bz2 after the database has been built to save space
or to restart with a newer dump.
OEWN can be cloned from its GitHub repository. Run the following command to clone OEWN:
git clone https://github.com/globalwordnet/english-wordnet.gitTo generate the Open English Namenet, run the following command from the project root:
python open_english_namenet/generate.py --oewn /path/to/english-wordnet --wd wikidata.db