Skip to content

A treebank annotation tool based on a statistical parser that is re-trained during annotation

License

Notifications You must be signed in to change notification settings

nschneid/activedop

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

163 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Active DOP treebank annotation tool

A treebank annotation tool based on a statistical parser that is re-trained during annotation. Paper: http://www.aclweb.org/anthology/C18-2009

screenshot of annotation tool

Installation instructions (MacOS and Linux)

Requires Python 3.10-11 to install.

  1. (Recommended): Create and activate a venv virtual Python environment:
python3.11 -m venv .venv
. .venv/bin/activate
  1. Install submodule requirements:
pip install setuptools
pip install cython==3.0.12
  1. Install submodules:
git submodule update --init --recursive
cd roaringbitmap
python setup.py install
cd ..
cd disco-dop
pip3 install -r requirements.txt
env CC=gcc sudo python setup.py install
cd ..
  1. Install activedop:
pip3 install -r requirements.txt

Installation instructions (Windows PowerShell)

NOTE: Requires a C++ compiler, e.g., from Visual Studio Build Tools.

Make sure to also install a Windows 10/11 SDK.

You will also need a standard GCC distribution to compile discodop. These instructions assume you've installed GCC from https://www.msys2.org/.

  1. (Recommended): Create and activate a venv virtual Python environment:
python3 -m venv .venv
.\.venv\Scripts\activate
  1. Install submodule requirements:
pip install setuptools
pip install cython
  1. Apply patches to dependencies

Installing discodop on Windows requires a few patches.

First, in disco-dop/setup.py, add '-DMS_WIN64' to the array of extra_compile_args at line 111. Then, in the same file, redefine extra_link_args at line 128 such that the line reads:

extra_link_args = ['-DNDEBUG', '-static-libgcc', '-static-libstdc++', '-Wl,-Bstatic,--whole-archive', '-lwinpthread', '-Wl,--no-whole-archive']
  1. Install submodules:
git submodule update --init --recursive
cd .\roaringbitmap\
python setup.py install
cd ..
cd .\disco-dop\
pip3 install -r requirements.txt
python setup.py build --compiler=mingw32
python setup.py install
cd ..
  1. Install activedop:
pip3 install -r requirements.txt

Running the demo on a toy treebank and annotation task:

  • extract the example grammar: "discodop runexp example.prm" The grammar will be extracted from "treebankExample.mrg", and the annotation task will consist of the sentences in "newsentsExample.txt".
  • run "FLASK_APP=app.py flask initdb"
  • run "FLASK_APP=app.py flask initpriorities"
  • start the web server with "FLASK_APP=app.py flask run --with-threads". open browser at http://localhost:5000/ username "JoeAnnotator", password "example"

Configuration

Edit "settings.cfg" to use a different grammar and sentences to annotate, and to configure usernames and passwords. Note that the treebank on which the grammar is based needs to be available, in the paths specified in the grammar parameter file.

Annotators have the option to export the annotated trees as LaTeX/PDF files. To use a local LaTeX installation for PDF generation, set the LATEX_SERVICE configuration option to "local". To use a remote LaTeX service, set LATEX_SERVICE to "remote", and set the LATEX_SERVICE_URL and LATEX_CREDS environment variables accordingly.

The remote LaTeX service should accept POST requests with JSON payloads of the form

and return the generated PDF as binary content.

The LATEX_CREDS value should be a string of the form "username:password", to be used for HTTP Basic Authentication with the remote LaTeX service.

The local LaTeX installation needs to have the pdflatex command available in the system PATH.

Input sentences

Sentences need to be segmented, one sentence per line. For best results, tokenize the sentences to annotate according to treebank conventions.

Reference

bibtex:

@InProceedings{vancranenburgh2018active,
    author={van Cranenburgh, Andreas},
    title={Active DOP: A constituency treebank annotation tool with online learning}
    year={2018},
    booktitle={Proceedings of COLING system demonstrations},
    pages={38--42},
    url={http://www.aclweb.org/anthology/C18-2009}
}

About

A treebank annotation tool based on a statistical parser that is re-trained during annotation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 71.1%
  • JavaScript 14.4%
  • HTML 11.6%
  • CSS 1.7%
  • Dockerfile 1.2%