This repository contains all code, data preparation scripts, and training configurations for our Legal Named Entity Recognition experiments on the InLegalNER dataset. It is built upon and extends the work from the original EkStep/Legal_NER repository: https://github.com/Legal-NLP-EkStep/legal_NER
.
├── dataset/ # spaCy files (train.spacy, dev.spacy, test.spacy)
├── backtranslate.py # Back-translation data augmentation script
├── training/
│ ├── config_a.cfg # A: Baseline (RoBERTa)
│ ├── config_b.cfg # B: LegalBERT
│ ├── config_c.cfg # C: Back-Translation
│ └── config_d.cfg # D: Combo (LegalBERT + Back-Translation)
├── output/ # Model outputs for experiments A–D
├── evaluate/ # Evaluation JSON metrics
└── README.md # This file
-
Clone the repository:
git clone <https://github.com/YueranCao2001/ENLP_Project_Group3>
-
Create and activate a virtual environment:
python3 -m venv venv source venv/bin/activate -
Install dependencies:
pip install -r requirements.txt
Paraphrase 20% of the training data via English↔Chinese translation:
python backtranslate.py \
--input dataset/train.spacy \
--output dataset/train_bt.spacy \
--ratio 0.2 \
--max_length 400--ratio: fraction of docs to augment (0.2 = 20%)--max_length: max token length per translation batch
This produces dataset/train_bt.spacy alongside the original train.spacy.
Train four experiments using spaCy:
# A: Baseline (RoBERTa)
python -m spacy train training/config_a.cfg \
--output ./output/output_a \
--paths.train dataset/train.spacy \
--paths.dev dataset/dev.spacy
# B: LegalBERT (domain-adaptive pretraining)
python -m spacy train training/config_b.cfg \
--output ./output/output_b \
--paths.train dataset/train.spacy \
--paths.dev dataset/dev.spacy
# C: Back-Translation Augmentation
python -m spacy train training/config_c.cfg \
--output ./output/output_c \
--paths.train dataset/train_bt.spacy \
--paths.dev dataset/dev.spacy
# D: Combo (LegalBERT + Back-Translation)
python -m spacy train training/config_d.cfg \
--output ./output/output_d \
--paths.train dataset/train_bt.spacy \
--paths.dev dataset/dev.spacyEach command trains a transformer→ner pipeline and saves the best model to output/output_*.
Evaluate each best model on the held-out test set:
python -m spacy evaluate output/output_a/model-best \
dataset/test.spacy \
--output evaluate/metrics_test_a.json
python -m spacy evaluate output/output_b/model-best \
dataset/test.spacy \
--output evaluate/metrics_test_b.json
python -m spacy evaluate output/output_c/model-best \
dataset/test.spacy \
--output evaluate/metrics_test_c.json
python -m spacy evaluate output/output_d/model-best \
dataset/test.spacy \
--output evaluate/metrics_test_d.jsonEach metrics_test_*.json includes overall ents_p/ents_r/ents_f and per‐type P/R/F.
Use the provided JSON files under evaluate/ to plot precision, recall, and F1 comparisons.
We have published our trained model-best checkpoints as assets on GitHub Releases so you don’t need to retrain locally if you only want to run inference.
-
Download the Release asset
Go to Releases and download themodel-best_*.zipfile for the experiment (A, B, C, or D). -
Extract and place it under
output/output_*/