[CS 598DLH] Add BiLM + BiLSTM NER biomedical example to PyHealth #708
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
BiLM + BiLSTM NER Example (Biomedical NER)
This example demonstrates a simple end-to-end pipeline for biomedical named entity recognition (NER) using:
The goal is to show how a research-style reproduction (BiLM + NER) can be packaged as a reusable PyHealth example, improving the reproducibility of AI4H models.
This example is adapted from a course project that reproduces a published biomedical NER architecture and evaluates the effect of BiLM pretraining on NER performance.
Files
bilm_ner.pyMain script which:
test_bilm_ner.pyunittesttest suite which:decode()returns sequences whose lengths match the unpadded token lengths.These tests are lightweight and designed to run quickly on CPU.
Dataset Format
By default, the example can run entirely on a synthetic toy dataset (no external files required).
To use a real dataset, provide files in a simple CoNLL-style TSV format:
TOKEN<TAB>TAGExample: