Skip to content

Anshiboy/OneHealth

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Diagnosis Navigator

Interactive symptom-to-diagnosis chat built with a FastAPI backend (TF-IDF + SVD encoder + Torch MLP classifier) and a Vite/React frontend. It is wired to the larger Hugging Face dataset fhai50032/SymptomsDisease246k (246k symptom→disease pairs) and keeps asking the most relevant follow-up questions until it reaches an 85% confidence target. If that file is missing, it falls back to the smaller Gretel dataset.

Backend

  • Preferred dataset: download to data/symptomsDisease246k.json
    mkdir -p data
    curl -L 'https://huggingface.co/datasets/fhai50032/SymptomsDisease246k/resolve/main/symptomsDisease246k.json' -o data/symptomsDisease246k.json
  • Fallback (already small): data/train.jsonl and data/test.jsonl from GretelAI.
  • Pipeline: TF-IDF (1–2 grams) → TruncatedSVD (256 dims) → Normalizer → Torch MLP classifier. Per-class TF-IDF keywords drive follow-up questions; replies keep probing until ≥85% confidence.
  • API: POST /api/chat with { "sessionId": null | "<uuid>", "message": "<symptoms>" } returns predictions, a follow-up question (if under 85% confidence), and accuracy metrics.
  • To keep local training fast, set MAX_TRAIN_SAMPLES=5000 (or any limit) before starting the API; by default it trains on the full dataset.

Run the API

python3 -m venv .venv
. .venv/bin/activate
pip install -r backend/requirements.txt

uvicorn backend.app:app --reload --host 0.0.0.0 --port 8000

Smoke test:

curl -s -X POST http://localhost:8000/api/chat \
  -H "Content-Type: application/json" \
  -d '{"message":"high fever, chills, rash, pain behind my eyes"}'

Frontend

cd frontend
npm install      # already done once, safe to re-run
npm run dev -- --host --port 5173
# optionally export VITE_API_URL=http://localhost:8000 if you change ports/hosts

Features: chat UI with running session, top-3 diagnosis confidences with bars, the next follow-up question (until ≥85% confidence), and live model metrics.

Notes & limitations

  • This is a prototype; outputs are not medical advice. Always involve a clinician, especially for urgent/ambiguous cases.
  • Dataset is synthetic and small; expect biases and gaps. Consider retraining with curated clinical data and adding guardrails before any real use.
  • Conversations are in-memory; restart clears sessions. Scale-out will need persistence plus authentication and audit logging.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published