Diagnosis Navigator

Interactive symptom-to-diagnosis chat built with a FastAPI backend (TF-IDF + SVD encoder + Torch MLP classifier) and a Vite/React frontend. It is wired to the larger Hugging Face dataset fhai50032/SymptomsDisease246k (246k symptom→disease pairs) and keeps asking the most relevant follow-up questions until it reaches an 85% confidence target. If that file is missing, it falls back to the smaller Gretel dataset.

Backend

Preferred dataset: download to data/symptomsDisease246k.json

mkdir -p data
curl -L 'https://huggingface.co/datasets/fhai50032/SymptomsDisease246k/resolve/main/symptomsDisease246k.json' -o data/symptomsDisease246k.json

Fallback (already small): data/train.jsonl and data/test.jsonl from GretelAI.
Pipeline: TF-IDF (1–2 grams) → TruncatedSVD (256 dims) → Normalizer → Torch MLP classifier. Per-class TF-IDF keywords drive follow-up questions; replies keep probing until ≥85% confidence.
API: POST /api/chat with { "sessionId": null | "<uuid>", "message": "<symptoms>" } returns predictions, a follow-up question (if under 85% confidence), and accuracy metrics.
To keep local training fast, set MAX_TRAIN_SAMPLES=5000 (or any limit) before starting the API; by default it trains on the full dataset.

Run the API

python3 -m venv .venv
. .venv/bin/activate
pip install -r backend/requirements.txt

uvicorn backend.app:app --reload --host 0.0.0.0 --port 8000

Smoke test:

curl -s -X POST http://localhost:8000/api/chat \
  -H "Content-Type: application/json" \
  -d '{"message":"high fever, chills, rash, pain behind my eyes"}'

Frontend

cd frontend
npm install      # already done once, safe to re-run
npm run dev -- --host --port 5173
# optionally export VITE_API_URL=http://localhost:8000 if you change ports/hosts

Features: chat UI with running session, top-3 diagnosis confidences with bars, the next follow-up question (until ≥85% confidence), and live model metrics.

Notes & limitations

This is a prototype; outputs are not medical advice. Always involve a clinician, especially for urgent/ambiguous cases.
Dataset is synthetic and small; expect biases and gaps. Consider retraining with curated clinical data and adding guardrails before any real use.
Conversations are in-memory; restart clears sessions. Scale-out will need persistence plus authentication and audit logging.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
backend		backend
data		data
frontend		frontend
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Diagnosis Navigator

Backend

Run the API

Frontend

Notes & limitations

About

Uh oh!

Releases

Packages

Languages

Anshiboy/OneHealth

Folders and files

Latest commit

History

Repository files navigation

Diagnosis Navigator

Backend

Run the API

Frontend

Notes & limitations

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages