FailLite, a failure-resilient model serving system for resource-constrained edge environments. This project includes the source code of the system implementation, built upon Nvidia Triton Inference Server, and the source code of a simulator for large-scale evaluation.
FailLite/
├── src/ # System implementation
│ ├── controller/ # FailLite controller (failure detection + two-step failover approach)
│ ├── model_manager/ # FailLite Agent to coordinate the model loading/unloading on worker nodes
│ ├── monitoring_daemon/ # Collect heartbeat and system metrics from worker nodes
│ ├── model_profiler/ # Model profiling
│ ├── inference_client/ # Model inference client with receving failover notification
├── simulator/ # Simulator for large-scale evaluation
├── scripts/ # Scripts for running failover experiments on edge testbeds
├── analysis/ # Scripts for results analysis and visualization
├── doc/ # Documentation files (e.g., FailLite's architecture)
This project is licensed under the MIT License - see the LICENSE file for details.