-
Notifications
You must be signed in to change notification settings - Fork 0
MOE
Sérgio Ildefonso edited this page Jan 7, 2025
·
1 revision
A Mixture-of-Experts (MOE) model is a machine learning technique that works similarly as having a team of specialists, each one with their own expertise. It consists of multiple specialized models, called "experts," and a "gating network" that decides which "expert" is best suited for a particular task.
Here is important:
- Experts: Individual neural networks, each one, specialized on a specific subject.
- Gating Network: A "dispatcher network" that determines which expert should handle a specific part of the input and dispatches the input to the expert.
Here is a diagram explaining how Mixture of Experts works:
stateDiagram-v2
[*] --> Input
Input --> Gating_Network
state Experts {
Expert_1
Expert_2
Expert_3
Expert_N
}
Gating_Network --> Routing
Gating_Network --> X2
Gating_Network --> X3
Input --> Routing
Routing --> Expert_2
Routing --> Expert_3
Expert_2 --> X2
Expert_3 --> X3
X2 --> Output_Parser
X3 --> Output_Parser
Output_Parser --> Output
Output --> [*]
Here is the explanation of the flow:
- The model receives an input.
- The Gating Network analyzes the input and, based on the input, assigns a weight to each expert, indicating how relevant they are to the task.
- The input is dispatched to the chosen experts based on the weights.
- The outputs of the experts are gathered and combined to produce the result.
Here are some benefits of using MOE models:
- Scalability: For large scale tasks, MOE can split the task in subtasks that are handled by specific "experts".
- Accuracy: This is a result of the combination of multiple "experts".
- Efficiency: This can be achieved by using only the necessary "experts".
Runtime Revolution