Skip to content
Sérgio Ildefonso edited this page Jan 7, 2025 · 1 revision

A Mixture-of-Experts (MOE) model is a machine learning technique that works similarly as having a team of specialists, each one with their own expertise. It consists of multiple specialized models, called "experts," and a "gating network" that decides which "expert" is best suited for a particular task.

Here is important:

  • Experts: Individual neural networks, each one, specialized on a specific subject.
  • Gating Network: A "dispatcher network" that determines which expert should handle a specific part of the input and dispatches the input to the expert.

Here is a diagram explaining how Mixture of Experts works:

stateDiagram-v2
    [*] --> Input
    Input --> Gating_Network
    state Experts {
    Expert_1
    Expert_2
    Expert_3
    Expert_N
    }
    Gating_Network --> Routing
    Gating_Network --> X2
    Gating_Network --> X3
    Input --> Routing
    Routing --> Expert_2
    Routing --> Expert_3
    Expert_2 --> X2
    Expert_3 --> X3
    X2 --> Output_Parser
    X3 --> Output_Parser
    Output_Parser --> Output
    Output --> [*]
Loading

Here is the explanation of the flow:

  1. The model receives an input.
  2. The Gating Network analyzes the input and, based on the input, assigns a weight to each expert, indicating how relevant they are to the task.
  3. The input is dispatched to the chosen experts based on the weights.
  4. The outputs of the experts are gathered and combined to produce the result.

Here are some benefits of using MOE models:

  • Scalability: For large scale tasks, MOE can split the task in subtasks that are handled by specific "experts".
  • Accuracy: This is a result of the combination of multiple "experts".
  • Efficiency: This can be achieved by using only the necessary "experts".

Clone this wiki locally