Skip to content

Thompson sampling returns identical scores during cold start #9

@jjroelofs

Description

@jjroelofs

Problem

When experiments have no data (cold start), the Thompson sampling algorithm returns identical scores for all arms because the beta distribution with alpha=1, beta=1 (from 0 turns, 0 rewards) produces the same random value for all arms in a single sampling run. This causes deterministic ordering instead of random exploration.

Impact

  • During cold start, content is always displayed in the same order
  • Prevents proper exploration of all content options
  • Reduces effectiveness of the reinforcement learning algorithm
  • Creates bias toward content that appears first in the deterministic order

Root Cause

The beta distribution sampling in ThompsonCalculator::calculateThompsonScores() generates scores like:

$alpha = $data->rewards + 1;
$beta = $data->turns - $data->rewards + 1;
$scores[$id] = $this->randBeta($alpha, $beta);

When all arms have 0 turns and 0 rewards:

  • All arms get alpha=1, beta=1
  • The randBeta() function returns similar/identical values
  • No differentiation between arms during initial exploration

Solution Implemented

Add a micro-randomization tie-breaker to ensure unique scores:

$base_score = $this->randBeta($alpha, $beta);
$tie_breaker = mt_rand(1, 999) / 1000000;
$scores[$id] = $base_score + $tie_breaker;

This ensures:

  • Each arm gets a unique score even with identical statistics
  • Proper randomization during cold start
  • Maintains statistical properties of Thompson sampling
  • Tie-breaker is small enough (0.000001-0.000999) to not affect learned preferences

Testing

  • Verified unique scores generated during cold start
  • Confirmed random exploration with empty experiment data
  • Validated tie-breaker doesn't interfere with learned preferences

Fixed in PR #8

🤖 Generated with Claude Code

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions