Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
145 changes: 140 additions & 5 deletions common/web/lm-worker/src/correction/distance-modeler.ts
Original file line number Diff line number Diff line change
Expand Up @@ -526,8 +526,6 @@ namespace correction {
let searchSpace = this;
let currentReturns: {[mapKey: string]: SearchNode} = {};

// JS measures time by the number of milliseconds since Jan 1, 1970.
let timeStart = Date.now();
let maxTime: number;
if(waitMillis == 0) {
maxTime = Infinity;
Expand All @@ -537,6 +535,136 @@ namespace correction {
maxTime = waitMillis;
}

/**
* This inner class is designed to help the algorithm detect its active execution time.
* While there's no official JS way to do this, we can approximate it by polling the
* current system time (in ms) after each iteration of a short-duration loop. Unusual
* spikes in system time for a single iteration is likely to indicate that an OS
* context switch occurred at some point during the iteration's execution.
*/
class ExecutionTimer {
/**
* The system time when this instance was created.
*/
private start: number;

/**
* Marks the system time at the start of the currently-running loop, as noted
* by a call to the `startLoop` function.
*/
private loopStart: number;

private maxExecutionTime: number;
private maxTrueTime: number;

private executionTime: number;

/**
* Used to track intervals in which potential context swaps by the OS may
* have occurred. Context switches generally seem to pause threads for
* at least 16 ms, while we expect each loop iteration to complete
* within just 1 ms. So, any possible context switch should have the
* longest observed change in system time.
*
* See `updateOutliers` for more details.
*/
private largestIntervals: number[] = [0];

constructor(maxExecutionTime: number, maxTrueTime: number) {
// JS measures time by the number of milliseconds since Jan 1, 1970.
this.loopStart = this.start = Date.now();
this.maxExecutionTime = maxExecutionTime;
this.maxTrueTime = maxTrueTime;
}

startLoop() {
this.loopStart = Date.now();
}

markIteration() {
const now = Date.now();
const delta = now - this.loopStart;
this.executionTime += delta;

/**
* Update the list of the three longest system-time intervals observed
* for execution of a single loop iteration.
*
* Ignore any zero-ms length intervals; they'd make the logic much
* messier than necessary otherwise.
*/
if(delta) {
// If the currently-observed interval is longer than the shortest of the 3
// previously-observed longest intervals, replace it.
if(this.largestIntervals.length > 2 && delta > this.largestIntervals[0]) {
this.largestIntervals[0] = delta;
} else {
this.largestIntervals.push(delta);
}
// Puts the list in ascending order. Shortest of the list becomes the head,
// longest one the tail.
this.largestIntervals.sort();

// Then, determine if we need to update our outlier-based tweaks.
this.updateOutliers();
}
}

updateOutliers() {
/* Base assumption: since each loop of the search should evaluate within ~1ms,
* notably longer execution times are probably context switches.
*
* Base assumption: OS context switches generally last at least 16ms. (Based on
* a window.setTimeout() usually not evaluating for at least
* that long, even if set to 1ms.)
Comment on lines +617 to +619
Copy link
Contributor Author

@jahorton jahorton Aug 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basis: try running this code in your browser's developer mode a few times:

var now = Date.now();  var timer = window.setTimeout(() => {console.log(Date.now() - now)}, 1);

Of course, keep in mind that this code will be triggered via user-input interrupt, so it can easily be triggered while the relevant thread would otherwise be asleep, possibly in the middle of whatever timer mechanism the OS uses to periodically context-switch, "off the beat" in a sense. As a result, you will see notably lower values than 16ms... but you'll likely also see some higher values.

Example runs on Win 10 / Firefox:

image

If a 23ms context-switch pause occurred during a prediction search, with no other interruptions... that'd cut execution time from 33ms to just 10ms without the special handling added here. If it were paused even longer, well... good luck operating with literally 0ms allotted execution time. As noted by #5467 / #5479, it can happen with some regularity.

While we don't know the exact conditions of the BrowserStack servers, devices, etc used for unit testing... I wouldn't be surprised if they have at least a few VMs set up per server; if so, there's a pretty solid chance of significant context-switching happening on those systems.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://developer.mozilla.org/en-US/docs/Web/API/setTimeout#reasons_for_delays_longer_than_specified may also be helpful background.

On Windows, this may also be relevant:

The resolution of the GetTickCount function is limited to the resolution of the system timer, which is typically in the range of 10 milliseconds to 16 milliseconds. The resolution of the GetTickCount function is not affected by adjustments made by the GetSystemTimeAdjustment function.

I've got no idea what sort of behaviour we are going to see on mobile devices for these scenarios. Sounds like some testing on low-end devices is seriously needed.

*
* To mitigate these assumptions: we'll track the execution time of every loop
* iteration. If the longest observation somehow matches or exceeds the length of
* the next two almost-longest observations twice over... we have a very strong
* 'context switch' candidate.
*
* Or, in near-formal math/stats: we expect a very low variance in execution
* time among the iterations of the search's loops. With a very low variance,
* ANY significant proportional spikes in execution time are outliers - outliers
* likely caused by an OS context switch.
*
* Rather than do intensive math, we use a somewhat lazy approach below that
* achieves the same net results given our assumptions, even when relaxed somewhat.
*
* The logic below relaxes the base assumptions a bit to be safe:
* - [2ms, 2ms, 8ms] will cause 8ms to be seen as an outlier.
* - [2ms, 3ms, 10ms] will cause 10ms to be seen as an outlier.
*
* Ideally:
* - [1ms, 1ms, 4ms] will view 4ms as an outlier.
*
* So we can safely handle slightly longer average intervals and slightly shorter
* OS context-switch time intervals.
*/
if(this.largestIntervals.length > 2) {
// Precondition: the `largestIntervals` array is sorted in ascending order.
// Shortest entry is at the head, longest at the tail.
if(this.largestIntervals[2] >= 2 * (this.largestIntervals[0] + this.largestIntervals[1])) {
this.executionTime -= this.largestIntervals[2];
this.largestIntervals.pop();
}
}
}

shouldTimeout(): boolean {
const now = Date.now();
if(this.start - now > this.maxTrueTime) {
return true;
}

return this.executionTime > this.maxExecutionTime;
}

resetOutlierCheck() {
this.largestIntervals = [];
}
}

class BatchingAssistant {
currentCost = Number.MIN_SAFE_INTEGER;
entries: SearchResult[] = [];
Expand Down Expand Up @@ -582,17 +710,22 @@ namespace correction {

let batcher = new BatchingAssistant();

const timer = new ExecutionTimer(maxTime*3, maxTime);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why maxTime*3 vs maxTime? Is that likely to be a constant multiplier? If so, we should build that into the class I should think.

Copy link
Contributor Author

@jahorton jahorton Aug 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... because I needed to set the max true-system-time threshold to something larger than the max execution time, and something based on the max execution time seemed to make the most sense. I just picked '3' for the coefficient because there'd be nearly no chance of that being too short for the execution threshold to be exceeded.

Could probably be maxTime*2 pretty safely, and I could entertain arguments for other values. Maybe even 1.5; we don't expect heavy "thrashing" behavior from constant OS context switches.

The key interpretation: we're willing to wait up to 0.1 sec for predictions if execution gets delayed due to context switches, but for the common case we should only need to wait 0.033 sec (when no context switches occur). That 0.033 sec value has long been set via constant.

Using the * [coefficient] approach will dynamically adjust that maximum wait up if we're given more execution time, but for the common case, there should be no observable effect.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. The numbers are fairly off-the-cuff, but we need to start somewhere. Measuring and testing on real low-end Android devices may be helpful.


// Stage 1 - if we already have extracted results, build a queue just for them and iterate over it first.
let returnedValues = Object.values(this.returnedValues);
if(returnedValues.length > 0) {
let preprocessedQueue = new models.PriorityQueue<SearchNode>(QUEUE_NODE_COMPARATOR, returnedValues);

// Build batches of same-cost entries.
timer.startLoop();
while(preprocessedQueue.count > 0) {
let entry = preprocessedQueue.dequeue();
let batch = batcher.checkAndAdd(entry);
timer.markIteration();

if(batch) {
// Do not track yielded time.
yield batch;
}
}
Expand All @@ -601,22 +734,24 @@ namespace correction {
// finalize the last preprocessed group without issue.
let batch = batcher.tryFinalize();
if(batch) {
// Do not track yielded time.
yield batch;
}
}

// Stage 2: the fun part; actually searching!
timer.resetOutlierCheck();
timer.startLoop();
let timedOut = false;
do {
let newResult: PathResult;

// Search for a 'complete' path, skipping all partial paths as long as time remains.
do {
newResult = this.handleNextNode();
timer.markIteration();

// (Naive) timeout check!
let now = Date.now();
if(now - timeStart > maxTime) {
if(timer.shouldTimeout()) {
timedOut = true;
}
} while(!timedOut && newResult.type == 'intermediate')
Expand Down