Skip to content

Conversation

@jahorton
Copy link
Contributor

@jahorton jahorton commented Aug 4, 2022

Cleans up some code that's proving problematic for some of our unit tests; there's some cross-UI-module pollution happening on occasion that turns into a race condition of sorts.

@keymanapp-test-bot skip

@jahorton jahorton added this to the A16S7 milestone Aug 4, 2022
@keymanapp-test-bot keymanapp-test-bot bot added the user-test-missing User tests have not yet been defined for the PR label Aug 4, 2022
@keymanapp-test-bot
Copy link

keymanapp-test-bot bot commented Aug 4, 2022

@keymanapp-test-bot keymanapp-test-bot bot removed the user-test-missing User tests have not yet been defined for the PR label Aug 4, 2022
@jahorton jahorton marked this pull request as ready for review August 4, 2022 08:25
Copy link
Member

@mcdurdin mcdurdin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any real problems, but I am not sure why the loaduserinterface callback was removed?

{
window.setTimeout(ui.Initialize,250); return;
if(!keymanweb['initialized']) {
ui.initTimer = window.setTimeout(ui.Initialize,250);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this has a timeout of 250 whereas the other init has a timeout of 50. I can't see any good reason to keep this at 250. I reckon 50 is better -- polling every 50msec until init won't be very expensive.

Copy link
Contributor Author

@jahorton jahorton Aug 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just re-read this; it looks like you're thinking of setInterval, rather than setTimeout. setInterval does polling; setTimeout is a one-time check.

I could go the setInterval route, but then I'd need to ignore your comments below about where to place clearTimeout - we should only clear it once we're confirmed to be initializing.

All of that said, I think we can probably just remove the timeout stuff altogether and be done with it; the Toggle and Toolbar UI modules don't set one and seem to load just as well. I just... was going for the less drastic change here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imagine this is called twice before keymanweb.initialized. This would setup two timeouts -- we'd lose the id for the first timeout. (e.g. on the third call, keymanweb.initialized==true. First timeout would not be cancelled, so we'd get a second init call. It may not matter, but it isn't tidy.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so that's a gap in the code as it sits, but that doesn't exactly answer the "polling" / setInterval vs setTimeout question. Unless you were thinking of the 'second timeout' as something akin to a 'polling' call - but that feels like a bit much of an inference to make without clearer confirmation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking of setTimeout in the circumstance where this init function is called multiple times. Not setInterval. If you do this, you'll get two calls to bar():

  function bar() {
    console.log('bar');
  }
  function foo() {
    window.setTimeout(bar, 100);
  }
  foo();
  foo();

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

polling every 50msec until init won't be very expensive.

Consider that this currently sets up a timeout every 250msec. So, it's effectively 'polling'.

"} ";

// Initialize after KMW is fully initialized, if UI already loaded
keymanweb['addEventListener']('loaduserinterface',ui.Initialize);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for removing this?

Copy link
Contributor Author

@jahorton jahorton Aug 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd tried a project-wide search for the event and turned up nothing. Probably shouldn't have tried the search with quotes, though - the related events are defined at the bottom of kmwuimanager.ts. Interestingly, it doesn't look like eliminating 'em caused any issues - and the functions that trigger them are even marked with an interesting tidbit:

/**
* Function doLoad
* Scope Private
* @return {boolean}
* Description Execute UI initialization code after loading the UI
* // Appears to be unused; could be eliminated? Though, doUnload IS used.
*/
doLoad() {
var p={};
return this.keyman.util.callEvent('kmw.loaduserinterface',p);
}

Note line 49 there. That comment does seem to ring true - a project-wide search for doLoad( turns up... nothing else aside from a function in Developer's Pascal code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, if that's the case, can we remove doLoad() as well so we don't have this vestigial code hanging around?

@jahorton
Copy link
Contributor Author

jahorton commented Aug 4, 2022

... and the child PR (#7037) had a test failure that I thought this one would've fixed. Looks like there may be another aspect or two to investigate.

@jahorton jahorton marked this pull request as draft August 4, 2022 08:32
@jahorton
Copy link
Contributor Author

jahorton commented Aug 5, 2022

With the latest commit, I've re-run the test suite multiple times and no longer see the test failure that sent this back to draft. 4 successes on one build agent, 2 on another.

I'm trying to verify this for the build agent (pp-602) that actually produced that error as well, but it's having trouble actually getting the full test suite to run at this time.

@mcdurdin mcdurdin modified the milestones: A16S7, A16S8 Aug 7, 2022
@jahorton
Copy link
Contributor Author

jahorton commented Aug 8, 2022

... Huh. Build agent pp-602 is almost-consistently failing to support a BrowserStack connection to a Safari test environment target. The other build agents seem fine; it's just pp-602 that's having this problem. It just... won't "capture" in time, according to the logs. And mostly for that one test-environment at that.

@jahorton jahorton marked this pull request as ready for review August 8, 2022 02:51
@mcdurdin
Copy link
Member

mcdurdin commented Aug 8, 2022

... Huh. Build agent pp-602 is almost-consistently failing to support a BrowserStack connection to a Safari test environment target. The other build agents seem fine; it's just pp-602 that's having this problem. It just... won't "capture" in time, according to the logs. And mostly for that one test-environment at that.

I've just checked the browserstack-local executable version on both Windows agents and they appear to be identical. That said, the version available for download from browserstack.com differs (it's a different file size even though the file version is the same?!)

I can back up the existing executables on the agents and replace with the new version from the downloads site. Don't know if that will make a difference or not, of course; suspect not.

UPDATE: we have version 8.6 (metadata on the executable is wrong -- reported to BrowserStack). The latest version is 8.7. Given that, I will update the version on the build agents once my test builds finish running.


Here's what I am seeing for the web test build history. I am not seeing the same correlation that you report. It's possible that there are odd real failing tests in this log but most of these builds have no web changes.

image

And the same log for LM layer. Again, I don't see a real correlation between agent and failing build.

image

@mcdurdin
Copy link
Member

mcdurdin commented Aug 8, 2022

I'm going to re-run the build for the latest commit on this PR on all agents and see what emerges. Results:

LMLayer:

Copy link
Member

@mcdurdin mcdurdin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking pretty good. Just a bit more polish on the timer cleanup needed.

Curious that the toolbar ui doesn't have the same initialization pattern.

Looking at the UI code, I think it needs some serious TLC one day!

Comment on lines 293 to 297
// While the `ui.init` check above will prevent the timeout's call from
// re-evaluating _this instance_'s method, it can have nasty cross-effects
// in unit testing if not cleared, calling this modules' `Initialize`
// _on a different module_!
window.clearTimeout(ui.initTimer);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be at the top of the function and we should not call clearTimeout if ui.initTimer === null:

  if(ui.initTimer !== null) {
    window.clearTimeout(ui.initTimer);
    ui.initTimer = null;
  }

"} ";

// Initialize after KMW is fully initialized, if UI already loaded
keymanweb['addEventListener']('loaduserinterface',ui.Initialize);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, if that's the case, can we remove doLoad() as well so we don't have this vestigial code hanging around?

}

if(ui.initialized || util['isTouchDevice']()) return;
window.clearTimeout(ui.initTimer);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be at the top of the function, like this:

  if(ui.initTimer !== null) {
    window.clearTimeout(ui.initTimer);
    ui.initTimer = null;
  }

@mcdurdin
Copy link
Member

mcdurdin commented Aug 8, 2022

UPDATE: we have version 8.6 (metadata on the executable is wrong -- reported to BrowserStack). The latest version is 8.7. Given that, I will update the version on the build agents once my test builds finish running.

Version 8.7 made no difference. So ... pp_602 has a slightly different performance profile than s1_601, so perhaps there's still another race somewhere?

@jahorton
Copy link
Contributor Author

jahorton commented Aug 9, 2022

Here's what I am seeing for the web test build history. I am not seeing the same correlation that you report. It's possible that there are odd real failing tests in this log but most of these builds have no web changes.

[omitted image]

And the same log for LM layer. Again, I don't see a real correlation between agent and failing build.

[omitted image]

Unfortunately, those images are irrelevant. Please remember that this PR is making other changes that fix issues underlying what you've observed here.

In my test runs so far, this PR fully stabilizes KMW test builds for all agents but pp-602 - when you include its changes. The other branches in the images above don't have these "other changes", so of course there are still problems on other agents there - this PR's fixes haven't landed yet!

I'm going to re-run the build for the latest commit on this PR on all agents and see what emerges. Results:

LMLayer:

That looks like the "same correlation" to me. Note the lower "tests passed" number (and in the case of the first set, the lower 'ignored' number) and the fact that no tests were marked as failing. Those are the hallmark symptoms of that BrowserStack "failure to capture" problem.

@jahorton
Copy link
Contributor Author

jahorton commented Aug 9, 2022

UPDATE: we have version 8.6 (metadata on the executable is wrong -- reported to BrowserStack). The latest version is 8.7. Given that, I will update the version on the build agents once my test builds finish running.

Version 8.7 made no difference. So ... pp_602 has a slightly different performance profile than s1_601, so perhaps there's still another race somewhere?

Even if "we" do, it's wholly external to our code. Our unit tests aren't even "spun up" at that level.

@mcdurdin
Copy link
Member

mcdurdin commented Aug 9, 2022

Even if "we" do, it's wholly external to our code. Our unit tests aren't even "spun up" at that level.

Righto. Where we are at now, we still have unstable tests. That's not... great. Any suggestions on where to go now? pp-602 and s1-601 are essentially identical VMs in the same datacenter. So it's hard to see what could be causing this. I mean, I can temporarily disable KeymanWeb and Common-LMLayer building on pp-602 but that's going to slow down builds and obviously isn't really a 'solution' so much as a bandaid.

Copy link
Member

@mcdurdin mcdurdin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just being tidy now

Co-authored-by: Marc Durdin <marc@durdin.net>
@jahorton jahorton merged commit be77d55 into master Aug 16, 2022
@jahorton jahorton deleted the fix/web/ui-module-test-stability branch August 16, 2022 02:56
@keyman-server
Copy link
Collaborator

Changes in this pull request will be available for download in Keyman version 16.0.48-alpha

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants