Skip to content

Conversation

@YoruStar
Copy link

@YoruStar YoruStar commented Dec 2, 2025

…l handling

@YoruStar
Copy link
Author

YoruStar commented Dec 2, 2025

When I run the ksh test case, I occasionally encounter the error

sigchld.sh[77]: FAIL: SIGCHLD trap queueing failed -- expected 'running=0 maxrunning=4', got 'running=1 maxrunning=4'

as described in #344 (comment).

I tried to simplify the reproduction process

#!/bin/ksh

integer count=1
while true
do
    jobmax=4
    got=$(ksh -c '
        JOBMAX='$jobmax' JOBCOUNT=$(('$jobmax'*2))
        integer running=0 maxrunning=0
        trap "((running--))" CHLD
        for ((i=0; i<JOBCOUNT; i++))
        do	sleep 1 &
            if	((++running > maxrunning))
            then	((maxrunning=running))
            fi
        done
        wait
        sleep 1
        print running=$running maxrunning=$maxrunning
    ')
    exp='running=0 maxrunning='$jobmax
    echo "when $count, expected '$exp', got '$got'"
    ((count++))
    [[ $got == $exp ]] || exit 1
done

Without adding sleep, the script encounters the same error after looping a few hundred times. With sleep added, I have run it tens of thousands of times in my local environment without seeing the error.

The completion of wait does not necessarily mean that the trap has finished handling the SIGCHLD signal. Therefore, this error is not a design flaw in ksh itself, but rather an issue with the test case design. Adding an appropriate delay can prevent the test case from failing due to this error.

@McDutchie
Copy link

Thanks YoruStar, that's really helpful, that intermittent failure is a long-term annoyance.

Can we test whether a shorter delay, like sleep .1 (one-tenth of a second), does the trick?

I want to avoid slowing down the regression tests as much as possible. The slower they get, the less likely I am to run them regularly.

@YoruStar
Copy link
Author

YoruStar commented Dec 4, 2025

The delay logic was modified to wait 0.1 seconds each time and check whether running is 0, with a maximum total delay of 1 second.
At least in my local environment, a 1‑second delay is sufficient to ensure that this test case passes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants