Skip to content

Conversation

@afalabel
Copy link

@afalabel afalabel commented Jun 13, 2025

BEGINRELEASENOTES
*Resources
FIX: Condor command line call compatible with v24
ENDRELEASENOTES

@afalabel afalabel requested review from atsareg and fstagni as code owners June 13, 2025 11:51
@fstagni fstagni requested a review from aldbr June 13, 2025 12:10
Copy link
Contributor

@aldbr aldbr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments:

  • https://github.com/DIRACGrid/DIRAC/actions/runs/15633872299/job/44045237809?pr=8225 is failing because the commit message should have the following form: "<type>(<scope>): <subject>". So something like: fix(resources): command line call compatible with v24 would work.

  • https://github.com/DIRACGrid/DIRAC/actions/runs/15633872302/job/44045237775?pr=8225 is failing because you did not install pre-commit locally. In your mamba environment, if you just run pre-commit install, then the files you change should be automatically formatted correctly and the CI pipeline should pass.

  • You should probably target the v8r0 branch: once merged, the fix will be automatically applied to integration (v9).

  • Is getCEStatus() really working?
    I have the feeling that there is no I or R to parse in the output.
    Shouldn't we just try to parse the Total for query line?

Please let me know if I can help! 🙂

@afalabel afalabel changed the title condor commandline call compatible with v24 fix (resources): condor commandline call compatible with v24 Jun 13, 2025
@afalabel afalabel force-pushed the condor_v24_compatibility branch from 9523177 to 1d94303 Compare June 13, 2025 16:01
@aldbr aldbr changed the base branch from integration to rel-v8r0 June 13, 2025 16:07
@aldbr aldbr changed the title fix (resources): condor commandline call compatible with v24 [8.0] fix (resources): condor commandline call compatible with v24 Jun 13, 2025
@fstagni
Copy link
Contributor

fstagni commented Jun 17, 2025

Can the -json option be used here? https://htcondor.readthedocs.io/en/latest/man-pages/condor_q.html#output

It looks like in theory it would be less error-prone than parsing the output.

@afalabel
Copy link
Author

Yes I have the code ready for that, I am waiting from @aldbr to test verify it and I'll push it.

@aldbr aldbr force-pushed the condor_v24_compatibility branch from 3f491bd to 69f1e7e Compare June 17, 2025 14:37
@aldbr aldbr changed the base branch from rel-v8r0 to integration June 17, 2025 14:37
@DIRACGridBot DIRACGridBot added the alsoTargeting:integration Cherry pick this PR to integration after merge label Jun 17, 2025
@aldbr aldbr force-pushed the condor_v24_compatibility branch from 69f1e7e to c5ca107 Compare June 17, 2025 14:38
@aldbr
Copy link
Contributor

aldbr commented Jun 17, 2025

Thanks for your work @afalabel
I've adjusted a few things based on what you have done:

  • get both the ClusterId and the ProcId to retrieve the job ID (<ClusterId>.<ProcId>)
  • get HoldReason/Code/SubCode: they only appear in the json if the JobStatus == 5, this is why you did not see the fields (I manually held one of the job to check).
  • increment waiting jobs if we find both JobStatus == 5 and HoldReasonCode == 16 (held jobs are not all waiting jobs)

And then because we changed parseCondorStatus (now named getCondorStatus), I needed to adapt HTCondorComputingElement too.
Because of that I changed the target to v9, just in case I am breaking something.

Now let me see if I can add a unit test or adapt the existing ones, and test the code with a few instances, just to make sure it works correctly.

Note: I also check if I could get the CE status for HTCondorComputingElement too, but this is still not possible (condor_q -totals returns the jobs of everyone when we use remote schedds, we would need to come with an owner name I think).

@aldbr aldbr force-pushed the condor_v24_compatibility branch 4 times, most recently from e10391c to 6089b2c Compare June 20, 2025 14:52
@aldbr aldbr force-pushed the condor_v24_compatibility branch from 6089b2c to ee50ef6 Compare June 20, 2025 15:37
@aldbr aldbr marked this pull request as draft June 23, 2025 08:10
@aldbr aldbr force-pushed the condor_v24_compatibility branch from ee50ef6 to a8c093a Compare June 24, 2025 14:19
@aldbr
Copy link
Contributor

aldbr commented Jun 24, 2025

I've tried to use the -spool option and conda_transfer_data command to avoid getting all the pilot outputs in the BatchOutput directory but it does not work as I expect (I don't manage to get any output at all).

As we have to clean up the files (not only the outputs but also the pilot wrappers) from time to time anyway, I think I will just add a cleanup method in the SSHComputingElement.
Probably a good opportunity to finish #7703

@aldbr aldbr marked this pull request as ready for review June 24, 2025 14:26
Copy link
Contributor

@andresailer andresailer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments.
Looks good to me otherwise

@aldbr
Copy link
Contributor

aldbr commented Jun 30, 2025

Thanks @andresailer, I've applied your suggestions.
I also added a commit to prevent pilots from being rescheduled (when there is no constraint from the admin side).
We tested it locally, it seems to work.

@fstagni fstagni merged commit b1686ff into DIRACGrid:integration Jun 30, 2025
44 of 48 checks passed
@DIRACGridBot DIRACGridBot added sweep:done All sweeping actions have been done for this PR sweep:failed Sweeping failed and needs manual intervention labels Jun 30, 2025
@DIRACGridBot
Copy link

Sweep summary

Sweep ran in https://github.com/DIRACGrid/DIRAC/actions/runs/15977484464

Failed:

  • integration
    cherry-pick b1686ff into integration failed
    check merge conflicts on a local copy of this repository
    git fetch upstream
    git checkout upstream/integration -b cherry-pick-2-b1686ff54-integration
    git cherry-pick -x -m 1 b1686ff54
    # Fix the conflicts
    git cherry-pick --continue
    git commit --amend -m 'sweep: #8225 fix (resources): condor commandline call compatible with v24' --author=''
    git push -u origin cherry-pick-2-b1686ff54-integration
    
    # If you have the GitHub CLI installed the PR can be made with
    gh pr create \
         --label 'sweep:from integration' \
         --base integration \
         --repo DIRACGrid/DIRAC \
         --title '[sweep:integration] fix (resources): condor commandline call compatible with v24' \
         --body 'Sweep #8225 `fix (resources): condor commandline call compatible with v24` to `integration`.
    
    Adding original author @afalabel as watcher.
    
    BEGINRELEASENOTES
    *Resources
    FIX: Condor command line call compatible with v24
    ENDRELEASENOTES
    Closes #8238'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

alsoTargeting:integration Cherry pick this PR to integration after merge sweep:done All sweeping actions have been done for this PR sweep:failed Sweeping failed and needs manual intervention

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants