Skip to content

Make ensureConfiguration give up if cache is not synced#2

Open
MikeSpreitzer wants to merge 1 commit intostevekuznetsov:skuznets/generic-storagefrom
MikeSpreitzer:add-apf-sync
Open

Make ensureConfiguration give up if cache is not synced#2
MikeSpreitzer wants to merge 1 commit intostevekuznetsov:skuznets/generic-storagefrom
MikeSpreitzer:add-apf-sync

Conversation

@MikeSpreitzer
Copy link

What this PR does / why we need it:
When WATCH notification latency is high, this change replaces rapid
repeates of the create with going around the outer loop (which has a
delay in it).

Which issue this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged): fixes #

Special notes for your reviewer:

Release note:

NONE

When WATCH notification latency is high, this change replaces rapid
repeates of the create with going around the outer loop (which has a
delay in it).
@MikeSpreitzer
Copy link
Author

Tested with -v=3 on the kube-apiserver. Looks good. Saw

I0517 01:16:15.005924      11 strategy.go:239] "Successfully created PriorityLevelConfiguration" type="suggested" name="system"

...

E0517 01:16:15.735774      11 crdb.go:154] "Exec" err="ERROR: duplicate key value violates unique constraint \"primary\" (SQLSTATE 23505)" sql="INSERT INTO k8s (key, value) VALUES ($1, $2);" args=[/registry/prioritylevelconfigurations/system 6b3873000a420a24666c6f77636f6e74726f6c2e6170697365727665722e6b38732e696f2f76316265746131121a5072696f726974794c6576656c436f6e6669 (truncated 559 bytes)] PGX_LOG_LEVEL="error"

I0517 01:16:15.759861      11 httplog.go:131] "HTTP" verb="POST" URI="/apis/flowcontrol.apiserver.k8s.io/v1beta2/prioritylevelconfigurations?fieldManager=api-priority-and-fairness-config-producer-v1" latency="28.665104ms" userAgent="kube-apiserver/v1.24.0 (linux/amd64) kubernetes/37fdf6a" audit-ID="51f6c29b-5bd6-4a7d-b5d9-c8e33cee6371" srcIP="127.0.0.1:41638" apf_pl="exempt" apf_fs="exempt" apf_execution_time="28.346844ms" resp=409

E0517 01:16:15.760633      11 storage_flowcontrol.go:167] "APF bootstrap ensurer ran into error, will retry later" err="failed ensuring suggested settings - aborting ensure() due to lack of cache sync for PriorityLevelConfiguration type=suggested name=\"system\""

... and no repeats of the sql error in the following 5 minutes.

@MikeSpreitzer
Copy link
Author

See also kubernetes#107696 and kubernetes#107704 .

@tkashem

This is probably the first of many code improvements to come from lengthening the WATCH notification latency. IMHO, correct code is not sensitive to that latency --- but I bet that we have some incorrect code. Good code will not be very noisy when that latency gets long. I am sure we have more code that is not yet up to that standard of goodness.

stevekuznetsov pushed a commit that referenced this pull request Nov 6, 2023
Signed-off-by: Nilekh Chaudhari <1626598+nilekhc@users.noreply.github.com>

fix (#2)

Signed-off-by: Monis Khan <mok@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant