Commit 58d27ae
authored
gRPC: Allow retries of up to MAX_MSG_SIZE (#347)
## Problem
gRPC has a built-in retry mechanism[1] which we configure to
automatically retry on status UNAVAILABLE messages from Pinecone.
However, it has been observed that VectorService/Upsert method is _not_
being retried automatically and causes an exception to be thrown to the
application:
Traceback (most recent call last):
File ".venv/lib/python3.11/site-packages/pinecone/grpc/base.py", line
150, in wrapped
return func(
^^^^^
File ".venv/lib64/python3.11/site-packages/grpc/_channel.py", line 1181,
in __call__
return _end_unary_response_blocking(state, call, False, None)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".venv/lib64/python3.11/site-packages/grpc/_channel.py", line 1006,
in _end_unary_response_blocking
raise _InactiveRpcError(state) # pytype: disable=not-instantiable
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that
terminated with:
status = StatusCode.UNAVAILABLE
details = "unavailable"
debug_error_string = "UNKNOWN:Error received from peer
ipv4:34.223.120.220:443
{created_time:"2024-05-10T11:54:43.047741403+00:00", grpc_status:14,
grpc_message:"unavailable"}"
Enabling gRPC's tracing[2] by setting env vars 'GRPC_VERBOSITY=debug
GRPC_TRACE=all' (warning - this is _very_ verbose!) highlighted that
when we do get an StatusCode.UNAVAILABLE, retry is not considered as the
request is too large ("committing" in this context means it effectively
disables retry attempts):
0514 14:00:43.870499051 4093173 retry_filter_legacy_call_data.cc:1855]
chand=0x7ff708006080 calld=0x56377b0b11e0: exceeded retry buffer size,
committing
As per gRPC's options[3], the max buffer size is controlled via:
/** Per-RPC retry buffer size, in bytes. Default is 256 KiB. */
#define GRPC_ARG_PER_RPC_RETRY_BUFFER_SIZE
"grpc.per_rpc_retry_buffer_size"
Given Upsert messages are frequently larger than 256KiB (it is common to
batch up to the 2 MB limit), we will fail to retry any batches larger
than 256kB.
## Solution
Address this by changing the retry buffer size to the same size as the
maximum message we support (currently 128MB, more than sufficient to
retry any UpsertRequest).
[1]: https://grpc.io/docs/guides/retry/
[2]:
https://github.com/grpc/grpc/blob/master/doc/environment_variables.md
[3]:
https://github.com/grpc/grpc/blob/befeeba0f57c6ed3608935d8317fd26289e7e080/include/grpc/impl/channel_arg_names.h#L321
## Type of Change
- [x] Bug fix (non-breaking change which fixes an issue)
## Test Plan
No existing test infra to automate testing of this (no way to do error
injection); manually verified that previously seen (intermittent)
UNAVAILABLE responses are correctly retried.1 parent a4a136a commit 58d27ae
1 file changed
+1
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
98 | 98 | | |
99 | 99 | | |
100 | 100 | | |
| 101 | + | |
101 | 102 | | |
102 | 103 | | |
103 | 104 | | |
| |||
0 commit comments