-
Notifications
You must be signed in to change notification settings - Fork 845
Description
re branch - https://github.com/apache/trafficserver/tree/cb7eda60dc8c2068afd458fd20a8dd6232887f0d
I notice that the nexthop failure count upon a network timeout/event
never increments beyond 2.
With a failure threshold of 10, requests that land on this 'nexthop'
will always have to incur
a 2s timeout before moving onto the nexthop in the list. as the
threshold is never reached per nexthop's failure tracking.
Failure counts using parent_select works as expected, Thus the failing
parent is taken out of rotation per the parent retry timer/window(300)
Test case and other testing parameters found in the below comments.
See the below for a debug output comparison between next_hop and parent_select.
One increments failure count(parent_select) and one does not(next_hop).
next_hop
[Apr 23 01:49:27.883] [ET_NET 10] DEBUG: <NextHopHealthStatus.cc:139
(markNextHop)> (next_hop) [75] Parent fail count increased to 2 for
192.168.72.208
[Apr 23 01:49:30.308] [ET_NET 11] DEBUG: <NextHopHealthStatus.cc:139
(markNextHop)> (next_hop) [76] Parent fail count increased to 2 for
192.168.72.208
[Apr 23 01:49:32.366] [ET_NET 11] DEBUG: <NextHopHealthStatus.cc:139
(markNextHop)> (next_hop) [76] Parent fail count increased to 2 for
192.168.72.208
[Apr 23 01:49:34.481] [ET_NET 12] DEBUG: <NextHopHealthStatus.cc:139
(markNextHop)> (next_hop) [77] Parent fail count increased to 2 for
192.168.72.208
[Apr 23 01:49:37.474] [ET_NET 12] DEBUG: <NextHopHealthStatus.cc:139
(markNextHop)> (next_hop) [77] Parent fail count increased to 2 for
192.168.72.208
[Apr 23 01:49:39.596] [ET_NET 13] DEBUG: <NextHopHealthStatus.cc:139
(markNextHop)> (next_hop) [78] Parent fail count increased to 2 for
192.168.72.208
[Apr 23 01:49:41.599] [ET_NET 13] DEBUG: <NextHopHealthStatus.cc:139
(markNextHop)> (next_hop) [78] Parent fail count increased to 2 for
192.168.72.208
[Apr 23 01:49:44.439] [ET_NET 14] DEBUG: <NextHopHealthStatus.cc:139
(markNextHop)> (next_hop) [79] Parent fail count increased to 2 for
192.168.72.208
[Apr 23 01:49:47.434] [ET_NET 14] DEBUG: <NextHopHealthStatus.cc:139
(markNextHop)> (next_hop) [79] Parent fail count increased to 2 for
192.168.72.208
[Apr 23 01:49:49.633] [ET_NET 15] DEBUG: <NextHopHealthStatus.cc:139
(markNextHop)> (next_hop) [80] Parent fail count increased to 2 for
192.168.72.208
[Apr 23 01:49:52.639] [ET_NET 15] DEBUG: <NextHopHealthStatus.cc:139
(markNextHop)> (next_hop) [80] Parent fail count increased to 2 for
192.168.72.208
[Apr 23 01:49:55.474] [ET_NET 16] DEBUG: <NextHopHealthStatus.cc:139
(markNextHop)> (next_hop) [81] Parent fail count increased to 2 for
192.168.72.208
[Apr 23 01:49:58.468] [ET_NET 16] DEBUG: <NextHopHealthStatus.cc:139
(markNextHop)> (next_hop) [81] Parent fail count increased to 2 for
192.168.72.208
parent_select
[Apr 23 02:11:12.507] [ET_NET 15] DEBUG:
<ParentSelectionStrategy.cc:86 (markParentDown)> (parent_select)
Parent fail count increased to 2 for 192.168.72.208:80
[Apr 23 02:11:14.595] [ET_NET 16] DEBUG:
<ParentSelectionStrategy.cc:86 (markParentDown)> (parent_select)
Parent fail count increased to 3 for 192.168.72.208:80
[Apr 23 02:11:17.589] [ET_NET 17] DEBUG:
<ParentSelectionStrategy.cc:86 (markParentDown)> (parent_select)
Parent fail count increased to 4 for 192.168.72.208:80
[Apr 23 02:11:20.435] [ET_NET 18] DEBUG:
<ParentSelectionStrategy.cc:86 (markParentDown)> (parent_select)
Parent fail count increased to 5 for 192.168.72.208:80
[Apr 23 02:11:22.602] [ET_NET 19] DEBUG:
<ParentSelectionStrategy.cc:86 (markParentDown)> (parent_select)
Parent fail count increased to 6 for 192.168.72.208:80
[Apr 23 02:11:25.587] [ET_NET 0] DEBUG: <ParentSelectionStrategy.cc:86
(markParentDown)> (parent_select) Parent fail count increased to 7 for
192.168.72.208:80
[Apr 23 02:11:28.353] [ET_NET 1] DEBUG: <ParentSelectionStrategy.cc:86
(markParentDown)> (parent_select) Parent fail count increased to 8 for
192.168.72.208:80
[Apr 23 02:11:30.795] [ET_NET 2] DEBUG: <ParentSelectionStrategy.cc:86
(markParentDown)> (parent_select) Parent fail count increased to 9 for
192.168.72.208:80
[Apr 23 02:11:33.758] [ET_NET 3] DEBUG: <ParentSelectionStrategy.cc:86
(markParentDown)> (parent_select) Parent fail count increased to 10
for 192.168.72.208:80