-
Notifications
You must be signed in to change notification settings - Fork 845
Description
Recently in our CDN project, we found an issue when client do a continuous access from Nginx to ATS(both 5.3.2 and 6.x), for example like
'for i in {1..10}; do curl -v -H"host:test.nginx.com" "http://test.nginx.com" -x127.1:80 -o /dev/null; done'
"127.1:80" is Nginx interface which will be proxied to ATS, and then be proxied to final origin server. While origin server reply with different response and one of them is empty(content-length=0), then after this empty response replace the previous non-empty cache in ATS, client get stuck until download timeout.
We try to simulate the problem with the following simple website
from BaseHTTPServer import BaseHTTPRequestHandler
import random
a = 1
class TodoHandler(BaseHTTPRequestHandler):
protocol_version = 'HTTP/1.1'
def do_GET(self):
global a
#r = random.randint(0,1)
if a % 2 == 0:
r = 0
else:
r = 1
self.send_response(200)
##r = 0
if r == 0:
uu = "Not Found"
self.send_header("Connection", "keep-alive")
self.send_header("Content-type", "text/html")
self.send_header("Content-Length", str(len(uu)))
self.send_header("Cache-Control", "max-age=10")
self.send_header("Expires", "Thu, 01 Jan 1970 00:00:00 GMT")
self.send_header("Accept-Ranges", "bytes")
self.end_headers()
print(a, ": length", str(len(uu), r)
self.wfile.write(uu)
else:
#self.send_header("Content-type", "video/mp4")
self.send_header("Connection", "keep-alive")
self.send_header("Cache-Control", "max-age=10")
self.send_header("Content-Length", 0)
self.send_header("Accept-Ranges", "bytes")
self.send_header("Connection", "keep-alive")
self.end_headers()
print(a, ": length 0", r)
a += 1
if name == 'main':
# Start a simple server, and loop forever
from BaseHTTPServer import HTTPServer
server = HTTPServer(('0.0.0.0', 8081), TodoHandler)
print("Starting server, use to stop")
server.serve_forever()
After checking code and log, we found this zero length response will lead to a resident alternate rewritten
int CacheVC::updateVector(int /* event ATS_UNUSED /, Event * / e ATS_UNUSED */)
if (small_doc && have_res_alt && (fragment || (f.update && !total_len))) {
...
f.rewrite_resident_alt = 1;
write_len = doc->data_len();
Debug("cache_update_alt", "rewriting resident alt size: %d key: %X, first_key: %X", write_len, doc->key.slice32(0),
first_key.slice32(0));
}
but only replace response header in CacheHTTPInfoVector and still keep the previous content which is not as desired.
static int agg_copy(char *p, CacheVC *vc)
#ifdef HTTP_CACHE
if (vc->f.rewrite_resident_alt) {
ink_assert(vc->f.use_first_key);
Doc *res_doc = (Doc *)vc->first_buf->data();
res_alt_blk = new_IOBufferBlock(vc->first_buf, res_doc->data_len(), sizeofDoc + res_doc->hlen);
doc->key = res_doc->key;
doc->total_len = res_doc->data_len();
}
#endif
#ifdef HTTP_CACHE
if (vc->f.rewrite_resident_alt)
iobufferblock_memcpy(doc->data(), vc->write_len, res_alt_blk, 0);
else
#endif
As a result when found content-length(0) in header doesn't match the object length in cache , ATS make it a chunked reponse with previous content.
void HttpTransact::handle_content_length_header(State *s, HTTPHdr *header, HTTPHdr *base)
if ((int64_t)s->cache_info.object_read->object_size_get() == cl) {
s->hdr_info.trust_response_cl = true;
} else {
DebugTxn("http_trans", "Content Length header and cache object size mismatch."
"Disabling keep-alive");
s->hdr_info.trust_response_cl = false;
}
Then this transform stuck at the user agent [VC_EVENT_WRITE_READY] until timeout finally, and couldn't serve next request.
Jun 22 14:08:09.805] Server {0x7f8895a28800} DEBUG: (http_chunk) creating a chunk of size 9 bytes
[Jun 22 14:08:09.805] Server {0x7f8895a28800} DEBUG: (http_redirect) [HttpTunnel::producer_handler] enable_redirection: [1 0 0] event: 102
[Jun 22 14:08:09.805] Server {0x7f8895a28800} DEBUG: (http) [3] [&HttpSM::tunnel_handler_cache_read, VC_EVENT_READ_COMPLETE]
[Jun 22 14:08:09.805] Server {0x7f8895a28800} DEBUG: (cache_close) do_io_close 0x7f887f9c0000 -1 1
[Jun 22 14:08:09.805] Server {0x7f8895a28800} DEBUG: (cache_free) free 0x7f887f9c0000
[Jun 22 14:08:09.805] Server {0x7f8895a28800} DEBUG: (http_tunnel) [3] consumer_handler [user agent VC_EVENT_WRITE_READY]
Now we simply modify agg_copy
from
if (vc->f.update && vc->total_len) {
CacheHTTPInfo *http_info = vc->write_vector->get(vc->alternate_index);
http_info->object_size_set(vc->total_len);
}
to
if (vc->f.update) {
CacheHTTPInfo *http_info = vc->write_vector->get(vc->alternate_index);
http_info->object_size_set(vc->total_len);
}
which keep object size as received one to make it work but don't know if already patched and why this stuck happen.
pls refer to attached log for more info.