Error while getting statistics logfile index from webserver - only on one monitored host

22 August 2017, 17:52
Every three days, most often at around 1800 localtime (with a few exceptions), Monitor fails to retrieve the logfile index from the same one of eight Hiawatha hosts. I've grepped through all the webserver logs on both the Hiawatha target (v1.6) and the Monitor (v1.5) server, but can't find anything that correlates to those timestamps, and there aren't any system cronjobs running at those times. It's a WordPress site, so there are wp-cron.php jobs running, but they don't consistently line up with the Monitor error timestamps.

It just started happening a few weeks ago, but it took me awhile to recognize a pattern, so I can't isolate what changed in the environment. As far as I can tell, there's no impact to performance, but it's a reoccurring annoyance that's bugging me. Can you give me a nudge in the right direction?
Hugo Leisink
25 August 2017, 16:02
If it happens around the same time, try a manual connection from the Monitor server to that specific webserver. It could be a network issue.
25 August 2017, 18:13
Network congestion causing timeouts, you mean? Both Monitor and the target are virtual containers, and live on the same physical and virtual host, so network connections between the two are entirely virtual. I've configured Monitor to connect directly to the target IP, rather than going out and connecting to the DNS-resolved IP (which is a web-application firewall/reverse proxy). Latency is essentially non-existent. I guess there could be some job on another host that's burping out a lot of data, though. I'll see if I can rig up some kind of diagnostic for the next expected failure window.

