Portal Home > Knowledgebase > Articles Database > All services fail and restart every five minutes, and how I fixed it.


All services fail and restart every five minutes, and how I fixed it.




Posted by Johnny Cache, 03-05-2010, 08:15 AM
I don't have a single clue as to why this happened, but at precisely 2:00 AM Pacific, and every five minutes following, all of the daemons on my cPanel failover cluster would stop and restart in under a second. I've attached a screenshot of all the failure messages I received during that time. http://nwtechgroup.com/WHT/failed-services.png The fact that the services would stop every 5 minutes made me instantly think of cron. During this flood of emails, I received an interesting email from cron that I'd never received until today, indicating that this job had been run: When I did a crontab -l it was evident that it wasn't being invoked every five minutes, and, when compared to one of my standalone cPanel machines, I could find no anomalies. Anyhow, /scripts/upcp --force took care of the issue, all of my services continue to run without interruption. Has anyone else noticed this? I'm wondering if a cPanel auto update went awry along the way. There was nothing indicative of a catastrophic failure in /var/log/messages, only that most of the daemons were being restarted every five minutes. My /upcp output looked odd today as well: http://nwtechgroup.com/WHT/upcp--force.txt Following /upcp --force, I checked /var/cpanel/updatelogs/ and found another log that was generated 1 minute after /upcp finished, which I exported to a text file: http://nwtechgroup.com/WHT/update.12...ostinstall.txt It looks (to me) like several key components were missing and had to be rebuilt. When you scroll through and read it, it almost seems like cPanel was trying to install, as if it were a vanilla whitebox server with no GUI. Can anyone else think of anything?

Posted by UNIXy, 03-06-2010, 12:23 PM
It's most likely chkservd. Can you post the file /var/log/chkservd.log somewhere? Regards Joe / UNIXY

Posted by Johnny Cache, 03-06-2010, 03:10 PM
chkservd.log as requested. The damn file was huge so I just exported March 05 to the text file. The problem began at exactly 1:59am Pacific Time. http://nwtechgroup.com/WHT/chkservd.log.txt

Posted by UNIXy, 03-06-2010, 03:18 PM
What's worrying is that right before 1:59am, at 1:54am, sshd went down. It's usually a sign of intrusion but I hope I'm wrong. What does your /var/log/secure file have? Anything odd there? Hopefully this is just a bad cPanel release. /scripts/upcp is set in cron to run daily around 12:28. Regards

Posted by Johnny Cache, 03-06-2010, 03:26 PM
Hiya Joe- I was about to respond to the 01:59am, as originally I wondered the same thing. /secure is completely normal. I personally thing that something happened during a /upcp, maybe a network hiccup or something. I'm thinking that there was an interruption during /upcp and it replicated to the other servers. Logical theory I think, considering we're both running failover cPanel clusters? I've gone through all the logs - actually, I spent three hours going over the logs from the past week, chkrootkit, rkHunter, and Lynis all come back reporting no anomalies. Note that /scripts/upcp --force seemed to correct it all. The contents of the postinstall.log are what I find to be the most interesting. Discuss? :-)

Posted by UNIXy, 03-06-2010, 04:05 PM
It looks like you already did due diligence and ruled out intrusion, which is good. I would keep this incident in mind and move on for now. --force will always produce nasty logs if that's what you're referring to. Cheers, Joe / UNIXY

Posted by Johnny Cache, 03-06-2010, 04:13 PM
If the worst actually had happened, all I'd have to do is a bare metal restore from CDP, I suppose, but I do believe that would take one of my clusters down for a while. :-/ I wasn't so much worried about /upcp --force as much as I found the postinstall.log to be the most interesting.



Was this answer helpful?

Add to Favourites Add to Favourites    Print this Article Print this Article

Also Read
MSBill (Views: 606)
Imagemagick (Views: 579)


Language: