Portal Home > Knowledgebase > Industry Announcements > Web Hosting Main Forums > Providers and Network Outages and Updates > My site down again: Need help troubleshooting
Posted by LincolnAdams, 11-25-2010, 01:25 AM Well my site's down again, and no response yet from MDDHosting. In the meantime I'm trying to figure this out so I can get it up again:
The server is accessible, but I can't log into cPanel using my domain, I have to use the server's domain instead. I tried disabling all my plugins by temporarily deleting my plugins folder via cPanel, still nothing. My site refuses to load.
When the problem first started, it was giving timeout errors, sometimes resulting in a blank page, and now I'm getting a "connection refused" error. This is based on Pingdom analysis BTW, not simply me trying to load the page on my own:
Here's a traceroute from my PC:
1,192.168.1.1,0ms,None,----
2,10.240.164.5,7ms,None,----
3,167.206.32.34,6ms,dstswr2-vl2.rh.hcvl,----
4,----,Timeout,n/a,----
5,64.15.5.130,34ms,None,----
6,64.15.1.5,78ms,None,----
7,----,Timeout,n/a,----
8,206.222.119.58,59ms,den1-ar3-ge-1-0-0-0,----
9,173.248.191.30,58ms,cypress.mddhosting.,----
I have a feeling the server gagged on one of my site's script and is now blocking everything as a security precaution, but I have no idea how I can reset it, assuming that's the case.
Any suggestions? My site is http://www.habitationofjustice.com
Posted by trustedurl.com, 11-25-2010, 01:28 AM Dns points to a server called cypress... looks like port 80 isn't responding (telnet 173.248.191.30 80) I think that's a handy networks IP... oddly enough cypress.mddhosting.com loads up, so I'm guessing port 80 got firewalled somehow on that specific IP.
Posted by FractionHost, 11-25-2010, 01:31 AM Contact Mike from MDD, he's active here and he is usually prompt on things. I'm sure he can help.
Posted by LincolnAdams, 11-25-2010, 01:31 AM Yeah, Cypress is the server the site is on. What would cause port 80 from not responding though?
Posted by LincolnAdams, 11-25-2010, 01:33 AM I already opened a ticket, but Mike is AWOL again. I'm on my own here until somebody wakes up over there.Quote:
Originally Posted by JDAnswerContact Mike from MDD, he's active here and he is usually prompt on things. I'm sure he can help.
Posted by trustedurl.com, 11-25-2010, 01:34 AM MDD should be able to tell you, I'm just guessing, but it looks like port 80 on that IP is firewalled.Quote:
Originally Posted by LincolnAdamsYeah, Cypress is the server the site is on. What would cause port 80 from not responding though?
Posted by LincolnAdams, 11-25-2010, 01:36 AM What would trigger that, a bad script?Quote:
Originally Posted by trustedurl.comMDD should be able to tell you, I'm just guessing, but it looks like port 80 on that IP is firewalled.
Posted by stablehost, 11-25-2010, 01:39 AM They don't have phone support?
Posted by LincolnAdams, 11-25-2010, 01:41 AM I'm hearing impaired, not very good with the phones at all.Quote:
Originally Posted by nerdieThey don't have phone support?
Posted by trustedurl.com, 11-25-2010, 01:41 AM Note that was a guess and I don't want to speculate what could trigger something like that, but I'm sure if you ask MDD they'll clarify.Quote:
Originally Posted by LincolnAdamsWhat would trigger that, a bad script?
Posted by LincolnAdams, 11-25-2010, 01:43 AM Finally got a response from support, they're looking into it.
Posted by Hostify Networks, 11-25-2010, 02:06 AM Your site appears to be working now.
Posted by LincolnAdams, 11-25-2010, 02:08 AM Yes, it was some sort of network issue, not related to my site thankfully. I'm like a black cat when it comes to servers though, yeesh.
Posted by MikeDVB, 11-25-2010, 02:12 AM The issue was very odd - our monitoring from Dallas didn't pick it up but our Pingdom monitoring did pick it up. I was actually sleeping so the staff on hand was trying to diagnose the cause/resolve the issue before waking me up.
They ultimately did wake me up as they were not able to exactly determine what was going on or how to fix it however the outage did resolve itself a few minutes after I became available.
I'm gathering traceroutes from those affected (only some where, some were not) to see if we can find anything common among those that were not able to reach the server for a short period.
Posted by trustedurl.com, 11-25-2010, 02:15 AM The OP's IP was definitely filtered on port 80 and 443, you probably monitor the main server ip which wasn't affected.Quote:
Originally Posted by MikeDVBI'm gathering traceroutes from those affected (only some where, some were not) to see if we can find anything common among those that were not able to reach the server for a short period.If you're running CSF or something similar, make sure it isn't blocking your own IP aliases.Quote:
telnet www.habitationofjustice.com 80
Trying 173.248.191.30...
telnet: connect to address 173.248.191.30: Connection refused
telnet: Unable to connect to remote host: Connection refused
Posted by MikeDVB, 11-25-2010, 02:19 AM Yeah, it looks like at this point for whatever reason the routing for secondary IPs failed. I'll be honest that I'm not exactly a networking guru so I'm waiting on my key networking guy to wake up, investigate, and get back with me.Quote:
Originally Posted by trustedurl.comThe OP's IP was definitely filtered on port 80 and 443, you probably monitor the main server ip which wasn't affected.
The first thing I did once I was alerted and was awake enough to know what I was doing was pushing out the IP routes from the server to the upstream routers and then within a few minutes everything was back to normal.
Posted by Yujin, 11-25-2010, 02:20 AM Is this the same problem that you're experiencing since September:Quote:
Originally Posted by LincolnAdamsWell my site's down again, and no response yet from MDDHosting. In the meantime I'm trying to figure this out so I can get it up again:
The server is accessible, but I can't log into cPanel using my domain, I have to use the server's domain instead. I tried disabling all my plugins by temporarily deleting my plugins folder via cPanel, still nothing. My site refuses to load.
When the problem first started, it was giving timeout errors, sometimes resulting in a blank page, and now I'm getting a "connection refused" error. This is based on Pingdom analysis BTW, not simply me trying to load the page on my own:
Here's a traceroute from my PC:
1,192.168.1.1,0ms,None,----
2,10.240.164.5,7ms,None,----
3,167.206.32.34,6ms,dstswr2-vl2.rh.hcvl,----
4,----,Timeout,n/a,----
5,64.15.5.130,34ms,None,----
6,64.15.1.5,78ms,None,----
7,----,Timeout,n/a,----
8,206.222.119.58,59ms,den1-ar3-ge-1-0-0-0,----
9,173.248.191.30,58ms,cypress.mddhosting.,----
I have a feeling the server gagged on one of my site's script and is now blocking everything as a security precaution, but I have no idea how I can reset it, assuming that's the case.
Any suggestions? My site is http://www.habitationofjustice.com
(http://www.webhostingtalk.com/showthread.php?t=979317)
I recalled you having 40-plugins.
Posted by trustedurl.com, 11-25-2010, 02:22 AM I suggest you look at the firewall, smtp was working for the OP's IP, so not a routing problem I'd say, but your guy should be able to tell you for sure.Quote:
Originally Posted by MikeDVBYeah, it looks like at this point for whatever reason the routing for secondary IPs failed. I'll be honest that I'm not exactly a networking guru so I'm waiting on my key networking guy to wake up, investigate, and get back with me.
Posted by MikeDVB, 11-25-2010, 02:26 AM Yeah, it was odd that some monitoring picked up issues, some monitoring didn't pick up issues. Some clients had issues accessing their sites and some did not.Quote:
Originally Posted by trustedurl.comI suggest you look at the firewall, smtp was working for the OP's IP, so not a routing problem I'd say, but your guy should be able to tell you for sure.
When looking at the traffic graphs you can't even tell that there was any sort of outage (no dips of any sort). Definitely odd - we're updating all affected clients via tickets.
In speaking with a few other providers that I personally network with - they've experienced issues where cPanel 11.28 has killed off their aliased/secondary IP addresses during an update (which just happened to have run on this server before the outage). If this is a cPanel bug I'm definitely not going to be happy.
Posted by trustedurl.com, 11-25-2010, 02:36 AM Seriously doubt that the alias was gone, smtp worked fine on the OP's IP. You should look at your firewall, that said the updated could have triggered something, but you'll know best as to how you've set it up.Quote:
Originally Posted by MikeDVBIn speaking with a few other providers that I personally network with - they've experienced issues where cPanel 11.28 has killed off their aliased/secondary IP addresses during an update (which just happened to have run on this server before the outage). If this is a cPanel bug I'm definitely not going to be happy.
Cpanel does sometimes drop the aliases but personally I havent seen that in years, they did fix a lot of their outstanding bugs.
Posted by MikeDVB, 11-25-2010, 02:37 AM I've nailed it down to the cPanel update process restarting networking but failing to restart ipaliases, I'm going to open a ticket with cPanel about this. On a side-note, I'm going to add monitoring to every server for a secondary (aliased) IP address as well as adding a monitor for the ipaliases service itself so should this happen again we'll get notifications within a minute or so.
This has only happened since upgrading to 11.28, uggh.
Posted by TonyB, 11-25-2010, 04:06 AM It's happened to us on a machine and it sure came as a surprise when we had secondary IP's stop routing after updating cPanel. This also started in 11.28 it had never happened previously which is why it was a huge surprise.Quote:
Originally Posted by MikeDVBIn speaking with a few other providers that I personally network with - they've experienced issues where cPanel 11.28 has killed off their aliased/secondary IP addresses during an update (which just happened to have run on this server before the outage). If this is a cPanel bug I'm definitely not going to be happy.
Posted by MikeDVB, 11-25-2010, 04:07 AM Indeed - after 3 years of using cPanel I've never seen this happen. I verified it via the log files that this is indeed what happened and I've got a ticket with cPanel.Quote:
Originally Posted by TonyBIt's happened to us on a machine and it sure came as a surprise when we had secondary IP's stop routing after updating cPanel. This also started in 11.28 it had never happened previously which is why it was a huge surprise.
I really hope I don't have to go through the trouble of making a huge fuss about this to get them to do whatever needs to be done to get this fixed.
Posted by MikeDVB, 11-25-2010, 05:19 AM Just in case anybody is interested in following the thread on cPanel about the issue:
http://forums.cpanel.net/f5/cpanel-1...ng-175271.html
I do also have an internal ticket with them.
Posted by LincolnAdams, 11-25-2010, 11:03 AM That's unsettling, especially when it's the kind of bug that could bring down an entire range of sites, and even more so when the change that caused it seems rather pointless to begin with.
Reminds me of what WordPress developers sometimes do. "Let's take a function that works perfectly, there's no reason to change it whatsoever, but we'll change it anyway and completely $%^& up what was perfectly good code just because we have nothing better to do."
Posted by stablehost, 11-25-2010, 11:07 AM What "release" of cPanel are you using Mike? Current? Release? Stable?
Posted by trustedurl.com, 11-25-2010, 12:48 PM Personally I don't think that's what was the issue. I could telnet to port 25 on the IP, but port 80 was blocked/filtered. If the alias was dropped then the IP wouldn't respond to *anything*.Quote:
Originally Posted by LincolnAdamsThat's unsettling, especially when it's the kind of bug that could bring down an entire range of sites, and even more so when the change that caused it seems rather pointless to begin with.
Though an error like that did exist in cPanel, I haven't seen it for years.
Posted by TonyB, 11-25-2010, 03:01 PM What happens is the web server attempts to create a listener for the ip again and it's unavailable when it does it. Could happen to other services as well it's a situation of certain circumstances in order for it to happen.Quote:
Originally Posted by trustedurl.comPersonally I don't think that's what was the issue. I could telnet to port 25 on the IP, but port 80 was blocked/filtered. If the alias was dropped then the IP wouldn't respond to *anything*.
Though an error like that did exist in cPanel, I haven't seen it for years.
Posted by MikeDVB, 11-25-2010, 04:43 PM Not all secondary IP addresses were dropped, only some of them. You're welcome to continue believing whatever you wish but /scripts/upcp ran at 11:23. The outage occured at 11:23, the log verifies that ipaliases failed on the restart and the only thing I did to bring everything back online was "service ipaliases restart".Quote:
Originally Posted by trustedurl.comPersonally I don't think that's what was the issue. I could telnet to port 25 on the IP, but port 80 was blocked/filtered. If the alias was dropped then the IP wouldn't respond to *anything*.
Though an error like that did exist in cPanel, I haven't seen it for years.
I didn't turn off the firewall or do anything else. Again, I've verified what happened as has another level 3 administrator and cPanel. You're welcome to continue posting your opinions but I'm telling you flat out that you are wrong.
Have a wonderful Thanksgiving.
Posted by trustedurl.com, 11-25-2010, 04:44 PM Sure, but if the IP isn't available/bound, then no service will listen on that same IP. Besides, I can't reproduce this upcp error for the life of me.Quote:
Originally Posted by TonyBWhat happens is the web server attempts to create a listener for the ip again and it's unavailable when it does it. Could happen to other services as well it's a situation of certain circumstances in order for it to happen.
Posted by trustedurl.com, 11-25-2010, 04:48 PM I never said they did, did I? I believe I specifically mentioned the main server IP was working fine (i.e. a non aliased IP).Quote:
Originally Posted by MikeDVBNot all secondary IP addresses were dropped, only some of them.well, I'm just going by with what I saw; on the aliased IP exim was responding, but port 80 and 443 were closed off; but 80 and 443 were responding on the main IP and another few aliases.Quote:
Originally Posted by MikeDVBYou're welcome to continue believing whatever you wishNo need to get defensive, if that's what you see in the logs then it's reasonable to look at that. Still doesn't explain why exim was responding though, but who knows, maybe apache restarted before the alias was available and exim started after the IP was available.Quote:
Originally Posted by MikeDVBThe outage occured at 11:23, the log verifies that ipaliases failed on the restart and the only thing I did to bring everything back online was "service ipaliases restart".Not opinions, just observations; obviously I don't have access to your logs, but, the behavior shown isn't completely consistent with the alias not being bound at all. One simply wouldn't be able to connect to the IP at all if it was dropped.Quote:
Originally Posted by MikeDVBYou're welcome to continue posting your opinions but I'm telling you flat out that you are wrong.A little late for me But have a good one!Quote:
Originally Posted by MikeDVBHave a wonderful Thanksgiving.
Posted by MikeDVB, 11-25-2010, 04:50 PM It doesn't happen every time, otherwise everybody would be in an uproar. A cPanel technician acknowledged that there were other reports of this happening.Quote:
Originally Posted by trustedurl.comSure, but if the IP isn't available/bound, then no service will listen on that same IP. Besides, I can't reproduce this upcp error for the life of me.
The thing is, you can sit at console and run "service ipaliases restart" for an hour and periodically it won't restart properly and you'll lose secondary networking.
We had the same thing happen to another server last week, we went to add an account to the server that needed a dedicated IP and our IP ranges were sequential. When we looked at the list of available IPs a bunch of them were missing... A "service ipaliases restart" fixed the issue and this was on another set of hardware.
Again, just because you don't think it's what happened doesn't mean that you're right. Just because you haven't been able to reproduce the issue, doesn't mean that it didn't occur.
You are correct that if an IP goes offline that no services should respond on that IP but I didn't test to see if some services were working and some were not. Being that restarting ipaliases without just cause can cause some flaky networking and I wouldn't put it past the possibility of it only partially disabling/enabling the IP.
I don't deal with kernel development so I can't tell you what happened but I can tell you what triggered it to happen, and that is it.
At this point I'm going to add you to my "ignore" list here on WHT as I'm not going to endlessly debate this because you think that I'm wrong. If I don't ignore you, I'd be too tempted to make further responses.
Have a wondeful Thanksgiving.
Posted by trustedurl.com, 11-25-2010, 04:59 PM The missing IPs can happen when you remove an IP from the list; the ip aliases aren't sequentially numbered then; on an ipaliases restart it will not realize that this is the case and fub up on bringing them back up; doing a full ipaliases stop and start does fix that.Quote:
Originally Posted by MikeDVBWe had the same thing happen to another server last week, we went to add an account to the server that needed a dedicated IP and our IP ranges were sequential. When we looked at the list of available IPs a bunch of them were missing... A "service ipaliases restart" fixed the issue and this was on another set of hardware.I don't think you understand; I'm saying this is fact: "if an ip alias isn't bound, then NO service will be able to listen on that IP alias". You surely agree with that?Quote:
Originally Posted by MikeDVBAgain, just because you don't think it's what happened doesn't mean that you're right. Just because you haven't been able to reproduce the issue, doesn't mean that it didn't occur.So you do agree Ok, I checked the OPs IP before you came on and I guarantee you 100% that all services were open except for port 80 and 443 (http and https).Quote:
Originally Posted by MikeDVBYou are correct that if an IP goes offline that no services should respond on that IP but I didn't test to see if some services were working and some were not.You can't partially disable an IP; it either is bound or not. What is possible is what I alluded to earlier, that apache was restarted before the IP was bound, but that the other services were started after it was bound.Quote:
Originally Posted by MikeDVBBeing that restarting ipaliases without just cause can cause some flaky networking and I wouldn't put it past the possibility of it only partially disabling/enabling the IP.I never even said you were wrong, merely trying to point out to you what was the situation when the OP opened this thread; that is not an opinion, just an observation. You can choose to take it into account or just ignore it, either way, makes no difference to me.Quote:
Originally Posted by MikeDVBAt this point I'm going to add you to my "ignore" list here on WHT as I'm not going to endlessly debate this because you think that I'm wrong. If I don't ignore you, I'd be too tempted to make further responses.
Add to Favourites Print this Article