Portal Home > Knowledgebase > Industry Announcements > Web Hosting Main Forums > Providers and Network Outages and Updates > securedragon vps down ?


securedragon vps down ?




Posted by ttgt, 07-20-2012, 01:54 AM
Hi,

both my vps and his client area are unreachable,

is your vps working well now ?


thanx

Posted by ZKuJoe, 07-20-2012, 03:06 AM
https://my.securedragon.net/announcements.php?id=190

Unfortunately the easiest part of this upgrade is giving us the most trouble. We have a single switch that is giving us trouble and attempts to troubleshoot are causing more issues. We are able to failover successfully but cannot troubleshoot the problem when the network is failed over. We are finishing up our work shortly and will schedule more maintenance later if we cannot resolve this soon.

Posted by ttgt, 07-20-2012, 03:57 AM
how about replace a new switch to let it work well first?

Posted by ZKuJoe, 07-20-2012, 04:20 AM
We replaced both switches and routers, one of the switches would not accept the new IP and was causing a duplicate IP issue. Everything is back online but we have not tested all of the failover just yet. We will reschedule that for another day.

Posted by ZKuJoe, 07-20-2012, 05:40 AM
The network is back online after our DC staff dug around and found an old laptop with a serial port on it so we could reset one of the new switches to factory defaults (we brought the console cable but forgot the adapter since neither or our netbooks have a serial port). The primary router and switch are working (and the failover router and switch were working prior but after getting the network back online we did not test the redundancy).

Here's how our night went:
7/19 @ 4PM-9PM (EST): Finalized testing the new hardware (it was in our lab for 2 weeks so we were fairly confident before we even drove to the DC). We made sure everything was working flawlessly and everything worked better than expected.
7/19 @ 10PM: Failed over to our backup router (router3).
7/19 @ 10PM-11PM: New hardware installed, started configuring new hardware to match our production network.
7/19 @ 11PM: Switched over to new network hardware. Cut-over was flawless with less than 15 seconds of downtime.
7/19 @ 11:05PM: Begin testing failover. Removed uplink fiber to primary router, failover resulted in 3 ping timeouts.
7/19 @ 11:10PM: Restored uplink and primary router returned to master state, again 3 ping timeouts.
7/19 @ 11:15PM: Removed cable from primary router to primary switch, DANGER DANGER! Network does not return.
7/19 @ 11:15PM - 7/20 @ 4:00AM: Troubleshooting efforts of pain. Stuff that worked shouldn't have and stuff that should have worked didn't. In the end, while trying to login to the web GUI for the switch, I noticed that the management IP the switch was set for was the wrong IP. I changed it in the web GUI and it wouldn't take. After resetting the primary switch to factory defaults and multiple reboots of both switches and routers, the network finally came back online.

The next 3 days will be spent reviewing logs and each piece of hardware to try to figure out the problem. We are very unhappy that after weeks of testing and executing the upgrade with very little downtime, the problem we run into was a semi-dumb switch.

Posted by ZKuJoe, 07-20-2012, 02:38 PM
The network issues have been resolved and we are back online with full redundancy at this time. The problem was caused by multiple issues.

1) Duplicate IP on the switch (even after changing the IP address the duplicate still remained).
2) Incorrect IPs on our WAN interfaces.

At this time we are able to withstand up to 2 hardware failures (1 switch and 1 router, or any 2 network cables). To confirm the network redundancy I was able to restart our routers (separately) without any downtime.

At this time, the only outstanding issue is that IPv6 is not routing correctly but this is due to a VRRP issue so we are working on a solution for this.



Was this answer helpful?

Add to Favourites Add to Favourites    Print this Article Print this Article

Also Read
Wiredtree Node Down? (Views: 991)
Limestone Latency (Views: 1044)
ServerTag down?? (Views: 1065)


Language: