PacifickRack / QuadraNet down?

Portal Home > Knowledgebase > Industry Announcements > Web Hosting Main Forums > Providers and Network Outages and Updates > PacifickRack / QuadraNet down?

Posted by jackpx, 04-23-2010, 10:42 PM
I have servers in Quadranet and is down; somebody has the same problem?
Posted by Katie, 04-23-2010, 10:46 PM
Yes, this is accurate. Our LAX9-R3 router froze. We are currently bringing it back up and determining if an IOS upgrade will be necessary to resolve this in the long-term or something else.
Posted by jackpx, 04-23-2010, 10:52 PM
Another problem in the datacenter in 1 week.
Posted by jackpx, 04-23-2010, 11:40 PM
one hour down...
Posted by Katie, 04-23-2010, 11:45 PM
Our techs are working non-stop at this. It was discovered as a hardware failure and they are currently configuring the replacement router.
Posted by oc-colo, 04-24-2010, 01:21 AM
It is working now.
Posted by BurakUeda, 04-24-2010, 01:41 AM
My servers are still down. :/
Posted by cubigworm, 04-24-2010, 01:42 AM
My servers also still down.
Posted by riddla, 04-24-2010, 01:47 AM
still down here too.

last ticket response i got was 'very soon'.

ugh!
Posted by Katie, 04-24-2010, 01:48 AM
Some of the servers on the router are still down. This is because while attempting to fix this issue we have narrowed it down to a backplane issue in the affected router forcing us to replace the chassis.
Posted by guangming84, 04-24-2010, 01:53 AM
3 hours later. My server is not good. My client is very anxious.
Posted by guangming84, 04-24-2010, 01:56 AM

I am very worried. 3 hours later
Posted by oc-colo, 04-24-2010, 01:57 AM
Worked around 40 min and it is down again. I hope they fix it soon.
Posted by riddla, 04-24-2010, 01:59 AM
im getting grilled here.

kate, how much longer do you guys expect this to take?
Posted by xnpu, 04-24-2010, 02:02 AM
Ours still down too.

Please keep those updates with technical tidbits coming Kate. Seems to calm our people down here.
Posted by guangming84, 04-24-2010, 02:10 AM
There has been an issue with the network router 9th floor 3rd row, we believe the actual chassis to
be the root cause of the issue - the
entire router is now being replaced (as opposed to replacing specific blades
before which were thought to be the problem).
ETA is 30 minutes.

---------------------------

my god !!!!
Posted by guangming84, 04-24-2010, 02:13 AM
The router that your section of this datacenter floor is on had a hardware
failure in the management module, we've been working on replacing the module
and
also re-loading all the configs onto it.
The network engineering team has just completed this , and you should see
access to the machines again at this point.

We deeply apologize for the service interruption you experienced and we've been
moving as fast as we could to replace the failed network gear and reload configs
from backups.
-----------------

Posted by riddla, 04-24-2010, 02:19 AM
negative.

3 servers over 2 accounts still down.
Posted by xnpu, 04-24-2010, 02:25 AM
No access yet for me either.
Posted by xnpu, 04-24-2010, 02:31 AM
@guangming84 their webserver is on a different router.

I have two servers with them, one if on a different router without problem, one is on the broken router. They didn't fix their own website first, it simply is not affected by the broken router.
Posted by guangming84, 04-24-2010, 02:41 AM

Quote:

Originally Posted by xnpu

@guangming84 their webserver is on a different router.

I have two servers with them, one if on a different router without problem, one is on the broken router. They didn't fix their own website first, it simply is not affected by the broken router.

thank you .I'm just very anxious.

China Time AM 10:30. I can not connect to my server. Now is the time to PM 14:40 a. I still can not connect to my server.
Posted by xnpu, 04-24-2010, 02:43 AM
@guangming84

Dude - totally understand. Very frustrating for everyone here I think.

BTW, what city are you in? I'm in Beijing myself.
Posted by jackpx, 04-24-2010, 02:46 AM
four hours down...
Posted by BurakUeda, 04-24-2010, 02:53 AM
Not to hop on the bandwagon, but this is taking a bit too long, seriously.
Any ETA would be very helpful...
Posted by riddla, 04-24-2010, 02:56 AM
yea no doubt.

2 hours ago i was told 'very soon'.

right after i said, just let me know, so i can start moving some of my critical services elsewhere.
Posted by guangming84, 04-24-2010, 02:58 AM
@ xnpu

I'm in Shanghai.

Welcome to the World Expo 2010
Posted by BurakUeda, 04-24-2010, 03:03 AM
Guys!
There is a "Private Message" feature for the chit-chat.
Posted by guangming84, 04-24-2010, 03:07 AM
4 hours 30 minutes ago ...
Posted by cubigworm, 04-24-2010, 03:20 AM
it's toooooooooo long to wait.
Posted by guangming84, 04-24-2010, 03:30 AM
5 hours ago
Posted by oc-colo, 04-24-2010, 03:31 AM
It is working for me now.

I keep my fingers crossed there will be no future downtime.
Posted by riddla, 04-24-2010, 03:40 AM
not working for me.........

still 3 servers down.
Posted by xnpu, 04-24-2010, 03:43 AM
No love here either. (1 server.)
Posted by Nugie Lim, 04-24-2010, 03:44 AM
Looks like they managed to fix part of the network. Lets hope they can bring up all other part soon.
Posted by guangming84, 04-24-2010, 03:45 AM
I need to wait a long time to repair my server?

5 hours lata ....
Posted by jackpx, 04-24-2010, 03:51 AM
Yes, goodbye Quadranet
Posted by xnpu, 04-24-2010, 03:54 AM
If I would bail out every time one of my providers had a big problem I would never stop moving. 5 hours is quite a record though. A little more frequent updates would be appreciated.
Posted by guangming84, 04-24-2010, 04:02 AM
5 hours 30 minutes passed....
Posted by jackpx, 04-24-2010, 04:19 AM

Quote:

Originally Posted by xnpu

If I would bail out every time one of my providers had a big problem I would never stop moving. 5 hours is quite a record though. A little more frequent updates would be appreciated.

5 hous today, 10 hours last week.
Posted by xnpu, 04-24-2010, 04:21 AM

Quote:

Originally Posted by jackpx

5 hous today, 10 hours last week.

Wow. Fortunately I was not affected nor aware or last week. Hm...
Posted by jackpx, 04-24-2010, 04:27 AM

Quote:

Originally Posted by xnpu

Wow. Fortunately I was not affected nor aware or last week. Hm...

http://www.webhostingtalk.com/showthread.php?t=941775
Posted by guangming84, 04-24-2010, 04:27 AM
We are now on the tail end of the situation which occurred about 5 hours ago.

The problem was two fold and took some time to diagnose, then time to replace
blades which looked to be faulty only to find out the chassis was also at fault.
In the end, we have had to replace the entire chassis and the SUP Management
module, and are in the process of reloading the config from backups. We are in
the process of reloading the VLAN configs right now and once complete your
machines should again be accessible. Please hang tight, we've had the entire
network engineering team working on this and getting everything diagnosed,
replaced, and config'ed as quickly as possible.

We initially believed the SUP management blade to have failed, however after replacing it we found their was still an issue. Diagnosing further we deduced that the problem must be related to the chassis failing tonight and causing issues with multiple blades. We de-racked the bad chassis, had to re-rack with a new one, re-installed the modules, then had to properly re-cable the ports, reload the config for the SUP Management module, and are now in the process of reloading all the VLAN configs from backups. As soon as your VLAN has been input you will have access to your machine(s) again.

We deeply apologize for the issue you experienced tonight and this unforeseen failure in the distribution router for your section. However, due to having on-site spares and on-site network engineering team we have been able to keep this situation from escalating further.
Compensation for the outage will be available per our Network SLA, please give us a few hours to finish getting the situation under control and then submit a new ticket in the Billing department to claim SLA credits.

We hope for your patience and understanding as we work through this major issue and understand that any downtime has a large affect on your business and will continue to work to provide as reliable of a service as you have come to expect from us in the past. This is of course not acceptable for us as a service provider either, and we will be thoroughly researching this situation to determine what we can do to restore services even more quickly or prevent this issue from happening altogether in the future.

Thank You
--
Posted by xnpu, 04-24-2010, 04:38 AM
And I'm back in business! Finally :-)
Posted by BurakUeda, 04-24-2010, 04:47 AM
One up, another one still down...
Posted by jackpx, 04-24-2010, 04:52 AM
Online my servers...
Posted by cubigworm, 04-24-2010, 05:03 AM
still down
Posted by guangming84, 04-24-2010, 05:05 AM
still down

6 hours 30 minutes lata ...
Posted by cubigworm, 04-24-2010, 05:52 AM
online now
Posted by Katie, 04-24-2010, 11:50 AM
Sorry for my lack of updates. I was attempting to update from home as much as possible but that became impossible.

As you can tell, the issue has been resolved at this time. Typically, if there is an issue we're on top of it fairly quickly. guangming84 posted the synopsis of the issue. However, here it is again:

Quote:

We are now on the tail end of the situation which occurred about 5 hours ago.

The problem was two fold and took some time to diagnose, then time to replace blades which looked to be faulty only to find out the chassis was also at fault. In the end, we have had to replace the entire chassis and the SUP Management module, and are in the process of reloading the config from backups. We are in the process of reloading the VLAN configs right now and once complete your machines should again be accessible. Please hang tight, we've had the entire network engineering team working on this and getting everything diagnosed, replaced, and config'ed as quickly as possible.

We initially believed the SUP management blade to have failed, however after replacing it we found their was still an issue. Diagnosing further we deduced that the problem must be related to the chassis failing tonight and causing issues with multiple blades. We de-racked the bad chassis, had to re-rack with a new one, re-installed the modules, then had to properly re-cable the ports, reload the config for the SUP Management module, and are now in the process of reloading all the VLAN configs from backups. As soon as your VLAN has been input you will have access to your machine(s) again.

We deeply apologize for the issue you experienced tonight and this unforeseen failure in the distribution router for your section. However, due to having on-site spares and on-site network engineering team we have been able to keep this situation from escalating further.
Compensation for the outage will be available per our Network SLA, please give us a few hours to finish getting the situation under control and then submit a new ticket in the Billing department to claim SLA credits.

We hope for your patience and understanding as we work through this major issue and understand that any downtime has a large affect on your business and will continue to work to provide as reliable of a service as you have come to expect from us in the past. This is of course not acceptable for us as a service provider either, and we will be thoroughly researching this situation to determine what we can do to restore services even more quickly or prevent this issue from happening altogether in the future.

Thank You

Posted by CGotzmann, 04-24-2010, 01:59 PM
5 hours is not that bad of a time frame for what occurred.
yes, downtime sucks, but no one will never have downtime. 1 router out of 4 on this floor bit the dust and had to be replaced in the most extreme of ways. in 8 years, this is the first time I have seen a bad chassis. I've heard of it technically possible but had never heard of it actually happening to anyone.
We have full spare parts for entire router failures, so we were able to replace this LAX9-R3 last night. The length of the repair can be attributed to diagnosing and labor.

If a colocation customer wanted redundancy, then you are asking about something that would be their responsibility for 50% of the equation. That customer would need to have dual NICs in each of their machines, with dual switches on their own side. They could then get 2 uplinks from us from 2 separate routers. A dedicated server as is standard in the industry has a single public uplink. You could always get some type of custom solution dedicated server where you have 2 separate fail over public uplinks, but this is not what the majority of customers look for.

To aid in the visual perspective and give you a bit more of a feel for what happened, here is a picture of the routers in the LAX9 datacenter.
http://pacificrack.com/04.23/badrouter.jpg

The very first router, LAX9-R3 was the one that experienced the issues.
I would also like to quote a sentence from the previously posted summary:

We hope for your patience and understanding as we work through this major issue and understand that any downtime has a large affect on your business and will continue to work to provide as reliable of a service as you have come to expect from us in the past. This is of course not acceptable for us as a service provider either, and we will be thoroughly researching this situation to determine what we can do to restore services even more quickly or prevent this issue from happening altogether in the future.

Prior to the past few weeks, our network and power uptimes were impeccable. Over the last 3 weeks we have had a couple totally separate and unrelated issues as per the threads here on WHT. Each issue has been independent of each other on different floors or sections and I would toss this up to bad luck for them to all happen so close to each other.
Lets return to business as usual and get back on track to the uptime you can expect from QuadraNet!

Add to Favourites Print this Article

Knowledgebase

PacifickRack / QuadraNet down?

Our Services

Client Menu

Legal