Rapidswitch DC issues

Portal Home > Knowledgebase > Industry Announcements > Web Hosting Main Forums > Providers and Network Outages and Updates > Rapidswitch DC issues

Posted by CretaForce, 02-20-2010, 11:38 PM
Does anybody else has servers down in Rapidswitch ?

Quote:

We have had a reported AC failure in sections of RSH North. While the site has redundant cooling capacity, this cover has limits. Unfortunately at this time we do not have further information to give, but will update you as soon as we do know more.

This may cause a rise in ambient temperature which may as a result affect servers running in RSH North.

Posted by brightstation, 02-20-2010, 11:47 PM
yep, one server down sine 2:00am.
Posted by ValueVPS-Dave, 02-21-2010, 12:03 AM
3 servers down - no response to support tickets.
Posted by gordonrp, 02-21-2010, 12:16 AM
Supposedly an AC failure, it's blooming england at 2AM! Open a window, turn on a fan! (currently 30F, 0celcius outside!).
Posted by xeno007, 02-21-2010, 01:39 AM
One of the servers 3 and a half hours down. This sucks.
Posted by gordonrp, 02-21-2010, 02:10 AM
Back up, after 4 hours.
Posted by xeno007, 02-21-2010, 02:12 AM
Yup, my server is back too now. Still no official information if the issue is completely resolved.
Posted by CretaForce, 02-21-2010, 02:29 AM
One server UP, one server DOWN, 2 servers UNAFFECTED.
Posted by CretaForce, 02-21-2010, 03:07 AM
Is still someone with server down?
Posted by robputt796, 02-21-2010, 03:37 AM
All of our servers hosted at RapidSwitch were down until 6:30 am approx. About 3.5 hours downtime I think. I'll need to check our monitoring to be sure.
Posted by ValueVPS-Dave, 02-21-2010, 06:32 AM
4 hours 20 minutes on each server :-( What's the betting that '100% SLA' doesn't kick in (it never has since they implemented it)
Posted by colinjmx5, 02-21-2010, 10:32 AM
Hi,
I reported issue to RapidSwitch direct at 4am as alerted by servermojo.
With lots of help from vmhosts all working for us by 10:20am

Would be really useful as to know cause since it was a cold night, i would expect what happened with 30c in Summer but not in Winter

Colin Johnston
MX5 Owners
Posted by aeris, 02-21-2010, 03:59 PM
None of 11 servers affected. Which is a bit strange, since they're all at "north".

Also, I kinda doubt they have a window in their server room.
Posted by brightstation, 02-22-2010, 06:42 AM
"We will be carrying out maintenance which will affect some of your services with us.

Maintenance Type: Infrastructure (AC)
Expected effect on your service: No Effect"

Guess what, servers are down again.

Time to move on, who does maintenance in the middle of the day, these guys are a bunch of amateurs.
Posted by ValueVPS-Dave, 02-22-2010, 06:50 AM
I have a server down - anyone else affected?
Posted by erectvps, 02-22-2010, 06:51 AM
Got a private server there and its not responding
Posted by rishikesh, 02-22-2010, 06:55 AM
Hello,
I am still facing same problem, any problem on RapidSwitch...
Posted by jarimh1984, 02-22-2010, 07:04 AM
3 servers up 2 down. Two down servers are in B4 RSH North Lwr.
Posted by ValueVPS-Dave, 02-22-2010, 07:09 AM
All servers down :-(
Posted by ValueVPS-Dave, 02-22-2010, 07:12 AM
I think it's time for ValueVPS and RapidSwitch to finally part company (While ValueVPS still has some clients). Never known such incompetence - i got an email at 10:12 telling me they were carrying out maintenance between 10:00 and 11:00 with 'Expected downtime duration: 0'minutes - servers went down at 10:28 and are still down.

How the hell was i meant to alert my clients?
Posted by robputt796, 02-22-2010, 07:14 AM
Hi,

Once again Putt Hosting has been affected by this outage, all of our UK servers are down. Seems a little silly that they are replacing the defective part during the day in their time zone I have to admit. Is this sort of thing covered by their 100% uptime SLA, I wasn't too concerned after Friday or was it Saturday night's outage but this bring my uptime below acceptable levels.

I have to admit I feel the same as Dave on this matter, they should have definitely given more than -12 mins notice.
Posted by erectvps, 02-22-2010, 07:18 AM
Timing isn't great but sometimes things have to be done NOW rather than later. If it is a major issue then better to have a bit of downtime now when people/parts are available rather than late a night where there is a high chance of failure between now and then.
Posted by robputt796, 02-22-2010, 07:33 AM
Servers appear the be back up!

Putt Hosting VPSs are still down as they need to re-quota themselves due to an unsafe shutdown.

I have read through the SLA terms on the RapidSwitch website, they pretty much cover themselves for things like this :-(. It's about time people make SLAs that actually mean something.
Posted by allstarone, 02-22-2010, 07:54 AM
Has anyone got any info regarding why their servers are going down? Our servers are recording high temps, but when we check the disk temps they are relatively cool, or at least cool enough that they shouldn't have major issues.

We're wondering if there is some sort of electrical issue occuring when the AC is switched on/off.
Posted by ValueVPS-Dave, 02-22-2010, 07:56 AM

Quote:

Originally Posted by erectvps

Timing isn't great but sometimes things have to be done NOW rather than later. If it is a major issue then better to have a bit of downtime now when people/parts are available rather than late a night where there is a high chance of failure between now and then.

Once or twice this may be acceptable - what isn't acceptable is that these problems are becoming more and more consistent and we are expected to just sit back and let it happen. I have lost count of the number of clients we have lost due to the shoddy service that RapidSwitch now provide. Did anyone notice after that last major outage that they announced we all now have a '100% SLA' - so far I have not been able to claim anything back under SLA, lets see what happens this time.

"100% SLA for all services
==============================
We are launching a 100% service level agreement (SLA) for all services. This becomes effective from 1st December 2009. To view details of it, please go to https://myservers.rapidswitch.com/Terms.aspx As it is now part of our standard contract for all clients, next time you log into our portal you will be asked to accept the changes to our Terms & Conditions, the changes in this case being the addition of the 100% SLA. "

Reading the SLA:

"Should the Client not have access to the Services as defined above, RapidSwitch shall credit the Client 0.5 days service credit for each hour when the service is not available, subject to a maximum credit in any one month of 50% of the monthly fee for the contracted service. The credit applies to the contracted service. The Client shall not be entitled to any credits under this SLA if any payment of the price for the Services is overdue under the terms of this Agreement. The credit shall be made for the element of the Services that were not available, it will not be made for the whole service. (E.G. If a dedicated server and backup service are ordered, but the backup service is not available for a period of time, the credit will be calculated based on the price of the backup service, not the combined price of the dedicated server and backup service.) Any credit is subject to the Client notifying RapidSwitch within 7 days in writing. This Client agrees the service credits due under this SLA are its sole remedy against RapidSwitch for any non-availability of the Services."

How does that make 100% SLA?
Posted by ValueVPS-Dave, 02-22-2010, 07:57 AM
servers have just come back up for me
Posted by dazmanultra, 02-22-2010, 08:00 AM
One of our monitoring nodes with Rapidswitch has remained available, however this is possibly because it isn't heavily loaded.

I think the failure of their AC/cooling systems has meant crashing for heavily loaded servers...
Posted by freewhosting, 02-22-2010, 08:06 AM
Looks as though they've disconnected their phone systems also; I received a standard response from them on a ticket which said automatically closing due to live incident. Anyone else get the same?

Had all 6 servers go down for 4.5 hours on Sunday morning and RapidSwitch have agreed to pay out as per the 100% SLA agreement.

Our VPS instance with them also has incurred a lot of problem, with the node being offline for 80 minutes whilst they migrated clients to a new node.

We all need an assurance from RapidSwitch that they can meet the needs of our businesses.
Posted by CraigMesser, 02-22-2010, 08:47 PM
Sound's like you've all had a right game these past 2 days :|
Posted by moozaad, 02-23-2010, 07:53 AM
We lost a hard drive sunday morning due to heat. Not happy at all. RS aren't willing at covering it either.
Personally I consider it asset damage by RS by not providing a suitable environment for IT equipment...

For those that don't know... their backup AC didn't stop the ambient temp rise from going over 48c!! I'm not sure what it topped out at but it also hard locked our NIC on both servers when they did it again on monday.

Least happy customer in the world right now!!!
Posted by PCS-Chris, 02-23-2010, 08:44 AM
Our racks in RSH North were affected, although we didnt have any downtime or hardware failures. Just some slightly hot raid cards
Posted by XFactorServers, 02-23-2010, 08:46 AM
Time to move to Pound Host... o wait we already did.
Posted by aeris, 02-23-2010, 10:28 AM
I got a readout on the SMART data on the disks of a few select servers, their lifetime max temperature varied from around 40 to 51 degrees C. Most manufactures specify the maximum recommended operating temperature of 60C, and this isn't hotter than the disks in one of my backup NAS boxes have been running for years, so assuming the readings are correct, that really shouldn't have caused any problems.

There might of course have been differences in the temperatures in other parts of the datacenter, but I'm pretty sure it couldn't have reached 48C around these servers.
Posted by moozaad, 02-23-2010, 10:37 AM
48oC was the case ambient sensor reading off both servers. That's a trigger threshold not real-time data so the temperature probably went above that.
Both servers' planar probes reported 58oC triggers too - and they were idle at the time (also low TDP/voltage variants).
Unfortunately I can't get the smart data from the drive until it is recover from the RS.
I presume I was in a dead air spot as the temp shot up when the main AC went.

FYI the normal ambient is 22c and planar 27c.
Posted by moozaad, 02-23-2010, 12:49 PM
These quotes make my day... from head of operations at RapidSwitch.

Quote:

Jon,

Can you send through logs of the periods that your servers were unavailable during the issue period?

Regards,

Paul
RapidSwitch

My reply was.. no, crashed servers don't make logs. By some twist of fate caused by heat it was still responding to pings but httpd and other services were down. I think the storage controller had given up but the kernel or at least network drivers were still up. He knows this because we've spent the last 48 hours discussing RS cooking my HDD and he's had the temp logs since early Monday.
There's no thermal shutdown on Dell R300s by default, that's been fixed now! I had hoped I'd never need it but apparently when hosted with RapidSwitch... you do.

The most I can offer is thermal logs from the RAC and eye witness statements from clients trying to use the site but apparently they're not enough.

Quote:

No problem, we can go from the records held in MyServers in the absence of your own if you prefer?

21/02/2010 14:04 20:16 Reply received in 0ms with TTL=63
21/02/2010 14:01 0:02 Failed to receive a reply: Timed Out
0.5 hours

Obviously the question in that quote is rhetoric or he'd ask for the statements or contact details; logs were already provided.
Really that quote translates to 'here take what we offer, bend over and stfu coz we can ping you that means we've not broken the SLA!'. The times shown don't even relate to the incident, that's me trying to get the thing back online (it took a few reboots and eventually RS had to pull the power to reset some components that wouldn't soft reset even with a power off - that's heat for you, really messes things up).

So to them they neither broke their SLA by heat bathing the hardware at more than 48C nor admit accidental damage to assets caused by the previously mentioned SLA breach that didn't happen.
Win/Win scenario for RapidSwitch then.

*sigh*
Posted by rotame, 02-23-2010, 01:12 PM
I faced similar problem , seems to be ok now According to them

First incident:
Date: 21/02/2010
Time: 02:00
Duration: Approximately 3.5 hours
Impact: Approximately 10% of servers had thermal cut out for a small portion of when there was reduced cooling.

Second incident:
Date: 22/02/2010
Time: 10:00
Duration: Approximately 1.5 hours
Impact: Approximately 8% of servers had thermal cut out for a small portion of when there was reduced cooling.

The cause of the first incident was diagnosed as a single phase of the polyphase power distribution not supplying voltage. Our onsite engineers followed the process and escalated to our senior staff, who diagnosed the fault with our mechanical and electrical contractors. A 630 amp polyphase breaker was taken out and reseated, this resolved the issue.

The second incident occurred when the mechanical and electrical engineers were on site reviewing the low voltage switch panel. The breaker that had been reseated the previous day overheated and suffered a critical failure. We believe the cause of this was a manufacturing fault, and the issue has been escalated to the manufacturer. We engineered an amount of redundancy into our LV distribution and panels, which is why we were able to restore full cooling duty so quickly.

During both outages we implemented our disaster recovery plan to supply additional emergency cooling. Although the data centre temperature rose, the increase was tolerated by the vast majority of servers and so there was no impact on most clients.

The matter is now resolved and full cooling duty has been operational for over 24 hours.
Posted by aeris, 02-23-2010, 02:34 PM

Quote:

Originally Posted by moozaad

There's no thermal shutdown on Dell R300s by default, that's been fixed now! I had hoped I'd never need it but apparently when hosted with RapidSwitch... you do.

It's a nice safety, and it's not like they're the first company who have suffered an AC failure. I remember LeaseWeb had a similar event a few months back.

Quote:

Originally Posted by moozaad

So to them they neither broke their SLA by heat bathing the hardware at more than 48C nor admit accidental damage to assets caused by the previously mentioned SLA breach that didn't happen.
Win/Win scenario for RapidSwitch then.

Granted, they really should fully honor the SLA. But why don't you have some form of external monitoring on these servers? Most monitor services like Pingdom/Hyperspin/ServiceUptime provides full logging of all downtime, and they're nice leverage in cases such as that.
Posted by robputt796, 02-23-2010, 02:46 PM

Quote:

Originally Posted by aeris

Granted, they really should fully honor the SLA. But why don't you have some form of external monitoring on these servers? Most monitor services like Pingdom/Hyperspin/ServiceUptime provides full logging of all downtime, and they're nice leverage in cases such as that.

Agreed, RapidSwitch took a look at my HyperSpin log dump which I copied and pasted out of the emails from HyperSpin and fully honoured their SLA as per their Terms of Service. They also as a good will gesture rounded the downtime to 5 hours even though it was only 4.1 hours. Good service if I am honest, let's accept it downtime does occur and they have handled it well considering.
Posted by moozaad, 02-23-2010, 03:21 PM

Quote:

Originally Posted by aeris

Granted, they really should fully honor the SLA. But why don't you have some form of external monitoring on these servers? Most monitor services like Pingdom/Hyperspin/ServiceUptime provides full logging of all downtime, and they're nice leverage in cases such as that.

RapidSwitch monitoring is usually pretty sufficient, but in this case it would have had to ask for a resource to get the 500 error over http to show they'd knocked out the storage controller.
If it was a plain crash then it would have all been a lot simpler. Throw in the HDD death from the temp, and proving it all becomes complicated but not impossible. The question that needs answering from my side; is why didn't the mirror carry on working even with a dropped drive. The answer is very likely heat but is unprovable. Same for the NICs that wouldn't come back up after being power cycled.

If it had been IOmarts SLA (parent company) I would have been sorted as they have an environment section in their SLA but RS lacks that.
I did hold RS in high regards but Paul (hi!) hasn't been very unhelpful these last 2 days.
We're mostly an R&D company with a few friends and colleagues running off a xen VM one of the server so we didn't lose any business this time. I just wished RS held up the good faith/good will/best endeavours part of the business relationship which I feel is now lacking.

Well got to put it behind me now and source a replacement HDD that's no longer on the market >.<
Posted by ValueVPS-Dave, 02-23-2010, 03:23 PM
I have to say that RS have honored their SLA with no fuss whatsoever. Even though I hate downtime as it upsets my clients i feel it's only fair for me to say that RS have handled the entire situation properly and promptly. Customer service is improved, lets see the service get back to what we were enjoying 2-3 years ago.
Posted by rotame, 02-24-2010, 07:40 AM

Quote:

Originally Posted by ValueVPS-Dave

I have to say that RS have honored their SLA with no fuss whatsoever. Even though I hate downtime as it upsets my clients i feel it's only fair for me to say that RS have handled the entire situation properly and promptly. Customer service is improved, lets see the service get back to what we were enjoying 2-3 years ago.

I agree with you , I am over 2 years client and always I had information for any situation good or bad
I believe are very pro, honest and they provide good service
Posted by Interix, 02-24-2010, 07:47 AM
Sometimes things just have to be replaced during the day before it causes any damage, i would rather have them replace the problem ASAP than Just wait until it decides to happen again even if it means they have to do maintenance through the day. That’s me though.

Lisa
Posted by gordonrp, 02-24-2010, 08:22 PM
I had a pretty good run at Rapidswitch, having been there for almost two years.

I've just canceled my dedicated server and colocated server, but am pretty shocked to see a 30 day cancellation policy. I'm sure this was not in place back when I signed up: http://www.uploadscreenshot.com/image/50593/6549132

I guess this will be the last test of their customer service for me. All in all they have been very friendly and helpful over the period, but I have a sneaking suspicion I am going to get stung here for another months worth fees. Hopefully the original owners are still around, I forget if they stuck around after the buyout or not.

Gordon
Posted by aeris, 02-25-2010, 08:28 AM
They have had the 30-day cancellation clause for as long as I've been with them, but my first server with them had an ident in the 4000s, so you seem to be pre-dating that. And no, as a rule they won't waver it.
Posted by gordonrp, 02-25-2010, 01:01 PM

Quote:

Originally Posted by aeris

They have had the 30-day cancellation clause for as long as I've been with them, but my first server with them had an ident in the 4000s, so you seem to be pre-dating that. And no, as a rule they won't waver it.

Actually I've been with them for about 945 days. Over two years, signed up in 25/07/2007.

It's a shame to leave, but my customers already left me! Only hoping they don't leave a sour taste in my mouth as their first ticket response indicates they will.

Quote:

Hello Gordon,

I can see your servers have been cancelled, and will be disconnected: 27/03/2010.

May I ask why you would like to cancel your services with RapidSwitch, and if there is anything I can do to help?
Alternatively, we offer courier shipment of your equipment by a specialised computer equipment relocation company, Comtec. The fully-insured service provided by Comtec is:

1. Come to data centre to collect server
2. Wrap server in anti-static bubble wrap
3. Place in IT relocation crate
4. Cover with anti static blanket and seal crate
5. Deliver to destination in Comtec van
6. Unpack and hand to client
7. The cost for this service is £100+VAT plus £2+VAT per mile that the server is delivered to from our office. The mileage is calculated as driving distance for the fastest route, not as the crow flies.

Kindest regards,

Francesca
RapidSwitch

£117 plus mileage? Please... That was in response to me asking if they would box it up for me and I'd arrange for courier to collect.

Add to Favourites Print this Article

Knowledgebase

Rapidswitch DC issues

Our Services

Client Menu

Legal