Portal Home > Knowledgebase > Articles Database > kernel: aacraid: Host adapter abort request (0,0,0,0)
kernel: aacraid: Host adapter abort request (0,0,0,0)
Posted by bloodyman, 12-31-2011, 04:49 AM |
Hello
Recently on 2 of my servers (I have 12 servers with the same configuration), I get SCSI errors.
Servers are running Adaptec 5405 RAID cards with 2x300 GB SAS disks.
Everytime in /var/log/messages I get information about SCSI hang:
kernel: aacraid: Host adapter abort request (0,0,0,0)
kernel: aacraid: Host adapter reset request. SCSI hang ?
kernel: aacraid: SCSI bus appears hung
I reviewed controller logs, and I saw that each time the controller tried to resolve this several times without success. Then I saw a controller reboot, which finally sorted the problem. This all happend on working server, without server panic, there were only very high load when server was unable to write information to HDDs and it then it started to use SWAP.
The last thing I've observed after such controller reset is that [aacraid] server process changed into [AAC] process - I have not installed any additional drivers for Adaptec cards on my server! And process name changes!
I don't know if anyone has simmilar problems with Adaptec RAID cards?
Btw - I'm using CentOS 5.7 32bit with stock aacdriver.
|
Posted by PCS-Chris, 12-31-2011, 08:27 AM |
I've had the same issue before, it affected several servers (Adaptec 2405 and 2805) and only happened at times of peak load. So ironically we had a few machines go down one after the other. The system will either hang and recover or if you are hosting VPS processes may timeout and crash.
To work around this we made the following changes:
1. Kernel upgrade. - At the time the latest kernel was 2.6.18-194 IIRC and we built a 2.6.18-238 based kernel from a testing SRPM which included a newer aacraid driver. Obviously you are now on 2.6.18-274 but check the changelog/eratta to see if there were any driver changes for aacraid.
2. Force your CPU cstate to highest performance instead of scaling
3. Watch your CPU usage / we dedicated additional CPU to the host OS in our case as these were Xen systems.
I can't say these tricks will work for you but it gives you something to try. We've not seen that error since making the changes and most of our Adaptec based machines have 250+ days uptime.
If you are not running any type of Virtualization or Cloudlinux I would be tempted to build a 3.1.x kernel off kernel.org just so youve got the very latest drivers and can rule that out
Chris
|
Posted by bloodyman, 12-31-2011, 09:56 AM |
Hi
Thanks for reply.
I must say, that this most time happens to me when servers was idle, with load 0.20 to 0.70, without any intensive I/O wait operations.
I read changelog of 2.6.18 kernel delivered with CentOS 5 and did not find any aacraid updates in 2011 year. What aacraid driver version are you using?
This issue is very strange for me, I googled about it but did not find any clue - some people wrote about problems with Dell or IBM servers using Adaptec controllers but also did not find any answer. Before yesterday hang became, server was running 200+ days without issue after first hang. That is why it makes my worry and makes this thing more misterious.
Bloody
|
Posted by nibb, 05-07-2013, 02:35 PM |
I know this post is old, but im also having this on several servers running adaptec 2405 and xenserver
Did you guys managed to resolve this? Are this broken controller cards which need to be replaced or this is a software bug?
|
Posted by PCS-Chris, 05-07-2013, 03:11 PM |
What is your kernel and driver version?
|
Posted by Steven, 05-07-2013, 03:48 PM |
I have also had this issue occur multiple times with failing hard drives.
There has been cases where a disk will pass smart and be OK in the raid bios.. but it is so slow it brings the entire array to a crawl and the card eventually aborts the request and spits out that error.
|
Posted by nibb, 05-07-2013, 03:48 PM |
This seems to be a hardware problem since I enabled logging in the BIOS I already get an PCI express error in the IPMI card and the bios.
Im running XenServer 6.0.2
But this is a hardware problem, I just saw some logs in the OS where the controller is disconnected, and in the BIOS of the controller the array was missing, I rebooted and its back online.
I don´t know if its the port or the card failing.
|
Add to Favourites Print this Article
Also Read