Project Home
Project Home
Trackers
Trackers
Documents
Documents
Wiki
Wiki
Discussion Forums
Discussions
Project Information
Project Info
Forum Topic - Fsys.aha8scsi issues with Adaptec 29160LP Ultra 160 Controller : (12 Items)
   
Fsys.aha8scsi issues with Adaptec 29160LP Ultra 160 Controller  
Pardon me if this gets a bit long winded.

We have been using the Fsys.aha8scsi driver on our production systems for years (more than a decade actually), and are 
in the process of hardware upgrades.  After much evaluation and testing we decided on the Adaptec 29160P PCI based SCSII
 controller.  During all the evaluation and testing we really had no issues or problems withe the driver.  Now that our 
new systems have arrived and are being assembled, we have run into a rather strange set of issues...

Running show_pci reports that the Adaptec controller has vendor ID 9005 and Device ID x80.  So, we've been starting the 
Driver as 'Fsys.aha8scsi -L aha8scsi -D80 -V9005'.  This has been working just fine, until we tried it on one of our new
 systems.  At some point, usually during boot up, the driver prints this out on the console a couple times, and the 
machine locks up...

OSMEvent(0x0000a318, 0x0004, 00000000, ... )

Now here's where it gets weird. In the past we had used an older model Adaptec card and it had a device ID of xCF. In 
playing around with things to try to figure out what would work, on a whim we started the driver with -DCF.  It 
complains of errors parsing the command line, BUT IT WORKS!  And, even though we occasionally get an OSMEvent() lockup 
during boot up, if makes it past the bootup, it seems to be stable from there on.  It will always fail at some point 
with -D80 set.  

Anybody have any idea what could possibly be going on here?

BTW, the Adaptec controller is talking to an external ARC-6060 Series RAID controller (from Areca Technology 
Corporation).  One change in the mix from when we were evaluating and testing, the firmware in the RAID controller has a
 newer version.  Could that be causing this?

Any help or thoughts would be appreciated.

TIA//
-Rob
RE: Fsys.aha8scsi issues with Adaptec 29160LP Ultra 160 Controller  
Looks like the Adaptec CHIM code is reporting an auto configuration
required event.  The driver doesn't seem to do anything with it.  Maybe
the RAID box is resetting the bus?  Do you see the problem with a
standard SCSI drive?


-----Original Message-----
From: Robert Hem [mailto:community-noreply@qnx.com] 
Sent: Wednesday, August 06, 2008 12:44 PM
To: qnx4-community
Subject: Fsys.aha8scsi issues with Adaptec 29160LP Ultra 160 Controller 

Pardon me if this gets a bit long winded.

We have been using the Fsys.aha8scsi driver on our production systems
for years (more than a decade actually), and are in the process of
hardware upgrades.  After much evaluation and testing we decided on the
Adaptec 29160P PCI based SCSII controller.  During all the evaluation
and testing we really had no issues or problems withe the driver.  Now
that our new systems have arrived and are being assembled, we have run
into a rather strange set of issues...

Running show_pci reports that the Adaptec controller has vendor ID 9005
and Device ID x80.  So, we've been starting the Driver as 'Fsys.aha8scsi
-L aha8scsi -D80 -V9005'.  This has been working just fine, until we
tried it on one of our new systems.  At some point, usually during boot
up, the driver prints this out on the console a couple times, and the
machine locks up...

OSMEvent(0x0000a318, 0x0004, 00000000, ... )

Now here's where it gets weird. In the past we had used an older model
Adaptec card and it had a device ID of xCF. In playing around with
things to try to figure out what would work, on a whim we started the
driver with -DCF.  It complains of errors parsing the command line, BUT
IT WORKS!  And, even though we occasionally get an OSMEvent() lockup
during boot up, if makes it past the bootup, it seems to be stable from
there on.  It will always fail at some point with -D80 set.  

Anybody have any idea what could possibly be going on here?

BTW, the Adaptec controller is talking to an external ARC-6060 Series
RAID controller (from Areca Technology Corporation).  One change in the
mix from when we were evaluating and testing, the firmware in the RAID
controller has a newer version.  Could that be causing this?

Any help or thoughts would be appreciated.

TIA//
-Rob

_______________________________________________
QNX4 Community Support
http://community.qnx.com/sf/go/post11490
Re: RE: Fsys.aha8scsi issues with Adaptec 29160LP Ultra 160 Controller  
> Looks like the Adaptec CHIM code is reporting an auto configuration
> required event.  The driver doesn't seem to do anything with it.  Maybe
> the RAID box is resetting the bus?  Do you see the problem with a
> standard SCSI drive?

Yeah, that's probably what's happening, but I'm going to have to get back to you on that.  I have to dig up a SCSI drive
 to try it with.  The RAID controller has a SCSI host interface, but a SATA drive interface.

Are there any command line options that could be set to get the driver to maybe ignore auto configuration requirements?

We're really between a rock and a hard place here.  Twenty-four new systems, that we can't really roll to production 
until we're sure they're stable.
RE: RE: Fsys.aha8scsi issues with Adaptec 29160LP Ultra 160 Controller  
The driver performs an auto configuration when it starts.  We have never
seen a device cause the CHIM to generate an OSMEvent after it is up.
Unfortunately there aren't any command line options that could change
the behavior.


-----Original Message-----
From: Robert Hem [mailto:community-noreply@qnx.com] 
Sent: Monday, August 11, 2008 1:06 PM
To: qnx4-community
Subject: Re: RE: Fsys.aha8scsi issues with Adaptec 29160LP Ultra 160
Controller 

> Looks like the Adaptec CHIM code is reporting an auto configuration
> required event.  The driver doesn't seem to do anything with it.
Maybe
> the RAID box is resetting the bus?  Do you see the problem with a
> standard SCSI drive?

Yeah, that's probably what's happening, but I'm going to have to get
back to you on that.  I have to dig up a SCSI drive to try it with.  The
RAID controller has a SCSI host interface, but a SATA drive interface.

Are there any command line options that could be set to get the driver
to maybe ignore auto configuration requirements?

We're really between a rock and a hard place here.  Twenty-four new
systems, that we can't really roll to production until we're sure
they're stable.

_______________________________________________
QNX4 Community Support
http://community.qnx.com/sf/go/post11640
Re: RE: RE: Fsys.aha8scsi issues with Adaptec 29160LP Ultra 160 Controller  
I've discussed this a bit with my cohorts.  We've never seen any OSMEvent()s before either.  Like I said in my original 
post, we've been using the Fsys.aha8scsi driver for years.  With a number of different RAID controllers (Adaptec, 
Chapporel(sp), Infotrend and the new ARC-6060 w/ the previous firmware version).  They have all worked just fine.  No 
errors what so ever.  In fact we were pretty exited about the performance of the ARC-6060 in the evaluations we did (and
 hopefully can be again).  Around 60 MB/sec sustained read performance.  ~50-55 MB/sec write, and ~45 MB/sec random 
access/multi threaded access, sustained.

As far as talking to raw SCSI drives go, frankly, we gave up on that years ago.  Whenever we'd try hooking one up, we'd 
alway run into seemingly phantom disk block errors, especially when performance/throughput was pushed to the limit.  
Consequently, whenever the need to talk to single disks has arisen (removable backups, etc.), we've turned LUN support 
on in the driver and had the RAID controller act as the middle guy talking to the individual drives.  With each drive 
configured as a separate LUN on the controller.  But, I digress...

Pardon my ignorance, but what is an OSMEvent?  Perhaps if we had some idea what OSMEvent()s are being logged, we could 
figure out a way around it??  Maybe disable something in the RAID controller configuration, or something along those 
lines.

Another bit of evidence that something changed in the ARC-6060 with the newer firmware,  the BIOS would never recognize 
the ARC-6060 w/ the older firmware as a synchronous U160 device, it always showed up as a generic async SCSI drive/
device.  We had to turn off int 11 bio support to get it into U160 mode on the QNX side with the correct drive geometry.
  Needless to say, booting off it was a lost cause.  With the new firmware the BIOS recognizes it correctly, no tweaking
 necessary to get the geometry, etc. correct on the QNX side, and we've even been able to boot off it.  And, run 
successfully when the OSMEvent()s don't show up.

We're so close, but yet so far :-(

Any idea why using -D80 NEVER works, yet -DCF (the wrong device ID) seems to work most of the time?  There's got tp be 
some funky timing going on here.

-Rob

> The driver performs an auto configuration when it starts.  We have never
> seen a device cause the CHIM to generate an OSMEvent after it is up.
> Unfortunately there aren't any command line options that could change
> the behavior.
> 
> 
> -----Original Message-----
> From: Robert Hem [mailto:community-noreply@qnx.com] 
> Sent: Monday, August 11, 2008 1:06 PM
> To: qnx4-community
> Subject: Re: RE: Fsys.aha8scsi issues with Adaptec 29160LP Ultra 160
> Controller 
> 
> > Looks like the Adaptec CHIM code is reporting an auto configuration
> > required event.  The driver doesn't seem to do anything with it.
> Maybe
> > the RAID box is resetting the bus?  Do you see the problem with a
> > standard SCSI drive?
> 
> Yeah, that's probably what's happening, but I'm going to have to get
> back to you on that.  I have to dig up a SCSI drive to try it with.  The
> RAID controller has a SCSI host interface, but a SATA drive interface.
> 
> Are there any command line options that could be set to get the driver
> to maybe ignore auto configuration requirements?
> 
> We're really between a rock and a hard place here.  Twenty-four new
> systems, that we can't really roll to production until we're sure
> they're stable.
> 
> _______________________________________________
> QNX4 Community Support
> http://community.qnx.com/sf/go/post11640


RE: RE: RE: Fsys.aha8scsi issues with Adaptec 29160LP Ultra 160 Controller  
An OSMEvent is something that is returned to our driver by the Adaptec
code. We haven't seen this before, so are not too sure how to handle it.
I am still waiting for the traceinfo output.


-----Original Message-----
From: Robert Hem [mailto:community-noreply@qnx.com] 
Sent: Monday, August 11, 2008 3:43 PM
To: qnx4-community
Subject: Re: RE: RE: Fsys.aha8scsi issues with Adaptec 29160LP Ultra 160
Controller 

I've discussed this a bit with my cohorts.  We've never seen any
OSMEvent()s before either.  Like I said in my original post, we've been
using the Fsys.aha8scsi driver for years.  With a number of different
RAID controllers (Adaptec, Chapporel(sp), Infotrend and the new ARC-6060
w/ the previous firmware version).  They have all worked just fine.  No
errors what so ever.  In fact we were pretty exited about the
performance of the ARC-6060 in the evaluations we did (and hopefully can
be again).  Around 60 MB/sec sustained read performance.  ~50-55 MB/sec
write, and ~45 MB/sec random access/multi threaded access, sustained.

As far as talking to raw SCSI drives go, frankly, we gave up on that
years ago.  Whenever we'd try hooking one up, we'd alway run into
seemingly phantom disk block errors, especially when
performance/throughput was pushed to the limit.  Consequently, whenever
the need to talk to single disks has arisen (removable backups, etc.),
we've turned LUN support on in the driver and had the RAID controller
act as the middle guy talking to the individual drives.  With each drive
configured as a separate LUN on the controller.  But, I digress...

Pardon my ignorance, but what is an OSMEvent?  Perhaps if we had some
idea what OSMEvent()s are being logged, we could figure out a way around
it??  Maybe disable something in the RAID controller configuration, or
something along those lines.

Another bit of evidence that something changed in the ARC-6060 with the
newer firmware,  the BIOS would never recognize the ARC-6060 w/ the
older firmware as a synchronous U160 device, it always showed up as a
generic async SCSI drive/device.  We had to turn off int 11 bio support
to get it into U160 mode on the QNX side with the correct drive
geometry.  Needless to say, booting off it was a lost cause.  With the
new firmware the BIOS recognizes it correctly, no tweaking necessary to
get the geometry, etc. correct on the QNX side, and we've even been able
to boot off it.  And, run successfully when the OSMEvent()s don't show
up.

We're so close, but yet so far :-(

Any idea why using -D80 NEVER works, yet -DCF (the wrong device ID)
seems to work most of the time?  There's got tp be some funky timing
going on here.

-Rob

> The driver performs an auto configuration when it starts.  We have
never
> seen a device cause the CHIM to generate an OSMEvent after it is up.
> Unfortunately there aren't any command line options that could change
> the behavior.
> 
> 
> -----Original Message-----
> From: Robert Hem [mailto:community-noreply@qnx.com] 
> Sent: Monday, August 11, 2008 1:06 PM
> To: qnx4-community
> Subject: Re: RE: Fsys.aha8scsi issues with Adaptec 29160LP Ultra 160
> Controller 
> 
> > Looks like the Adaptec CHIM code is reporting an auto configuration
> > required event.  The driver doesn't seem to do anything with it.
> Maybe
> > the RAID box is resetting the bus?  Do you see the problem with a
> > standard SCSI drive?
> 
> Yeah, that's probably what's happening, but I'm going to have to get
> back to you on that.  I have to dig up a SCSI drive to try it with.
The
> RAID controller has a SCSI host interface, but a SATA drive interface.
> 
> Are there any command line options that could be set to get the driver
> to maybe ignore auto configuration requirements?
> 
> We're really between a rock and a hard place here.  Twenty-four new
> systems, that we...
View Full Message
Re: RE: RE: RE: Fsys.aha8scsi issues with Adaptec 29160LP Ultra 160 Controller  
traceinfo output?  Do you mean the exact OSMEvent() messages?  I'll record those, and get back to you.

If you're looking for something else, what?  Start the driver in verbose mode?

If you're looking for output from the traceinfo facilities, I'm at a bit of a loss of how to get that.  The machine 
locks up.

-Rob

> An OSMEvent is something that is returned to our driver by the Adaptec
> code. We haven't seen this before, so are not too sure how to handle it.
> I am still waiting for the traceinfo output.
> 
> 



RE: RE: RE: RE: Fsys.aha8scsi issues with Adaptec 29160LP Ultra 160 Controller  
OK, I missed the bit about the machine locking up. I meant the traceinfo
utility output, but you can't get that.

There is not much else that we can do without the hardware, so if you
want to send the hardware, we can take a look at it.


-----Original Message-----
From: Robert Hem [mailto:community-noreply@qnx.com] 
Sent: Tuesday, August 12, 2008 11:44 AM
To: qnx4-community
Subject: Re: RE: RE: RE: Fsys.aha8scsi issues with Adaptec 29160LP Ultra
160 Controller 

traceinfo output?  Do you mean the exact OSMEvent() messages?  I'll
record those, and get back to you.

If you're looking for something else, what?  Start the driver in verbose
mode?

If you're looking for output from the traceinfo facilities, I'm at a bit
of a loss of how to get that.  The machine locks up.

-Rob

> An OSMEvent is something that is returned to our driver by the Adaptec
> code. We haven't seen this before, so are not too sure how to handle
it.
> I am still waiting for the traceinfo output.
> 
> 





_______________________________________________
QNX4 Community Support
http://community.qnx.com/sf/go/post11692
Re: RE: RE: RE: RE: Fsys.aha8scsi issues with Adaptec 29160LP Ultra 160 Controller  
Not that this is going to be of much help, but here are the two OSMEvent messages...

OSMEvent(0x0000a258, 0x0004, 00000000, ...)
OSMEvent(0x0000a258, 0x0001, 00000000, ...)

That's with the -D80 parameter.

With the -DCF parameter the first value is 0x0000a318

Just a WAG, but the 0x0004 & 0x0001 are possibly partition numbers??

====

It's been a while since we sent you guys any hardware.  It's possible.  We can certainly spare one of the new machines.  What do I have to do?

Do you have my real e-mail address? (I'm hesitant to post it in an open forum).

-Rob

> OK, I missed the bit about the machine locking up. I meant the traceinfo
> utility output, but you can't get that.
> 
> There is not much else that we can do without the hardware, so if you
> want to send the hardware, we can take a look at it.



RE: RE: RE: RE: RE: Fsys.aha8scsi issues with Adaptec 29160LP Ultra 160 Controller  
Given the nature of this issue, it may be best to go thru your QNX
Technical Support so that this issue can be dealt with officially.
Assuming you have a support plan with QNX, either enter the information
in the QNX Technical Support Portal (look under the "Support and
Services tab on the page for the "Support Portal" entry) or send an
email to support@qnx.com .


-----Original Message-----
From: Robert Hem [mailto:community-noreply@qnx.com] 
Sent: Tuesday, August 12, 2008 3:55 PM
To: qnx4-community
Subject: Re: RE: RE: RE: RE: Fsys.aha8scsi issues with Adaptec 29160LP
Ultra 160 Controller 

Not that this is going to be of much help, but here are the two OSMEvent
messages...

OSMEvent(0x0000a258, 0x0004, 00000000, ...)
OSMEvent(0x0000a258, 0x0001, 00000000, ...)

That's with the -D80 parameter.

With the -DCF parameter the first value is 0x0000a318

Just a WAG, but the 0x0004 & 0x0001 are possibly partition numbers??

====

It's been a while since we sent you guys any hardware.  It's possible.
We can certainly spare one of the new machines.  What do I have to do?

Do you have my real e-mail address? (I'm hesitant to post it in an open
forum).

-Rob

> OK, I missed the bit about the machine locking up. I meant the
traceinfo
> utility output, but you can't get that.
> 
> There is not much else that we can do without the hardware, so if you
> want to send the hardware, we can take a look at it.





_______________________________________________
QNX4 Community Support
http://community.qnx.com/sf/go/post11707
Re: RE: RE: RE: RE: RE: Fsys.aha8scsi issues with Adaptec 29160LP Ultra 160 Controller  
The saga continued... 
We were beginning to think we'd have to prove the existence of dark energy, bring self healing tin-wires into existence,
 or do something equally bizarre to resolve this.  But then, a light suddenly shown on what looks to be a road home...

As it turns out, the very first new system we were attempting to get running, was the (only) one that has persistently 
exhibited OSMEvent() issues.  The next 3 or 4 new systems we attempted to bring up, initially had OMSEvent() related 
failures, but eventually did boot up successfully.  Once successfully booted, we just left the up, moving on to run disk
 performance and capacity tests on them.  There are several variations of RAID configurations and total capacities we 
needed to get verified.  

Then we turned backed to the first system and wasted a lot of time trying to get a handle on the OMSEvent() issues.  
Things just deteriorated.   The system eventually wouldn't even boot at all.

So, we started working on getting some more new system up and running.  It was then that we stumbled into what is 
probably the root cause of all this. As it happened, the battery for the next RAID controller (backup sub-system) was 
completely dead.   The RAID couldn't even be configured, even with line power on.  The controller just flashed "low 
battery"  on the front panel.  So, it was left connected in the machine and the whole works left plugged into line power
.  The next day the battery was recharged.  We configured the RAID, booted the machine and installed QNX.  Absolutely no
 problems with any of that.  We even tried warm and cold re-booting the machine several times.  No failures.

From then on we have made it a matter of procedure to leave each newly assembled machine powered on over night before 
attempting RAID configuration, and OS boot install.  No problems ever since.

As for that first machine... Ray (one of my cohorts, and guy who did most of the work) systematically swapped out 
hardware in it.  It was a bad CPU.

In conclusion, it seems all the backup batteries we received for our new RAID systems were nearly or completely dead.  
It was the nearly dead ones that created all the mystery.  The RAID controller just doesn't perform at 100%, if the 
backup battery is too low.  To verify, we (Ray) removed the battery backup sub-system from a RAID controller (it's on an
 optional daughter board).  Everything worked swimmingly running off just line power.

We're still a bit concerned over what could happen when the RAID backup batteries start getting old.  Forgo battery 
backup??  Anyway, it looks like we can move forward for now.

Just thought I'd let you-all know there is hope for a light at the end of the tunnel 8-)

-----
And Hugh, thanks for the help and the offer, but I don't think we'll be needing to send you any hardware... this time ;-
)

-Rob
RE: RE: RE: RE: RE: RE: Fsys.aha8scsi issues with Adaptec 29160LP Ultra 160 Controller  
Good news! Thanks for letting us know.

Hugh.

-----Original Message-----
From: Robert Hem [mailto:community-noreply@qnx.com] 
Sent: Wednesday, August 27, 2008 11:35 AM
To: qnx4-community
Subject: Re: RE: RE: RE: RE: RE: Fsys.aha8scsi issues with Adaptec
29160LP Ultra 160 Controller 

The saga continued... 
We were beginning to think we'd have to prove the existence of dark
energy, bring self healing tin-wires into existence, or do something
equally bizarre to resolve this.  But then, a light suddenly shown on
what looks to be a road home...

As it turns out, the very first new system we were attempting to get
running, was the (only) one that has persistently exhibited OSMEvent()
issues.  The next 3 or 4 new systems we attempted to bring up, initially
had OMSEvent() related failures, but eventually did boot up
successfully.  Once successfully booted, we just left the up, moving on
to run disk performance and capacity tests on them.  There are several
variations of RAID configurations and total capacities we needed to get
verified.  

Then we turned backed to the first system and wasted a lot of time
trying to get a handle on the OMSEvent() issues.  Things just
deteriorated.   The system eventually wouldn't even boot at all.

So, we started working on getting some more new system up and running.
It was then that we stumbled into what is probably the root cause of all
this. As it happened, the battery for the next RAID controller (backup
sub-system) was completely dead.   The RAID couldn't even be configured,
even with line power on.  The controller just flashed "low battery"  on
the front panel.  So, it was left connected in the machine and the whole
works left plugged into line power.  The next day the battery was
recharged.  We configured the RAID, booted the machine and installed
QNX.  Absolutely no problems with any of that.  We even tried warm and
cold re-booting the machine several times.  No failures.

From then on we have made it a matter of procedure to leave each newly
assembled machine powered on over night before attempting RAID
configuration, and OS boot install.  No problems ever since.

As for that first machine... Ray (one of my cohorts, and guy who did
most of the work) systematically swapped out hardware in it.  It was a
bad CPU.

In conclusion, it seems all the backup batteries we received for our new
RAID systems were nearly or completely dead.  It was the nearly dead
ones that created all the mystery.  The RAID controller just doesn't
perform at 100%, if the backup battery is too low.  To verify, we (Ray)
removed the battery backup sub-system from a RAID controller (it's on an
optional daughter board).  Everything worked swimmingly running off just
line power.

We're still a bit concerned over what could happen when the RAID backup
batteries start getting old.  Forgo battery backup??  Anyway, it looks
like we can move forward for now.

Just thought I'd let you-all know there is hope for a light at the end
of the tunnel 8-)

-----
And Hugh, thanks for the help and the offer, but I don't think we'll be
needing to send you any hardware... this time ;-)

-Rob

_______________________________________________
QNX4 Community Support
http://community.qnx.com/sf/go/post12490