foundry27 : Post

Forum Topic - SH BSPs and NMIs: (7 Items)

View: as

Douglas Bailey

06/11/2009 4:33 PM

post31559

The InterruptEnable/InterruptDisable functions on SH are implemented by
setting the SR.IMASK field to mask out interrupts.  Which works fine for
everything except NMIs -- they are not masked.  If an NMI happens and
gets processed while we think we have interrupts disabled, we're
pooched.

Easy enough to say we don't support the use of NMIs but we provide NMI
callouts in our startup lib and many of our board support packages make
use of these callouts.  I suppose it might be valid for a customer to
use an NMI as a reset mechanism, but it cannot be valid to continue to
run after an NMI has been processed, because an NMI could have
interrupted the kernel when it thought interrupts were disabled.

The fact that we provide NMI support in our BSPs makes me wonder if
customers have been using NMIs when they shouldn't.  This is tempered by
the fact that the NMI masking/unmasking done in the NMI callouts looks
to be quite busted, so I'm hoping that a customer that used NMIs
incorrectly would have run into other problems, that would have already
forced us to notice this issue.

I guess I'm wondering where to go from here.  I don't like the fact that
the BSPs we provide suggest that the use of NMIs is safe when in fact
they aren't, but I don't know for certain if any customers are using
NMIs.

Any thoughts?

Chris Hobbs

Re: SH BSPs and NMIs

Chris Hobbs

06/11/2009 4:51 PM

post31565

Re: SH BSPs and NMIs

On Thu, 2009-06-11 at 16:34 -0400, Douglas Bailey wrote:
> The InterruptEnable/InterruptDisable functions on SH are implemented by
> setting the SR.IMASK field to mask out interrupts.  Which works fine for
> everything except NMIs -- they are not masked.  If an NMI happens and
> gets processed while we think we have interrupts disabled, we're
> pooched.
> 
> Easy enough to say we don't support the use of NMIs but we provide NMI
> callouts in our startup lib and many of our board support packages make
> use of these callouts.  I suppose it might be valid for a customer to
> use an NMI as a reset mechanism, but it cannot be valid to continue to
> run after an NMI has been processed, because an NMI could have
> interrupted the kernel when it thought interrupts were disabled.
> 
> The fact that we provide NMI support in our BSPs makes me wonder if
> customers have been using NMIs when they shouldn't.  This is tempered by
> the fact that the NMI masking/unmasking done in the NMI callouts looks
> to be quite busted, so I'm hoping that a customer that used NMIs
> incorrectly would have run into other problems, that would have already
> forced us to notice this issue.
> 
> I guess I'm wondering where to go from here.  I don't like the fact that
> the BSPs we provide suggest that the use of NMIs is safe when in fact
> they aren't, but I don't know for certain if any customers are using
> NMIs.
> 
> Any thoughts?

Nothing to do with the bug but I will make clear in the assumptions we
make about the manner in which applications make use of our Safe Kernel
that NMIs must not be used (even in x86). Is that reasonable,
particularly in the x86 case---I heard that even the x86 support for
NMIs might be flakey.

Chris
> 
> 
> _______________________________________________
> OSTech
> http://community.qnx.com/sf/go/post31559
>

Colin Burgess(deleted)

Re: SH BSPs and NMIs

Colin Burgess(deleted)

06/11/2009 4:52 PM

post31566

Re: SH BSPs and NMIs

Looking at the callout code, it would appear that there are cases where we
can get spurious NMIs, since it immediately leaves the callout with a -1
return value.

So I would assume that we NEED to have the callout to handle said spurious
interrupts, otherwise we'll get an unexpected interrupt exception (since they
by definition can't be masked)

The callout appears to do something to mask further nmis (Alanis would love
this chip) but that doesn't help user code that calls InterruptDisable()

Douglas Bailey wrote:
> The InterruptEnable/InterruptDisable functions on SH are implemented by
> setting the SR.IMASK field to mask out interrupts.  Which works fine for
> everything except NMIs -- they are not masked.  If an NMI happens and
> gets processed while we think we have interrupts disabled, we're
> pooched.
> 
> Easy enough to say we don't support the use of NMIs but we provide NMI
> callouts in our startup lib and many of our board support packages make
> use of these callouts.  I suppose it might be valid for a customer to
> use an NMI as a reset mechanism, but it cannot be valid to continue to
> run after an NMI has been processed, because an NMI could have
> interrupted the kernel when it thought interrupts were disabled.
> 
> The fact that we provide NMI support in our BSPs makes me wonder if
> customers have been using NMIs when they shouldn't.  This is tempered by
> the fact that the NMI masking/unmasking done in the NMI callouts looks
> to be quite busted, so I'm hoping that a customer that used NMIs
> incorrectly would have run into other problems, that would have already
> forced us to notice this issue.
> 
> I guess I'm wondering where to go from here.  I don't like the fact that
> the BSPs we provide suggest that the use of NMIs is safe when in fact
> they aren't, but I don't know for certain if any customers are using
> NMIs.
> 
> Any thoughts?
> 
> 
> _______________________________________________
> OSTech
> http://community.qnx.com/sf/go/post31559
> 

-- 
cburgess@qnx.com

Brian Stecher

06/12/2009 8:41 AM

post31603

Re: SH BSPs and NMIs

More importantly, it doesn't help kernel code that disables interrupts.
Even with the -1 return, the intr entry/exit sequence still going to 
potentially muck with data structures that are in an inconsistent 
state - if the hardware can throw spurious NMI's we'll have to do a 
special handler inside kernel.s itself that just does a return from 
interrupt as quickly as possible and do a code inspect pass over all 
the exception entry/exit sequences to make sure that nothing that's 
damaged in the NMI handler is needed by them.

	Brian

On Thu, Jun 11, 2009 at 04:52:22PM -0400, Colin Burgess wrote:
> Looking at the callout code, it would appear that there are cases where we
> can get spurious NMIs, since it immediately leaves the callout with a -1
> return value.
> 
> So I would assume that we NEED to have the callout to handle said spurious
> interrupts, otherwise we'll get an unexpected interrupt exception (since they
> by definition can't be masked)
> 
> The callout appears to do something to mask further nmis (Alanis would love
> this chip) but that doesn't help user code that calls InterruptDisable()
> 
> Douglas Bailey wrote:
> > The InterruptEnable/InterruptDisable functions on SH are implemented by
> > setting the SR.IMASK field to mask out interrupts.  Which works fine for
> > everything except NMIs -- they are not masked.  If an NMI happens and
> > gets processed while we think we have interrupts disabled, we're
> > pooched.
> > 
> > Easy enough to say we don't support the use of NMIs but we provide NMI
> > callouts in our startup lib and many of our board support packages make
> > use of these callouts.  I suppose it might be valid for a customer to
> > use an NMI as a reset mechanism, but it cannot be valid to continue to
> > run after an NMI has been processed, because an NMI could have
> > interrupted the kernel when it thought interrupts were disabled.
> > 
> > The fact that we provide NMI support in our BSPs makes me wonder if
> > customers have been using NMIs when they shouldn't.  This is tempered by
> > the fact that the NMI masking/unmasking done in the NMI callouts looks
> > to be quite busted, so I'm hoping that a customer that used NMIs
> > incorrectly would have run into other problems, that would have already
> > forced us to notice this issue.
> > 
> > I guess I'm wondering where to go from here.  I don't like the fact that
> > the BSPs we provide suggest that the use of NMIs is safe when in fact
> > they aren't, but I don't know for certain if any customers are using
> > NMIs.
> > 
> > Any thoughts?
> > 
> > 
> > _______________________________________________
> > OSTech
> > http://community.qnx.com/sf/go/post31559
> > 
> 
> -- 
> cburgess@qnx.com
> 
> _______________________________________________
> OSTech
> http://community.qnx.com/sf/go/post31566
> 

-- 
Brian Stecher (bstecher@qnx.com)        QNX Software Systems
phone: +1 (613) 591-0931 (voice)        175 Terence Matthews Cr.
       +1 (613) 591-3579 (fax)          Kanata, Ontario, Canada K2M 1W8

Douglas Bailey

06/12/2009 10:38 AM

post31615

Re: SH BSPs and NMIs

Can you explain in more detail how intr entry/exit can muck with kernel
data structures?  

The generated code is read-only as far as kernel data structures are
concerned, with the exception of saving the register context.  However,
the generated code exits through intr_done -- I guess that could be
problematic with an inconsistent kernel state...

btw, NMIs are held pending while SR.BL is set, so that should make the
exception entry/exit fairly straight forward.



On Fri, 2009-06-12 at 08:42 -0400, Brian Stecher wrote:
> More importantly, it doesn't help kernel code that disables interrupts.
> Even with the -1 return, the intr entry/exit sequence still going to 
> potentially muck with data structures that are in an inconsistent 
> state - if the hardware can throw spurious NMI's we'll have to do a 
> special handler inside kernel.s itself that just does a return from 
> interrupt as quickly as possible and do a code inspect pass over all 
> the exception entry/exit sequences to make sure that nothing that's 
> damaged in the NMI handler is needed by them.
> 
> 	Brian
> 
> On Thu, Jun 11, 2009 at 04:52:22PM -0400, Colin Burgess wrote:
> > Looking at the callout code, it would appear that there are cases where we
> > can get spurious NMIs, since it immediately leaves the callout with a -1
> > return value.
> > 
> > So I would assume that we NEED to have the callout to handle said spurious
> > interrupts, otherwise we'll get an unexpected interrupt exception (since they
> > by definition can't be masked)
> > 
> > The callout appears to do something to mask further nmis (Alanis would love
> > this chip) but that doesn't help user code that calls InterruptDisable()
> > 
> > Douglas Bailey wrote:
> > > The InterruptEnable/InterruptDisable functions on SH are implemented by
> > > setting the SR.IMASK field to mask out interrupts.  Which works fine for
> > > everything except NMIs -- they are not masked.  If an NMI happens and
> > > gets processed while we think we have interrupts disabled, we're
> > > pooched.
> > > 
> > > Easy enough to say we don't support the use of NMIs but we provide NMI
> > > callouts in our startup lib and many of our board support packages make
> > > use of these callouts.  I suppose it might be valid for a customer to
> > > use an NMI as a reset mechanism, but it cannot be valid to continue to
> > > run after an NMI has been processed, because an NMI could have
> > > interrupted the kernel when it thought interrupts were disabled.
> > > 
> > > The fact that we provide NMI support in our BSPs makes me wonder if
> > > customers have been using NMIs when they shouldn't.  This is tempered by
> > > the fact that the NMI masking/unmasking done in the NMI callouts looks
> > > to be quite busted, so I'm hoping that a customer that used NMIs
> > > incorrectly would have run into other problems, that would have already
> > > forced us to notice this issue.
> > > 
> > > I guess I'm wondering where to go from here.  I don't like the fact that
> > > the BSPs we provide suggest that the use of NMIs is safe when in fact
> > > they aren't, but I don't know for certain if any customers are using
> > > NMIs.
> > > 
> > > Any thoughts?
> > > 
> > > 
> > > _______________________________________________
> > > OSTech
> > > http://community.qnx.com/sf/go/post31559
> > > 
> > 
> > -- 
> > cburgess@qnx.com
> > 
> > _______________________________________________
> > OSTech
> > http://community.qnx.com/sf/go/post31566
>...

Adam Mallory

Re: SH BSPs and NMIs

Adam Mallory

06/11/2009 5:23 PM

post31571

Re: SH BSPs and NMIs

While not SH4, NMIs are used by some customers on the x86 (IGT I  
believe does).

Some customers abuse it as a "important interrupt" and they are  
usually corrected but in other cases, NMI is used as trigger for a  
failover mechanism (hardware gone wrong/bad state) and there has to be  
some software control.

Are you saying the NMI handling is busted everywhere or just SH4?
-Adam

On 11-Jun-09, at 4:34 PM, Douglas Bailey wrote:

>
> The InterruptEnable/InterruptDisable functions on SH are implemented  
> by
> setting the SR.IMASK field to mask out interrupts.  Which works fine  
> for
> everything except NMIs -- they are not masked.  If an NMI happens and
> gets processed while we think we have interrupts disabled, we're
> pooched.
>
> Easy enough to say we don't support the use of NMIs but we provide NMI
> callouts in our startup lib and many of our board support packages  
> make
> use of these callouts.  I suppose it might be valid for a customer to
> use an NMI as a reset mechanism, but it cannot be valid to continue to
> run after an NMI has been processed, because an NMI could have
> interrupted the kernel when it thought interrupts were disabled.
>
> The fact that we provide NMI support in our BSPs makes me wonder if
> customers have been using NMIs when they shouldn't.  This is  
> tempered by
> the fact that the NMI masking/unmasking done in the NMI callouts looks
> to be quite busted, so I'm hoping that a customer that used NMIs
> incorrectly would have run into other problems, that would have  
> already
> forced us to notice this issue.
>
> I guess I'm wondering where to go from here.  I don't like the fact  
> that
> the BSPs we provide suggest that the use of NMIs is safe when in fact
> they aren't, but I don't know for certain if any customers are using
> NMIs.
>
> Any thoughts?
>
>
> _______________________________________________
> OSTech
> http://community.qnx.com/sf/go/post31559
>

--
Cheers,
    Adam

   QNX Software Systems
   [ amallory@qnx.com ]
   ---------------------------------------------------
   With a PC, I always felt limited by the software available.
   On Unix, I am limited only by my knowledge.
       --Peter J. Schoenster

Colin Burgess(deleted)

06/11/2009 6:09 PM

post31577

RE: SH BSPs and NMIs

it's not handled on anything except x86, and even then it's deferred until kernel exit.

________________________________

From: Adam Mallory [mailto:community-noreply@qnx.com]
Sent: Thu 6/11/2009 5:23 PM
To: ostech-core_os
Subject: Re: SH BSPs and NMIs



While not SH4, NMIs are used by some customers on the x86 (IGT I 
believe does).

Some customers abuse it as a "important interrupt" and they are 
usually corrected but in other cases, NMI is used as trigger for a 
failover mechanism (hardware gone wrong/bad state) and there has to be 
some software control.

Are you saying the NMI handling is busted everywhere or just SH4?
-Adam

On 11-Jun-09, at 4:34 PM, Douglas Bailey wrote:

>
> The InterruptEnable/InterruptDisable functions on SH are implemented 
> by
> setting the SR.IMASK field to mask out interrupts.  Which works fine 
> for
> everything except NMIs -- they are not masked.  If an NMI happens and
> gets processed while we think we have interrupts disabled, we're
> pooched.
>
> Easy enough to say we don't support the use of NMIs but we provide NMI
> callouts in our startup lib and many of our board support packages 
> make
> use of these callouts.  I suppose it might be valid for a customer to
> use an NMI as a reset mechanism, but it cannot be valid to continue to
> run after an NMI has been processed, because an NMI could have
> interrupted the kernel when it thought interrupts were disabled.
>
> The fact that we provide NMI support in our BSPs makes me wonder if
> customers have been using NMIs when they shouldn't.  This is 
> tempered by
> the fact that the NMI masking/unmasking done in the NMI callouts looks
> to be quite busted, so I'm hoping that a customer that used NMIs
> incorrectly would have run into other problems, that would have 
> already
> forced us to notice this issue.
>
> I guess I'm wondering where to go from here.  I don't like the fact 
> that
> the BSPs we provide suggest that the use of NMIs is safe when in fact
> they aren't, but I don't know for certain if any customers are using
> NMIs.
>
> Any thoughts?
>
>
> _______________________________________________
> OSTech
> http://community.qnx.com/sf/go/post31559
>

--
Cheers,
    Adam

   QNX Software Systems
   [ amallory@qnx.com ]
   ---------------------------------------------------
   With a PC, I always felt limited by the software available.
   On Unix, I am limited only by my knowledge.
       --Peter J. Schoenster





_______________________________________________
OSTech
http://community.qnx.com/sf/go/post31571

Attachment:

winmail.dat 5.22 KB

Return

The text you entered is not a valid object ID
More Information
Object IDs begin with an object prefix and end with a number. For example, if you enter
artf2345
the application will jump directly to an artifact with the ID artf2345. Some valid object prefixes are:
artf	for an artifact
doc	for a document
page	for a project page
topc	for a discussion topic
wiki	for a wiki page