foundry27 : Post

Forum Topic - -Wc,-mno-fp-moves option: (11 Items)

View: as

Matt Ferraro(deleted)

03/04/2009 12:22 PM

post23496

Hello,

I'm trying to understand what this option (-Wc,-mno-fp-moves) does.  But I can't for the life of me find any 
documentation anywhere - either on qnx.com (http://www.qnx.com/developers/docs/6.3.2/neutrino/utilities/q/qcc.html)  

or gnu.org.

I understand that -m is an emulation option.  

I'm trying to ensure that floating point emulation does NOT occur.  Here is my complete command string:

C:/QNX632/host/win32/x86/usr/bin/qcc 
-Vgcc_ntoppc
-c 
-O
-Wc,-mno-fp-moves
-Wc,-Wall
-Wc,-Wno-parentheses
-DNDEBUG   
-I.
-Ic:/data/qnx_ppc_bsp1/src/apps/fpu_test/ppc/be 
-Ic:/data/qnx_ppc_bsp1/src/apps/fpu_test/ppc 
-Ic:/data/qnx_ppc_bsp1/src/apps/fpu_test 
-IC:/QNX632/target/qnx6/usr/include/xilinx 
-IC:/QNX632/target/qnx6/usr/include    
-EB
-DVARIANT_be  
c:/data/qnx_ppc_bsp1/src/apps/fpu_test/main.c 

Also,  what is the purpose of the options
  -DNDEBUG & -DVARIANT_BE

Neil Schellenberger(deleted)

03/04/2009 2:49 PM

post23524

Re: -Wc,-mno-fp-moves option

One of the compiler guys might be able to give more background, but I'll
take a stab at answering this.

The flag is a QNX-specific addition to gcc.  It prevents the code
generator from using floating point registers for integer moves.  (I
think that the code generator does this to reduce register pressure.  We
try to avoid it because it may force an FP context switch, which is even
more expensive than the original problem :-)

So, long story short, the -mfp-no-moves flag has no real bearing on
floating point code, per se.

On Wed, 2009-03-04 at 12:22 -0500, Matt Ferraro wrote:
> Hello,
> 
> I'm trying to understand what this option (-Wc,-mno-fp-moves) does.  But I can't for the life of me find any 
documentation anywhere - either on qnx.com (http://www.qnx.com/developers/docs/6.3.2/neutrino/utilities/q/qcc.html)  
> 
> or gnu.org.
> 
> I understand that -m is an emulation option.  
> 
> I'm trying to ensure that floating point emulation does NOT occur.  Here is my complete command string:
> 
> C:/QNX632/host/win32/x86/usr/bin/qcc 
> -Vgcc_ntoppc
> -c 
> -O
> -Wc,-mno-fp-moves
> -Wc,-Wall
> -Wc,-Wno-parentheses
> -DNDEBUG   
> -I.
> -Ic:/data/qnx_ppc_bsp1/src/apps/fpu_test/ppc/be 
> -Ic:/data/qnx_ppc_bsp1/src/apps/fpu_test/ppc 
> -Ic:/data/qnx_ppc_bsp1/src/apps/fpu_test 
> -IC:/QNX632/target/qnx6/usr/include/xilinx 
> -IC:/QNX632/target/qnx6/usr/include    
> -EB
> -DVARIANT_be  
> c:/data/qnx_ppc_bsp1/src/apps/fpu_test/main.c 
> 
> Also,  what is the purpose of the options
>   -DNDEBUG & -DVARIANT_BE
> 
> 
> _______________________________________________
> General
> http://community.qnx.com/sf/go/post23496
>

Colin Burgess(deleted)

03/04/2009 2:57 PM

post23528

Re: -Wc,-mno-fp-moves option

Back at the time it -mno-fp-moves was implemented, the compiler was deciding to place things like PhRect_t in floating
point registers - it could then do a single 64bit floating point move, to assign them around.

However as you can imagine, this was causing serious slowdowns on powerpc machines that used fpu emulation, so we
made the compiler an offer it couldn't refuse.

Neil Schellenberger wrote:
> One of the compiler guys might be able to give more background, but I'll
> take a stab at answering this.
> 
> The flag is a QNX-specific addition to gcc.  It prevents the code
> generator from using floating point registers for integer moves.  (I
> think that the code generator does this to reduce register pressure.  We
> try to avoid it because it may force an FP context switch, which is even
> more expensive than the original problem :-)
> 
> So, long story short, the -mfp-no-moves flag has no real bearing on
> floating point code, per se.
> 
> On Wed, 2009-03-04 at 12:22 -0500, Matt Ferraro wrote:
>> Hello,
>>
>> I'm trying to understand what this option (-Wc,-mno-fp-moves) does.  But I can't for the life of me find any 
documentation anywhere - either on qnx.com (http://www.qnx.com/developers/docs/6.3.2/neutrino/utilities/q/qcc.html)  
>>
>> or gnu.org.
>>
>> I understand that -m is an emulation option.  
>>
>> I'm trying to ensure that floating point emulation does NOT occur.  Here is my complete command string:
>>
>> C:/QNX632/host/win32/x86/usr/bin/qcc 
>> -Vgcc_ntoppc
>> -c 
>> -O
>> -Wc,-mno-fp-moves
>> -Wc,-Wall
>> -Wc,-Wno-parentheses
>> -DNDEBUG   
>> -I.
>> -Ic:/data/qnx_ppc_bsp1/src/apps/fpu_test/ppc/be 
>> -Ic:/data/qnx_ppc_bsp1/src/apps/fpu_test/ppc 
>> -Ic:/data/qnx_ppc_bsp1/src/apps/fpu_test 
>> -IC:/QNX632/target/qnx6/usr/include/xilinx 
>> -IC:/QNX632/target/qnx6/usr/include    
>> -EB
>> -DVARIANT_be  
>> c:/data/qnx_ppc_bsp1/src/apps/fpu_test/main.c 
>>
>> Also,  what is the purpose of the options
>>   -DNDEBUG & -DVARIANT_BE
>>
>>
>> _______________________________________________
>> General
>> http://community.qnx.com/sf/go/post23496
>>
> 
> _______________________________________________
> General
> http://community.qnx.com/sf/go/post23524
> 

-- 
cburgess@qnx.com

Steve Reid

03/04/2009 3:02 PM

post23530

RE: -Wc,-mno-fp-moves option

Sounds like something to add to the docs. I'll create a PR.

Steve Reid (stever@qnx.com)
Technical Editor
QNX Software Systems

Malte Mundt(deleted)

04/23/2009 6:33 AM

post27849

Re: RE: -Wc,-mno-fp-moves option

Interesting! Also sounds like something that could go away in 1-2 years, when non-FPU PPCs disappear. AFAIK the only PPC
 without FPU is the 405.

At least I'm a small step further now in my ever-lasting quest to find out why QNX 4.25 Photon seems a bit faster than 
QNX Neutrino's. :-)

- Malte

Neil Schellenberger(deleted)

04/23/2009 1:52 PM

post27922

Re: RE: -Wc,-mno-fp-moves option

For Neutrino there are actually potentially negative performance
implications for doing this on a hardware FPU.  Since we spill the FPU
context in a lazy fashion, it is entirely possible that executing an FPU
instruction will require the context to be reloaded.

On Thu, 2009-04-23 at 06:33 -0400, Malte Mundt wrote:
> Interesting! Also sounds like something that could go away in 1-2 years, when non-FPU PPCs disappear. AFAIK the only 
PPC without FPU is the 405.
> 
> At least I'm a small step further now in my ever-lasting quest to find out why QNX 4.25 Photon seems a bit faster than
 QNX Neutrino's. :-)
> 
> - Malte
> 
> _______________________________________________
> General
> http://community.qnx.com/sf/go/post27849
>

Malte Mundt(deleted)

04/24/2009 5:23 AM

post27964

Re: RE: -Wc,-mno-fp-moves option

Hey Neil,

can you elaborate on this? What does it mean - spill FPU context in a lazy fashion, and reload context?

This is very interesting because I'm trying to find out what's going on regarding PR 67633.

Are you saying the using FPU opcodes causes... what. Exceptions?


- Malte

Neil Schellenberger(deleted)

04/24/2009 1:29 PM

post28038

Re: RE: -Wc,-mno-fp-moves option

Hi Malte,

The FPU context is only saved (spilled) when an FPU instruction is
executed from a thread which is different from the last thread to use
the FPU on that CPU.  (This is called "lazy" since it defers the spill
until the very last possible moment.)  This can potentially greatly slow
down this particular optimization (using FP registers for 64 bit integer
moves).

Regards,
Neil

On Fri, 2009-04-24 at 05:23 -0400, Malte Mundt wrote:
> Hey Neil,
> 
> can you elaborate on this? What does it mean - spill FPU context in a lazy fashion, and reload context?
> 
> This is very interesting because I'm trying to find out what's going on regarding PR 67633.
> 
> Are you saying the using FPU opcodes causes... what. Exceptions?
> 
> 
> - Malte
> 
> _______________________________________________
> General
> http://community.qnx.com/sf/go/post27964
>

Malte Mundt(deleted)

04/28/2009 5:57 AM

post28220

Re: RE: -Wc,-mno-fp-moves option

> The FPU context is only saved (spilled) when an FPU instruction is
> executed from a thread which is different from the last thread to use
> the FPU on that CPU. 

Ah, this means it gets saved in the moment when a different thread is using the FPU, which in turn means that this 
particular optimization would be much slower. Correct?

But how does the kernel (I assume it's the kernel that does the saving of the FPU context) know that a different thread 
is executing FPU instructions? Is the kernel invoked every time an FPU instruction is used? Or does the kernel somehow 
program the CPU to generate an exception the first time a different thread is using the FPU?


- Malte

Sunil Kittur(deleted)

04/28/2009 7:32 AM

post28223

Re: -Wc,-mno-fp-moves option

The kernel keeps a state variable (per-cpu) in actives_fpu[] that records
which thread currently "owns" the FPU.

In essence the following actions are performed for the lazy context switch:

Initially the FPU is disabled so the first FPU instruction causes a trap.
The kernel allocates an FPU save area and initialises it with default values.
Because actives_fpu[RUNCPU] is NULL and the thread has an allocated FPU save
area we load the FPU registers, set actives_fpu[RUNCPU] to the this thread
and restart the FPU operation.

If a thread context switch occurs, actives_fpu[RUNCPU] remains unchanged,
but the kernel disables the FPU because actives_fpu[RUNCPU] is different to
the currently active thread:

- if the new thread does no FPU operations, everything is fine.
   If we subsequently switch back to the original thread, the FPU is still
   disabled so we trap, but since actives_fpu[RUNCPU] indicates the thread
   owns the FPU, we can simply re-enable the FPU and restart the instruction.

- if the new thread does an FPU operation, we trap and allocate an FPU save
   area for it and initialise it with default values.
   Since actives_fpu[RUNCPU] is non-NULL, we save the FPU register context in
   that save area, then load the FPU registers from the new thread's save area.
   We set actives_fpu[RUNCPU] to the new thread, re-enable the FPU and restart
   the instruction.

This means we'll incur at least the cost of an FPU-disabled exception whenever
we context switch back to a thread that is actively using the FPU, and will
incur the additional cost of an FPU register save and re-load if we repeatedly
switch between threads that are actively using the FPU.

This state machine is implemented in the cpu-specific code, involving the
relevant trap handlers in kernel.s to detect the various FPU exceptions based
on the FPU enable/disable state and actives_fpu[RUNCPU].

	Sunil.

Malte Mundt wrote:
>>The FPU context is only saved (spilled) when an FPU instruction is
>>executed from a thread which is different from the last thread to use
>>the FPU on that CPU. 
> 
> 
> Ah, this means it gets saved in the moment when a different thread is using the FPU, which in turn means that this 
particular optimization would be much slower. Correct?
> 
> But how does the kernel (I assume it's the kernel that does the saving of the FPU context) know that a different 
thread is executing FPU instructions? Is the kernel invoked every time an FPU instruction is used? Or does the kernel 
somehow program the CPU to generate an exception the first time a different thread is using the FPU?
> 
> 
> - Malte
> 
> 
> _______________________________________________
> General
> http://community.qnx.com/sf/go/post28220
>

Malte Mundt(deleted)

05/07/2009 8:07 AM

post28878

Re: -Wc,-mno-fp-moves option

Sunil, this is an excellent explanation! Thanks a lot man!!


- Malte

Return

The text you entered is not a valid object ID
More Information
Object IDs begin with an object prefix and end with a number. For example, if you enter
artf2345
the application will jump directly to an artifact with the ID artf2345. Some valid object prefixes are:
artf	for an artifact
doc	for a document
page	for a project page
topc	for a discussion topic
wiki	for a wiki page