Project Home
Project Home
Source Code
Source Code
Documents
Documents
Wiki
Wiki
Discussion Forums
Discussions
Project Information
Project Info
BroadcastCommunity.qnx.com will be offline from May 31 6:00pm until June 2 12:00AM for upcoming system upgrades. For more information please go to https://community.qnx.com/sf/discussion/do/listPosts/projects.bazaar/discussion.bazaar.topc28418
Forum Topic - NEON assembly issue: (21 Items)
   
NEON assembly issue  
NEON Intrinsic

void NEON_Intrinsic (uint8_t * __restrict dest, uint8_t * __restrict src, int n)
{
  int i;
  uint8x8_t rfac = vdup_n_u8 (77);
  uint8x8_t gfac = vdup_n_u8 (151);
  uint8x8_t bfac = vdup_n_u8 (28);
  uint16x8_t  	temp;
  uint8x8x3_t 	rgb;
  uint8x8_t 	result;

  n/=8;

  for (i=0; i<n; i++)
  {
    rgb  = vld3_u8 (src);

    temp = vmull_u8 (rgb.val[0],      rfac);
    temp = vmlal_u8 (temp,rgb.val[1], gfac);
    temp = vmlal_u8 (temp,rgb.val[2], bfac);

    result = vshrn_n_u16 (temp, 8);
    vst1_u8 (dest, result);
    src  += 8*3;

    //printf("result %d\n", result);

    dest += 8;
  }
}

NEON gasm assembly code:
int NEON_Assembly (uint8_t * __restrict dest, uint8_t * __restrict src, int n)
{
	__asm __volatile (
			"lsr         r4, %[n], #3\n\t" 

			"vmov.u8 d3, #77\n\t"
			"vmov.u8 d4, #151\n\t"
			"vmov.u8 d5, #28\n\t"
			"mov     r3, #0\n\t"
			"b 		 .comp\n\t"

			".lp:\n\t"
			  "# load 8 pixels:\n\t"
			  "vld3.8      {d16-d18}, %[src]!\n\t"

			// do the weight average:
			  "vmull.u8    q3, d16, d3\n\t"
			  "vmlal.u8    q3, d17, d4\n\t"
			  "vmlal.u8    q3, d18, d5\n\t"

			  // shift and store:
			  "vshrn.i16   d8, q3, #8\n\t"
			  "vst1.8      {d8}, %[dest]!\n\t"

			  "add r3, r3, #1\n\t"

			".comp:\n\t"
			  "cmp        r3, r4\n\t"
			  "blt        .lp\n\t"


                /* Output */:[dest] "=r"/*"=&r"*/ (dest), [n]/* Symbolic name */ "+r"/* register constraint */ (n)/* C 
variable name*/
				/* Input */ :[src] "r" (src)/*(&src)*/
				/* Clobber */ :"r0", "r1", "r2","r4", "r3", "d16", "d17", "d18", "d3", "d4", "d5", "d8", "q3", "cc","memory"/* "
memory"*/
		        );
	return 1;
}

This is getting compiled but crashing with SIGSEGV error for the assembly line "vst1.8      {d8}, %[dest]!\n\t" I am not
 able to find what is the cause for the same as syntactically it is right !

One more thing is compiler seems to be replacing the code present in volatile section of the code with some crap ! 
please suggest me the steps

I have verified this code against the RVDS generated code and found it correct can anyone help me in understanding why 
it is not working ???

It is really IMPORTANT for me, thanks in advance !
Re: NEON assembly issue  
Completely assembly generated by GCC & GASM for NEON_Assembly function.

NEON_Assembly:
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	stmfd	sp!, {r4, r5, r6, r7, r8, lr}
	fstmfdd	sp!, {d8}
	push {lr}
	bl	mcount
	movw	r0, #:lower16:NEON_Assembly
	mov	r6, r1
	movt	r0, #:upper16:NEON_Assembly
	mov	r1, lr
	mov	r5, lr
	mov	r7, r2
	bl	__cyg_profile_func_enter
@ 86 "C:/ide-4.7-workspace/TestNeonOptimization/TestNeonOptimization.c" 1
	lsr         r4, r7, #3
	vmov.u8 d3, #77
	vmov.u8 d4, #151
	vmov.u8 d5, #28
	mov     r3, #0
	b 		 .comp
	.lp:
	# load 8 pixels:
	vld3.8      {d16-d18}, r6!
	vmull.u8    q3, d16, d3
	vmlal.u8    q3, d17, d4
	vmlal.u8    q3, d18, d5
	vshrn.i16   d8, q3, #8
	vst1.8      {d8}, r6!
	add r3, r3, #1
	.comp:
	cmp        r3, r4
	blt        .lp
	
@ 0 "" 2
	movw	r0, #:lower16:NEON_Assembly
	mov	r1, r5
	movt	r0, #:upper16:NEON_Assembly
	bl	__cyg_profile_func_exit
	mov	r0, #1
	fldmfdd	sp!, {d8}
	ldmfd	sp!, {r4, r5, r6, r7, r8, pc}
	.size	NEON_Assembly, .-NEON_Assembly
	.align	2
	.global	Reference_Code
	.type	Reference_Code, %function
Re: NEON assembly issue  
While I haven't looked at your assembly, I did a quick check if all is
ok with gdb shipped with 6.5.0:
(gdb) show version
GNU gdb 6.8 qnx-nto (rev. 506)

Registers do appear as they should. Note, however, that gdb can not
really separate VFP and NEON presentations since the registers are
shared between the two instruction sets. This is why you will see
multiple sets of registers showing basically the same stuff, only
representing differently, e.g. you will see a set of d? registers and
later q? registers. First 64  bits will be the same in both, but q
represents 128 bits.

So you should be able to do something like this:

(gdb) print $d0
$1 = {u8 = {0, 0, 0, 0, 0, 0, 0, 0}, u16 = {0, 0, 0, 0}, u32 = {0, 0}, 
  u64 = 0, f32 = {0, 0}, f64 = 0}
(gdb) set $d0.u8={1,2,3,4,5,6,7,8}
(gdb) print $d0
$2 = {u8 = {1, 2, 3, 4, 5, 6, 7, 8}, u16 = {513, 1027, 1541, 2055}, u32
= {
    67305985, 134678021}, u64 = 578437695752307201, f32 =
{1.53998961e-36, 
    4.06321607e-34}, f64 = 5.447603722011605e-270}
(gdb) print $q0
$3 = {u8 = {1, 2, 3, 4, 5, 6, 7, 8, 0, 0, 0, 0, 0, 0, 0, 0}, u16 = {513,
1027, 
    1541, 2055, 0, 0, 0, 0}, u32 = {67305985, 134678021, 0, 0}, u64 = {
    578437695752307201, 0}, f32 = {1.53998961e-36, 4.06321607e-34, 0,
0}, 
  f64 = {5.447603722011605e-270, 0}}
(gdb) 


Note, however, that you first must execute at least one NEON instruction
(so the context gets initialized in the kernel) before you can really
manipulate the registers from gdb (or see their real values).


Thanks,

Aleksandar

On Fri, 2010-08-20 at 07:44 -0400, Girisha SG wrote:
> Completely assembly generated by GCC & GASM for NEON_Assembly function.
> 
> NEON_Assembly:
> 	@ args = 0, pretend = 0, frame = 0
> 	@ frame_needed = 0, uses_anonymous_args = 0
> 	stmfd	sp!, {r4, r5, r6, r7, r8, lr}
> 	fstmfdd	sp!, {d8}
> 	push {lr}
> 	bl	mcount
> 	movw	r0, #:lower16:NEON_Assembly
> 	mov	r6, r1
> 	movt	r0, #:upper16:NEON_Assembly
> 	mov	r1, lr
> 	mov	r5, lr
> 	mov	r7, r2
> 	bl	__cyg_profile_func_enter
> @ 86 "C:/ide-4.7-workspace/TestNeonOptimization/TestNeonOptimization.c" 1
> 	lsr         r4, r7, #3
> 	vmov.u8 d3, #77
> 	vmov.u8 d4, #151
> 	vmov.u8 d5, #28
> 	mov     r3, #0
> 	b 		 .comp
> 	.lp:
> 	# load 8 pixels:
> 	vld3.8      {d16-d18}, r6!
> 	vmull.u8    q3, d16, d3
> 	vmlal.u8    q3, d17, d4
> 	vmlal.u8    q3, d18, d5
> 	vshrn.i16   d8, q3, #8
> 	vst1.8      {d8}, r6!
> 	add r3, r3, #1
> 	.comp:
> 	cmp        r3, r4
> 	blt        .lp
> 	
> @ 0 "" 2
> 	movw	r0, #:lower16:NEON_Assembly
> 	mov	r1, r5
> 	movt	r0, #:upper16:NEON_Assembly
> 	bl	__cyg_profile_func_exit
> 	mov	r0, #1
> 	fldmfdd	sp!, {d8}
> 	ldmfd	sp!, {r4, r5, r6, r7, r8, pc}
> 	.size	NEON_Assembly, .-NEON_Assembly
> 	.align	2
> 	.global	Reference_Code
> 	.type	Reference_Code, %function
> 
> 
> 
> _______________________________________________
> 
> General
> http://community.qnx.com/sf/go/post63687
> 

Re: NEON assembly issue  
I could see that the both load & store use the same register r6 !!!
And SIGSEGV is fro the store instruction, also GASM is modifying the instructions within the __asm __volatile section !!
!(As I rememeber I read some where that if the section is with in this qualifier then nothing will done by the GASM). I 
think GASM shipped with momentics 6.5.0 is not acting as per the expectation and quality !
Re: NEON assembly issue  
When I try debugging using following procedure,

1. Enter ntoarm-gdb -b 115200 TesNeonOptimization_g in the command prompt
2. run target qnx 10.90.73.53:8000
3. run upload TestNeonOptimization_g /tmp/TestNeonOptimization
4. run sym TestNeonOptimization_g
5. run b NEON_Assembly
6. run /tmp/TestNeonOptimization_g

getting the SIGSEGV as shown in the attached JPG
Attachment: Image Error.JPG 116.47 KB
Re: NEON assembly issue  
1) Copmile your program with 

make CCOPTS="-g3 -O0"

2) You don't need 'sym' step from your example

3) you don't need -b115200 in your command line

4) you ABSOLUTELY MUST have matching libc library on yourhost and your
target. If you can not update your target, you should use your matching
installation (but do use gdb from 6.5.0). e.g.

set QNX_TARGET=C:\qnx640\target\qnx6

ntoarm-gdb ....

---
Aleksandar


On Tue, 2010-08-24 at 05:19 -0400, Girisha SG wrote:
> When I try debugging using following procedure,
> 
> 1. Enter ntoarm-gdb -b 115200 TesNeonOptimization_g in the command prompt
> 2. run target qnx 10.90.73.53:8000
> 3. run upload TestNeonOptimization_g /tmp/TestNeonOptimization
> 4. run sym TestNeonOptimization_g
> 5. run b NEON_Assembly
> 6. run /tmp/TestNeonOptimization_g
> 
> getting the SIGSEGV as shown in the attached JPG
> 
> 
> 
> _______________________________________________
> 
> General
> http://community.qnx.com/sf/go/post64036

Re: NEON assembly issue  
Sorry, you can not use pre-6.5.0 since you are copmiling for -v7. 

So disregard my suggestion about using qnx640. Don't.

If you can not update your target to 6.5.0 but only want to test your
neon code, you can do the following:

pass -Wl,-I/tmp/libc.so.3 to the linker, e.g. by specifying: 

LDOPTS="-Wl,-I/tmp/libc.so.3"

when doing make:

C:...>make LDOPTS="-Wl,-I/tmp/libc.so.3"

You need to put libc from 6.5.0 on your target in /tmp directory.

Debug. Your executable will use /tmp/libc.so.3 as interpreter and you
will have matching libraries.

Hope this helps,

Aleksandar

On Tue, 2010-08-24 at 05:19 -0400, Girisha SG wrote:
> When I try debugging using following procedure,
> 
> 1. Enter ntoarm-gdb -b 115200 TesNeonOptimization_g in the command prompt
> 2. run target qnx 10.90.73.53:8000
> 3. run upload TestNeonOptimization_g /tmp/TestNeonOptimization
> 4. run sym TestNeonOptimization_g
> 5. run b NEON_Assembly
> 6. run /tmp/TestNeonOptimization_g
> 
> getting the SIGSEGV as shown in the attached JPG
> 
> 
> 
> _______________________________________________
> 
> General
> http://community.qnx.com/sf/go/post64036

Re: NEON assembly issue  
Thanks for the reply.

I have tried the method suggested by you and I am not getting the version mismatch message, however I am getting the 
following warning/errors

1. attach 81936
2. Cannot access memory at address 0xe7c3c99c
3. Program crashes with the below error(as per the comment dest variable is optimized out and program crashes when 
trying to store the values in dest).

9957*stopped,reason="signal-received",signal-name="SIGSEGV",signal-meaning="Segmentation fault",thread-id="1",frame=
{addr="0x0010215c",func="NEON_Assembly",args=[{name="dest",value="<value optimized out>"},{name="src",value="0x1 <
Address 0x1 out of bounds>"},{name="n",value="400"}],file="C:/ide-4.7-workspace/TestNeonOptimization/
TestNeonOptimization.c",fullname="C:/ide-4.7-workspace/TestNeonOptimization/TestNeonOptimization.c",line="86"}
Re: NEON assembly issue  
Does it now run without the debugger?

Can you attach your binary so I can take a quick look?

On Tue, 2010-08-24 at 09:26 -0400, Girisha SG wrote:
> Thanks for the reply.
> 
> I have tried the method suggested by you and I am not getting the version mismatch message, however I am getting the 
following warning/errors
> 
> 1. attach 81936
> 2. Cannot access memory at address 0xe7c3c99c
> 3. Program crashes with the below error(as per the comment dest variable is optimized out and program crashes when 
trying to store the values in dest).
> 
> 9957*stopped,reason="signal-received",signal-name="SIGSEGV",signal-meaning="Segmentation fault",thread-id="1",frame=
{addr="0x0010215c",func="NEON_Assembly",args=[{name="dest",value="<value optimized out>"},{name="src",value="0x1 <
Address 0x1 out of bounds>"},{name="n",value="400"}],file="C:/ide-4.7-workspace/TestNeonOptimization/
TestNeonOptimization.c",fullname="C:/ide-4.7-workspace/TestNeonOptimization/TestNeonOptimization.c",line="86"}
> 
> 
> 
> _______________________________________________
> 
> General
> http://community.qnx.com/sf/go/post64070
> 

Re: NEON assembly issue  
Attachd the binary.

In the previous posts I have attached the Reference code and inline assembly code and I strongly believe there is some 
problem with the GASM :( and it is doing crap work .

For the program crash I am strongly suspecting GASM shipped with momentics 6.5.0.

Working assembly is very important for our project future :(
Attachment: Text TestNeonOptimization_g 74.96 KB
Re: NEON assembly issue  
Your assembly is not good. Assembler gives this warning:

Assembler messages:
...83: Warning: missing operand; zero assumed
...:88: Warning: missing operand; zero assumed


The lines in question correspond to your assembly instructions:

vld3.8      {d16-d18}, r5!
and (not surprisingly):
vst1.8      {d8}, r5!

The latter store causes SEGV.


Try this:
...
"vld3.8      {d16-d18}, [%[src]]!\n\t"
...
"vst1.8      {d8}, [%[dest]]!\n\t"

Observer square brackets. I'm not sure if this is what you wanted
though, but that is the syntax for loading/storing from/to an address
stored in a register.


When compiling, do not ignore warnings - they often give some clue.


HTH,

Aleksandar



On Tue, 2010-08-24 at 09:57 -0400, Girisha SG wrote:
> Attachd the binary.
> 
> In the previous posts I have attached the Reference code and inline assembly code and I strongly believe there is some
 problem with the GASM :( and it is doing crap work .
> 
> For the program crash I am strongly suspecting GASM shipped with momentics 6.5.0.
> 
> Working assembly is very important for our project future :(
> 
> 
> 
> _______________________________________________
> 
> General
> http://community.qnx.com/sf/go/post64081

Re: NEON assembly issue  
Thanks for your suggestion.

I have tried the same and now there is no warning during compilation but instead of crashing during Storing now it 
crashes loading itself :(.

Following is the error,
505*stopped,reason="signal-received",signal-name="SIGSEGV",signal-meaning="Segmentation fault",thread-id="1",frame={addr
="0x00102148",func="NEON_Assembly",args=[{name="dest",value="<value optimized out>"},{name="src",value="0x1 <Address 0x1
 out of bounds>"},{name="n",value="400"}],file="C:/ide-4.7-workspace/TestNeonOptimization/TestNeonOptimization.c",
fullname="C:/ide-4.7-workspace/TestNeonOptimization/TestNeonOptimization.c",line="86"}

During runtime 'src' register should have the proper address but the value it had before crash was 0x1 !!!

Why all these are happening ? Is assembler not intelligent enough to take care of these things ?
Re: NEON assembly issue  
> Thanks for your suggestion.
> 
> I have tried the same and now there is no warning during compilation but 
> instead of crashing during Storing now it crashes loading itself :(.
> 
> Following is the error,
> 505*stopped,reason="signal-received",signal-name="SIGSEGV",signal-meaning="
> Segmentation fault",thread-id="1",frame={addr="0x00102148",func="NEON_Assembly
> ",args=[{name="dest",value="<value optimized out>"},{name="src",value="0x1 <Address 0x1 out of bounds>"},{name="n",
value="400"
> }],file="C:/ide-4.7-workspace/TestNeonOptimization/TestNeonOptimization.c",
> fullname="C:/ide-4.7-workspace/TestNeonOptimization/TestNeonOptimization.c",
> line="86"}
> 
> During runtime 'src' register should have the proper address but the value it 
> had before crash was 0x1 !!!
> 
> Why all these are happening ? Is assembler not intelligent enough to take care
>  of these things ?

Also why assembler is using the same register for src and dest arrays ???
Re: NEON assembly issue  
On Wed, 2010-08-25 at 00:55 -0400, Girisha SG wrote:
> > Thanks for your suggestion.
> > 
> > I have tried the same and now there is no warning during compilation but 
> > instead of crashing during Storing now it crashes loading itself :(.
> > 
> > Following is the error,
> > 505*stopped,reason="signal-received",signal-name="SIGSEGV",signal-meaning="
> > Segmentation fault",thread-id="1",frame={addr="0x00102148",func="NEON_Assembly
> > ",args=[{name="dest",value="<value optimized out>"},{name="src",value="0x1 <Address 0x1 out of bounds>"},{name="n",
value="400"
> > }],file="C:/ide-4.7-workspace/TestNeonOptimization/TestNeonOptimization.c",
> > fullname="C:/ide-4.7-workspace/TestNeonOptimization/TestNeonOptimization.c",
> > line="86"}
> > 
> > During runtime 'src' register should have the proper address but the value it 
> > had before crash was 0x1 !!!
> > 
> > Why all these are happening ? Is assembler not intelligent enough to take care
> >  of these things ?

Probably not. Assembler is a pretty literal translator from mnemonics to
machine code.

Compile your application with "-g -O0" this will leave your 'src' and
'dest' arguments visible to gdb, so you can check what is going on.

> 
> Also why assembler is using the same register for src and dest arrays ???

That I don't know.

I took a second look and it looks like your original %[src] without
extra square brackets was correct, but then I believe both (src) and
(data) should be (*src) and (*data), and constraint should be "m":

...
     "# load 8 pixels:\n\t"
     "vld3.8      {d16-d18}, %[src]!\n\t"
...
     "vst1.8      {d8}, %[dest]!\n\t"
...
/* Output */:[dest] "=m"/*"=&r"*/ (*dest), [n]/* Symbolic name */ "+r"/*
register constraint */ (n)/* C variable name*/
 /* Input */ :[src] "m" (*src)/*(&src)*/


But I can not guarantee this is all that is needed.


---
Aleksandar

Re: NEON assembly issue  
Hi Alexander,
Thanks for the valuable input, it perfectly worked for loop count of 4000 !!!

However it executes without SIGSEGV for loopcount > 24000 but final output by assembly doesnot match with reference code
.

Can you please share the document & examples that helps in writing the robust GASM assembly(Our project will be bit big 
and requires complete understanding of the assembly techniques).

Future of the project using QNX as target OS and Cortex-A8 as target hardware is looking bright !!!
Re: NEON assembly issue  
Most likely because you haven't told it not to.  The constraint would
need to include "&".

        Unless an output operand has the `&' constraint modifier, GCC
        may
        allocate it in the same register as an unrelated input operand,
        on the assumption the inputs are consumed before the outputs are
        produced.  This assumption may be false if the assembler code
        actually consists of more than one instruction.  In such a case,
        use `&' for each output operand that may not overlap an input.
        *Note Modifiers::.

I'd suggest closely consulting

        http://gcc.gnu.org/onlinedocs/gcc-4.4.4/gcc/Constraints.html

Beware, though, that constraints are quite tricky and the documentation
is quite poor.  The only way to be absolutely certain of the meaning of
any particular constraint -- particularly any machine specific ones --
is to read the compiler source code.

Regards,
Neil

On Wed, 2010-08-25 at 00:55 -0400, Girisha SG wrote:
> > Thanks for your suggestion.
> > 
> > I have tried the same and now there is no warning during compilation but 
> > instead of crashing during Storing now it crashes loading itself :(.
> > 
> > Following is the error,
> > 505*stopped,reason="signal-received",signal-name="SIGSEGV",signal-meaning="
> > Segmentation fault",thread-id="1",frame={addr="0x00102148",func="NEON_Assembly
> > ",args=[{name="dest",value="<value optimized out>"},{name="src",value="0x1 <Address 0x1 out of bounds>"},{name="n",
value="400"
> > }],file="C:/ide-4.7-workspace/TestNeonOptimization/TestNeonOptimization.c",
> > fullname="C:/ide-4.7-workspace/TestNeonOptimization/TestNeonOptimization.c",
> > line="86"}
> > 
> > During runtime 'src' register should have the proper address but the value it 
> > had before crash was 0x1 !!!
> > 
> > Why all these are happening ? Is assembler not intelligent enough to take care
> >  of these things ?
> 
> Also why assembler is using the same register for src and dest arrays ???
> 
> 
> 
> 
> _______________________________________________
> 
> General
> http://community.qnx.com/sf/go/post64257
> 
Re: NEON assembly issue  
Thanks for the input.

I had tried this earlier as well as now and with this there is no improvement(Still it crashes with SIGSEGV).

Reading the compiler source and writing the assembly may be really tedious task !!!
Can you please suggest any documentation & examples that will help us in writing large GASM assembly perfectly ?
Re: NEON assembly issue  
If you have the option, it can be simpler to write pure assembly
functions rather than to try to write inline assembly with complex
constraints.  (I realize that this may not be possible.)

Unfortunately, constraints are one area of gcc which is quite poorly
documented.  Other than experimentation, and looking at least at the
machine definitions (where the machine specific constraints are
defined), I can't think of any other helpful hints.  Ryan probably has
better insights, though.

On Thu, 2010-08-26 at 02:25 -0400, Girisha SG wrote:
> Thanks for the input.
> 
> I had tried this earlier as well as now and with this there is no improvement(Still it crashes with SIGSEGV).
> 
> Reading the compiler source and writing the assembly may be really tedious task !!!
> Can you please suggest any documentation & examples that will help us in writing large GASM assembly perfectly ?
> 
> 
> 
> _______________________________________________
> 
> General
> http://community.qnx.com/sf/go/post64491
> 
Re: NEON assembly issue  
Please guide me through writing pure GASM assembly. As I understand it will have different sections and it will have 
some basic template. Also let me know how we can integrate this assembly in the project and call these functions from 
the C functions.

I am versy new to assembly programming, can you please suggest me any links where I can get the really useful tricks for
 writing the assembly.
Re: NEON assembly issue  
Besides Neil's suggestion and looking at gcc source, I would suggest
always looking at two things:

a) generated assembly by cc
b) generated machine code by disassembling generated function and making
sure it makes sense.

Maybe for this sort of thing the right way really is to write your
function in pure asm until you get what you want. Then you can play with
inline assembly and constraints until you get correct output.


On Thu, 2010-08-26 at 02:25 -0400, Girisha SG wrote:
> Thanks for the input.
> 
> I had tried this earlier as well as now and with this there is no improvement(Still it crashes with SIGSEGV).
> 
> Reading the compiler source and writing the assembly may be really tedious task !!!
> Can you please suggest any documentation & examples that will help us in writing large GASM assembly perfectly ?

"writing large GASM assembly perfectly" - I'm wary of the "large" part -
are you sure that is a good (and necessary) idea? In any case, good
luck.


---
Aleksandar
Re: NEON assembly issue  
Please guide me through writing pure GASM assembly. As I understand it will have different sections and it will have 
some basic template. Also let me know how we can integrate this assembly in the project and call these functions from 
the C functions.

I am versy new to assembly programming, can you please suggest me any links where I can get the really useful tricks for
 writing the assembly.