Andrew Pierson
|
M7 + gcc4.2.1: devb-ram SEGV
|
Andrew Pierson
06/19/2008 1:04 PM
post9440
|
M7 + gcc4.2.1: devb-ram SEGV
When using the devb-ram driver, it will SEGV a short time after it is started. The memory addresses it is crashing on
are within the shared memory it allocates. We verified this by inspecting the core for the offending instruction and
register contents it used. You can replicate this problem by adding the following patch to sim.c and running devb-ram
by itself.
$ svn diff sim.c
Index: sim.c
===================================================================
--- sim.c (revision 10453)
+++ sim.c (working copy)
@@ -521,6 +521,19 @@
return EXIT_FAILURE;
}
+ // *** This will crash at a random point. ***
+ unsigned int i;
+ for( i=0 ; i<mapSize; i+=8 )
+ {
+ fprintf( stderr, "%p: %08x %08x %08x %08x %08x %08x %08x %08x/n",
+ addr + i,
+ addr[i ],addr[i+1],addr[i+2],addr[i+3],
+ addr[i+4],addr[i+5],addr[i+6],addr[i+7]
+ );
+ }
+ // Never get here.
+ fprintf( stderr, "/n---END OF MEMORY---/n" );
+
if( ram_ctrl.cflags & RAM_CFLAG_SCAN ) {
if( ( hba = ram_alloc_hba( ) ) == NULL ) {
return( CAM_FAILURE );
===================================================================
Here is how we start it:
devb-ram cam quiet ram capacity=57344 dos exe=all blk cache=128k
Since we are accessing memory that we allocated, why are we losing permission to access this memory? How can we work
around this problem?
Thanks,
Andy
|
|
|
dave carlson(deleted)
|
Re: devb-ram SEGV with ARM FCSE shmctl
|
dave carlson(deleted)
08/20/2008 7:58 AM
post12066
|
Re: devb-ram SEGV with ARM FCSE shmctl
Using the 640M5, we are continuing to see the problem Andy reported here several weeks ago. Since our devb-ram was very
old and very hacked,
I have compiled the trunk version of devb-ram so that it will be "compatible" with all the 6.4 runtime pieces (libcam,
etc.) Thus, the hacking is limited to the tiny diff below that adds the arm shmctl so that a ramdisk can be larger than
tiny. We need a min of 28MB which is not happening without SHMCTL_GLOBAL.
Findings:
1. As a virgin driver straight from the trunk *will not crash*. Except that I am on ARM with FCSE so my ramdisk needed
to be derated from 28MB to 12MB.
When I replace the cam_calloc version with the canonical shm_open/shmctl/mmap, I get find:
2. If I run the shmctl version with FreeMem:88Mb/128Mb, it will have a SEGV in the shm memory array nearly immediately
. (Within a few thousand accesses -- most often in memcpy.) It is as if the mapping is "lost".
3. If I kill some of my running apps (so that pidin info shows 98+MB free rather than 60MB free), the devb-ram *will
not crash*. It appears that core files in /dev/shmem -- will cause the same shm failure -- ie., my idle apps are not
actively causing the failure -- it appears to be simply how much memory is in use.
Enclosed is a diff -u for the trivial change to devb-ram/sim.c required to demonstrate the problem with ARM FSCE.
This is a show stopper for us.
|
|
|
Sunil Kittur(deleted)
|
Re: devb-ram SEGV with ARM FCSE shmctl
|
Sunil Kittur(deleted)
08/20/2008 8:41 AM
post12069
|
Re: devb-ram SEGV with ARM FCSE shmctl
When did you initially report this problem?
Did you get a TicketID or PR number?
Sunil.
dave carlson wrote:
> Using the 640M5, we are continuing to see the problem Andy reported here several weeks ago. Since our devb-ram was
very old and very hacked,
> I have compiled the trunk version of devb-ram so that it will be "compatible" with all the 6.4 runtime pieces (libcam,
etc.) Thus, the hacking is limited to the tiny diff below that adds the arm shmctl so that a ramdisk can be larger
than tiny. We need a min of 28MB which is not happening without SHMCTL_GLOBAL.
>
> Findings:
> 1. As a virgin driver straight from the trunk *will not crash*. Except that I am on ARM with FCSE so my ramdisk
needed to be derated from 28MB to 12MB.
>
> When I replace the cam_calloc version with the canonical shm_open/shmctl/mmap, I get find:
>
> 2. If I run the shmctl version with FreeMem:88Mb/128Mb, it will have a SEGV in the shm memory array nearly
immediately. (Within a few thousand accesses -- most often in memcpy.) It is as if the mapping is "lost".
>
> 3. If I kill some of my running apps (so that pidin info shows 98+MB free rather than 60MB free), the devb-ram *will
not crash*. It appears that core files in /dev/shmem -- will cause the same shm failure -- ie., my idle apps are not
actively causing the failure -- it appears to be simply how much memory is in use.
>
> Enclosed is a diff -u for the trivial change to devb-ram/sim.c required to demonstrate the problem with ARM FSCE.
>
> This is a show stopper for us.
>
>
>
> _______________________________________________
> OSTech
> http://community.qnx.com/sf/go/post12066
|
|
|
dave carlson(deleted)
|
Re: devb-ram SEGV with ARM FCSE shmctl
|
dave carlson(deleted)
08/20/2008 9:42 AM
post12080
|
Re: devb-ram SEGV with ARM FCSE shmctl
No, we have been (trying) not to use our priority support ($$$) for 6.4 pre-release integration issues. The response on
the forums has been our main support. So, we reported this as topic 3140 back in June.
You guys fixed the unlink panic (topic 3141) in 24 hours. But 3140 has languished.
We have not been pursuing it due to the libc.3 ABI change and the fact that the bsp libraries (libcam, cam-disk, io-char
, etc.) had not been released yet. Once we obtained the 6.4/libc.3 compilation of the bsp libs, I have started to
characterize the problem further.
It was at this point that simply saying devb-ram was broken was not going help you track the problem.
I thought reducing the problem to a 20 line diff to trunk would help. :-)
BTW, I have thought about our apps scribbling on the page tables -- but I reject this. 1) Our apps are idle. Idle = 99
%. The failure is nearly immediate. 2) Our apps are rock solid. A random scribbler would (should) cause random apps
to fail. Even kernel panics. Devb-ram is the only death. Note also, we have ~10 other shmctl global memory segments
in use that "never fail". Whatever the interaction, it seems unique to the "large" devb-ram chunk.
Thanks (as always) for your help.
dave
|
|
|
Sunil Kittur(deleted)
|
Re: devb-ram SEGV with ARM FCSE shmctl
|
Sunil Kittur(deleted)
08/20/2008 12:36 PM
post12108
|
Re: devb-ram SEGV with ARM FCSE shmctl
Does this happen with a 6.3.2 procnto?
The memory manager went through a major overhaul in 6.3.2 and a few
ARM shm_ctl() things fell through the cracks so I was curious if it
was something new in 6.4.0.
Also, am I correct in assuming all I need to do to reproduce it is
build a devb-ram with your sim.c modifications then run:
devb-ram cam quiet ram capacity=57344 dos exe=all blk cache=128k
Simply doing this will cause devb-ram to sigsegv?
Sunil.
dave carlson wrote:
> No, we have been (trying) not to use our priority support ($$$) for 6.4 pre-release integration issues. The response
on the forums has been our main support. So, we reported this as topic 3140 back in June.
>
> You guys fixed the unlink panic (topic 3141) in 24 hours. But 3140 has languished.
>
> We have not been pursuing it due to the libc.3 ABI change and the fact that the bsp libraries (libcam, cam-disk, io-
char, etc.) had not been released yet. Once we obtained the 6.4/libc.3 compilation of the bsp libs, I have started to
characterize the problem further.
>
> It was at this point that simply saying devb-ram was broken was not going help you track the problem.
>
> I thought reducing the problem to a 20 line diff to trunk would help. :-)
>
> BTW, I have thought about our apps scribbling on the page tables -- but I reject this. 1) Our apps are idle. Idle =
99%. The failure is nearly immediate. 2) Our apps are rock solid. A random scribbler would (should) cause random apps
to fail. Even kernel panics. Devb-ram is the only death. Note also, we have ~10 other shmctl global memory segments
in use that "never fail". Whatever the interaction, it seems unique to the "large" devb-ram chunk.
>
> Thanks (as always) for your help.
>
> dave
>
>
>
>
> _______________________________________________
> OSTech
> http://community.qnx.com/sf/go/post12080
>
|
|
|
dave carlson(deleted)
|
RE: devb-ram SEGV with ARM FCSE shmctl
|
dave carlson(deleted)
08/20/2008 12:51 PM
post12110
|
RE: devb-ram SEGV with ARM FCSE shmctl
Sunil,
We have that odd-ball patched OS from last year + the MsgCurrent
prio-inversion fix.
Our shipping OS is:
2008/02/08-12:42:22EST
6.3.2
The devb-ram has never failed with this kernel. NB: I have not tested
the 6.4trunk devb-ram with the old kernel/libcam/etc. But the old
devb-ram and the trunk devb-ram+shmctl fail identically.
Note also, this kernel was recently patched with the MsgCurrent
priqo-inv fix.
The command line below (as Andy supplied) will fail as described.
Our filesystem init code (mkdosfs, cp small files to /dos, etc.) will
fail 50% of the time during the small file copies.
To force a failure for the other 50%, I do:
while true ; do
cp /usr/bin/someBigFile /dos
echo done
rm /dos/someBigFile
done
That loop fails immediately or runs forever -- I use 20minutes as "proof
of life".
dave
-----Original Message-----
From: Sunil Kittur [mailto:community-noreply@qnx.com]
Sent: Wednesday, August 20, 2008 12:37 PM
To: ostech-core_os
Subject: Re: devb-ram SEGV with ARM FCSE shmctl
Does this happen with a 6.3.2 procnto?
The memory manager went through a major overhaul in 6.3.2 and a few
ARM shm_ctl() things fell through the cracks so I was curious if it
was something new in 6.4.0.
Also, am I correct in assuming all I need to do to reproduce it is
build a devb-ram with your sim.c modifications then run:
devb-ram cam quiet ram capacity=57344 dos exe=all blk cache=128k
Simply doing this will cause devb-ram to sigsegv?
Sunil.
dave carlson wrote:
> No, we have been (trying) not to use our priority support ($$$) for
6.4 pre-release integration issues. The response on the forums has been
our main support. So, we reported this as topic 3140 back in June.
>
> You guys fixed the unlink panic (topic 3141) in 24 hours. But 3140
has languished.
>
> We have not been pursuing it due to the libc.3 ABI change and the fact
that the bsp libraries (libcam, cam-disk, io-char, etc.) had not been
released yet. Once we obtained the 6.4/libc.3 compilation of the bsp
libs, I have started to characterize the problem further.
>
> It was at this point that simply saying devb-ram was broken was not
going help you track the problem.
>
> I thought reducing the problem to a 20 line diff to trunk would help.
:-)
>
> BTW, I have thought about our apps scribbling on the page tables --
but I reject this. 1) Our apps are idle. Idle = 99%. The failure is
nearly immediate. 2) Our apps are rock solid. A random scribbler would
(should) cause random apps to fail. Even kernel panics. Devb-ram is
the only death. Note also, we have ~10 other shmctl global memory
segments in use that "never fail". Whatever the interaction, it seems
unique to the "large" devb-ram chunk.
>
> Thanks (as always) for your help.
>
> dave
>
>
>
>
> _______________________________________________
> OSTech
> http://community.qnx.com/sf/go/post12080
>
_______________________________________________
OSTech
http://community.qnx.com/sf/go/post12108
|
|
|
Sunil Kittur(deleted)
|
Re: RE: devb-ram SEGV with ARM FCSE shmctl
|
Sunil Kittur(deleted)
08/21/2008 4:38 PM
post12225
|
Re: RE: devb-ram SEGV with ARM FCSE shmctl
OK, I think I've found the bug...
It's due to the variable page size support.
If you need a quick workaround, use the -m~v option
to procnto to disable the variable page size support.
Basically what is happening is that the mappings can
be coalesced from 4K small pages to 1M section
mappings. However, the cpu_pte_split/merge code
was unconditionally setting the L1 descriptors with
the process current domain id instead of taking into
account that these PRIV/LOWERPROT mappings have
no domain id. What this means is that if the domain
happens to get stolen, the L1 entries' domain field
is no longer valid, so we will get a domain fault when
we next access it.
To fix the problem we need to preserve the L1
domain field that has been set when we split/merge
L1 entries.
I'll be posting a diff for review in the OSrev forum
(this was assigned PR60487).
Sunil.
|
|
|
dave carlson(deleted)
|
RE: RE: devb-ram SEGV with ARM FCSE shmctl
|
dave carlson(deleted)
08/21/2008 5:04 PM
post12228
|
RE: RE: devb-ram SEGV with ARM FCSE shmctl
Sunil,
That sounds pretty good -- Occam is satisfied. I will try your
workaround. But I am on vaca next week so it may take a bit.
Thanks for your attention on this matter.
A pleasure doing business with you. :-)
dave
-----Original Message-----
From: Sunil Kittur [mailto:community-noreply@qnx.com]
Sent: Thursday, August 21, 2008 4:39 PM
To: ostech-core_os
Subject: Re: RE: devb-ram SEGV with ARM FCSE shmctl
OK, I think I've found the bug...
It's due to the variable page size support.
If you need a quick workaround, use the -m~v option
to procnto to disable the variable page size support.
Basically what is happening is that the mappings can
be coalesced from 4K small pages to 1M section
mappings. However, the cpu_pte_split/merge code
was unconditionally setting the L1 descriptors with
the process current domain id instead of taking into
account that these PRIV/LOWERPROT mappings have
no domain id. What this means is that if the domain
happens to get stolen, the L1 entries' domain field
is no longer valid, so we will get a domain fault when
we next access it.
To fix the problem we need to preserve the L1
domain field that has been set when we split/merge
L1 entries.
I'll be posting a diff for review in the OSrev forum
(this was assigned PR60487).
Sunil.
_______________________________________________
OSTech
http://community.qnx.com/sf/go/post12225
|
|
|
dave carlson(deleted)
|
RE: TicketID85015 - devb-ram SEGV with ARM FCSE shmctl
|
dave carlson(deleted)
08/21/2008 6:09 PM
post12230
|
RE: TicketID85015 - devb-ram SEGV with ARM FCSE shmctl
Adrian/Sunil,
That the -m~v appears to fix the problem. (ya!)
I will test the procnto patch on the other side of my holiday.
Nice job popping our showstopper.
dave
|
|
|
|