Tim Gessner
|
debugging heap corruption
|
Tim Gessner
09/03/2008 11:29 AM
post12771
|
debugging heap corruption
I have a resource mgr which does not have a hardware device, rather it is an error logging component. It uses a
circular buffer to store error events and then writes those events to a MySQL database.
The resource mgr is multithreaded and I have an additional thread at a lower priority to handle writing the events to
the MySQL database.
Whenever the component is accessed by more than 1 thread at a time, it crashes. I have a mutex around the circular
buffer and of course the resource mgr has a mutex built into the io_write mechanism (at least as I understand it).
Regardless, I think I have more than sufficient protection.
When I run in the debugger it runs fine, never crashes. I have tried to use the memory analysis tools but the remote
QNX box freezes (I am windows hosted on my dev machine). I have tried linking with libmalloc_g, but it runs without
crashing.
The crashes are always on a heap access, either an allocation, deletion, or read. I have rewritten the code a number of
times now, moving things around, changing techniques, etc.
I am out of ideas and nearing the end of the project with a component which is unstable. Please help!!!
I am running 6.3.2 with a libc patch provided by QNX.
Thanx
Tim
|
|
|
Shiv Nagarajan(deleted)
|
Re: debugging heap corruption
|
Shiv Nagarajan(deleted)
09/03/2008 11:34 AM
post12772
|
Re: debugging heap corruption
is the crash always in the same place. Is the crash in the
memory allocator (i.e inside libc) or in the application?
Can you post some of the code around which the crash is
occurring? i.e. the calls to the allocator/free near which it is
crashing, and also the code that locks around this.
This is running on a multi-core I suppose? There are locks inside
the allocator code in libc, so the libc memory allocation code
itself is thread-safe.
thanks
shiv
Wed Sep 3 11:34:24 EDT 2008
--> According to Tim Gessner <--
I have a resource mgr which does not have a hardware device, rather it is an error logging component. It uses a
circular buffer to store error events and then writes those events to a MySQL database.
The resource mgr is multithreaded and I have an additional thread at a lower priority to handle writing the events to
the MySQL database.
Whenever the component is accessed by more than 1 thread at a time, it crashes. I have a mutex around the circular
buffer and of course the resource mgr has a mutex built into the io_write mechanism (at least as I understand it).
Regardless, I think I have more than sufficient protection.
When I run in the debugger it runs fine, never crashes. I have tried to use the memory analysis tools but the remote
QNX box freezes (I am windows hosted on my dev machine). I have tried linking with libmalloc_g, but it runs without
crashing.
The crashes are always on a heap access, either an allocation, deletion, or read. I have rewritten the code a number
of times now, moving things around, changing techniques, etc.
I am out of ideas and nearing the end of the project with a component which is unstable. Please help!!!
I am running 6.3.2 with a libc patch provided by QNX.
Thanx
Tim
_______________________________________________
OSTech
http://community.qnx.com/sf/go/post12771
--
****
Shiv Nagarajan,
Kernel Developer, QNX Software Systems,
Ottawa, Canada
****
|
|
|
Tim Gessner
|
Re: debugging heap corruption
|
Tim Gessner
09/04/2008 9:51 AM
post12817
|
Re: debugging heap corruption
Thanx for responding. The crash is always in libc in heap management functions. The location in my code which triggers
is different each time. That is, it is always around a new or delete, but there are several locations in my code where
I allocate memory.
Perhaps the most common location is in my satabase update thread. I'm using the helper functions to create a resource
mgr so I don't directly create those threads. I do create one thread directly to update the MySQL database. All
threads allocate and de-allocate from the heap. Crashes usually come from calls into libc from the thread that I
created directly. Also, they seem to come on deletes most often. (Again though, it is not the same every time so).
I can post all the code if it helps, but here is the most common location. This code makes a copy of the events in the
buffer and then inserts them into the database. It makes a copy in order to limit the time the buffer is lockec.
for ( ; s_pData != NULL && s_pData->pMySql != NULL; )
{
try
{
// if we've processed all the events
// then block
if ( getLastEvent() == s_tLastEventID )
{
::pthread_mutex_lock(&s_mxEvent);
::pthread_cond_wait(&s_condEvent,&s_mxEvent);
::pthread_mutex_unlock(&s_mxEvent);
}
if ( ! s_bDBConnected )
{
__DBReconnect();
while ( ! s_bDBConnected )
{
::sleep(2);
::pthread_testcancel();
__DBReconnect();
}
}
// first make a copy of the events so that the
// log itself is available while we're updating
// the database
::pthread_mutex_lock(&g_mxEvents);
bLocked = true;
UINT uQueTail = 0;
UINT uQueCount = 0;
if ( g_uQueCount == g_uQueSize )
{
uQueTail = g_uQueHead;
uQueCount = g_uQueSize;
}
else
{
uQueTail = g_uQueHead - g_uQueCount;
uQueCount = g_uQueCount;
}
// loog for the previous event id and start there if found
if ( s_tLastEventID > 0 )
{
UINT u = 0;
for ( ; u < g_uQueSize; u++ )
{
if ( g_Events[u].tEventID == s_tLastEventID )
break;
}
if ( u < g_uQueSize )
{
uQueTail = u;
if ( uQueTail > g_uQueHead )
uQueCount = (g_uQueSize - uQueTail) + g_uQueHead;
else
uQueCount = g_uQueHead - uQueTail;
}
}
// increment then read
uQueTail++;
uQueTail &= (g_uQueSize - 1);
EVENTS aEvents;
for ( UINT u = 0; u < uQueCount; u++ )
{
if ( g_Events[uQueTail].tEventID != 0 )
{
char* pszBuf = new char[::strlen(g_Events[uQueTail].pszEventText) + 1];
::strcpy(pszBuf,g_Events[uQueTail].pszEventText);
std::pair<size_t,char*>e(g_Events[uQueTail].tEventID,pszBuf);
aEvents.push_back(e);
}
uQueTail++;
uQueTail &= (g_uQueSize - 1);
}
::pthread_mutex_unlock(&g_mxEvents);
bLocked = false;
// now update the database with the copied log items
for ( EVENTS::iterator i = aEvents.begin(); i != aEvents.end(); ++i )
{
if ( i->second == NULL )
{
s_tLastEventID = i->first;
continue;
}
// the log is one long string - so break
// it out into parts consistent with the
// database schema
tagPARSEDEVENT Event;
__ParseEvent(i->second,Event);
// if the event did not have a valid time
// we'll ignore it - this is a safe mearsure
// against a bogus SQL stmt. we could be more
// granular and still update just with a diff
// datatime - but for now ...
if (( ::strlen(Event.szTime) > 0 )
&& ( Event.pszDesc != NULL ))
{
::memset(s_pData->pszSQLStmt,0,4096);
...
View Full Message
|
|
|
Shiv Nagarajan(deleted)
|
Re: debugging heap corruption
|
Shiv Nagarajan(deleted)
09/04/2008 11:03 AM
post12823
|
Re: debugging heap corruption
Do you know where in the lib it is crashing? If you know where it
is crashing, it may be possible to just see if the crash is
occurring because of a specific error in userland (maybe the
malloc structures have been over-written).
Have you tried catching this in the debugger or getting a core
file. Does the crash occur probably because of a double free, I
notice that you are setting the freed element to NULL post delete
here in this snippet. So it should be easy to see if we are
calling delete with a NULL element ever.
I believe delete just
turns around and calls free (just as new just calls malloc
internally). So if you want you could just implement some stubs
that cover for malloc and free, and then just in turn call
__malloc and __free instead. The library (libc) defines
malloc and free as just stub calls which in turn call
__malloc and __free, so if you define you own malloc/free to do
some checks, before calling __malloc and __free, we may be able
to know more.
shiv
Thu Sep 4 10:59:47 EDT 2008
--> According to Tim Gessner <--
Thanx for responding. The crash is always in libc in heap management functions. The location in my code which
triggers is different each time. That is, it is always around a new or delete, but there are several locations in my
code where I allocate memory.
Perhaps the most common location is in my satabase update thread. I'm using the helper functions to create a resource
mgr so I don't directly create those threads. I do create one thread directly to update the MySQL database. All
threads allocate and de-allocate from the heap. Crashes usually come from calls into libc from the thread that I
created directly. Also, they seem to come on deletes most often. (Again though, it is not the same every time so).
I can post all the code if it helps, but here is the most common location. This code makes a copy of the events in the
buffer and then inserts them into the database. It makes a copy in order to limit the time the buffer is lockec.
for ( ; s_pData != NULL && s_pData->pMySql != NULL; )
{
try
{
// if we've processed all the events
// then block
if ( getLastEvent() == s_tLastEventID )
{
::pthread_mutex_lock(&s_mxEvent);
::pthread_cond_wait(&s_condEvent,&s_mxEvent);
::pthread_mutex_unlock(&s_mxEvent);
}
if ( ! s_bDBConnected )
{
__DBReconnect();
while ( ! s_bDBConnected )
{
::sleep(2);
::pthread_testcancel();
__DBReconnect();
}
}
// first make a copy of the events so that the
// log itself is available while we're updating
// the database
::pthread_mutex_lock(&g_mxEvents);
bLocked = true;
UINT uQueTail = 0;
UINT uQueCount = 0;
if ( g_uQueCount == g_uQueSize )
{
uQueTail = g_uQueHead;
uQueCount = g_uQueSize;
}
else
{
uQueTail = g_uQueHead - g_uQueCount;
uQueCount = g_uQueCount;
}
// loog for the previous event id and start there if found
if ( s_tLastEventID > 0 )
{
UINT u = 0;
for ( ; u < g_uQueSize; u++ )
{
if ( g_Events[u].tEventID == s_tLastEventID )
break;
}
if ( u < g_uQueSize )
{
uQueTail = u;
if ( uQueTail > g_uQueHead )
uQueCount = (g_uQueSize - uQueTail) + g_uQueHead;
else
uQueCount = g_uQueHead - uQueTail;
}
}
// increment then read
uQueTail++;
uQueTail &= (g_uQueSize - 1);
EVENTS aEvents;
for ( UINT u = 0; u < uQueCount; u++ )
{
if ( g_Events[uQueTail].tEventID != 0 )
{
char* pszBuf = new char[::strlen(g_Events[uQueTail].pszEventText) + 1];
...
View Full Message
|
|
|
Tim Gessner
|
RE: debugging heap corruption
|
Tim Gessner
09/04/2008 11:08 AM
post12825
|
RE: debugging heap corruption
It doesn't crash when running under the debugger or when linked with
libmalloc_g. I have been using the core dumps which are always in the
same place, but always in the heap management functions. I will try
overwriting free and malloc and see what I can find.
I discovered that if I run my code as a standard app (as opposed to a
daemon) it doesn't crash. It only crashes if run as a daemon. Here is
the code I use to start it as a daemon. Does anything look suspicious
here?
pid_t pid = 0;
if (( eRunMode != RUNMODE_CONSOLE )
&& ( eRunMode != RUNMODE_DEBUG ))
{
// starting as a daemon
pid = fork();
if ( pid == 0 )
{
// setup the daemon process
::setsid();
::umask(0);
::chdir("/");
struct rlimit rl;
if ( ::getrlimit(RLIMIT_NOFILE, &rl) == -1 )
{
// <<ERROR>>
::slogf(_SLOG_SETCODE(0,0),_SLOG_WARNING,"Failed to get limits.\n");
}
if ( rl.rlim_max == RLIM_INFINITY )
rl.rlim_max = 1024;
for ( unsigned int i = 0; i < rl.rlim_max; i++ )
::close(i);
fd0 = ::open("/dev/null",O_RDWR);
fd1 = ::dup(0);
fd2 = ::dup(0);
if (( fd0 != 0 )
&& ( fd1 != 1 )
&& ( fd2 != 2 ))
{
// <<ERROR>>
::slogf(_SLOG_SETCODE(0,0),_SLOG_WARNING,"Invalid file descriptors.\n");
}
}
}
if ( pid == 0 )
{
::pthread_setname_np(::pthread_self(),"main");
struct sigaction sa;
::memset(&sa,0,sizeof(struct sigaction));
sa.sa_handler = SIG_IGN;
::sigaction(SIGHUP,&sa,&s_PrevHUP);
try
{
if ( Initialize() )
{
Run();
Shutdown();
}
else
::slogf(_SLOG_SETCODE(0,0),_SLOG_WARNING,"EventLog failed to initialize,
closing appliction\n");
}
catch( ... )
{
// <<ERROR>>
::slogf(_SLOG_SETCODE(0,0),_SLOG_WARNING,"An
unknown exception was caught. Shutting down.\n");
}
::sigaction(SIGHUP,&s_PrevHUP,NULL);
::slogf(_SLOG_SETCODE(0,0),_SLOG_WARNING,"EventLog
successfully shutdown.\n");
}
-----Original Message-----
From: Shiv Nagarajan [mailto:community-noreply@qnx.com]
Sent: Thursday, September 04, 2008 8:03 AM
To: ostech-core_os
Subject: Re: debugging heap corruption
Do you know where in the lib it is crashing? If you know where it
is crashing, it may be possible to just see if the crash is
occurring because of a specific error in userland (maybe the
malloc structures have been over-written).
Have you tried catching this in the debugger or getting a core
file. Does the crash occur probably because of a double free, I
notice that you are setting the freed element to NULL post delete
here in this snippet. So it should be easy to see if we are
calling delete with a NULL element ever.
I believe delete just
turns around and calls free (just as new just calls malloc
internally). So if you want you could just implement some stubs
that cover for malloc and free, and then just in turn call
__malloc and __free instead. The library (libc) defines
malloc and free as just stub calls which in turn call
__malloc and __free, so if you define you own malloc/free to do
some checks, before calling __malloc and __free, we may be able
to know more.
shiv
Thu Sep 4 10:59:47 EDT 2008
--> According to Tim Gessner <--
Thanx for responding. The crash is always in libc in heap
management functions. The location in my code which triggers is
different each time. That is, it is always around a new or delete, but
there are several locations in my code where I allocate memory.
Perhaps the most common location is in my satabase update
thread. I'm using the helper functions to create a resource mgr so I
don't directly create those threads. I do create one thread directly to
update the MySQL database. All threads allocate and de-allocate from
the heap. Crashes usually come from calls into libc from the thread
that I created directly. ...
View Full Message
|
|
|
Shiv Nagarajan(deleted)
|
Re: debugging heap corruption
|
Shiv Nagarajan(deleted)
09/04/2008 11:10 AM
post12826
|
Re: debugging heap corruption
Are u multi-threaded?
U cannot mix multithreaded with fork. Probably better of
calling procmgr_daemon instead to detach from the terminal
shiv
Thu Sep 4 11:09:57 EDT 2008
--> According to Tim Gessner <--
It doesn't crash when running under the debugger or when linked with
libmalloc_g. I have been using the core dumps which are always in the
same place, but always in the heap management functions. I will try
overwriting free and malloc and see what I can find.
I discovered that if I run my code as a standard app (as opposed to a
daemon) it doesn't crash. It only crashes if run as a daemon. Here is
the code I use to start it as a daemon. Does anything look suspicious
here?
pid_t pid = 0;
if (( eRunMode != RUNMODE_CONSOLE )
&& ( eRunMode != RUNMODE_DEBUG ))
{
// starting as a daemon
pid = fork();
if ( pid == 0 )
{
// setup the daemon process
::setsid();
::umask(0);
::chdir("/");
struct rlimit rl;
if ( ::getrlimit(RLIMIT_NOFILE, &rl) == -1 )
{
// <<ERROR>>
::slogf(_SLOG_SETCODE(0,0),_SLOG_WARNING,"Failed to get limits.\n");
}
if ( rl.rlim_max == RLIM_INFINITY )
rl.rlim_max = 1024;
for ( unsigned int i = 0; i < rl.rlim_max; i++ )
::close(i);
fd0 = ::open("/dev/null",O_RDWR);
fd1 = ::dup(0);
fd2 = ::dup(0);
if (( fd0 != 0 )
&& ( fd1 != 1 )
&& ( fd2 != 2 ))
{
// <<ERROR>>
::slogf(_SLOG_SETCODE(0,0),_SLOG_WARNING,"Invalid file descriptors.\n");
}
}
}
if ( pid == 0 )
{
::pthread_setname_np(::pthread_self(),"main");
struct sigaction sa;
::memset(&sa,0,sizeof(struct sigaction));
sa.sa_handler = SIG_IGN;
::sigaction(SIGHUP,&sa,&s_PrevHUP);
try
{
if ( Initialize() )
{
Run();
Shutdown();
}
else
::slogf(_SLOG_SETCODE(0,0),_SLOG_WARNING,"EventLog failed to initialize,
closing appliction\n");
}
catch( ... )
{
// <<ERROR>>
::slogf(_SLOG_SETCODE(0,0),_SLOG_WARNING,"An
unknown exception was caught. Shutting down.\n");
}
::sigaction(SIGHUP,&s_PrevHUP,NULL);
::slogf(_SLOG_SETCODE(0,0),_SLOG_WARNING,"EventLog
successfully shutdown.\n");
}
-----Original Message-----
From: Shiv Nagarajan [mailto:community-noreply@qnx.com]
Sent: Thursday, September 04, 2008 8:03 AM
To: ostech-core_os
Subject: Re: debugging heap corruption
Do you know where in the lib it is crashing? If you know where it
is crashing, it may be possible to just see if the crash is
occurring because of a specific error in userland (maybe the
malloc structures have been over-written).
Have you tried catching this in the debugger or getting a core
file. Does the crash occur probably because of a double free, I
notice that you are setting the freed element to NULL post delete
here in this snippet. So it should be easy to see if we are
calling delete with a NULL element ever.
I believe delete just
turns around and calls free (just as new just calls malloc
internally). So if you want you could just implement some stubs
that cover for malloc and free, and then just in turn call
__malloc and __free instead. The library (libc) defines
malloc and free as just stub calls which in turn call
__malloc and __free, so if you define you own malloc/free to do
some checks, before calling __malloc and __free, we may be able
to know more.
shiv
Thu Sep 4 10:59:47 EDT 2008
--> According to Tim Gessner <--
Thanx for responding. The crash is always in libc in heap
management functions. The location in my code which triggers is
different each time. That is, it is always around a new or delete, but
there are several locations in my code where I allocate memory.
Perhaps the most...
View Full Message
|
|
|
Tim Gessner
|
RE: debugging heap corruption
|
Tim Gessner
09/04/2008 1:29 PM
post12848
|
RE: debugging heap corruption
I am multithreaded, all threads are created after I fork. Is that still
a problem? I will look into procmgr_daemon.
Thanx
Tim
-----Original Message-----
From: Shiv Nagarajan [mailto:community-noreply@qnx.com]
Sent: Thursday, September 04, 2008 8:10 AM
To: ostech-core_os
Subject: Re: debugging heap corruption
Are u multi-threaded?
U cannot mix multithreaded with fork. Probably better of
calling procmgr_daemon instead to detach from the terminal
shiv
Thu Sep 4 11:09:57 EDT 2008
--> According to Tim Gessner <--
It doesn't crash when running under the debugger or when linked
with
libmalloc_g. I have been using the core dumps which are always
in the
same place, but always in the heap management functions. I will
try
overwriting free and malloc and see what I can find.
I discovered that if I run my code as a standard app (as opposed
to a
daemon) it doesn't crash. It only crashes if run as a daemon.
Here is
the code I use to start it as a daemon. Does anything look
suspicious
here?
pid_t pid = 0;
if (( eRunMode != RUNMODE_CONSOLE )
&& ( eRunMode != RUNMODE_DEBUG ))
{
// starting as a daemon
pid = fork();
if ( pid == 0 )
{
// setup the daemon process
::setsid();
::umask(0);
::chdir("/");
struct rlimit rl;
if ( ::getrlimit(RLIMIT_NOFILE, &rl) ==
-1 )
{
// <<ERROR>>
::slogf(_SLOG_SETCODE(0,0),_SLOG_WARNING,"Failed to get
limits.\n");
}
if ( rl.rlim_max == RLIM_INFINITY )
rl.rlim_max = 1024;
for ( unsigned int i = 0; i <
rl.rlim_max; i++ )
::close(i);
fd0 = ::open("/dev/null",O_RDWR);
fd1 = ::dup(0);
fd2 = ::dup(0);
if (( fd0 != 0 )
&& ( fd1 != 1 )
&& ( fd2 != 2 ))
{
// <<ERROR>>
::slogf(_SLOG_SETCODE(0,0),_SLOG_WARNING,"Invalid file
descriptors.\n");
}
}
}
if ( pid == 0 )
{
::pthread_setname_np(::pthread_self(),"main");
struct sigaction sa;
::memset(&sa,0,sizeof(struct sigaction));
sa.sa_handler = SIG_IGN;
::sigaction(SIGHUP,&sa,&s_PrevHUP);
try
{
if ( Initialize() )
{
Run();
Shutdown();
}
else
::slogf(_SLOG_SETCODE(0,0),_SLOG_WARNING,"EventLog failed to
initialize,
closing appliction\n");
}
catch( ... )
{
// <<ERROR>>
::slogf(_SLOG_SETCODE(0,0),_SLOG_WARNING,"An
unknown exception was caught. Shutting down.\n");
}
::sigaction(SIGHUP,&s_PrevHUP,NULL);
::slogf(_SLOG_SETCODE(0,0),_SLOG_WARNING,"EventLog
successfully shutdown.\n");
}
-----Original Message-----
From: Shiv Nagarajan [mailto:community-noreply@qnx.com]
Sent: Thursday, September 04, 2008 8:03 AM
To: ostech-core_os
Subject: Re: debugging heap corruption
Do you know where in the lib it is crashing? If you know where
it
is crashing, it may be possible to just see if the crash is
occurring because of a specific error in userland (maybe the
malloc structures have been over-written).
Have you tried catching this in the debugger or getting a core
file. Does the crash occur probably because of a double free, I
notice that you are setting the freed element to NULL post
delete
here in this snippet. So it should be easy to see if we are
calling delete with a NULL element ever.
I believe delete just
turns around and calls free (just as new just calls malloc
internally). So if you want you could just implement some stubs
that cover for malloc and free, and then just in turn call
__malloc and __free instead. The library (libc) defines
malloc and free as just stub calls which in turn call
__malloc and __free, so if you define you own malloc/free to do
some checks, before calling __malloc and __free, we may be able
to know more.
shiv
Thu Sep 4 10:59:47 EDT...
View Full Message
|
|
|
Shiv Nagarajan(deleted)
|
Re: debugging heap corruption
|
Shiv Nagarajan(deleted)
09/04/2008 1:33 PM
post12851
|
Re: debugging heap corruption
yeah, using fork with a multi-threaded process doesnt work.
(whether the threads are created before or after the fork,
the only condition under which it would work, was if you did a
fork->exec and the execed process was multi-threaded)
procmgr_daemon would get you most of the features you would need,
inclding detaching from terminal. and closing relevant
file descriptors and changing the working directory etc
the primary difference is that the fork call changes your pid,
while procmgr_daemon would leave the pid the same.
shiv
Thu Sep 4 13:33:42 EDT 2008
--> According to Tim Gessner <--
I am multithreaded, all threads are created after I fork. Is that still
a problem? I will look into procmgr_daemon.
Thanx
Tim
-----Original Message-----
From: Shiv Nagarajan [mailto:community-noreply@qnx.com]
Sent: Thursday, September 04, 2008 8:10 AM
To: ostech-core_os
Subject: Re: debugging heap corruption
Are u multi-threaded?
U cannot mix multithreaded with fork. Probably better of
calling procmgr_daemon instead to detach from the terminal
shiv
Thu Sep 4 11:09:57 EDT 2008
--> According to Tim Gessner <--
It doesn't crash when running under the debugger or when linked
with
libmalloc_g. I have been using the core dumps which are always
in the
same place, but always in the heap management functions. I will
try
overwriting free and malloc and see what I can find.
I discovered that if I run my code as a standard app (as opposed
to a
daemon) it doesn't crash. It only crashes if run as a daemon.
Here is
the code I use to start it as a daemon. Does anything look
suspicious
here?
pid_t pid = 0;
if (( eRunMode != RUNMODE_CONSOLE )
&& ( eRunMode != RUNMODE_DEBUG ))
{
// starting as a daemon
pid = fork();
if ( pid == 0 )
{
// setup the daemon process
::setsid();
::umask(0);
::chdir("/");
struct rlimit rl;
if ( ::getrlimit(RLIMIT_NOFILE, &rl) ==
-1 )
{
// <<ERROR>>
::slogf(_SLOG_SETCODE(0,0),_SLOG_WARNING,"Failed to get
limits.\n");
}
if ( rl.rlim_max == RLIM_INFINITY )
rl.rlim_max = 1024;
for ( unsigned int i = 0; i <
rl.rlim_max; i++ )
::close(i);
fd0 = ::open("/dev/null",O_RDWR);
fd1 = ::dup(0);
fd2 = ::dup(0);
if (( fd0 != 0 )
&& ( fd1 != 1 )
&& ( fd2 != 2 ))
{
// <<ERROR>>
::slogf(_SLOG_SETCODE(0,0),_SLOG_WARNING,"Invalid file
descriptors.\n");
}
}
}
if ( pid == 0 )
{
::pthread_setname_np(::pthread_self(),"main");
struct sigaction sa;
::memset(&sa,0,sizeof(struct sigaction));
sa.sa_handler = SIG_IGN;
::sigaction(SIGHUP,&sa,&s_PrevHUP);
try
{
if ( Initialize() )
{
Run();
Shutdown();
}
else
::slogf(_SLOG_SETCODE(0,0),_SLOG_WARNING,"EventLog failed to
initialize,
closing appliction\n");
}
catch( ... )
{
// <<ERROR>>
::slogf(_SLOG_SETCODE(0,0),_SLOG_WARNING,"An
unknown exception was caught. Shutting down.\n");
}
::sigaction(SIGHUP,&s_PrevHUP,NULL);
::slogf(_SLOG_SETCODE(0,0),_SLOG_WARNING,"EventLog
successfully shutdown.\n");
}
-----Original Message-----
From: Shiv Nagarajan [mailto:community-noreply@qnx.com]
Sent: Thursday, September 04, 2008 8:03 AM
To: ostech-core_os
Subject: Re: debugging heap corruption
Do you know where in the lib it is crashing? If you know where
it
is crashing, it may be possible to just see if the crash is
occurring because of a specific error in userland (maybe the
malloc structures have been over-written).
Have you tried catching this in the debugger or getting a...
View Full Message
|
|
|
Tim Gessner
|
RE: debugging heap corruption
|
Tim Gessner
09/04/2008 2:28 PM
post12858
|
RE: debugging heap corruption
Thanx, I wasn't aware of that. I will make the changes right away.
Tim
-----Original Message-----
From: Shiv Nagarajan [mailto:community-noreply@qnx.com]
Sent: Thursday, September 04, 2008 10:34 AM
To: ostech-core_os
Subject: Re: debugging heap corruption
yeah, using fork with a multi-threaded process doesnt work.
(whether the threads are created before or after the fork,
the only condition under which it would work, was if you did a
fork->exec and the execed process was multi-threaded)
procmgr_daemon would get you most of the features you would need,
inclding detaching from terminal. and closing relevant
file descriptors and changing the working directory etc
the primary difference is that the fork call changes your pid,
while procmgr_daemon would leave the pid the same.
shiv
Thu Sep 4 13:33:42 EDT 2008
--> According to Tim Gessner <--
I am multithreaded, all threads are created after I fork. Is
that still
a problem? I will look into procmgr_daemon.
Thanx
Tim
-----Original Message-----
From: Shiv Nagarajan [mailto:community-noreply@qnx.com]
Sent: Thursday, September 04, 2008 8:10 AM
To: ostech-core_os
Subject: Re: debugging heap corruption
Are u multi-threaded?
U cannot mix multithreaded with fork. Probably better of
calling procmgr_daemon instead to detach from the terminal
shiv
Thu Sep 4 11:09:57 EDT 2008
--> According to Tim Gessner <--
It doesn't crash when running under the debugger or when
linked
with
libmalloc_g. I have been using the core dumps which are
always
in the
same place, but always in the heap management functions.
I will
try
overwriting free and malloc and see what I can find.
I discovered that if I run my code as a standard app (as
opposed
to a
daemon) it doesn't crash. It only crashes if run as a
daemon.
Here is
the code I use to start it as a daemon. Does anything
look
suspicious
here?
pid_t pid = 0;
if (( eRunMode != RUNMODE_CONSOLE )
&& ( eRunMode != RUNMODE_DEBUG ))
{
// starting as a daemon
pid = fork();
if ( pid == 0 )
{
// setup the daemon process
::setsid();
::umask(0);
::chdir("/");
struct rlimit rl;
if ( ::getrlimit(RLIMIT_NOFILE,
&rl) ==
-1 )
{
// <<ERROR>>
::slogf(_SLOG_SETCODE(0,0),_SLOG_WARNING,"Failed to get
limits.\n");
}
if ( rl.rlim_max ==
RLIM_INFINITY )
rl.rlim_max = 1024;
for ( unsigned int i = 0; i <
rl.rlim_max; i++ )
::close(i);
fd0 =
::open("/dev/null",O_RDWR);
fd1 = ::dup(0);
fd2 = ::dup(0);
if (( fd0 != 0 )
&& ( fd1 != 1 )
&& ( fd2 != 2 ))
{
// <<ERROR>>
::slogf(_SLOG_SETCODE(0,0),_SLOG_WARNING,"Invalid file
descriptors.\n");
}
}
}
if ( pid == 0 )
{
::pthread_setname_np(::pthread_self(),"main");
struct sigaction sa;
::memset(&sa,0,sizeof(struct
sigaction));
sa.sa_handler = SIG_IGN;
::sigaction(SIGHUP,&sa,&s_PrevHUP);
try
{
if ( Initialize() )
{
Run();
Shutdown();
}
else
::slogf(_SLOG_SETCODE(0,0),_SLOG_WARNING,"EventLog
failed to
initialize,
closing appliction\n");
}
catch( ... )
{
// <<ERROR>>
::slogf(_SLOG_SETCODE(0,0),_SLOG_WARNING,"An
unknown exception was caught. Shutting down.\n");
}
::sigaction(SIGHUP,&s_PrevHUP,NULL);
::slogf(_SLOG_SETCODE(0,0),_SLOG_WARNING,"EventLog
successfully shutdown.\n");
}
-----Original Message-----
From: Shiv Nagarajan [mailto:community-noreply@qnx.com]
Sent: Thursday, September 04, 2008 8:03 AM
To: ostech-core_os
Subject: Re: debugging heap corruption
Do you know where in the lib it is...
View Full Message
|
|
|
Neil Schellenberger(deleted)
|
Re: debugging heap corruption
|
Neil Schellenberger(deleted)
09/04/2008 3:06 PM
post12862
|
Re: debugging heap corruption
Do I want/need to know what horrible voodoo we do that makes this
illegal? ;-) I thought that so long as you were single threaded at the
point of fork or made appropriate use of a fork handler you were POSIXly
golden. (Or are all Neutrino programs inherently multi-threaded with
some kind of housekeeping thread(s)?)
On Thu, 2008-09-04 at 13:33 -0400, Shiv Nagarajan wrote:
> yeah, using fork with a multi-threaded process doesnt work.
> (whether the threads are created before or after the fork,
> the only condition under which it would work, was if you did a
> fork->exec and the execed process was multi-threaded)
>
> procmgr_daemon would get you most of the features you would need,
> inclding detaching from terminal. and closing relevant
> file descriptors and changing the working directory etc
>
> the primary difference is that the fork call changes your pid,
> while procmgr_daemon would leave the pid the same.
>
> shiv
> Thu Sep 4 13:33:42 EDT 2008
|
|
|
Shiv Nagarajan(deleted)
|
Re: debugging heap corruption
|
Shiv Nagarajan(deleted)
09/04/2008 3:11 PM
post12863
|
Re: debugging heap corruption
Its basically that we dont set up at fork handlers for things
like statically intialized mutexes (for e.g. in malloc code in
libc). So a process that perfors a fork, would end up with
mutexes in the allocator in libc code in the child process that are
no longer "valid".
if we setup an at fork handler for the malloc mutex (and possibly
anything else that would need re-initialisation), we should be
ok, for a process that forks before creating additional threads.
And, no.. programs are not inherently multi-threaded :)
shiv
Thu Sep 4 15:10:51 EDT 2008
--> According to Neil Schellenberger <--
Do I want/need to know what horrible voodoo we do that makes this
illegal? ;-) I thought that so long as you were single threaded at the
point of fork or made appropriate use of a fork handler you were POSIXly
golden. (Or are all Neutrino programs inherently multi-threaded with
some kind of housekeeping thread(s)?)
On Thu, 2008-09-04 at 13:33 -0400, Shiv Nagarajan wrote:
> yeah, using fork with a multi-threaded process doesnt work.
> (whether the threads are created before or after the fork,
> the only condition under which it would work, was if you did a
> fork->exec and the execed process was multi-threaded)
>
> procmgr_daemon would get you most of the features you would need,
> inclding detaching from terminal. and closing relevant
> file descriptors and changing the working directory etc
>
> the primary difference is that the fork call changes your pid,
> while procmgr_daemon would leave the pid the same.
>
> shiv
> Thu Sep 4 13:33:42 EDT 2008
_______________________________________________
OSTech
http://community.qnx.com/sf/go/post12862
--
****
Shiv Nagarajan,
Kernel Developer, QNX Software Systems,
Ottawa, Canada
****
|
|
|
Sean Boudreau(deleted)
|
Re: debugging heap corruption
|
Sean Boudreau(deleted)
09/04/2008 3:22 PM
post12870
|
Re: debugging heap corruption
Ooooh this sounds like a fine project for the new guy :)
If you check POSIX you'll see that what we do now (nothing)
is fine as the child is only supposed to call async safe
funcs between fork and exec but the last consensus was that
it's probably too dangerous to give people a loaded gun and
not expect them to shoot themselves.
See also PR 59947 if you go looking at the fork code.
-seanb
On Thu, Sep 04, 2008 at 03:11:44PM -0400, Shiv Nagarajan wrote:
> Its basically that we dont set up at fork handlers for things
> like statically intialized mutexes (for e.g. in malloc code in
> libc). So a process that perfors a fork, would end up with
> mutexes in the allocator in libc code in the child process that are
> no longer "valid".
>
> if we setup an at fork handler for the malloc mutex (and possibly
> anything else that would need re-initialisation), we should be
> ok, for a process that forks before creating additional threads.
>
> And, no.. programs are not inherently multi-threaded :)
>
> shiv
> Thu Sep 4 15:10:51 EDT 2008
>
> --> According to Neil Schellenberger <--
> Do I want/need to know what horrible voodoo we do that makes this
> illegal? ;-) I thought that so long as you were single threaded at the
> point of fork or made appropriate use of a fork handler you were POSIXly
> golden. (Or are all Neutrino programs inherently multi-threaded with
> some kind of housekeeping thread(s)?)
>
> On Thu, 2008-09-04 at 13:33 -0400, Shiv Nagarajan wrote:
> > yeah, using fork with a multi-threaded process doesnt work.
> > (whether the threads are created before or after the fork,
> > the only condition under which it would work, was if you did a
> > fork->exec and the execed process was multi-threaded)
> >
> > procmgr_daemon would get you most of the features you would need,
> > inclding detaching from terminal. and closing relevant
> > file descriptors and changing the working directory etc
> >
> > the primary difference is that the fork call changes your pid,
> > while procmgr_daemon would leave the pid the same.
> >
> > shiv
> > Thu Sep 4 13:33:42 EDT 2008
>
>
> _______________________________________________
> OSTech
> http://community.qnx.com/sf/go/post12862
>
> --
> ****
> Shiv Nagarajan,
> Kernel Developer, QNX Software Systems,
> Ottawa, Canada
> ****
>
> _______________________________________________
> OSTech
> http://community.qnx.com/sf/go/post12863
>
|
|
|
Xiaodan Tang(deleted)
|
RE: debugging heap corruption
|
Xiaodan Tang(deleted)
09/04/2008 4:08 PM
post12875
|
RE: debugging heap corruption
If this fork() is the root cause, I am interesting to know why a debug
compiled one wouldn't fail?
-xtang
> -----Original Message-----
> From: Sean Boudreau [mailto:community-noreply@qnx.com]
> Sent: Thursday, September 04, 2008 3:22 PM
> To: ostech-core_os
> Subject: Re: debugging heap corruption
>
>
>
> Ooooh this sounds like a fine project for the new guy :)
>
> If you check POSIX you'll see that what we do now (nothing)
> is fine as the child is only supposed to call async safe
> funcs between fork and exec but the last consensus was that
> it's probably too dangerous to give people a loaded gun and
> not expect them to shoot themselves.
>
> See also PR 59947 if you go looking at the fork code.
>
> -seanb
>
> On Thu, Sep 04, 2008 at 03:11:44PM -0400, Shiv Nagarajan wrote:
> > Its basically that we dont set up at fork handlers for things like
> > statically intialized mutexes (for e.g. in malloc code in
> libc). So a
> > process that perfors a fork, would end up with mutexes in the
> > allocator in libc code in the child process that are no longer
> > "valid".
> >
> > if we setup an at fork handler for the malloc mutex (and possibly
> > anything else that would need re-initialisation), we should
> be ok, for
> > a process that forks before creating additional threads.
> >
> > And, no.. programs are not inherently multi-threaded :)
> >
> > shiv
> > Thu Sep 4 15:10:51 EDT 2008
> >
> > --> According to Neil Schellenberger <--
> > Do I want/need to know what horrible voodoo we do that
> makes this
> > illegal? ;-) I thought that so long as you were single
> threaded at the
> > point of fork or made appropriate use of a fork handler
> you were POSIXly
> > golden. (Or are all Neutrino programs inherently
> multi-threaded with
> > some kind of housekeeping thread(s)?)
> >
> > On Thu, 2008-09-04 at 13:33 -0400, Shiv Nagarajan wrote:
> > > yeah, using fork with a multi-threaded process doesnt work.
> > > (whether the threads are created before or after the fork,
> > > the only condition under which it would work, was if you did a
> > > fork->exec and the execed process was multi-threaded)
> > >
> > > procmgr_daemon would get you most of the features you
> would need,
> > > inclding detaching from terminal. and closing relevant
> > > file descriptors and changing the working directory etc
> > >
> > > the primary difference is that the fork call changes your pid,
> > > while procmgr_daemon would leave the pid the same.
> > >
> > > shiv
> > > Thu Sep 4 13:33:42 EDT 2008
> >
> >
> > _______________________________________________
> > OSTech
> > http://community.qnx.com/sf/go/post12862
> >
> > --
> > ****
> > Shiv Nagarajan,
> > Kernel Developer, QNX Software Systems, Ottawa, Canada
> > ****
> >
> > _______________________________________________
> > OSTech
> > http://community.qnx.com/sf/go/post12863
> >
>
> _______________________________________________
> OSTech
> http://community.qnx.com/sf/go/post12870
>
>
|
|
|
Shiv Nagarajan(deleted)
|
Re: debugging heap corruption
|
Shiv Nagarajan(deleted)
09/04/2008 4:11 PM
post12876
|
Re: debugging heap corruption
I think in this case, all that is happening is that the
mutex protection is failing (locking/unlocking are not
really protecting the critical section inside the allcoator).
In a debug build, things run slower, or timing is different
which is making the "race condition" disappear. The problem is
there, its just not being seen.
shiv
Thu Sep 4 16:10:56 EDT 2008
--> According to Xiaodan Tang <--
If this fork() is the root cause, I am interesting to know why a debug
compiled one wouldn't fail?
-xtang
> -----Original Message-----
> From: Sean Boudreau [mailto:community-noreply@qnx.com]
> Sent: Thursday, September 04, 2008 3:22 PM
> To: ostech-core_os
> Subject: Re: debugging heap corruption
>
>
>
> Ooooh this sounds like a fine project for the new guy :)
>
> If you check POSIX you'll see that what we do now (nothing)
> is fine as the child is only supposed to call async safe
> funcs between fork and exec but the last consensus was that
> it's probably too dangerous to give people a loaded gun and
> not expect them to shoot themselves.
>
> See also PR 59947 if you go looking at the fork code.
>
> -seanb
>
> On Thu, Sep 04, 2008 at 03:11:44PM -0400, Shiv Nagarajan wrote:
> > Its basically that we dont set up at fork handlers for things like
> > statically intialized mutexes (for e.g. in malloc code in
> libc). So a
> > process that perfors a fork, would end up with mutexes in the
> > allocator in libc code in the child process that are no longer
> > "valid".
> >
> > if we setup an at fork handler for the malloc mutex (and possibly
> > anything else that would need re-initialisation), we should
> be ok, for
> > a process that forks before creating additional threads.
> >
> > And, no.. programs are not inherently multi-threaded :)
> >
> > shiv
> > Thu Sep 4 15:10:51 EDT 2008
> >
> > --> According to Neil Schellenberger <--
> > Do I want/need to know what horrible voodoo we do that
> makes this
> > illegal? ;-) I thought that so long as you were single
> threaded at the
> > point of fork or made appropriate use of a fork handler
> you were POSIXly
> > golden. (Or are all Neutrino programs inherently
> multi-threaded with
> > some kind of housekeeping thread(s)?)
> >
> > On Thu, 2008-09-04 at 13:33 -0400, Shiv Nagarajan wrote:
> > > yeah, using fork with a multi-threaded process doesnt work.
> > > (whether the threads are created before or after the fork,
> > > the only condition under which it would work, was if you did a
> > > fork->exec and the execed process was multi-threaded)
> > >
> > > procmgr_daemon would get you most of the features you
> would need,
> > > inclding detaching from terminal. and closing relevant
> > > file descriptors and changing the working directory etc
> > >
> > > the primary difference is that the fork call changes your pid,
> > > while procmgr_daemon would leave the pid the same.
> > >
> > > shiv
> > > Thu Sep 4 13:33:42 EDT 2008
> >
> >
> > _______________________________________________
> > OSTech
> > http://community.qnx.com/sf/go/post12862
> >
> > --
> > ****
> > Shiv Nagarajan,
> > Kernel Developer, QNX Software Systems, Ottawa, Canada
> > ****
> >
> > _______________________________________________
> > OSTech
> > http://community.qnx.com/sf/go/post12863
> >
>
>...
|
|
|
Tim Gessner
|
Re: debugging heap corruption
|
Tim Gessner
09/05/2008 10:29 AM
post12927
|
Re: debugging heap corruption
Well, switching to procmgr_daemon seems to have solved the problem. Under heavy loads the process remains - no crashes.
Thanx for all the help. This one saved my .... (butt)
Tim
|
|
|
Shiv Nagarajan(deleted)
|
Re: debugging heap corruption
|
Shiv Nagarajan(deleted)
09/05/2008 10:31 AM
post12928
|
Re: debugging heap corruption
Cool..
shiv
Fri Sep 5 10:30:57 EDT 2008
--> According to Tim Gessner <--
Well, switching to procmgr_daemon seems to have solved the problem. Under heavy loads the process remains - no crashes
.
Thanx for all the help. This one saved my .... (butt)
Tim
_______________________________________________
OSTech
http://community.qnx.com/sf/go/post12927
--
****
Shiv Nagarajan,
Kernel Developer, QNX Software Systems,
Ottawa, Canada
****
|
|
|
Neil Schellenberger(deleted)
|
Re: debugging heap corruption
|
Neil Schellenberger(deleted)
09/04/2008 5:06 PM
post12883
|
Re: debugging heap corruption
Although SUSv3 does stipulate the minimum required behaviour for fork(),
the rationale for posix_atfork() does go on to explain why it is
insufficient for most real systems ;-) Not to mention the violation of
the Principle of Least Surprise for engineers who are coming from a Unix
or Linux environment....
Joking aside, I will delve into the PR and the source code, though.
Thanks for the pointer!
Regards,
Neil
P.S. The docs for 6.3.2 fork() say that it will fail with ENOSYS if
threads already exist. Is that really the case?
On Thu, 2008-09-04 at 15:22 -0400, Sean Boudreau wrote:
>
> Ooooh this sounds like a fine project for the new guy :)
>
> If you check POSIX you'll see that what we do now (nothing)
> is fine as the child is only supposed to call async safe
> funcs between fork and exec but the last consensus was that
> it's probably too dangerous to give people a loaded gun and
> not expect them to shoot themselves.
>
> See also PR 59947 if you go looking at the fork code.
>
> -seanb
>
> On Thu, Sep 04, 2008 at 03:11:44PM -0400, Shiv Nagarajan wrote:
> > Its basically that we dont set up at fork handlers for things
> > like statically intialized mutexes (for e.g. in malloc code in
> > libc). So a process that perfors a fork, would end up with
> > mutexes in the allocator in libc code in the child process that are
> > no longer "valid".
> >
> > if we setup an at fork handler for the malloc mutex (and possibly
> > anything else that would need re-initialisation), we should be
> > ok, for a process that forks before creating additional threads.
> >
> > And, no.. programs are not inherently multi-threaded :)
> >
> > shiv
> > Thu Sep 4 15:10:51 EDT 2008
> >
> > --> According to Neil Schellenberger <--
> > Do I want/need to know what horrible voodoo we do that makes this
> > illegal? ;-) I thought that so long as you were single threaded at the
> > point of fork or made appropriate use of a fork handler you were POSIXly
> > golden. (Or are all Neutrino programs inherently multi-threaded with
> > some kind of housekeeping thread(s)?)
> >
> > On Thu, 2008-09-04 at 13:33 -0400, Shiv Nagarajan wrote:
> > > yeah, using fork with a multi-threaded process doesnt work.
> > > (whether the threads are created before or after the fork,
> > > the only condition under which it would work, was if you did a
> > > fork->exec and the execed process was multi-threaded)
> > >
> > > procmgr_daemon would get you most of the features you would need,
> > > inclding detaching from terminal. and closing relevant
> > > file descriptors and changing the working directory etc
> > >
> > > the primary difference is that the fork call changes your pid,
> > > while procmgr_daemon would leave the pid the same.
> > >
> > > shiv
> > > Thu Sep 4 13:33:42 EDT 2008
> >
> >
> > _______________________________________________
> > OSTech
> > http://community.qnx.com/sf/go/post12862
> >
> > --
> > ****
> > Shiv Nagarajan,
> > Kernel Developer, QNX Software Systems,
> > Ottawa, Canada
> > ****
> >
> > _______________________________________________
> > OSTech
> > http://community.qnx.com/sf/go/post12863
> >
>
> _______________________________________________
> OSTech
> http://community.qnx.com/sf/go/post12870
>
|
|
|
David Sarrazin
|
RE: debugging heap corruption
|
David Sarrazin
09/04/2008 5:09 PM
post12884
|
RE: debugging heap corruption
> -----Original Message-----
> From: Neil Schellenberger [mailto:community-noreply@qnx.com]
> Sent: September 4, 2008 5:06 PM
> To: ostech-core_os
> Subject: Re: debugging heap corruption
>
> Although SUSv3 does stipulate the minimum required behaviour
> for fork(), the rationale for posix_atfork() does go on to
> explain why it is insufficient for most real systems ;-) Not
> to mention the violation of the Principle of Least Surprise
> for engineers who are coming from a Unix or Linux environment....
>
> Joking aside, I will delve into the PR and the source code, though.
> Thanks for the pointer!
>
> Regards,
> Neil
>
> P.S. The docs for 6.3.2 fork() say that it will fail with
> ENOSYS if threads already exist. Is that really the case?
Yes. Take a look for "_Multi_threaded".
|
|
|
|