Project Home
Project Home
Documents
Documents
Wiki
Wiki
Discussion Forums
Discussions
Project Information
Project Info
Forum Topic - debugging heap corruption: (18 Items)
   
debugging heap corruption  
I have a resource mgr which does not have a hardware device, rather it is an error logging component.  It uses a 
circular buffer to store error events and then writes those events to a MySQL database.

The resource mgr is multithreaded and I have an additional thread at a lower priority to handle writing the events to 
the MySQL database.

Whenever the component is accessed by more than 1 thread at a time, it crashes.  I have a mutex around the circular 
buffer and of course the resource mgr has a mutex built into the io_write mechanism (at least as I understand it).  
Regardless, I think I have more than sufficient protection.

When I run in the debugger it runs fine, never crashes.  I have tried to use the memory analysis tools but the remote 
QNX box freezes (I am windows hosted on my dev machine).  I have tried linking with libmalloc_g, but it runs without 
crashing.

The crashes are always on a heap access, either an allocation, deletion, or read.  I have rewritten the code a number of
 times now, moving things around, changing techniques, etc.  

I am out of ideas and nearing the end of the project with a component which is unstable.  Please help!!!

I am running 6.3.2 with a libc patch provided by QNX.

Thanx
Tim
Re: debugging heap corruption  
is the crash always in the same place.  Is the crash in the
memory allocator (i.e inside libc) or in the application?

Can you post some of the code around which the crash is
occurring? i.e. the calls to the allocator/free near which it is
crashing, and also the code that locks around this.

This is running on a multi-core I suppose? There are locks inside
the allocator code in libc, so the libc memory allocation code
itself is thread-safe.

thanks
shiv
Wed Sep  3 11:34:24 EDT 2008


 --> According to Tim Gessner <--
	I have a resource mgr which does not have a hardware device, rather it is an error logging component.  It uses a 
circular buffer to store error events and then writes those events to a MySQL database.
	
	The resource mgr is multithreaded and I have an additional thread at a lower priority to handle writing the events to 
the MySQL database.
	
	Whenever the component is accessed by more than 1 thread at a time, it crashes.  I have a mutex around the circular 
buffer and of course the resource mgr has a mutex built into the io_write mechanism (at least as I understand it).  
Regardless, I think I have more than sufficient protection.
	
	When I run in the debugger it runs fine, never crashes.  I have tried to use the memory analysis tools but the remote 
QNX box freezes (I am windows hosted on my dev machine).  I have tried linking with libmalloc_g, but it runs without 
crashing.
	
	The crashes are always on a heap access, either an allocation, deletion, or read.  I have rewritten the code a number 
of times now, moving things around, changing techniques, etc.  
	
	I am out of ideas and nearing the end of the project with a component which is unstable.  Please help!!!
	
	I am running 6.3.2 with a libc patch provided by QNX.
	
	Thanx
	Tim
	
	_______________________________________________
	OSTech
	http://community.qnx.com/sf/go/post12771

-- 
****
Shiv Nagarajan,
Kernel Developer, QNX Software Systems,
Ottawa, Canada
****
Re: debugging heap corruption  
Thanx for responding.  The crash is always in libc in heap management functions.  The location in my code which triggers
 is different each time.  That is, it is always around a new or delete, but there are several locations in my code where
 I allocate memory.

Perhaps the most common location is in my satabase update thread.  I'm using the helper functions to create a resource 
mgr so I don't directly create those threads.  I do create one thread directly to update the MySQL database.  All 
threads allocate and de-allocate from the heap.  Crashes usually come from calls into libc from the thread that I 
created directly.  Also, they seem to come on deletes most often.  (Again though, it is not the same every time so).

I can post all the code if it helps, but here is the most common location.  This code makes a copy of the events in the 
buffer and then inserts them into the database.  It makes a copy in order to limit the time the buffer is lockec.

for ( ; s_pData != NULL && s_pData->pMySql != NULL; )
{
  try
  {
    // if we've processed all the events
    // then block
    if ( getLastEvent() == s_tLastEventID )			
    {		
      ::pthread_mutex_lock(&s_mxEvent);
      ::pthread_cond_wait(&s_condEvent,&s_mxEvent);
      ::pthread_mutex_unlock(&s_mxEvent);
    }

    if ( ! s_bDBConnected )
    {
      __DBReconnect();				

      while ( ! s_bDBConnected )
      {
        ::sleep(2);
        
        ::pthread_testcancel();
        
        __DBReconnect();
      }
    }
      
    // first make a copy of the events so that the 
    // log itself is available while we're updating 
    // the database
    ::pthread_mutex_lock(&g_mxEvents);
    bLocked = true;
    
    UINT uQueTail = 0;
    UINT uQueCount = 0;
    
    if ( g_uQueCount == g_uQueSize )
    {		
      uQueTail = g_uQueHead;
      uQueCount = g_uQueSize;
    }
    else
    {
      uQueTail = g_uQueHead - g_uQueCount;
      uQueCount = g_uQueCount;
    }
    
    // loog for the previous event id and start there if found
    if ( s_tLastEventID > 0 )
    {
      UINT u = 0;
      for ( ; u < g_uQueSize; u++ )
      {
        if ( g_Events[u].tEventID == s_tLastEventID )
          break;
      }

      if ( u < g_uQueSize )
      {
        uQueTail = u;
        if ( uQueTail > g_uQueHead )
          uQueCount = (g_uQueSize - uQueTail) + g_uQueHead;
        else
          uQueCount = g_uQueHead - uQueTail;
      }
    }
    
    // increment then read			
    uQueTail++;
    uQueTail &= (g_uQueSize - 1);
    
    EVENTS aEvents;
    for ( UINT u = 0; u < uQueCount; u++ )
    {
      if ( g_Events[uQueTail].tEventID != 0 )
      {
        char* pszBuf = new char[::strlen(g_Events[uQueTail].pszEventText) + 1];
        ::strcpy(pszBuf,g_Events[uQueTail].pszEventText);

        std::pair<size_t,char*>e(g_Events[uQueTail].tEventID,pszBuf);
        
        aEvents.push_back(e);
      }
      
      uQueTail++;
      uQueTail &= (g_uQueSize - 1);				
    }
    
    ::pthread_mutex_unlock(&g_mxEvents);
    bLocked = false;

    // now update the database with the copied log items
    for ( EVENTS::iterator i = aEvents.begin(); i != aEvents.end(); ++i )
    {
      if ( i->second == NULL )
      {
        s_tLastEventID = i->first;
        
        continue;
      }

      // the log is one long string - so break
      // it out into parts consistent with the 
      // database schema
      tagPARSEDEVENT Event;
      __ParseEvent(i->second,Event);
      
      // if the event did not have a valid time 
      // we'll ignore it - this is a safe mearsure
      // against a bogus SQL stmt.  we could be more
      // granular and still update just with a diff
      // datatime - but for now ...
      if (( ::strlen(Event.szTime) > 0 )
      &&	( Event.pszDesc != NULL ))
      {
        ::memset(s_pData->pszSQLStmt,0,4096);
...
View Full Message
Re: debugging heap corruption  
Do you know where in the lib it is crashing? If you know where it
is crashing, it may be possible to just see if the crash is
occurring because of a specific error in userland (maybe the
malloc structures have been over-written). 

Have you tried catching this in the debugger or getting a core
file. Does the crash occur probably because of a double free, I
notice that you are setting the freed element to NULL post delete
here in this snippet. So it should be easy to see if we are
calling delete with a NULL element ever. 

I believe delete just
turns around and calls free (just as new just calls malloc
internally). So if you want you could just implement some stubs
that cover for malloc and free, and then just in turn call 
__malloc and __free instead. The library (libc) defines 
malloc and free as just stub calls which in turn call 
__malloc and __free, so if you define you own malloc/free to do 
some checks, before calling __malloc and __free, we may be able
to know more.

shiv
Thu Sep  4 10:59:47 EDT 2008

 --> According to Tim Gessner <--
	Thanx for responding.  The crash is always in libc in heap management functions.  The location in my code which 
triggers is different each time.  That is, it is always around a new or delete, but there are several locations in my 
code where I allocate memory.
	
	Perhaps the most common location is in my satabase update thread.  I'm using the helper functions to create a resource 
mgr so I don't directly create those threads.  I do create one thread directly to update the MySQL database.  All 
threads allocate and de-allocate from the heap.  Crashes usually come from calls into libc from the thread that I 
created directly.  Also, they seem to come on deletes most often.  (Again though, it is not the same every time so).
	
	I can post all the code if it helps, but here is the most common location.  This code makes a copy of the events in the
 buffer and then inserts them into the database.  It makes a copy in order to limit the time the buffer is lockec.
	
	for ( ; s_pData != NULL && s_pData->pMySql != NULL; )
	{
	  try
	  {
	    // if we've processed all the events
	    // then block
	    if ( getLastEvent() == s_tLastEventID )			
	    {		
	      ::pthread_mutex_lock(&s_mxEvent);
	      ::pthread_cond_wait(&s_condEvent,&s_mxEvent);
	      ::pthread_mutex_unlock(&s_mxEvent);
	    }
	
	    if ( ! s_bDBConnected )
	    {
	      __DBReconnect();				
	
	      while ( ! s_bDBConnected )
	      {
	        ::sleep(2);
	        
	        ::pthread_testcancel();
	        
	        __DBReconnect();
	      }
	    }
	      
	    // first make a copy of the events so that the 
	    // log itself is available while we're updating 
	    // the database
	    ::pthread_mutex_lock(&g_mxEvents);
	    bLocked = true;
	    
	    UINT uQueTail = 0;
	    UINT uQueCount = 0;
	    
	    if ( g_uQueCount == g_uQueSize )
	    {		
	      uQueTail = g_uQueHead;
	      uQueCount = g_uQueSize;
	    }
	    else
	    {
	      uQueTail = g_uQueHead - g_uQueCount;
	      uQueCount = g_uQueCount;
	    }
	    
	    // loog for the previous event id and start there if found
	    if ( s_tLastEventID > 0 )
	    {
	      UINT u = 0;
	      for ( ; u < g_uQueSize; u++ )
	      {
	        if ( g_Events[u].tEventID == s_tLastEventID )
	          break;
	      }
	
	      if ( u < g_uQueSize )
	      {
	        uQueTail = u;
	        if ( uQueTail > g_uQueHead )
	          uQueCount = (g_uQueSize - uQueTail) + g_uQueHead;
	        else
	          uQueCount = g_uQueHead - uQueTail;
	      }
	    }
	    
	    // increment then read			
	    uQueTail++;
	    uQueTail &= (g_uQueSize - 1);
	    
	    EVENTS aEvents;
	    for ( UINT u = 0; u < uQueCount; u++ )
	    {
	      if ( g_Events[uQueTail].tEventID != 0 )
	      {
	        char* pszBuf = new char[::strlen(g_Events[uQueTail].pszEventText) + 1];
	       ...
View Full Message
RE: debugging heap corruption  
It doesn't crash when running under the debugger or when linked with
libmalloc_g.  I have been using the core dumps which are always in the
same place, but always in the heap management functions.  I will try
overwriting free and malloc and see what I can find.

I discovered that if I run my code as a standard app (as opposed to a
daemon) it doesn't crash.  It only crashes if run as a daemon.  Here is
the code I use to start it as a daemon.  Does anything look suspicious
here?

	pid_t pid = 0;
	if (( eRunMode != RUNMODE_CONSOLE )
	&&	( eRunMode != RUNMODE_DEBUG ))
	{
		// starting as a daemon
		pid = fork();
		if ( pid == 0 )
		{
			// setup the daemon process
			::setsid();
			
			::umask(0);
			::chdir("/");
			
			struct rlimit rl;
			if ( ::getrlimit(RLIMIT_NOFILE, &rl) == -1 )
			{
				// <<ERROR>>
	
::slogf(_SLOG_SETCODE(0,0),_SLOG_WARNING,"Failed to get limits.\n");	
			}
			
			if ( rl.rlim_max == RLIM_INFINITY )
				rl.rlim_max = 1024;
				
			for ( unsigned int i = 0; i < rl.rlim_max; i++ )
				::close(i);	
			
			fd0 = ::open("/dev/null",O_RDWR);
			fd1 = ::dup(0);
			fd2 = ::dup(0);
			
			if (( fd0 != 0 )
			&&	( fd1 != 1 )
			&&	( fd2 != 2 ))
			{
				// <<ERROR>>
	
::slogf(_SLOG_SETCODE(0,0),_SLOG_WARNING,"Invalid file descriptors.\n");
			}
		}
	}
	
	if ( pid == 0 )
	{
		::pthread_setname_np(::pthread_self(),"main");

		struct sigaction sa;
		::memset(&sa,0,sizeof(struct sigaction));
		
		sa.sa_handler = SIG_IGN;
		::sigaction(SIGHUP,&sa,&s_PrevHUP);
		
		try
		{
			if ( Initialize() )
			{
				Run();
			
				Shutdown();
			}
			else
	
::slogf(_SLOG_SETCODE(0,0),_SLOG_WARNING,"EventLog failed to initialize,
closing appliction\n");
		}
		catch( ... )
		{
			// <<ERROR>>
			::slogf(_SLOG_SETCODE(0,0),_SLOG_WARNING,"An
unknown exception was caught. Shutting down.\n");
		}
			
		::sigaction(SIGHUP,&s_PrevHUP,NULL);		

		::slogf(_SLOG_SETCODE(0,0),_SLOG_WARNING,"EventLog
successfully shutdown.\n");
	}

-----Original Message-----
From: Shiv Nagarajan [mailto:community-noreply@qnx.com] 
Sent: Thursday, September 04, 2008 8:03 AM
To: ostech-core_os
Subject: Re: debugging heap corruption


Do you know where in the lib it is crashing? If you know where it
is crashing, it may be possible to just see if the crash is
occurring because of a specific error in userland (maybe the
malloc structures have been over-written). 

Have you tried catching this in the debugger or getting a core
file. Does the crash occur probably because of a double free, I
notice that you are setting the freed element to NULL post delete
here in this snippet. So it should be easy to see if we are
calling delete with a NULL element ever. 

I believe delete just
turns around and calls free (just as new just calls malloc
internally). So if you want you could just implement some stubs
that cover for malloc and free, and then just in turn call 
__malloc and __free instead. The library (libc) defines 
malloc and free as just stub calls which in turn call 
__malloc and __free, so if you define you own malloc/free to do 
some checks, before calling __malloc and __free, we may be able
to know more.

shiv
Thu Sep  4 10:59:47 EDT 2008

 --> According to Tim Gessner <--
	Thanx for responding.  The crash is always in libc in heap
management functions.  The location in my code which triggers is
different each time.  That is, it is always around a new or delete, but
there are several locations in my code where I allocate memory.
	
	Perhaps the most common location is in my satabase update
thread.  I'm using the helper functions to create a resource mgr so I
don't directly create those threads.  I do create one thread directly to
update the MySQL database.  All threads allocate and de-allocate from
the heap.  Crashes usually come from calls into libc from the thread
that I created directly. ...
View Full Message
Re: debugging heap corruption  
Are u multi-threaded?

U cannot mix multithreaded with fork. Probably better of
calling procmgr_daemon instead to detach from the terminal

shiv
Thu Sep  4 11:09:57 EDT 2008

 --> According to Tim Gessner <--
	It doesn't crash when running under the debugger or when linked with
	libmalloc_g.  I have been using the core dumps which are always in the
	same place, but always in the heap management functions.  I will try
	overwriting free and malloc and see what I can find.
	
	I discovered that if I run my code as a standard app (as opposed to a
	daemon) it doesn't crash.  It only crashes if run as a daemon.  Here is
	the code I use to start it as a daemon.  Does anything look suspicious
	here?
	
		pid_t pid = 0;
		if (( eRunMode != RUNMODE_CONSOLE )
		&&	( eRunMode != RUNMODE_DEBUG ))
		{
			// starting as a daemon
			pid = fork();
			if ( pid == 0 )
			{
				// setup the daemon process
				::setsid();
				
				::umask(0);
				::chdir("/");
				
				struct rlimit rl;
				if ( ::getrlimit(RLIMIT_NOFILE, &rl) == -1 )
				{
					// <<ERROR>>
		
	::slogf(_SLOG_SETCODE(0,0),_SLOG_WARNING,"Failed to get limits.\n");	
				}
				
				if ( rl.rlim_max == RLIM_INFINITY )
					rl.rlim_max = 1024;
					
				for ( unsigned int i = 0; i < rl.rlim_max; i++ )
					::close(i);	
				
				fd0 = ::open("/dev/null",O_RDWR);
				fd1 = ::dup(0);
				fd2 = ::dup(0);
				
				if (( fd0 != 0 )
				&&	( fd1 != 1 )
				&&	( fd2 != 2 ))
				{
					// <<ERROR>>
		
	::slogf(_SLOG_SETCODE(0,0),_SLOG_WARNING,"Invalid file descriptors.\n");
				}
			}
		}
		
		if ( pid == 0 )
		{
			::pthread_setname_np(::pthread_self(),"main");
	
			struct sigaction sa;
			::memset(&sa,0,sizeof(struct sigaction));
			
			sa.sa_handler = SIG_IGN;
			::sigaction(SIGHUP,&sa,&s_PrevHUP);
			
			try
			{
				if ( Initialize() )
				{
					Run();
				
					Shutdown();
				}
				else
		
	::slogf(_SLOG_SETCODE(0,0),_SLOG_WARNING,"EventLog failed to initialize,
	closing appliction\n");
			}
			catch( ... )
			{
				// <<ERROR>>
				::slogf(_SLOG_SETCODE(0,0),_SLOG_WARNING,"An
	unknown exception was caught. Shutting down.\n");
			}
				
			::sigaction(SIGHUP,&s_PrevHUP,NULL);		
	
			::slogf(_SLOG_SETCODE(0,0),_SLOG_WARNING,"EventLog
	successfully shutdown.\n");
		}
	
	-----Original Message-----
	From: Shiv Nagarajan [mailto:community-noreply@qnx.com] 
	Sent: Thursday, September 04, 2008 8:03 AM
	To: ostech-core_os
	Subject: Re: debugging heap corruption
	
	
	Do you know where in the lib it is crashing? If you know where it
	is crashing, it may be possible to just see if the crash is
	occurring because of a specific error in userland (maybe the
	malloc structures have been over-written). 
	
	Have you tried catching this in the debugger or getting a core
	file. Does the crash occur probably because of a double free, I
	notice that you are setting the freed element to NULL post delete
	here in this snippet. So it should be easy to see if we are
	calling delete with a NULL element ever. 
	
	I believe delete just
	turns around and calls free (just as new just calls malloc
	internally). So if you want you could just implement some stubs
	that cover for malloc and free, and then just in turn call 
	__malloc and __free instead. The library (libc) defines 
	malloc and free as just stub calls which in turn call 
	__malloc and __free, so if you define you own malloc/free to do 
	some checks, before calling __malloc and __free, we may be able
	to know more.
	
	shiv
	Thu Sep  4 10:59:47 EDT 2008
	
	 --> According to Tim Gessner <--
		Thanx for responding.  The crash is always in libc in heap
	management functions.  The location in my code which triggers is
	different each time.  That is, it is always around a new or delete, but
	there are several locations in my code where I allocate memory.
		
		Perhaps the most...
View Full Message
RE: debugging heap corruption  
I am multithreaded, all threads are created after I fork.  Is that still
a problem?  I will look into procmgr_daemon.

Thanx
Tim

-----Original Message-----
From: Shiv Nagarajan [mailto:community-noreply@qnx.com] 
Sent: Thursday, September 04, 2008 8:10 AM
To: ostech-core_os
Subject: Re: debugging heap corruption

Are u multi-threaded?

U cannot mix multithreaded with fork. Probably better of
calling procmgr_daemon instead to detach from the terminal

shiv
Thu Sep  4 11:09:57 EDT 2008

 --> According to Tim Gessner <--
	It doesn't crash when running under the debugger or when linked
with
	libmalloc_g.  I have been using the core dumps which are always
in the
	same place, but always in the heap management functions.  I will
try
	overwriting free and malloc and see what I can find.
	
	I discovered that if I run my code as a standard app (as opposed
to a
	daemon) it doesn't crash.  It only crashes if run as a daemon.
Here is
	the code I use to start it as a daemon.  Does anything look
suspicious
	here?
	
		pid_t pid = 0;
		if (( eRunMode != RUNMODE_CONSOLE )
		&&	( eRunMode != RUNMODE_DEBUG ))
		{
			// starting as a daemon
			pid = fork();
			if ( pid == 0 )
			{
				// setup the daemon process
				::setsid();
				
				::umask(0);
				::chdir("/");
				
				struct rlimit rl;
				if ( ::getrlimit(RLIMIT_NOFILE, &rl) ==
-1 )
				{
					// <<ERROR>>
		
	::slogf(_SLOG_SETCODE(0,0),_SLOG_WARNING,"Failed to get
limits.\n");	
				}
				
				if ( rl.rlim_max == RLIM_INFINITY )
					rl.rlim_max = 1024;
					
				for ( unsigned int i = 0; i <
rl.rlim_max; i++ )
					::close(i);	
				
				fd0 = ::open("/dev/null",O_RDWR);
				fd1 = ::dup(0);
				fd2 = ::dup(0);
				
				if (( fd0 != 0 )
				&&	( fd1 != 1 )
				&&	( fd2 != 2 ))
				{
					// <<ERROR>>
		
	::slogf(_SLOG_SETCODE(0,0),_SLOG_WARNING,"Invalid file
descriptors.\n");
				}
			}
		}
		
		if ( pid == 0 )
		{
			::pthread_setname_np(::pthread_self(),"main");
	
			struct sigaction sa;
			::memset(&sa,0,sizeof(struct sigaction));
			
			sa.sa_handler = SIG_IGN;
			::sigaction(SIGHUP,&sa,&s_PrevHUP);
			
			try
			{
				if ( Initialize() )
				{
					Run();
				
					Shutdown();
				}
				else
		
	::slogf(_SLOG_SETCODE(0,0),_SLOG_WARNING,"EventLog failed to
initialize,
	closing appliction\n");
			}
			catch( ... )
			{
				// <<ERROR>>
	
::slogf(_SLOG_SETCODE(0,0),_SLOG_WARNING,"An
	unknown exception was caught. Shutting down.\n");
			}
				
			::sigaction(SIGHUP,&s_PrevHUP,NULL);		
	
	
::slogf(_SLOG_SETCODE(0,0),_SLOG_WARNING,"EventLog
	successfully shutdown.\n");
		}
	
	-----Original Message-----
	From: Shiv Nagarajan [mailto:community-noreply@qnx.com] 
	Sent: Thursday, September 04, 2008 8:03 AM
	To: ostech-core_os
	Subject: Re: debugging heap corruption
	
	
	Do you know where in the lib it is crashing? If you know where
it
	is crashing, it may be possible to just see if the crash is
	occurring because of a specific error in userland (maybe the
	malloc structures have been over-written). 
	
	Have you tried catching this in the debugger or getting a core
	file. Does the crash occur probably because of a double free, I
	notice that you are setting the freed element to NULL post
delete
	here in this snippet. So it should be easy to see if we are
	calling delete with a NULL element ever. 
	
	I believe delete just
	turns around and calls free (just as new just calls malloc
	internally). So if you want you could just implement some stubs
	that cover for malloc and free, and then just in turn call 
	__malloc and __free instead. The library (libc) defines 
	malloc and free as just stub calls which in turn call 
	__malloc and __free, so if you define you own malloc/free to do 
	some checks, before calling __malloc and __free, we may be able
	to know more.
	
	shiv
	Thu Sep  4 10:59:47 EDT...
View Full Message
Re: debugging heap corruption  
yeah, using fork with a multi-threaded process doesnt work.  
(whether the threads are created before or after the fork,
the only condition under which it would work, was if you did a
fork->exec and the execed process was multi-threaded)

procmgr_daemon would get you most of the features you would need,
inclding detaching from terminal. and closing relevant 
file descriptors and changing the working directory etc

the primary difference is that the fork call changes your pid,
while procmgr_daemon would leave the pid the same.

shiv
Thu Sep  4 13:33:42 EDT 2008

 --> According to Tim Gessner <--
	I am multithreaded, all threads are created after I fork.  Is that still
	a problem?  I will look into procmgr_daemon.
	
	Thanx
	Tim
	
	-----Original Message-----
	From: Shiv Nagarajan [mailto:community-noreply@qnx.com] 
	Sent: Thursday, September 04, 2008 8:10 AM
	To: ostech-core_os
	Subject: Re: debugging heap corruption
	
	Are u multi-threaded?
	
	U cannot mix multithreaded with fork. Probably better of
	calling procmgr_daemon instead to detach from the terminal
	
	shiv
	Thu Sep  4 11:09:57 EDT 2008
	
	 --> According to Tim Gessner <--
		It doesn't crash when running under the debugger or when linked
	with
		libmalloc_g.  I have been using the core dumps which are always
	in the
		same place, but always in the heap management functions.  I will
	try
		overwriting free and malloc and see what I can find.
		
		I discovered that if I run my code as a standard app (as opposed
	to a
		daemon) it doesn't crash.  It only crashes if run as a daemon.
	Here is
		the code I use to start it as a daemon.  Does anything look
	suspicious
		here?
		
			pid_t pid = 0;
			if (( eRunMode != RUNMODE_CONSOLE )
			&&	( eRunMode != RUNMODE_DEBUG ))
			{
				// starting as a daemon
				pid = fork();
				if ( pid == 0 )
				{
					// setup the daemon process
					::setsid();
					
					::umask(0);
					::chdir("/");
					
					struct rlimit rl;
					if ( ::getrlimit(RLIMIT_NOFILE, &rl) ==
	-1 )
					{
						// <<ERROR>>
			
		::slogf(_SLOG_SETCODE(0,0),_SLOG_WARNING,"Failed to get
	limits.\n");	
					}
					
					if ( rl.rlim_max == RLIM_INFINITY )
						rl.rlim_max = 1024;
						
					for ( unsigned int i = 0; i <
	rl.rlim_max; i++ )
						::close(i);	
					
					fd0 = ::open("/dev/null",O_RDWR);
					fd1 = ::dup(0);
					fd2 = ::dup(0);
					
					if (( fd0 != 0 )
					&&	( fd1 != 1 )
					&&	( fd2 != 2 ))
					{
						// <<ERROR>>
			
		::slogf(_SLOG_SETCODE(0,0),_SLOG_WARNING,"Invalid file
	descriptors.\n");
					}
				}
			}
			
			if ( pid == 0 )
			{
				::pthread_setname_np(::pthread_self(),"main");
		
				struct sigaction sa;
				::memset(&sa,0,sizeof(struct sigaction));
				
				sa.sa_handler = SIG_IGN;
				::sigaction(SIGHUP,&sa,&s_PrevHUP);
				
				try
				{
					if ( Initialize() )
					{
						Run();
					
						Shutdown();
					}
					else
			
		::slogf(_SLOG_SETCODE(0,0),_SLOG_WARNING,"EventLog failed to
	initialize,
		closing appliction\n");
				}
				catch( ... )
				{
					// <<ERROR>>
		
	::slogf(_SLOG_SETCODE(0,0),_SLOG_WARNING,"An
		unknown exception was caught. Shutting down.\n");
				}
					
				::sigaction(SIGHUP,&s_PrevHUP,NULL);		
		
		
	::slogf(_SLOG_SETCODE(0,0),_SLOG_WARNING,"EventLog
		successfully shutdown.\n");
			}
		
		-----Original Message-----
		From: Shiv Nagarajan [mailto:community-noreply@qnx.com] 
		Sent: Thursday, September 04, 2008 8:03 AM
		To: ostech-core_os
		Subject: Re: debugging heap corruption
		
		
		Do you know where in the lib it is crashing? If you know where
	it
		is crashing, it may be possible to just see if the crash is
		occurring because of a specific error in userland (maybe the
		malloc structures have been over-written). 
		
		Have you tried catching this in the debugger or getting a...
View Full Message
RE: debugging heap corruption  
Thanx, I wasn't aware of that.  I will make the changes right away.

Tim

-----Original Message-----
From: Shiv Nagarajan [mailto:community-noreply@qnx.com] 
Sent: Thursday, September 04, 2008 10:34 AM
To: ostech-core_os
Subject: Re: debugging heap corruption

yeah, using fork with a multi-threaded process doesnt work.  
(whether the threads are created before or after the fork,
the only condition under which it would work, was if you did a
fork->exec and the execed process was multi-threaded)

procmgr_daemon would get you most of the features you would need,
inclding detaching from terminal. and closing relevant 
file descriptors and changing the working directory etc

the primary difference is that the fork call changes your pid,
while procmgr_daemon would leave the pid the same.

shiv
Thu Sep  4 13:33:42 EDT 2008

 --> According to Tim Gessner <--
	I am multithreaded, all threads are created after I fork.  Is
that still
	a problem?  I will look into procmgr_daemon.
	
	Thanx
	Tim
	
	-----Original Message-----
	From: Shiv Nagarajan [mailto:community-noreply@qnx.com] 
	Sent: Thursday, September 04, 2008 8:10 AM
	To: ostech-core_os
	Subject: Re: debugging heap corruption
	
	Are u multi-threaded?
	
	U cannot mix multithreaded with fork. Probably better of
	calling procmgr_daemon instead to detach from the terminal
	
	shiv
	Thu Sep  4 11:09:57 EDT 2008
	
	 --> According to Tim Gessner <--
		It doesn't crash when running under the debugger or when
linked
	with
		libmalloc_g.  I have been using the core dumps which are
always
	in the
		same place, but always in the heap management functions.
I will
	try
		overwriting free and malloc and see what I can find.
		
		I discovered that if I run my code as a standard app (as
opposed
	to a
		daemon) it doesn't crash.  It only crashes if run as a
daemon.
	Here is
		the code I use to start it as a daemon.  Does anything
look
	suspicious
		here?
		
			pid_t pid = 0;
			if (( eRunMode != RUNMODE_CONSOLE )
			&&	( eRunMode != RUNMODE_DEBUG ))
			{
				// starting as a daemon
				pid = fork();
				if ( pid == 0 )
				{
					// setup the daemon process
					::setsid();
					
					::umask(0);
					::chdir("/");
					
					struct rlimit rl;
					if ( ::getrlimit(RLIMIT_NOFILE,
&rl) ==
	-1 )
					{
						// <<ERROR>>
			
		::slogf(_SLOG_SETCODE(0,0),_SLOG_WARNING,"Failed to get
	limits.\n");	
					}
					
					if ( rl.rlim_max ==
RLIM_INFINITY )
						rl.rlim_max = 1024;
						
					for ( unsigned int i = 0; i <
	rl.rlim_max; i++ )
						::close(i);	
					
					fd0 =
::open("/dev/null",O_RDWR);
					fd1 = ::dup(0);
					fd2 = ::dup(0);
					
					if (( fd0 != 0 )
					&&	( fd1 != 1 )
					&&	( fd2 != 2 ))
					{
						// <<ERROR>>
			
		::slogf(_SLOG_SETCODE(0,0),_SLOG_WARNING,"Invalid file
	descriptors.\n");
					}
				}
			}
			
			if ( pid == 0 )
			{
	
::pthread_setname_np(::pthread_self(),"main");
		
				struct sigaction sa;
				::memset(&sa,0,sizeof(struct
sigaction));
				
				sa.sa_handler = SIG_IGN;
				::sigaction(SIGHUP,&sa,&s_PrevHUP);
				
				try
				{
					if ( Initialize() )
					{
						Run();
					
						Shutdown();
					}
					else
			
		::slogf(_SLOG_SETCODE(0,0),_SLOG_WARNING,"EventLog
failed to
	initialize,
		closing appliction\n");
				}
				catch( ... )
				{
					// <<ERROR>>
		
	::slogf(_SLOG_SETCODE(0,0),_SLOG_WARNING,"An
		unknown exception was caught. Shutting down.\n");
				}
					
				::sigaction(SIGHUP,&s_PrevHUP,NULL);

		
		
	::slogf(_SLOG_SETCODE(0,0),_SLOG_WARNING,"EventLog
		successfully shutdown.\n");
			}
		
		-----Original Message-----
		From: Shiv Nagarajan [mailto:community-noreply@qnx.com] 
		Sent: Thursday, September 04, 2008 8:03 AM
		To: ostech-core_os
		Subject: Re: debugging heap corruption
		
		
		Do you know where in the lib it is...
View Full Message
Re: debugging heap corruption  
Do I want/need to know what horrible voodoo we do that makes this
illegal? ;-)  I thought that so long as you were single threaded at the
point of fork or made appropriate use of a fork handler you were POSIXly
golden.  (Or are all Neutrino programs inherently multi-threaded with
some kind of housekeeping thread(s)?)

On Thu, 2008-09-04 at 13:33 -0400, Shiv Nagarajan wrote:
> yeah, using fork with a multi-threaded process doesnt work.  
> (whether the threads are created before or after the fork,
> the only condition under which it would work, was if you did a
> fork->exec and the execed process was multi-threaded)
> 
> procmgr_daemon would get you most of the features you would need,
> inclding detaching from terminal. and closing relevant 
> file descriptors and changing the working directory etc
> 
> the primary difference is that the fork call changes your pid,
> while procmgr_daemon would leave the pid the same.
> 
> shiv
> Thu Sep  4 13:33:42 EDT 2008
Re: debugging heap corruption  
Its basically that we dont set up at fork handlers for things 
like statically intialized mutexes (for e.g. in malloc code in
libc). So a process that perfors a fork, would end up with
mutexes in the allocator in libc code in the child process that are 
no longer "valid". 

if we setup an at fork handler for the malloc mutex (and possibly
anything else that would need re-initialisation), we should be
ok, for a process that forks before creating additional threads.

And, no.. programs are not inherently multi-threaded :)

shiv
Thu Sep  4 15:10:51 EDT 2008

 --> According to Neil Schellenberger <--
	Do I want/need to know what horrible voodoo we do that makes this
	illegal? ;-)  I thought that so long as you were single threaded at the
	point of fork or made appropriate use of a fork handler you were POSIXly
	golden.  (Or are all Neutrino programs inherently multi-threaded with
	some kind of housekeeping thread(s)?)
	
	On Thu, 2008-09-04 at 13:33 -0400, Shiv Nagarajan wrote:
	> yeah, using fork with a multi-threaded process doesnt work.  
	> (whether the threads are created before or after the fork,
	> the only condition under which it would work, was if you did a
	> fork->exec and the execed process was multi-threaded)
	> 
	> procmgr_daemon would get you most of the features you would need,
	> inclding detaching from terminal. and closing relevant 
	> file descriptors and changing the working directory etc
	> 
	> the primary difference is that the fork call changes your pid,
	> while procmgr_daemon would leave the pid the same.
	> 
	> shiv
	> Thu Sep  4 13:33:42 EDT 2008
	
	
	_______________________________________________
	OSTech
	http://community.qnx.com/sf/go/post12862

-- 
****
Shiv Nagarajan,
Kernel Developer, QNX Software Systems,
Ottawa, Canada
****
Re: debugging heap corruption  

Ooooh this sounds like a fine project for the new guy :)

If you check POSIX you'll see that what we do now (nothing)
is fine as the child is only supposed to call async safe
funcs between fork and exec but the last consensus was that
it's probably too dangerous to give people a loaded gun and
not expect them to shoot themselves.

See also PR 59947 if you go looking at the fork code.

-seanb

On Thu, Sep 04, 2008 at 03:11:44PM -0400, Shiv Nagarajan wrote:
> Its basically that we dont set up at fork handlers for things 
> like statically intialized mutexes (for e.g. in malloc code in
> libc). So a process that perfors a fork, would end up with
> mutexes in the allocator in libc code in the child process that are 
> no longer "valid". 
> 
> if we setup an at fork handler for the malloc mutex (and possibly
> anything else that would need re-initialisation), we should be
> ok, for a process that forks before creating additional threads.
> 
> And, no.. programs are not inherently multi-threaded :)
> 
> shiv
> Thu Sep  4 15:10:51 EDT 2008
> 
>  --> According to Neil Schellenberger <--
> 	Do I want/need to know what horrible voodoo we do that makes this
> 	illegal? ;-)  I thought that so long as you were single threaded at the
> 	point of fork or made appropriate use of a fork handler you were POSIXly
> 	golden.  (Or are all Neutrino programs inherently multi-threaded with
> 	some kind of housekeeping thread(s)?)
> 	
> 	On Thu, 2008-09-04 at 13:33 -0400, Shiv Nagarajan wrote:
> 	> yeah, using fork with a multi-threaded process doesnt work.  
> 	> (whether the threads are created before or after the fork,
> 	> the only condition under which it would work, was if you did a
> 	> fork->exec and the execed process was multi-threaded)
> 	> 
> 	> procmgr_daemon would get you most of the features you would need,
> 	> inclding detaching from terminal. and closing relevant 
> 	> file descriptors and changing the working directory etc
> 	> 
> 	> the primary difference is that the fork call changes your pid,
> 	> while procmgr_daemon would leave the pid the same.
> 	> 
> 	> shiv
> 	> Thu Sep  4 13:33:42 EDT 2008
> 	
> 	
> 	_______________________________________________
> 	OSTech
> 	http://community.qnx.com/sf/go/post12862
> 
> -- 
> ****
> Shiv Nagarajan,
> Kernel Developer, QNX Software Systems,
> Ottawa, Canada
> ****
> 
> _______________________________________________
> OSTech
> http://community.qnx.com/sf/go/post12863
> 
RE: debugging heap corruption  
If this fork() is the root cause, I am interesting to know why a debug
compiled one wouldn't fail?

-xtang 

> -----Original Message-----
> From: Sean Boudreau [mailto:community-noreply@qnx.com] 
> Sent: Thursday, September 04, 2008 3:22 PM
> To: ostech-core_os
> Subject: Re: debugging heap corruption
> 
> 
> 
> Ooooh this sounds like a fine project for the new guy :)
> 
> If you check POSIX you'll see that what we do now (nothing) 
> is fine as the child is only supposed to call async safe 
> funcs between fork and exec but the last consensus was that 
> it's probably too dangerous to give people a loaded gun and 
> not expect them to shoot themselves.
> 
> See also PR 59947 if you go looking at the fork code.
> 
> -seanb
> 
> On Thu, Sep 04, 2008 at 03:11:44PM -0400, Shiv Nagarajan wrote:
> > Its basically that we dont set up at fork handlers for things like 
> > statically intialized mutexes (for e.g. in malloc code in 
> libc). So a 
> > process that perfors a fork, would end up with mutexes in the 
> > allocator in libc code in the child process that are no longer 
> > "valid".
> > 
> > if we setup an at fork handler for the malloc mutex (and possibly 
> > anything else that would need re-initialisation), we should 
> be ok, for 
> > a process that forks before creating additional threads.
> > 
> > And, no.. programs are not inherently multi-threaded :)
> > 
> > shiv
> > Thu Sep  4 15:10:51 EDT 2008
> > 
> >  --> According to Neil Schellenberger <--
> > 	Do I want/need to know what horrible voodoo we do that 
> makes this
> > 	illegal? ;-)  I thought that so long as you were single 
> threaded at the
> > 	point of fork or made appropriate use of a fork handler 
> you were POSIXly
> > 	golden.  (Or are all Neutrino programs inherently 
> multi-threaded with
> > 	some kind of housekeeping thread(s)?)
> > 	
> > 	On Thu, 2008-09-04 at 13:33 -0400, Shiv Nagarajan wrote:
> > 	> yeah, using fork with a multi-threaded process doesnt work.  
> > 	> (whether the threads are created before or after the fork,
> > 	> the only condition under which it would work, was if you did a
> > 	> fork->exec and the execed process was multi-threaded)
> > 	> 
> > 	> procmgr_daemon would get you most of the features you 
> would need,
> > 	> inclding detaching from terminal. and closing relevant 
> > 	> file descriptors and changing the working directory etc
> > 	> 
> > 	> the primary difference is that the fork call changes your pid,
> > 	> while procmgr_daemon would leave the pid the same.
> > 	> 
> > 	> shiv
> > 	> Thu Sep  4 13:33:42 EDT 2008
> > 	
> > 	
> > 	_______________________________________________
> > 	OSTech
> > 	http://community.qnx.com/sf/go/post12862
> > 
> > --
> > ****
> > Shiv Nagarajan,
> > Kernel Developer, QNX Software Systems, Ottawa, Canada
> > ****
> > 
> > _______________________________________________
> > OSTech
> > http://community.qnx.com/sf/go/post12863
> > 
> 
> _______________________________________________
> OSTech
> http://community.qnx.com/sf/go/post12870
> 
> 
Re: debugging heap corruption  
I think in this case, all that is happening is that the 
mutex protection is failing (locking/unlocking are not 
really protecting the critical section inside the allcoator).

In a debug build, things run slower, or timing is different
which is making the "race condition" disappear. The problem is
there, its just not being seen.

shiv
Thu Sep  4 16:10:56 EDT 2008

 --> According to Xiaodan Tang <--
	If this fork() is the root cause, I am interesting to know why a debug
	compiled one wouldn't fail?
	
	-xtang 
	
	> -----Original Message-----
	> From: Sean Boudreau [mailto:community-noreply@qnx.com] 
	> Sent: Thursday, September 04, 2008 3:22 PM
	> To: ostech-core_os
	> Subject: Re: debugging heap corruption
	> 
	> 
	> 
	> Ooooh this sounds like a fine project for the new guy :)
	> 
	> If you check POSIX you'll see that what we do now (nothing) 
	> is fine as the child is only supposed to call async safe 
	> funcs between fork and exec but the last consensus was that 
	> it's probably too dangerous to give people a loaded gun and 
	> not expect them to shoot themselves.
	> 
	> See also PR 59947 if you go looking at the fork code.
	> 
	> -seanb
	> 
	> On Thu, Sep 04, 2008 at 03:11:44PM -0400, Shiv Nagarajan wrote:
	> > Its basically that we dont set up at fork handlers for things like 
	> > statically intialized mutexes (for e.g. in malloc code in 
	> libc). So a 
	> > process that perfors a fork, would end up with mutexes in the 
	> > allocator in libc code in the child process that are no longer 
	> > "valid".
	> > 
	> > if we setup an at fork handler for the malloc mutex (and possibly 
	> > anything else that would need re-initialisation), we should 
	> be ok, for 
	> > a process that forks before creating additional threads.
	> > 
	> > And, no.. programs are not inherently multi-threaded :)
	> > 
	> > shiv
	> > Thu Sep  4 15:10:51 EDT 2008
	> > 
	> >  --> According to Neil Schellenberger <--
	> > 	Do I want/need to know what horrible voodoo we do that 
	> makes this
	> > 	illegal? ;-)  I thought that so long as you were single 
	> threaded at the
	> > 	point of fork or made appropriate use of a fork handler 
	> you were POSIXly
	> > 	golden.  (Or are all Neutrino programs inherently 
	> multi-threaded with
	> > 	some kind of housekeeping thread(s)?)
	> > 	
	> > 	On Thu, 2008-09-04 at 13:33 -0400, Shiv Nagarajan wrote:
	> > 	> yeah, using fork with a multi-threaded process doesnt work.  
	> > 	> (whether the threads are created before or after the fork,
	> > 	> the only condition under which it would work, was if you did a
	> > 	> fork->exec and the execed process was multi-threaded)
	> > 	> 
	> > 	> procmgr_daemon would get you most of the features you 
	> would need,
	> > 	> inclding detaching from terminal. and closing relevant 
	> > 	> file descriptors and changing the working directory etc
	> > 	> 
	> > 	> the primary difference is that the fork call changes your pid,
	> > 	> while procmgr_daemon would leave the pid the same.
	> > 	> 
	> > 	> shiv
	> > 	> Thu Sep  4 13:33:42 EDT 2008
	> > 	
	> > 	
	> > 	_______________________________________________
	> > 	OSTech
	> > 	http://community.qnx.com/sf/go/post12862
	> > 
	> > --
	> > ****
	> > Shiv Nagarajan,
	> > Kernel Developer, QNX Software Systems, Ottawa, Canada
	> > ****
	> > 
	> > _______________________________________________
	> > OSTech
	> > http://community.qnx.com/sf/go/post12863
	> > 
	> 
	>...
Re: debugging heap corruption  
Well, switching to procmgr_daemon seems to have solved the problem.  Under heavy loads the process remains - no crashes.


Thanx for all the help.  This one saved my .... (butt)
Tim
Re: debugging heap corruption  
Cool..

shiv
Fri Sep  5 10:30:57 EDT 2008
 --> According to Tim Gessner <--
	Well, switching to procmgr_daemon seems to have solved the problem.  Under heavy loads the process remains - no crashes
.
	
	Thanx for all the help.  This one saved my .... (butt)
	Tim
	
	_______________________________________________
	OSTech
	http://community.qnx.com/sf/go/post12927

-- 
****
Shiv Nagarajan,
Kernel Developer, QNX Software Systems,
Ottawa, Canada
****
Re: debugging heap corruption  
Although SUSv3 does stipulate the minimum required behaviour for fork(),
the rationale for posix_atfork() does go on to explain why it is
insufficient for most real systems ;-)  Not to mention the violation of
the Principle of Least Surprise for engineers who are coming from a Unix
or Linux environment....

Joking aside, I will delve into the PR and the source code, though.
Thanks for the pointer!

Regards,
Neil

P.S. The docs for 6.3.2 fork() say that it will fail with ENOSYS if
threads already exist.  Is that really the case?

On Thu, 2008-09-04 at 15:22 -0400, Sean Boudreau wrote:
> 
> Ooooh this sounds like a fine project for the new guy :)
> 
> If you check POSIX you'll see that what we do now (nothing)
> is fine as the child is only supposed to call async safe
> funcs between fork and exec but the last consensus was that
> it's probably too dangerous to give people a loaded gun and
> not expect them to shoot themselves.
> 
> See also PR 59947 if you go looking at the fork code.
> 
> -seanb
> 
> On Thu, Sep 04, 2008 at 03:11:44PM -0400, Shiv Nagarajan wrote:
> > Its basically that we dont set up at fork handlers for things 
> > like statically intialized mutexes (for e.g. in malloc code in
> > libc). So a process that perfors a fork, would end up with
> > mutexes in the allocator in libc code in the child process that are 
> > no longer "valid". 
> > 
> > if we setup an at fork handler for the malloc mutex (and possibly
> > anything else that would need re-initialisation), we should be
> > ok, for a process that forks before creating additional threads.
> > 
> > And, no.. programs are not inherently multi-threaded :)
> > 
> > shiv
> > Thu Sep  4 15:10:51 EDT 2008
> > 
> >  --> According to Neil Schellenberger <--
> > 	Do I want/need to know what horrible voodoo we do that makes this
> > 	illegal? ;-)  I thought that so long as you were single threaded at the
> > 	point of fork or made appropriate use of a fork handler you were POSIXly
> > 	golden.  (Or are all Neutrino programs inherently multi-threaded with
> > 	some kind of housekeeping thread(s)?)
> > 	
> > 	On Thu, 2008-09-04 at 13:33 -0400, Shiv Nagarajan wrote:
> > 	> yeah, using fork with a multi-threaded process doesnt work.  
> > 	> (whether the threads are created before or after the fork,
> > 	> the only condition under which it would work, was if you did a
> > 	> fork->exec and the execed process was multi-threaded)
> > 	> 
> > 	> procmgr_daemon would get you most of the features you would need,
> > 	> inclding detaching from terminal. and closing relevant 
> > 	> file descriptors and changing the working directory etc
> > 	> 
> > 	> the primary difference is that the fork call changes your pid,
> > 	> while procmgr_daemon would leave the pid the same.
> > 	> 
> > 	> shiv
> > 	> Thu Sep  4 13:33:42 EDT 2008
> > 	
> > 	
> > 	_______________________________________________
> > 	OSTech
> > 	http://community.qnx.com/sf/go/post12862
> > 
> > -- 
> > ****
> > Shiv Nagarajan,
> > Kernel Developer, QNX Software Systems,
> > Ottawa, Canada
> > ****
> > 
> > _______________________________________________
> > OSTech
> > http://community.qnx.com/sf/go/post12863
> > 
> 
> _______________________________________________
> OSTech
> http://community.qnx.com/sf/go/post12870
> 
RE: debugging heap corruption  
 

> -----Original Message-----
> From: Neil Schellenberger [mailto:community-noreply@qnx.com] 
> Sent: September 4, 2008 5:06 PM
> To: ostech-core_os
> Subject: Re: debugging heap corruption
> 
> Although SUSv3 does stipulate the minimum required behaviour 
> for fork(), the rationale for posix_atfork() does go on to 
> explain why it is insufficient for most real systems ;-)  Not 
> to mention the violation of the Principle of Least Surprise 
> for engineers who are coming from a Unix or Linux environment....
> 
> Joking aside, I will delve into the PR and the source code, though.
> Thanks for the pointer!
> 
> Regards,
> Neil
> 
> P.S. The docs for 6.3.2 fork() say that it will fail with 
> ENOSYS if threads already exist.  Is that really the case?

Yes.  Take a look for "_Multi_threaded".