Project Home
Project Home
Documents
Documents
Wiki
Wiki
Discussion Forums
Discussions
Project Information
Project Info
BroadcastCommunity.qnx.com will be offline from May 31 6:00pm until June 2 12:00AM for upcoming system upgrades. For more information please go to https://community.qnx.com/sf/discussion/do/listPosts/projects.bazaar/discussion.bazaar.topc28418
Forum Topic - Multiple issues: Page 1 of 7 (7 Items)
   
Multiple issues  
Hi All,

Let me start telling my story...
So, today my process has hung on exit. I have attached to it with gdb and found that the thread is infinitely looping in
 thread_pool_destroy(). It's 6.3.2, so there is older thread pool implementation (prior to commit #167031) with sleepons
. The thread is looping here

	/*
	 Wait for everyone to die.  We should probably not use the
	 default sleepon here, but it is such a low runner case.
	*/
	pthread_sleepon_lock();
	while(pool->created > 0) {
		pthread_sleepon_wait(pool);
	}
	pthread_sleepon_unlock();

it repeatedly calls _sleepon_wait(), that one allocates _sleepon_entry from heap, tries to initialize a new condvar for 
it, pthread_cond_init fails with EBUSY, then it releases memory block back to heap exits pthread_sleepon_wait() and 
loops again ad infinitum. All the other pool threads are blocked because the looping thread holds sleepon mutex.

So, the code is now gone and new implementation does not use sleepons. But how could _sleepon_entry be released without 
having its condvar destroyed? I checked sleepon.c implementation but could not find any possibilities for that. Were 
there any bugs like that fixed in commits after 6.3.2? 
The only possibility for the situation to happen would be if SyncTypeCreate_r() for pthread_cond_init() could return 
error status but leave condvar actually allocated (or partially allocated). 

OK, I checked the kernel. I, actually, have never peeked into the kernel code and do not know it, but on first glance 
everything looks fine for condvar allocation. Just in nano_sync.c in line 47

	// Allocate a sync entry.
	if((syp = object_alloc(prp, &sync_souls)) == NULL) {
		kererr(act, EAGAIN);
		return(NULL);
	}

	// Map the virtual address of the users sync to a physical addr.
	// (flags & PTHREAD_PROCESSSHARED_MASK) may be useful to memmgr
	// Note that we tell vaddr_to_memobj to mark this page as containing a
	// sync object.
	if((obp = memmgr.vaddr_to_memobj(prp, sync, &addr, 1)) == (OBJECT *)-1) {
		<------------------------------------ HERE
		kererr(act, EINVAL);
		return NULL;
	}

	// Add the entry to the sync vector.
	if(synchash_add(obp, addr, syp) == -1) {
		object_free(prp, &sync_souls, syp);
		kererr(act, EBUSY);
		return(NULL);
	}

would not it be good to call object_free() just like it is done in next conditional statement? Anyway, even if there is 
a memory leak it happens before binding object to physical address and I doubt it could result in EBUSY from subsequent 
calls. So the question "How could sleepon get into the state when it has lost a condvar for a released block?" remains 
open.

Also, while examining thread pool code I've found one more cosmetic issue.

In thread_pool_destroy()...

    /* destroy pool cond */
    pthread_mutex_destroy(&(props->pool_cond));
    /* destroy pool mutex */
    pthread_mutex_destroy(&(props->inline_lock));

It's great that invocations of both pthread_cond_destroy() and pthread_mutex_destroy() result in the same kernel call, 
but still, it would be better to call pthread_cond_destroy() for props->pool_cond as it is a condvar, not a mutex.