foundry27 : Post

Forum Topic - Multiple issues: Page 1 of 7 (7 Items)

View: as

Oleh Derevenko(deleted)

04/17/2009 12:54 PM

post27385

Hi All,

Let me start telling my story...
So, today my process has hung on exit. I have attached to it with gdb and found that the thread is infinitely looping in
 thread_pool_destroy(). It's 6.3.2, so there is older thread pool implementation (prior to commit #167031) with sleepons
. The thread is looping here

	/*
	 Wait for everyone to die.  We should probably not use the
	 default sleepon here, but it is such a low runner case.
	*/
	pthread_sleepon_lock();
	while(pool->created > 0) {
		pthread_sleepon_wait(pool);
	}
	pthread_sleepon_unlock();

it repeatedly calls _sleepon_wait(), that one allocates _sleepon_entry from heap, tries to initialize a new condvar for 
it, pthread_cond_init fails with EBUSY, then it releases memory block back to heap exits pthread_sleepon_wait() and 
loops again ad infinitum. All the other pool threads are blocked because the looping thread holds sleepon mutex.

So, the code is now gone and new implementation does not use sleepons. But how could _sleepon_entry be released without 
having its condvar destroyed? I checked sleepon.c implementation but could not find any possibilities for that. Were 
there any bugs like that fixed in commits after 6.3.2? 
The only possibility for the situation to happen would be if SyncTypeCreate_r() for pthread_cond_init() could return 
error status but leave condvar actually allocated (or partially allocated). 

OK, I checked the kernel. I, actually, have never peeked into the kernel code and do not know it, but on first glance 
everything looks fine for condvar allocation. Just in nano_sync.c in line 47

	// Allocate a sync entry.
	if((syp = object_alloc(prp, &sync_souls)) == NULL) {
		kererr(act, EAGAIN);
		return(NULL);
	}

	// Map the virtual address of the users sync to a physical addr.
	// (flags & PTHREAD_PROCESSSHARED_MASK) may be useful to memmgr
	// Note that we tell vaddr_to_memobj to mark this page as containing a
	// sync object.
	if((obp = memmgr.vaddr_to_memobj(prp, sync, &addr, 1)) == (OBJECT *)-1) {
		<------------------------------------ HERE
		kererr(act, EINVAL);
		return NULL;
	}

	// Add the entry to the sync vector.
	if(synchash_add(obp, addr, syp) == -1) {
		object_free(prp, &sync_souls, syp);
		kererr(act, EBUSY);
		return(NULL);
	}

would not it be good to call object_free() just like it is done in next conditional statement? Anyway, even if there is 
a memory leak it happens before binding object to physical address and I doubt it could result in EBUSY from subsequent 
calls. So the question "How could sleepon get into the state when it has lost a condvar for a released block?" remains 
open.

Also, while examining thread pool code I've found one more cosmetic issue.

In thread_pool_destroy()...

    /* destroy pool cond */
    pthread_mutex_destroy(&(props->pool_cond));
    /* destroy pool mutex */
    pthread_mutex_destroy(&(props->inline_lock));

It's great that invocations of both pthread_cond_destroy() and pthread_mutex_destroy() result in the same kernel call, 
but still, it would be better to call pthread_cond_destroy() for props->pool_cond as it is a condvar, not a mutex.

Return

The text you entered is not a valid object ID
More Information
Object IDs begin with an object prefix and end with a number. For example, if you enter
artf2345
the application will jump directly to an artifact with the ID artf2345. Some valid object prefixes are:
artf	for an artifact
doc	for a document
page	for a project page
topc	for a discussion topic
wiki	for a wiki page