Project Home
Project Home
Documents
Documents
Wiki
Wiki
Discussion Forums
Discussions
Project Information
Project Info
Forum Topic - spawn() fails on QNX 6.5.0: (18 Items)
   
spawn() fails on QNX 6.5.0  
I have a multi-threaded program that is failing on the spawn() call.
I get EBADF, but when I log the parameters to spawn() none of them look wrong.

This is happening after my program goes through a process of:
1.  Shut down a process that talks to the CAN driver.
2. Spawn a test mode program, which interfaces to the CAN driver in a different way
2. Shut down the test mode program (with kill process).
3.  Restart the program that talks to the CAN_ResMgr driver.

After several cycles of the above, the device manager that's doing the spawn() gets EBADF from the spawn() call (either 
one) and after that spawn() will not work for the device manager until the device manager is killed and restarted.

Again, all the parameters to the spawn() call look fine both before and after the call.
Re: spawn() fails on QNX 6.5.0  
Additional information:

Rhp_char_t ** const  ExecutablePath = argv ;
struct inheritance inherit;
memset (&inherit,0,sizeof(inheritance));
inherit.flags =SPAWN_NOZOMBIE ;

			Rhp_int32_t fd_map[] = { STDIN_FILENO, STDOUT_FILENO, STDERR_FILENO };

			childProcessID = spawnp(ExecutablePath[0],3,fd_map,&inherit,ExecutablePath,NULL);

This is the code, without the logging.
Re: spawn() fails on QNX 6.5.0  
There's fairly well understood race condition when doing things like fork/spawn in a multithreaded process. During the 
process creation phase the process manager has to iterate over the file descriptors in the parent process that need to 
be duplicated in the child, and then send an _IO_DUP message to the resource manager.

The process of opening/closing a file descriptor involves:
1. Open a connection to the resource manager
2. Send a _IO_CONNECT message with information about the file you're interested in
3. The resource manager creates an OCB for the client process

And the process of closing a file descriptor is:
1. Send an _IO_CLOSE message on the coid
2. The resource manager cleans up the OCB associated with the client's file
3. Detach the connection from the resource manager.

If the process manager tries to duplicate a file descriptor that is partially opened or partially closed in the 
parent(the connection exists but the resource manager doesn't have information about it), the spawn/fork will fail with 
EBADF.

It sounds like you may be hitting this race condition, or may be ending up with a file descriptor that is partially open
.
Re: spawn() fails on QNX 6.5.0  
I think I must be ending up with a partially opened FD.  How to I fix it without restarting my device manager program?

Restarting that program isn't a good option, because to do that it would have to shut down all of our other applications
 (we have about 20-30 running) and then restarting everything again.

I'm passing in STDIN, STDOUT and STDERR, not other files the parent has opened.

I did try spawnv() as well as spawnp() and spawn(). 

spawnv() fails badly because apparently the FDs aren't able to be duplicated.
Re: spawn() fails on QNX 6.5.0  
So, would putting multi-second sleep() calls between the spawn() tries help?  That might be an option.

Thanks for replying quickly, BTW.
Re: spawn() fails on QNX 6.5.0  
Are you opening/closing fds 0,1,2 in any of the other threads? If those files are open and valid it may be some other 
issue.
Re: spawn() fails on QNX 6.5.0  
I found code that was calling printf().  This code is useless as the program is making itself a daemon with
procmgr_daemon().

Are you saying we shouldn't output data to fds 0, 1, or 2?
Re: spawn() fails on QNX 6.5.0  
BTW, I did try putting delays in my code, and that didn't help, so I think it's not a timing issue.
Re: spawn() fails on QNX 6.5.0  
One of the things `procmgr_daemon` does is it attempts to close all file descriptors, opens /dev/null and dups it to 0,1
,2. This may or may not be relevant.

One thing you could try is: once the system is in the state where spawnp fails with EBADF, you could try using 
ConnectServerInfo(0, ...) to scan the process's fds and check the flags for `_NTO_COF_DEAD`, and/or iofdinfo to see if 
the connections are still valid.

ConnectServerInfo can be used in a loop to scan for file descriptors. e.g.

struct _server_info sinfo;
next_fd = 5;
while (...) {
int fd = ConnectServerInfo(0, next_fd, &sinfo);
if (fd == -1 || fd & _NTO_SIDE_CHANNEL) break;
print_info_about_connection();
next_fd = fd+1;
}

if there is a connection with coid 5, ConnectServerInfo will return 5 (and sinfo will have information about 5). If 5 is
 not valid, it will return the next first valid coid (6,7,8...).
You can stop looping when `(fd & _NTO_SIDE_CHANNEL) == _NTO_SIDE_CHANNEL`
 
Re: spawn() fails on QNX 6.5.0  
I added your suggested code into my program, but it's not finding any FDs with _NTO_COF_DEAD
in the flags.

The flags are all either 0 or 1 (_NTO_COF_CLOEXEC).

Anything else I should look for?
Re: spawn() fails on QNX 6.5.0  
Once spawnp starts returning EBADF do you know that FDs 0,1,2 are all still valid? If you have traces, do you see the 
resource manager receiving the _IO_DUP messages during the spawn process?
Re: spawn() fails on QNX 6.5.0  
I'm not running my program in the debugger, if that's what you're asking about.
It's rather difficult for me to do so, as the lab test stand has it's own PC that connects directly to our QNX device.
This lab PC doesn't have Momentics on it..

Is there another way to get this information you talk about?  
Also, how I can tell if fds 0, 1 and 2 are valid or not?
What are their coids?  0, 1 and 2?

Re: spawn() fails on QNX 6.5.0  
pidin fds

should indicate if the fds are sane.
Re: spawn() fails on QNX 6.5.0  
Yes, for fds there is no translation, so they'll just be 0,1,2. I mentioned in a previous comment that you could try 
calling `iofdinfo`, maybe `fstat` would be easier.

If the spawnp is failing, I would expect that one of them is no longer associated with an open file in the server, but I
 don't know what would cause it to become invalid.
Re: spawn() fails on QNX 6.5.0  
I checked pidin fds for my process.

Before the issue happens, I see fds as:
0          1 rw     0 /dev/devh-usb.so
1          1
2          1

After the spawn() issue happens:
1          1
2          1

It looks like FD 0 gets closed somehow, if I'm reading this right.
Re: spawn() fails on QNX 6.5.0  
Yes, it could be closed. The other possibility would be that something has gone wrong on the resmgr side. The way 'pidin
 fds' works is that pidin dups the process's fs and calls iofdinfo on the result to get the path. So if something is bad
 on the resmgr side it might also make it vanish. Thought the getting closed is perhaps more likely. 
Re: spawn() fails on QNX 6.5.0  
Well, good news I think.
I was able to dup() the STDIN descriptor before it went bad and then restore it so that spawn() could function after the
 EBADF error.  So, I restore it with dup2() and call spawn() again and that works.

However, I'm trying to determine how STDIN is getting closed or messed up in the first place.

Is there any way to tell if it's getting closed or if the resmgr is messed up?
If the resmgr is messed up, can it be restarted?
Re: spawn() fails on QNX 6.5.0  
That sounds like you're closing it. If the resmgr was screwing up then you'd not likely to be able to recover it.