Edward Llewellyn
|
spawn() fails on QNX 6.5.0
|
Edward Llewellyn
01/25/2022 4:36 PM
post121687
|
spawn() fails on QNX 6.5.0
I have a multi-threaded program that is failing on the spawn() call.
I get EBADF, but when I log the parameters to spawn() none of them look wrong.
This is happening after my program goes through a process of:
1. Shut down a process that talks to the CAN driver.
2. Spawn a test mode program, which interfaces to the CAN driver in a different way
2. Shut down the test mode program (with kill process).
3. Restart the program that talks to the CAN_ResMgr driver.
After several cycles of the above, the device manager that's doing the spawn() gets EBADF from the spawn() call (either
one) and after that spawn() will not work for the device manager until the device manager is killed and restarted.
Again, all the parameters to the spawn() call look fine both before and after the call.
|
|
|
Edward Llewellyn
|
Re: spawn() fails on QNX 6.5.0
|
Edward Llewellyn
01/25/2022 4:44 PM
post121688
|
Re: spawn() fails on QNX 6.5.0
Additional information:
Rhp_char_t ** const ExecutablePath = argv ;
struct inheritance inherit;
memset (&inherit,0,sizeof(inheritance));
inherit.flags =SPAWN_NOZOMBIE ;
Rhp_int32_t fd_map[] = { STDIN_FILENO, STDOUT_FILENO, STDERR_FILENO };
childProcessID = spawnp(ExecutablePath[0],3,fd_map,&inherit,ExecutablePath,NULL);
This is the code, without the logging.
|
|
|
Ian Larson
|
Re: spawn() fails on QNX 6.5.0
|
Ian Larson
01/25/2022 5:33 PM
post121689
|
Re: spawn() fails on QNX 6.5.0
There's fairly well understood race condition when doing things like fork/spawn in a multithreaded process. During the
process creation phase the process manager has to iterate over the file descriptors in the parent process that need to
be duplicated in the child, and then send an _IO_DUP message to the resource manager.
The process of opening/closing a file descriptor involves:
1. Open a connection to the resource manager
2. Send a _IO_CONNECT message with information about the file you're interested in
3. The resource manager creates an OCB for the client process
And the process of closing a file descriptor is:
1. Send an _IO_CLOSE message on the coid
2. The resource manager cleans up the OCB associated with the client's file
3. Detach the connection from the resource manager.
If the process manager tries to duplicate a file descriptor that is partially opened or partially closed in the
parent(the connection exists but the resource manager doesn't have information about it), the spawn/fork will fail with
EBADF.
It sounds like you may be hitting this race condition, or may be ending up with a file descriptor that is partially open
.
|
|
|
Edward Llewellyn
|
Re: spawn() fails on QNX 6.5.0
|
Edward Llewellyn
01/25/2022 5:39 PM
post121690
|
Re: spawn() fails on QNX 6.5.0
I think I must be ending up with a partially opened FD. How to I fix it without restarting my device manager program?
Restarting that program isn't a good option, because to do that it would have to shut down all of our other applications
(we have about 20-30 running) and then restarting everything again.
I'm passing in STDIN, STDOUT and STDERR, not other files the parent has opened.
I did try spawnv() as well as spawnp() and spawn().
spawnv() fails badly because apparently the FDs aren't able to be duplicated.
|
|
|
Edward Llewellyn
|
Re: spawn() fails on QNX 6.5.0
|
Edward Llewellyn
01/25/2022 5:41 PM
post121691
|
Re: spawn() fails on QNX 6.5.0
So, would putting multi-second sleep() calls between the spawn() tries help? That might be an option.
Thanks for replying quickly, BTW.
|
|
|
Ian Larson
|
Re: spawn() fails on QNX 6.5.0
|
Ian Larson
01/25/2022 5:52 PM
post121692
|
Re: spawn() fails on QNX 6.5.0
Are you opening/closing fds 0,1,2 in any of the other threads? If those files are open and valid it may be some other
issue.
|
|
|
Edward Llewellyn
|
Re: spawn() fails on QNX 6.5.0
|
Edward Llewellyn
01/27/2022 11:49 AM
post121693
|
Re: spawn() fails on QNX 6.5.0
I found code that was calling printf(). This code is useless as the program is making itself a daemon with
procmgr_daemon().
Are you saying we shouldn't output data to fds 0, 1, or 2?
|
|
|
Edward Llewellyn
|
Re: spawn() fails on QNX 6.5.0
|
Edward Llewellyn
01/27/2022 12:08 PM
post121694
|
Re: spawn() fails on QNX 6.5.0
BTW, I did try putting delays in my code, and that didn't help, so I think it's not a timing issue.
|
|
|
Ian Larson
|
Re: spawn() fails on QNX 6.5.0
|
Ian Larson
01/27/2022 12:29 PM
post121695
|
Re: spawn() fails on QNX 6.5.0
One of the things `procmgr_daemon` does is it attempts to close all file descriptors, opens /dev/null and dups it to 0,1
,2. This may or may not be relevant.
One thing you could try is: once the system is in the state where spawnp fails with EBADF, you could try using
ConnectServerInfo(0, ...) to scan the process's fds and check the flags for `_NTO_COF_DEAD`, and/or iofdinfo to see if
the connections are still valid.
ConnectServerInfo can be used in a loop to scan for file descriptors. e.g.
struct _server_info sinfo;
next_fd = 5;
while (...) {
int fd = ConnectServerInfo(0, next_fd, &sinfo);
if (fd == -1 || fd & _NTO_SIDE_CHANNEL) break;
print_info_about_connection();
next_fd = fd+1;
}
if there is a connection with coid 5, ConnectServerInfo will return 5 (and sinfo will have information about 5). If 5 is
not valid, it will return the next first valid coid (6,7,8...).
You can stop looping when `(fd & _NTO_SIDE_CHANNEL) == _NTO_SIDE_CHANNEL`
|
|
|
Edward Llewellyn
|
Re: spawn() fails on QNX 6.5.0
|
Edward Llewellyn
01/27/2022 2:42 PM
post121696
|
Re: spawn() fails on QNX 6.5.0
I added your suggested code into my program, but it's not finding any FDs with _NTO_COF_DEAD
in the flags.
The flags are all either 0 or 1 (_NTO_COF_CLOEXEC).
Anything else I should look for?
|
|
|
Ian Larson
|
Re: spawn() fails on QNX 6.5.0
|
Ian Larson
01/27/2022 4:08 PM
post121697
|
Re: spawn() fails on QNX 6.5.0
Once spawnp starts returning EBADF do you know that FDs 0,1,2 are all still valid? If you have traces, do you see the
resource manager receiving the _IO_DUP messages during the spawn process?
|
|
|
Edward Llewellyn
|
Re: spawn() fails on QNX 6.5.0
|
Edward Llewellyn
01/27/2022 4:24 PM
post121698
|
Re: spawn() fails on QNX 6.5.0
I'm not running my program in the debugger, if that's what you're asking about.
It's rather difficult for me to do so, as the lab test stand has it's own PC that connects directly to our QNX device.
This lab PC doesn't have Momentics on it..
Is there another way to get this information you talk about?
Also, how I can tell if fds 0, 1 and 2 are valid or not?
What are their coids? 0, 1 and 2?
|
|
|
Roger Maclean
|
Re: spawn() fails on QNX 6.5.0
|
Roger Maclean
01/27/2022 4:28 PM
post121699
|
Re: spawn() fails on QNX 6.5.0
pidin fds
should indicate if the fds are sane.
|
|
|
Ian Larson
|
Re: spawn() fails on QNX 6.5.0
|
Ian Larson
01/27/2022 4:28 PM
post121700
|
Re: spawn() fails on QNX 6.5.0
Yes, for fds there is no translation, so they'll just be 0,1,2. I mentioned in a previous comment that you could try
calling `iofdinfo`, maybe `fstat` would be easier.
If the spawnp is failing, I would expect that one of them is no longer associated with an open file in the server, but I
don't know what would cause it to become invalid.
|
|
|
Edward Llewellyn
|
Re: spawn() fails on QNX 6.5.0
|
Edward Llewellyn
01/27/2022 4:52 PM
post121701
|
Re: spawn() fails on QNX 6.5.0
I checked pidin fds for my process.
Before the issue happens, I see fds as:
0 1 rw 0 /dev/devh-usb.so
1 1
2 1
After the spawn() issue happens:
1 1
2 1
It looks like FD 0 gets closed somehow, if I'm reading this right.
|
|
|
Roger Maclean
|
Re: spawn() fails on QNX 6.5.0
|
Roger Maclean
01/27/2022 5:12 PM
post121702
|
Re: spawn() fails on QNX 6.5.0
Yes, it could be closed. The other possibility would be that something has gone wrong on the resmgr side. The way 'pidin
fds' works is that pidin dups the process's fs and calls iofdinfo on the result to get the path. So if something is bad
on the resmgr side it might also make it vanish. Thought the getting closed is perhaps more likely.
|
|
|
Edward Llewellyn
|
Re: spawn() fails on QNX 6.5.0
|
Edward Llewellyn
01/28/2022 4:13 PM
post121703
|
Re: spawn() fails on QNX 6.5.0
Well, good news I think.
I was able to dup() the STDIN descriptor before it went bad and then restore it so that spawn() could function after the
EBADF error. So, I restore it with dup2() and call spawn() again and that works.
However, I'm trying to determine how STDIN is getting closed or messed up in the first place.
Is there any way to tell if it's getting closed or if the resmgr is messed up?
If the resmgr is messed up, can it be restarted?
|
|
|
Roger Maclean
|
Re: spawn() fails on QNX 6.5.0
|
Roger Maclean
01/31/2022 8:09 AM
post121704
|
Re: spawn() fails on QNX 6.5.0
That sounds like you're closing it. If the resmgr was screwing up then you'd not likely to be able to recover it.
|
|
|
|