foundry27 : Post

Forum Topic - pipe() call returning errno==EBADF, undocumented: (13 Items)

View: as

Jason Fordham(deleted)

01/18/2018 4:56 PM

post118404

pipe() call returning errno==EBADF, undocumented

Hi,

I work for a company which has a Developer Partner seat, but I'm not the nominated user. We're not sure what kind of 
support we can get. But we do have an issue.

We are porting one of our products to QNX, and in the implementation of a function which forks and execs a child process
, there is this little block of code:

    if ((fds[0] == -1 && pipe(in)  < 0) ||
        (fds[1] == -1 && pipe(out) < 0) ||
        (fds[2] == -1 && pipe(err) < 0) ||
        (pipe(childenv) < 0) ||
        (pipe(fail) < 0)) {
        throwIOException(env, errno, "Bad file descriptor");
        goto Catch;
    }

Now, sometimes this gets an errno of 9, or EBADF. This is not documented at http://www.qnx.com/developers/docs/7.0.0/
index.html#com.qnx.doc.neutrino.lib_ref/topic/p/pipe.html, or, for that matter, at https://www.unix.com/man-page/posix/
3P/pipe/, so we are puzzled, and would like to know why it's happening.

There is more context: the demonstration uses two threads, each of which spawns "cat /etc/profile". For a variable 
number of iterations, both threads succeed, but eventually one fails, with this EBADF error from pipe().

Albrecht Uhlmann

Re: pipe() call returning errno==EBADF, undocumented

Albrecht Uhlmann

01/19/2018 8:47 AM

post118407

Re: pipe() call returning errno==EBADF, undocumented

One possible cause could be this:
QNX uses a dedicated resource manager, also named "pipe", to implement POSIX pipes. The C function "pipe()" communicates
 with that resource manager using an internal file descriptor. If that is getting closed, for whatever reason, you would
 see this error code.
Reasons for closing this internal fd could be some issue with fork/exec, or the (unlikely) case of the pipe process 
being terminated. When you see this error, could you check if "pipe" process is still running?

If it is still running, I would try to decompose the complex condition statement to individual calls and see where it is
 failing.

Regards,
Albrecht

Jason Fordham(deleted)

01/19/2018 1:07 PM

post118411

Re: pipe() call returning errno==EBADF, undocumented

Hi Albrecht,

We'll give the breakout a try. Are you suggesting that the pipe manager might have crashed? 

Kind regards,
Jason

Albrecht Uhlmann

01/24/2018 5:02 PM

post118456

Re: pipe() call returning errno==EBADF, undocumented

Hi Jason,
I suggest to test it in order to confirm it or rule it out. Not more. Reading the other comments though, I believe that 
this will not be the case and that the problem lies more in the client C library where something goes crazy.

Personally, I stay away from combining multithreaded programs with fork() if I can help it.

You should also verify if the issue indeed goes away when using the QNX spawn call.

Regards,
Albrecht

Elad Lahav

Re: pipe() call returning errno==EBADF, undocumented

Elad Lahav

01/19/2018 12:27 PM

post118410

Re: pipe() call returning errno==EBADF, undocumented

A multi-threaded fork() is open to all kinds of issues, and until
recently was not permitted at all. That said, I don't see why it would
be a pipe() call that ends up returning EBADF.
In which context does the code you provided run? The parent or the
child?

--Elad

On Thu, 2018-01-18 at 16:56 -0500, Jason Fordham wrote:
> Hi,
> 
> I work for a company which has a Developer Partner seat, but I'm not
> the nominated user. We're not sure what kind of support we can get.
> But we do have an issue.
> 
> We are porting one of our products to QNX, and in the implementation
> of a function which forks and execs a child process, there is this
> little block of code:
> 
>     if ((fds[0] == -1 && pipe(in)  < 0) ||
>         (fds[1] == -1 && pipe(out) < 0) ||
>         (fds[2] == -1 && pipe(err) < 0) ||
>         (pipe(childenv) < 0) ||
>         (pipe(fail) < 0)) {
>         throwIOException(env, errno, "Bad file descriptor");
>         goto Catch;
>     }
> 
> Now, sometimes this gets an errno of 9, or EBADF. This is not
> documented at http://www.qnx.com/developers/docs/7.0.0/index.html#com
> .qnx.doc.neutrino.lib_ref/topic/p/pipe.html, or, for that matter, at
> https://www.unix.com/man-page/posix/3P/pipe/, so we are puzzled, and
> would like to know why it's happening.
> 
> There is more context: the demonstration uses two threads, each of
> which spawns "cat /etc/profile". For a variable number of iterations,
> both threads succeed, but eventually one fails, with this EBADF error
> from pipe().
> 
> 
> 
> 
> 
> _______________________________________________
> 
> OSTech
> http://community.qnx.com/sf/go/post118404
> To cancel your subscription to this discussion, please e-mail ostech-
> core_os-unsubscribe@community.qnx.com

Jason Fordham(deleted)

01/19/2018 1:19 PM

post118412

Re: pipe() call returning errno==EBADF, undocumented

Hi Elad,

> A multi-threaded fork() is open to all kinds of issues, and until
> recently was not permitted at all. That said, I don't see why it would
> be a pipe() call that ends up returning EBADF.

Do you happen to know how recently this changed? Is it possible that it's not been fully exercised in the field, and 
this is a Bug?

> In which context does the code you provided run? The parent or the
> child?

Each thread runs the exec, in a loop. Same code in each thread. The child process is just cat, so it's not doing much 
beyond writing the content of the file passed as its argument  to stdout.

Kind regards,
Jason

Roger Maclean

01/19/2018 1:56 PM

post118413

Re: pipe() call returning errno==EBADF, undocumented

My guess is that this represents a race condition resulting from
simultaneous forking and opening new pipes.  A pipe/open call is not
atomic and involves creating a connection (i.e. calling ConnectAttach) and
sending messages to the remote side to complete the operation.  I suspect
that as soon as the connection is made, any fork will attempt to dup the
connection but until the remaining process completes the dup will fail.

If this is the case (and even if not), you would do better to use
posix_spawn instead of fork/exec.  Besides being faster (due to fewer
processes being created), it will bypass the problem since fds are marked
close on exec until the connection is complete.

Or perhaps you want to have a mutex around some of this if only to allow
some control over what fds are given out to other processes.

On 2018-01-19, 1:19 PM, "Jason Fordham" <community-noreply@qnx.com> wrote:

>Hi Elad,
>
>> A multi-threaded fork() is open to all kinds of issues, and until
>> recently was not permitted at all. That said, I don't see why it would
>> be a pipe() call that ends up returning EBADF.
>
>Do you happen to know how recently this changed? Is it possible that it's
>not been fully exercised in the field, and this is a Bug?
>
>> In which context does the code you provided run? The parent or the
>> child?
>
>Each thread runs the exec, in a loop. Same code in each thread. The child
>process is just cat, so it's not doing much beyond writing the content of
>the file passed as its argument  to stdout.
>
>Kind regards,
>Jason
>
>
>
>_______________________________________________
>
>OSTech
>http://community.qnx.com/sf/go/post118412
>To cancel your subscription to this discussion, please e-mail
>ostech-core_os-unsubscribe@community.qnx.com

Elad Lahav

Re: pipe() call returning errno==EBADF, undocumented

Elad Lahav

01/19/2018 2:25 PM

post118414

Re: pipe() call returning errno==EBADF, undocumented

posix_spawn() will have the same problem. Unfortunately, we have to
follow the POSIX semantics which say that all file descriptors must
first be duplicated by the child and only then those marked as close-
on-exec are closed.
The QNX spawn() call does not follow these semantics and is therefore
not prone to this issue.

--Elad

On Fri, 2018-01-19 at 18:35 +0000, Roger Maclean wrote:
> My guess is that this represents a race condition resulting from
> simultaneous forking and opening new pipes.  A pipe/open call is not
> atomic and involves creating a connection (i.e. calling
> ConnectAttach) and
> sending messages to the remote side to complete the operation.  I
> suspect
> that as soon as the connection is made, any fork will attempt to dup
> the
> connection but until the remaining process completes the dup will
> fail.
> 
> If this is the case (and even if not), you would do better to use
> posix_spawn instead of fork/exec.  Besides being faster (due to fewer
> processes being created), it will bypass the problem since fds are
> marked
> close on exec until the connection is complete.
> 
> Or perhaps you want to have a mutex around some of this if only to
> allow
> some control over what fds are given out to other processes.
> 
> 
> On 2018-01-19, 1:19 PM, "Jason Fordham" <community-noreply@qnx.com>
> wrote:
> 
> > 
> > Hi Elad,
> > 
> > > 
> > > A multi-threaded fork() is open to all kinds of issues, and until
> > > recently was not permitted at all. That said, I don't see why it
> > > would
> > > be a pipe() call that ends up returning EBADF.
> > Do you happen to know how recently this changed? Is it possible
> > that it's
> > not been fully exercised in the field, and this is a Bug?
> > 
> > > 
> > > In which context does the code you provided run? The parent or
> > > the
> > > child?
> > Each thread runs the exec, in a loop. Same code in each thread. The
> > child
> > process is just cat, so it's not doing much beyond writing the
> > content of
> > the file passed as its argument  to stdout.
> > 
> > Kind regards,
> > Jason
> > 
> > 
> > 
> > _______________________________________________
> > 
> > OSTech
> > http://community.qnx.com/sf/go/post118412
> > To cancel your subscription to this discussion, please e-mail
> > ostech-core_os-unsubscribe@community.qnx.com
> 
> 
> 
> 
> _______________________________________________
> 
> OSTech
> http://community.qnx.com/sf/go/post118413
> To cancel your subscription to this discussion, please e-mail ostech-
> core_os-unsubscribe@community.qnx.com

Albrecht Uhlmann

01/24/2018 5:06 PM

post118458

Re: pipe() call returning errno==EBADF, undocumented

Hi Elad,

"Unfortunately, we have to
follow the POSIX semantics which say that all file descriptors must
first be duplicated by the child and only then those marked as close-
on-exec are closed."

--> does POSIX give a reason for requiring this? I can't see an obvious reason...
Regards,
-Albrecht

Elad Lahav

01/24/2018 5:21 PM

post118460

Re: pipe() call returning errno==EBADF, undocumented

It's so you can run the file actions on the child's view of file
descriptors. For example, you can have a file action that duplicates fd
2 to fd 7, and then another that duplicates fd 7 to fd 19. The latter
needs to use the child's view of fd 7, not the parent's.

Please don't shoot the messenger... Personally I don't think any of
this is very useful, it's just how the standard defines posix_spawn().

--Elad

On Wed, 2018-01-24 at 17:06 -0500, Albrecht Uhlmann wrote:
> Hi Elad,
> 
> "Unfortunately, we have to
> follow the POSIX semantics which say that all file descriptors must
> first be duplicated by the child and only then those marked as close-
> on-exec are closed."
> 
> --> does POSIX give a reason for requiring this? I can't see an
> obvious reason...
> Regards,
> -Albrecht
> 
> 
> 
> _______________________________________________
> 
> OSTech
> http://community.qnx.com/sf/go/post118458
> To cancel your subscription to this discussion, please e-mail ostech-
> core_os-unsubscribe@community.qnx.com

Jason Fordham(deleted)

Re: pipe() call returning errno==EBADF, undocumented

Jason Fordham(deleted)

01/24/2018 6:23 PM

post118462

Re: pipe() call returning errno==EBADF, undocumented

Hi Roger,

> My guess is that this represents a race condition resulting from
> simultaneous forking and opening new pipes.  A pipe/open call is not
> atomic and involves creating a connection (i.e. calling ConnectAttach) and
> sending messages to the remote side to complete the operation.  I suspect
> that as soon as the connection is made, any fork will attempt to dup the
> connection but until the remaining process completes the dup will fail.

This is an interesting theory: I hope what I am about to say doesn't invalidate it.  We are certainly thinking that 
there is a race condition. We now have a native repro which uses vfork, and we'd like to share it with you.

What I had thought was an issue with pipe() turns out to be an issue with the vfork failing: there was a remapping of 
the exception detail in the ThrowIOException function, which I didn't expect and didn't check. So that was misleading to
 everyone: I'm sorry about that. We've now correctly located the errno=9 as coming from the vfork.

> If this is the case (and even if not), you would do better to use
> posix_spawn instead of fork/exec.  Besides being faster (due to fewer
> processes being created), it will bypass the problem since fds are marked
> close on exec until the connection is complete.

What we are working on is a port of OpenJDK, and the code we are concerned with is the native implementation of java.
lang.UNIXProcess.forkAndExec(). There is commentary that explains why Linux uses vfork, and others posix_spawn: vfork is
 clearly preferred, because posix_spawn needs a helper process - which adds to the number of processes.

The complexity of switching to the posix_spawn implementation means we do not have a standalone repro: however, we have 
shown (in a crude way, using a hardcoded child path) that posix_spawn has the same issues as vfork. We should have a 
cleaner build tomorrow.

One of the remarks in the commentary is that on Linux, many of the paths through posix_spawn use vfork, because that's 
how it's implemented in glibc. Is the QNX implementation of posix_spawn based on the glibc implementation? If so, that 
might be an alternative explanation of why we see the same behavior when using posix_spawn 

> Or perhaps you want to have a mutex around some of this if only to allow
> some control over what fds are given out to other processes.

Would you like to look at the standalone repro? 

Kind regards,
Jason

Roger Maclean

01/25/2018 9:47 AM

post118466

Re: pipe() call returning errno==EBADF, undocumented

Everything you say is consistent with the problem being as I suggested.

I'm not actually sure what vfork is underneath on QNX, though my guess is
it'll be very close to a regular fork and still attempts a dup of all fds
which will exhibit this issue.  Vfork is a bit of a hack that was
introduced in early UNIXes to avoid the high cost of replicating the
address space of a process when 99% of the time it is used just long
enough to execute the exec call. We support it to allow programs ported
from elsewhere to work, but is not the best way to do things on QNX.

posix_spawn is specified such that it can be implemented in a library as a
traditional fork/exec or vfork/exec though on some systems, including QNX,
it's not.  On QNX, it is more efficient to use posix_spawn or one of the
spawn* functions as these result in fewer processes being created than the
fork/exec route.  We still support fork and exec since sometimes that is
just what you want and to support programs that are ported to QNX but
they're not optimal if what you're doing can be handled by the various
spawn functions. 

If I've interpreted all the code correctly, you won't have this issue if
you use the various spawn calls (not posix_spawn) since fds are always
marked as close on exec until they are fully open and spawn doesn't try to
dup anything marked as close on exec.

As Elad says, posix_spawn, at least currently, has to dup all fds
initially irrespective of the close on exec flag so will be subject to
this race condition.  You can get around it by retrying if you get an
EBADF which will most likely be due to this issue (though you obviously
don't want to keep retrying forever).

Similarly, if you're using fork/exec you could retry it though it won't be
as good as the other options.

On 2018-01-24, 6:23 PM, "Jason Fordham" <community-noreply@qnx.com> wrote:

>Hi Roger,
>
>> My guess is that this represents a race condition resulting from
>> simultaneous forking and opening new pipes.  A pipe/open call is not
>> atomic and involves creating a connection (i.e. calling ConnectAttach)
>>and
>> sending messages to the remote side to complete the operation.  I
>>suspect
>> that as soon as the connection is made, any fork will attempt to dup the
>> connection but until the remaining process completes the dup will fail.
>
>This is an interesting theory: I hope what I am about to say doesn't
>invalidate it.  We are certainly thinking that there is a race condition.
>We now have a native repro which uses vfork, and we'd like to share it
>with you.
>
>What I had thought was an issue with pipe() turns out to be an issue with
>the vfork failing: there was a remapping of the exception detail in the
>ThrowIOException function, which I didn't expect and didn't check. So
>that was misleading to everyone: I'm sorry about that. We've now
>correctly located the errno=9 as coming from the vfork.
>
>> If this is the case (and even if not), you would do better to use
>> posix_spawn instead of fork/exec.  Besides being faster (due to fewer
>> processes being created), it will bypass the problem since fds are
>>marked
>> close on exec until the connection is complete.
>
>What we are working on is a port of OpenJDK, and the code we are
>concerned with is the native implementation of
>java.lang.UNIXProcess.forkAndExec(). There is commentary that explains
>why Linux uses vfork, and others posix_spawn: vfork is clearly preferred,
>because posix_spawn needs a helper process - which adds to the number of
>processes.
>
>The complexity of switching to the posix_spawn implementation means we do
>not have a standalone repro: however, we have shown (in a crude way,
>using a hardcoded child path) that posix_spawn has the same issues as
>vfork. We should have a...

View Full Message

Elad Lahav

01/25/2018 9:56 AM

post118467

Re: pipe() call returning errno==EBADF, undocumented

I recommend that you use spawn(), which doesn't have to follow POSIX
semantics and thus will not duplicate any fds marked with O_CLOEXEC.
vfork() on QNX is considered as deprecated (and really should not have
been used in the last 20 years or so).

--Elad

On Thu, 2018-01-25 at 14:26 +0000, Roger Maclean wrote:
> Everything you say is consistent with the problem being as I
> suggested.
> 
> I'm not actually sure what vfork is underneath on QNX, though my
> guess is
> it'll be very close to a regular fork and still attempts a dup of all
> fds
> which will exhibit this issue.  Vfork is a bit of a hack that was
> introduced in early UNIXes to avoid the high cost of replicating the
> address space of a process when 99% of the time it is used just long
> enough to execute the exec call. We support it to allow programs
> ported
> from elsewhere to work, but is not the best way to do things on QNX.
> 
> posix_spawn is specified such that it can be implemented in a library
> as a
> traditional fork/exec or vfork/exec though on some systems, including
> QNX,
> it's not.  On QNX, it is more efficient to use posix_spawn or one of
> the
> spawn* functions as these result in fewer processes being created
> than the
> fork/exec route.  We still support fork and exec since sometimes that
> is
> just what you want and to support programs that are ported to QNX but
> they're not optimal if what you're doing can be handled by the
> various
> spawn functions. 
> 
> If I've interpreted all the code correctly, you won't have this issue
> if
> you use the various spawn calls (not posix_spawn) since fds are
> always
> marked as close on exec until they are fully open and spawn doesn't
> try to
> dup anything marked as close on exec.
> 
> As Elad says, posix_spawn, at least currently, has to dup all fds
> initially irrespective of the close on exec flag so will be subject
> to
> this race condition.  You can get around it by retrying if you get an
> EBADF which will most likely be due to this issue (though you
> obviously
> don't want to keep retrying forever).
> 
> Similarly, if you're using fork/exec you could retry it though it
> won't be
> as good as the other options.
> 
> 
> On 2018-01-24, 6:23 PM, "Jason Fordham" <community-noreply@qnx.com>
> wrote:
> 
> > 
> > Hi Roger,
> > 
> > > 
> > > My guess is that this represents a race condition resulting from
> > > simultaneous forking and opening new pipes.  A pipe/open call is
> > > not
> > > atomic and involves creating a connection (i.e. calling
> > > ConnectAttach)
> > > and
> > > sending messages to the remote side to complete the operation.  I
> > > suspect
> > > that as soon as the connection is made, any fork will attempt to
> > > dup the
> > > connection but until the remaining process completes the dup will
> > > fail.
> > This is an interesting theory: I hope what I am about to say
> > doesn't
> > invalidate it.  We are certainly thinking that there is a race
> > condition.
> > We now have a native repro which uses vfork, and we'd like to share
> > it
> > with you.
> > 
> > What I had thought was an issue with pipe() turns out to be an
> > issue with
> > the vfork failing: there was a remapping of the exception detail in
> > the
> > ThrowIOException function, which I didn't expect and didn't check.
> > So
> > that was misleading to everyone: I'm sorry about that. We've now
> > correctly located the errno=9 as coming from the vfork.
> > 
> > > 
> > > If this is the case (and even if...

View Full Message

Return

The text you entered is not a valid object ID
More Information
Object IDs begin with an object prefix and end with a number. For example, if you enter
artf2345
the application will jump directly to an artifact with the ID artf2345. Some valid object prefixes are:
artf	for an artifact
doc	for a document
page	for a project page
topc	for a discussion topic
wiki	for a wiki page