Mario Charest
11/18/2008 12:03 AM
post16714
|
6.3.2 Custom image.
10 computers all with different host name but with the same domain name.
The power sequence will power them on one by one with about half a second delay ( in case that matters ). Once started
the main machine will lauch software onto the other machines via on -f hostname ...
Normal if I do ls /net on a machine it will show all the machine but only the hostname part since the domain part is the
same. Once in a while that won't work, ls /net/ will show the complete FQN, and obviously on -f hostname won't work
because it doesnt seems to know about /net/hostname only but need /net/hostname.domain.
Why would that be?
|
|
|
Andrew Boyd(deleted)
11/18/2008 10:29 AM
post16778
|
Hiya Mario. The only time that you will see hostname.domain
in /net instead of simply hostname is when the domain is
different than yours.
I might suggest setting your domain as early as possible
in your boot sequence, eg:
# setconf _CS_DOMAIN <my_domain>
to override the default (I think it's net.intra) domain.
If you are seeing different domains (which are they?)
I might surmise that qnet is going online faster than
tcp/ip can set the hostname and domain, which might
explain the weird stuff. After a few seconds, qnet
should notice that the hostname and domain has changed,
and will re-start the discovery of it's own name and
domain (to avoid duplicates).
The earlier you can get the hostname and domain, the
less likely you will encounter these kinds of initialization
flaps.
--
aboyd
-----Original Message-----
From: Mario Charest [mailto:community-noreply@qnx.com]
Sent: Tuesday, November 18, 2008 12:03 AM
To: technology-networking
Subject: domain hiding
6.3.2 Custom image.
10 computers all with different host name but with the same domain name.
The power sequence will power them on one by one with about half a
second delay ( in case that matters ). Once started the main machine
will lauch software onto the other machines via on -f hostname ...
Normal if I do ls /net on a machine it will show all the machine but
only the hostname part since the domain part is the same. Once in a
while that won't work, ls /net/ will show the complete FQN, and
obviously on -f hostname won't work because it doesnt seems to know
about /net/hostname only but need /net/hostname.domain.
Why would that be?
_______________________________________________
Technology
http://community.qnx.com/sf/go/post16714
|
|
|
Mario Charest
11/18/2008 11:02 AM
post16789
|
-----Original Message-----
From: Andrew Boyd [mailto:community-noreply@qnx.com]
Sent: November 18, 2008 10:29 AM
To: technology-networking
Subject: RE: domain hiding
Hiya Mario. The only time that you will see hostname.domain
in /net instead of simply hostname is when the domain is
different than yours.
Well they are all the same.
I might suggest setting your domain as early as possible
in your boot sequence, eg:
# setconf _CS_DOMAIN <my_domain>
to override the default (I think it's net.intra) domain.
The boot image sets it to our own default ( company name ), then later on in the sysinit it's change to the customer
name.
If you are seeing different domains (which are they?)
Currently grade_expert.dg_stcom
I might surmise that qnet is going online faster than
tcp/ip can set the hostname and domain, which might
explain the weird stuff. After a few seconds, qnet
should notice that the hostname and domain has changed,
and will re-start the discovery of it's own name and
domain (to avoid duplicates).
The earlier you can get the hostname and domain, the
less likely you will encounter these kinds of initialization
flaps.
Will see what I can do about that.
--
aboyd
-----Original Message-----
From: Mario Charest [mailto:community-noreply@qnx.com]
Sent: Tuesday, November 18, 2008 12:03 AM
To: technology-networking
Subject: domain hiding
6.3.2 Custom image.
10 computers all with different host name but with the same domain name.
The power sequence will power them on one by one with about half a
second delay ( in case that matters ). Once started the main machine
will lauch software onto the other machines via on -f hostname ...
Normal if I do ls /net on a machine it will show all the machine but
only the hostname part since the domain part is the same. Once in a
while that won't work, ls /net/ will show the complete FQN, and
obviously on -f hostname won't work because it doesnt seems to know
about /net/hostname only but need /net/hostname.domain.
Why would that be?
_______________________________________________
Technology
http://community.qnx.com/sf/go/post16714
_______________________________________________
Technology
http://community.qnx.com/sf/go/post16778
|
|
|
Andrew Boyd(deleted)
11/18/2008 11:13 AM
post16796
|
> The boot image sets it to our own default
> ( company name ), then later on in the sysinit
> it's change to the customer name.
Ah ha. Not sure of the sequence of your sysinit
but if qnet goes online with the default (company
name) domain, it may start to populate /net with
other nodes which have the customer domain, which
is different from it's at the time.
And later, when the hostname/domain are changed
(eg TCP/IP initialization) this DOES NOT cause qnet
to delete all of the entries in /net - merely it's
own.
> > The earlier you can get the hostname and domain,
> > the less likely you will encounter these kinds of
> > initialization flaps.
>
> Will see what I can do about that.
Good. Ideally, the hostname and domain should be
correctly set to their (final) values before qnet
goes online. Qnet doesn't actually care if you
continually change hostname/domain - it's got the
code in it for those the corner cases - but I suspect
it would make things easier for your application to
not be changing them after qnet goes online.
--
aboyd
|
|
|
Mario Charest
11/18/2008 11:20 AM
post16800
|
-----Original Message-----
From: Andrew Boyd [mailto:community-noreply@qnx.com]
Sent: November 18, 2008 11:13 AM
To: technology-networking
Subject: RE: domain hiding
> The boot image sets it to our own default
> ( company name ), then later on in the sysinit
> it's change to the customer name.
Ah ha. Not sure of the sequence of your sysinit
but if qnet goes online with the default (company
name) domain, it may start to populate /net with
other nodes which have the customer domain, which
is different from it's at the time.
And later, when the hostname/domain are changed
(eg TCP/IP initialization) this DOES NOT cause qnet
to delete all of the entries in /net - merely it's
own.
What is odd is when I do ls /net ALL the nodes, even the one I'm on, shows the complete FQN.
> > The earlier you can get the hostname and domain,
> > the less likely you will encounter these kinds of
> > initialization flaps.
>
> Will see what I can do about that.
Good. Ideally, the hostname and domain should be
correctly set to their (final) values before qnet
goes online.
That is currently not possible, we start qnet in the .boot, that way if the file system is corrupted we still get access
to the system and can attempt recovery. Only later on in the sysinit is the domain set to what it needs to be.
Qnet doesn't actually care if you
continually change hostname/domain - it's got the
code in it for those the corner cases - but I suspect
it would make things easier for your application to
not be changing them after qnet goes online.
--
aboyd
_______________________________________________
Technology
http://community.qnx.com/sf/go/post16796
|
|
|
Andrew Boyd(deleted)
11/18/2008 11:28 AM
post16803
|
> > Ideally, the hostname and domain should be
> > correctly set to their (final) values before qnet
> > goes online.
>
> That is currently not possible, we start qnet in the .boot,
> that way if the file system is corrupted we still get access
> to the system and can attempt recovery. Only later on in
> the sysinit is the domain set to what it needs to be.
Ok, when you change the domain, also shell this out:
# rmdir /net/*
to try to clean things up. Note that this will have
the effect of tearing down ANY AND ALL node connections
that have been previously established via qnet.
--
aboyd
|
|
|
Mario Charest
11/18/2008 4:49 PM
post16838
|
rmdir /net/* doesn't do any good. I get a resource busy on the name of the local machine.
Now I'm not 100% sure about this but it seems the problem show up almost an hour after booting.
The only way to fix the problem one it happens was to change to domain name to something else and then restore it to
what it was originaly.
-----Original Message-----
From: Andrew Boyd [mailto:community-noreply@qnx.com]
Sent: November 18, 2008 11:28 AM
To: technology-networking
Subject: RE: domain hiding
> > Ideally, the hostname and domain should be
> > correctly set to their (final) values before qnet
> > goes online.
>
> That is currently not possible, we start qnet in the .boot,
> that way if the file system is corrupted we still get access
> to the system and can attempt recovery. Only later on in
> the sysinit is the domain set to what it needs to be.
Ok, when you change the domain, also shell this out:
# rmdir /net/*
to try to clean things up. Note that this will have
the effect of tearing down ANY AND ALL node connections
that have been previously established via qnet.
--
aboyd
_______________________________________________
Technology
http://community.qnx.com/sf/go/post16803
|
|
|
Andrew Boyd(deleted)
11/18/2008 5:02 PM
post16840
|
> rmdir /net/* doesn't do any good. I get a resource
> busy on the name of the local machine.
That's to be expected - try it at the command line! You
can ignore the error message. It will still have the
effect of completely cleaning out all the entries in the
NDB and tearing down all existing qnet connections (pretty
drastic).
> Now I'm not 100% sure about this but it seems the problem
> show up almost an hour after booting.
That's really weird. I thought it was a boot-time problem,
when you were changing the domain.
> The only way to fix the problem one it happens was to
> change to domain name to something else and then restore
> it to what it was originaly.
ok, like I said, qnet is coded to handle the hostname
and domain changing continually, but the next time this
happens, can you do the following on both machines
involved:
# hostname
# getconf _CS_DOMAIN
# ls /net
Thanks,
--
aboyd
|
|
|
Mario Charest
11/18/2008 5:11 PM
post16841
|
-----Original Message-----
From: Andrew Boyd [mailto:community-noreply@qnx.com]
Sent: November 18, 2008 5:02 PM
To: technology-networking
Subject: RE: domain hiding
> rmdir /net/* doesn't do any good. I get a resource
> busy on the name of the local machine.
That's to be expected - try it at the command line!
Command line? That's all I use ;-)
You
can ignore the error message. It will still have the
effect of completely cleaning out all the entries in the
NDB and tearing down all existing qnet connections (pretty
drastic).
Yeah I do an ls /net write after the rmdir and I see it gets emptied
> Now I'm not 100% sure about this but it seems the problem
> show up almost an hour after booting.
That's really weird. I thought it was a boot-time problem,
when you were changing the domain.
So did I but an hour after a boot I try to restart our software and they wouldn't start. Although I had restarted them
many time. Maybe I wrongly assumed it was a boot time issue.
> The only way to fix the problem one it happens was to
> change to domain name to something else and then restore
> it to what it was originaly.
ok, like I said, qnet is coded to handle the hostname
and domain changing continually, but the next time this
happens, can you do the following on both machines
involved:
# hostname
# getconf _CS_DOMAIN
# ls /net
Will do!
Thanks,
Thanks to you!
|
|
|
Mario Charest
11/18/2008 9:18 PM
post16843
|
ok, like I said, qnet is coded to handle the hostname
and domain changing continually, but the next time this
happens, can you do the following on both machines
involved:
# hostname
# getconf _CS_DOMAIN
# ls /net
Right after a reboot. I changed the sysinit to have the new
This is from machine controller where everything is looking normal.
----------------------------------------------------------------------------
#controller:/> ls /net
cam1_bas cam3_haut controller
cam1_haut cam4_bas grade_analyseur
cam2_bas cam4_haut hpatenaude.comact.domain
cam2_haut cam5_bas simulator
cam3_bas cam5_haut
#controller:/> getconf _CS_DOMAIN
grade_expert.dg_stcome
#controller:/> hostname
controller
#controller:/>
This is from machine cam3_haut:
-------------------------------------------------------
#cam3_haut:/> ls /net
cam1_bas.grade_expert.dg_stcome cam5_bas.grade_expert.dg_stcome
cam1_haut.grade_expert.dg_stcome cam5_haut.grade_expert.dg_stcome
cam2_bas.grade_expert.dg_stcome controller.grade_expert.dg_stcome
cam2_haut.grade_expert.dg_stcome grade_analyseur.grade_expert.dg_stcome
cam3_bas.grade_expert.dg_stcome hpatenaude.comact.domain
cam3_haut.grade_expert.dg_stcome simulator.grade_expert.dg_stcome
cam4_bas.grade_expert.dg_stcome
cam4_haut.grade_expert.dg_stcome
#cam3_haut:/> hostname
cam3_haut
#cam3_haut:/> getconf _CS_DOMAIN
grade_expert.dg_stcome
-----------------------------------------------------------
I did setconf _CS_DOMAIN grade_expert.dg_stcom on cam3_haut and everything came back to normal.
All the machines now have a rmdir /net/* after their setconf.
Thanks,
--
aboyd
_______________________________________________
Technology
http://community.qnx.com/sf/go/post16840
|
|
|
Mario Charest
11/18/2008 10:40 PM
post16844
|
-----Original Message-----
From: Andrew Boyd [mailto:community-noreply@qnx.com]
Sent: November 18, 2008 5:02 PM
To: technology-networking
Subject: RE: domain hiding
ok, like I said, qnet is coded to handle the hostname
and domain changing continually, but the next time this
happens, can you do the following on both machines
involved:
# hostname
# getconf _CS_DOMAIN
# ls /net
After recuperating from a disaster because I put rm -rf /net/* instead of rmdir /net/* in one of the machine, which
after it rebooted, deleted everything on its path... I manage to reproduce the problem then type
setconf _CS_DOMAIN ... (with the right domain name)
It didn't fix the problem, then I set a dummy name then set the proper domain name again ( by using the up arrow to
reuse the same command I first tried ) and it worked!
Thanks,
--
aboyd
_______________________________________________
Technology
http://community.qnx.com/sf/go/post16840
|
|
|
Mario Charest
11/19/2008 8:51 AM
post16873
|
I`ve worked around the problem by doing the following close to the end of the sysinit ( before our software start ).
setconf _CS_DOMAIN dummy
setconf _CS_DOMAIN customername
-----Original Message-----
From: Andrew Boyd [mailto:community-noreply@qnx.com]
Sent: November 18, 2008 5:02 PM
To: technology-networking
Subject: RE: domain hiding
> rmdir /net/* doesn't do any good. I get a resource
> busy on the name of the local machine.
That's to be expected - try it at the command line! You
can ignore the error message. It will still have the
effect of completely cleaning out all the entries in the
NDB and tearing down all existing qnet connections (pretty
drastic).
> Now I'm not 100% sure about this but it seems the problem
> show up almost an hour after booting.
That's really weird. I thought it was a boot-time problem,
when you were changing the domain.
> The only way to fix the problem one it happens was to
> change to domain name to something else and then restore
> it to what it was originaly.
ok, like I said, qnet is coded to handle the hostname
and domain changing continually, but the next time this
happens, can you do the following on both machines
involved:
# hostname
# getconf _CS_DOMAIN
# ls /net
Thanks,
--
aboyd
_______________________________________________
Technology
http://community.qnx.com/sf/go/post16840
|
|
|
Andrew Boyd(deleted)
11/19/2008 9:54 AM
post16887
|
> I`ve worked around the problem by doing the following
> close to the end of the sysinit
>
> setconf _CS_DOMAIN dummy
> setconf _CS_DOMAIN customername
Congratulations on the workaround! I was wondering if
you might have had a blank or unprintable character in
your domain name - which would make it look the same, but
fail the strcmp() - but that's a moot point.
--
aboyd
|
|
|
|