24 Sep 07


	Writing Native Network Drivers For io-pkt
	-----------------------------------------

This guide is designed to help you understand and
write "native" network drivers for io-pkt.

Before we get too deep, following are some essential
definitions that you need to understand:

io-pkt: 
	aka io-net3.  Successor to io-net (aka io-net2).
	Based much more closely on BSD4 than io-net which 
	should make porting protocols and drivers much easier.
	Mbufs not npkts are the underlying packet buffer.
	Threading model is entirely different, which
	hopefully should show some performance increases
	during rx.

native driver: 
	A network driver which is specifically written for
	io-pkt and takes full advantage of the performance
	increase with respect to avoiding a thread switch
	during rx, and permitting the stack to be multi-threaded.

shim driver:
	An unmodified io-net network driver binary which
    operates under io-pkt with the help of the devnp-shim.so
    "shim" driver layer.  The advantage of this is that
    any pre-existing io-net driver will work under io-pkt.  
	The disadvantage is that maximum throughput performance 
	will not be obtained (compared to a native driver) because 
	of the thread switch during packet rx performed by the shim.

BSD driver:
	A (nearly) unmodified BSD network driver which operates
    under io-pkt.  Examples of these would be devnp-fxp.so
    (BSD speedo driver) and devnp-bge.so (BSD tigon3 BCM 57xx
	driver).  See the "Porting BSD Driver" document for more
    information.  What you need to know is that by default,
    BSD drivers operate, with the aid of the source library
    emulation library (see io-pkt/sys/dev/lib/*) in single-
	threaded stack mode.  This is because the interrupt and
    threading model of io-pkt differs considerably from BSD.
	So, unless the BSD driver source has been carefully reviewed
	and modified as required to deal with all possible multi-threading 
    issues, whenever there is a BSD driver running, the entire
	stack is single-threaded.


100,000 foot overview
---------------------

Any network driver can be viewed as the "glue" between the
underlying network hardware, and the software infrastructure
of the operating system protocol stack above it.

So, the "bottom half" of the driver is coded specifically
for the particular hardare it supports.  And the "top half"
of the driver is coded specifically for the software 
infrastructure.  

This document deals specifically with the "top half" of the 
io-pkt "native" driver, which deals with the io-pkt software
infrastructure.


	What Does The Driver API To io-pkt Look Like?
	---------------------------------------------

If you look at an existing io-pkt driver, the problem is that
it's going to be cluttered up with all sorts of hardware-specific
junk (eg "bottom half" of driver) which is going to distract you
from understanding the API to io-pkt.

With this in mind, Seanb wrote the completely hardware-independent
"sample" driver, which can be found in the source tree at:

	io-pkt/sys/dev_qnx/sample/sam.c

Any driver can be considered to have the following functional
areas:

	Initialization
	Interrupt Handling
	Receive packet
	Transmit packet
	Periodic Timers
	Out of Band (control)
	Shutdown

Bring up the source to sam.c (above) in another window, and let's
take a look at each functional area.


	Initialization
	--------------

Initialization is probably the trickiest part of an io-pkt driver,
because unlike an io-net driver, part of the initialization code
will be called over and over again by the stack.  So, it must be
coded accordingly.  It is very easy to have a driver bug where
it works at first, but stops working after the stack re-initializes
it.

But I'm getting ahead of myself.  Initialization begins with this:

	struct nw_dll_syms sam_syms[] = {
	        {"iopkt_drvr_entry", &IOPKT_DRVR_ENTRY_SYM(sam)},
	        {NULL, NULL}
	};

This tells the stack to execute the sam_entry() function, which
in turn calls the dev_attach() function for every instance of
the hardware, of which there may be none, one or several.

The dev_attach() function, through pre-processor trickery, gets
a pointer to the following, via the "&sam_ca" parameter:

	CFATTACH_DECL(sam,
	    sizeof(struct sam_dev),
	    NULL,
	    sam_attach,
	    sam_detach,
	    NULL);

So for each instance of the hardware, the sam_attach() function
will be called ONCE and only ONCE.  The sam_attach() function 
basically does two things:  allocate resources (eg required for
the hardware) and hook itself up to the stack.

Looking at sam_attach() we can see it hooking itself up to the
stack, in two main ways.  

One is by setting the callout functions in it's ifp struct.  For 
example, when the stack wants to transmit a packet, it calls the 
ifp->if_start function pointer, which has NOTHING to do with 
initialization, btw.  In sam_attach() we see that the ifp->if_start 
function pointer is set to the address of the sam_start() packet 
transmit function.

Second is by setting up for the hardware interrupt by calling
the interrupt_entry_init() stack function, which is passed as
a parameter, a pointer to the sc_inter struct in the per-instance
device structure.

The sc_inter struct contains pointers to the sam_process_interrupt()  
and sam_enable_interrupt() functions, and also the per-instance
device structure pointer (sam).

Note that pthread_create() is NOT called.  This is an important
detail about the threading model of io-pkt drivers: whenever the
driver wishes to execute, it must do so under control of (ie called
by) the stack.  This quite specifically includes asynchronous
events such as hardware interrupts (as discussed above) and also
periodic timers via the callout_msec() stack function.

This completes the part of the driver initialization that is
called ONCE.  Note that the network hardware will NOT function
at this point - no packets will be received (or transmitted)
until someone executes the ifconfig utility, eg:

	ifconfig sam0 10.42.107.238

Now, the stack will call the ifp->if_init function pointer
for the sample driver, which in the attach function was
set to be sam_init().  This is where the hardware would
be enabled, and if the interrupt was not already attached,
the InterruptAttach_r() function is called.

Remember, the ifp->if_init function can and will be called
over and over again by the stack.  This is very different
from an io-net driver, who's initialization executes only
once.

For example, if someone does this:

	ifconfig sam0 mtu 8100

The ifp->if_init function in the driver will be called again
by the stack.  So, it is up to the driver to initialize the 
hardware as specified.

We can clearly see from this example that it would be an
error of the driver to set the MTU in the attach function.
Generally the first thing the init function does is disable
the hardware, because it's going to initialize it all over
again from scratch.

Summary: the attach function is called once, to allocate
resources and to hook up to the stack.  The init function
is called over and over again, to configure and enable the
hardware.

It is worth mentioning that if you wish to write a driver
for a PCI nic, there is a little dance you need to go 
through, for vendor and device ID tables and scanning.  
Of course, since sample.c was written to be a hardware
independent example, it does not have any of that code
in it.  See native PCI driver sources for code that
you can copy.


	Interrupt Handling & Receive Packet
	-----------------------------------

You will note that there are two different sam_isr() functions
provided.  The easiest way is to simply use the kernel    
InterruptMask() function.  A slightly more complicated way 
to handle the interrupt is to write to a hardware register to 
mask the interrupt, which works better if the interrupt is being 
shared with another device, and might be just a little bit faster.

Either way, after the sam_isr() function executes, the stack 
wakes up, and calls the driver's sam_process_interrupt() function 
via the sam->sc_inter.func function pointer.

The sam_process_interrupt() function will do whatever the
hardware requires - perhaps reading count registers, error
handling, etc.  It might or might not service the transmit
side of the hardware (generally not recommended because of 
negative performance impact of enabling the transmit complete 
interrupt).

It will however service the receive side of the hardware - any 
filled received packet are drained from the hardware, new empty 
packets are passed down to the hardware, and the filled received 
packets are passed up to the protocol stack using the ifp->if_input 
function pointer.


	Transmit Packet
	---------------

As noted above, when the stack wishes to transmit a packet, it
will call the driver's ifp->if_start function pointer, which 
was set to sam_start() in the attach function.

There are a couple of handy macros that you can use here. 
Generally the first thing you do here is see if you have
the hardware resources (descriptors, buffers, whatever)
available to transmit a packet.  If not, there isn't much  
you can do.

What most drivers do is loop in this function, passing packets
down to the hardware until there aren't any more packets to
be transmitted, or the hardware resources aren't available
to permit packet loading for transmission - whichever comes
first.

So you can use the IFQ_POLL() macro to peek at the transmit
queue, and see if there are any more packets from the stack 
ready for transmit - if there are none, you're done.

You use the IFQ_DEQUEUE() macro to unlink the first queued
packet from the transmit queue.  Some drivers just use this
function, and don't bother with the IFQ_POLL() macro.  See
native driver sources.

This really isn't very complicated.  Main gotcha to remember
is that before you return from this function, you must release
the transmit mutex as follows:

	NW_SIGUNLOCK_P(&ifp->if_snd_ex, iopkt_selfp, wtp);

Note that the sample driver, in the start function, calls
m_free(m) to release the transmitted packet.  It does this
to avoid a memory leak, but you probably don't want to do
that if you have a descriptor-based nic.

If you have a nic which unfortunately requires that you
copy the transmit packet into a buffer, then you should
immediately call m_free(m) which tells the stack that the 
buffer is available for re-use, and it will be written to.

However, if you have a descriptor-based nic, you do NOT
copy the transmitted packet - the hardware does the DMA - 
and you only want to release the packet buffer after
the DMA has completed sometime later, to avoid this packet
being over-written.

If you look at most native driver source, any descriptor-
based nic will have a "harvest" or "reap" function which
will check for transmitted descriptors, and will at that
point release the transmit packet buffer.

This requires that you squirrel away a pointer to the
transmit packet (mbuf) somewhere.  Often hardware will
have a few bytes free in the descriptor for this purpose,
or if not, you must maintain a corresponding array of
mbufs which you index into while harvesting descriptors.

Again, see native driver sources for several different
ways of doing this, depending upon hardware features
and driver author preference.


	Periodic Timers
    ---------------

Network drivers frequently need periodic timers to perform
such housekeeping functions as link maintenance and transmit
descriptor harvesting.  An io-pkt driver CANNOT create it's
own thread or asynchronous timer (via OS function) as you
might under io-net.  The way you set up a periodic timer
is as follows in the ifp->if_init function:

	callout_msec(&dev->mii_callout, 2 * 1000, dev_monitor, dev);

This will cause the dev_monitor() function to be called 
BY THE STACK after two seconds has elapsed.

The gotcha is that at the end of the dev_monitor() function,
it must re-arm it's periodic timer call by making the
above call again.  It's a one-shot - not a repetitive timer.

Note that if you call into the transmit code to harvest
descriptors, you should lock the transmit mutex to avoid
corrupting your data and registers, by using the NW_SIGLOCK()
macro.  See native driver source for examples of this.


	Link status events
	---------------------

Userland should be notified about link layer state changes.
This is done via the if_link_state_change() function:
     if_link_state_change(ifp, LINK_STATE_UP);
     if_link_state_change(ifp, LINK_STATE_DOWN);

Nice and easy :)


	Out of Band (control)
	---------------------

Out of band (non-data) control of the driver is accomplished
by the ifp->if_ioctl function pointer which is set to sam_ioctl()
in the attach function.

The ioctl function can be very simple (empty) or quite complex,
depending upon the features supported.  For backwards compatibility
of the nicinfo utility, eg:

	nicinfo sam0

you might wish to add support for the SIOCGDRVCOM DRVCOM_CONFIG/
DRVCOM_STATS commands.  See native driver sources for examples
of this.

If your driver supports hardware checksumming, you probably 
want to support the SIOCSIFCAP command (see examples).

If you want your driver to be display it's media link speed 
and duplex via the ifconfig utility: 

	ifconfig -v

you want to add support for the SIOCGIFMEDIA / SIOCSIFMEDIA
commands, which actually allow the media speed and duplex
to be set via the ifconfig utility.  This is a significant
change from io-net, where the driver media link parameters
had to be set once with the speed and duplex command line
parameters.  Run this:

	ifconfig -m

Native drivers that support the setting of media link
speed and duplex via ifconfig will have a source file
called:

	bsd_media.c

If you compare this file for different native drivers,
you will see that they are very similar - they all
interface to the stack quite similarly, and only minor
hardware-specific differences exist.

Finally, the ioctl interface is how the multicast receive
addresses are enabled.  See native driver sources for
examples on how these addresses are obtained from the
stack - the ETHER_FIRST_MULTI() and ETHER_NEXT_MULTI()
macros are used for this.


	Shutdown
	--------

Driver shutdown is slightly obfuscated.  You may have
noticed a detach function above:

	CFATTACH_DECL(sam,
	    sizeof(struct sam_dev),
	    NULL,
	    sam_attach,
	    sam_detach,
	    NULL);

which is kind of a red herring - it actually isn't as
important as it might appear at first.  BSD drivers don't
even have detach functions.

However, it doesn't hurt to have a detach function.  In 
the sample driver, we can see sam_detach() calling sam_stop(), 
which mostly turns off the hardware (necessary for DMA nics 
to avoid corrupting memory).

Note that sam_stop() can also be called directly by
the stack - remember that we set the ifp->if_stop
function pointer to sam_stop() in the attach function.

However, the most important shutdown code is in the
attach function:

	sam->sc_sdhook = shutdownhook_establish(sam_shutdown, sam);

This is what will get called when io-pkt is slayed, for
example, and is how the orderly shutdown of the hardware
is accomplished.

If you look at the sam_shutdown() function, you will find
that it simply calls sam_stop() to shut down the hardware.

Driver shutdown mostly consists of a bunch of different
ways to call the stop function to shut down the hardware.


	What Now?
	---------

Ok, you've read all the above, and it's time to get your
hands dirty.  If you want to write a native io-pkt network
driver, generally what you want to do is sift through the
existing driver source, and try to find one that has the
hardware which most resembles the hardware you wish to
write a driver for.  It will have similar data structures
to what you want, and have similar function layouts.

Good luck!

--	
aboyd