24 Sep 07 Writing Native Network Drivers For io-pkt ----------------------------------------- This guide is designed to help you understand and write "native" network drivers for io-pkt. Before we get too deep, following are some essential definitions that you need to understand: io-pkt: aka io-net3. Successor to io-net (aka io-net2). Based much more closely on BSD4 than io-net which should make porting protocols and drivers much easier. Mbufs not npkts are the underlying packet buffer. Threading model is entirely different, which hopefully should show some performance increases during rx. native driver: A network driver which is specifically written for io-pkt and takes full advantage of the performance increase with respect to avoiding a thread switch during rx, and permitting the stack to be multi-threaded. shim driver: An unmodified io-net network driver binary which operates under io-pkt with the help of the devnp-shim.so "shim" driver layer. The advantage of this is that any pre-existing io-net driver will work under io-pkt. The disadvantage is that maximum throughput performance will not be obtained (compared to a native driver) because of the thread switch during packet rx performed by the shim. BSD driver: A (nearly) unmodified BSD network driver which operates under io-pkt. Examples of these would be devnp-fxp.so (BSD speedo driver) and devnp-bge.so (BSD tigon3 BCM 57xx driver). See the "Porting BSD Driver" document for more information. What you need to know is that by default, BSD drivers operate, with the aid of the source library emulation library (see io-pkt/sys/dev/lib/*) in single- threaded stack mode. This is because the interrupt and threading model of io-pkt differs considerably from BSD. So, unless the BSD driver source has been carefully reviewed and modified as required to deal with all possible multi-threading issues, whenever there is a BSD driver running, the entire stack is single-threaded. 100,000 foot overview --------------------- Any network driver can be viewed as the "glue" between the underlying network hardware, and the software infrastructure of the operating system protocol stack above it. So, the "bottom half" of the driver is coded specifically for the particular hardare it supports. And the "top half" of the driver is coded specifically for the software infrastructure. This document deals specifically with the "top half" of the io-pkt "native" driver, which deals with the io-pkt software infrastructure. What Does The Driver API To io-pkt Look Like? --------------------------------------------- If you look at an existing io-pkt driver, the problem is that it's going to be cluttered up with all sorts of hardware-specific junk (eg "bottom half" of driver) which is going to distract you from understanding the API to io-pkt. With this in mind, Seanb wrote the completely hardware-independent "sample" driver, which can be found in the source tree at: io-pkt/sys/dev_qnx/sample/sam.c Any driver can be considered to have the following functional areas: Initialization Interrupt Handling Receive packet Transmit packet Periodic Timers Out of Band (control) Shutdown Bring up the source to sam.c (above) in another window, and let's take a look at each functional area. Initialization -------------- Initialization is probably the trickiest part of an io-pkt driver, because unlike an io-net driver, part of the initialization code will be called over and over again by the stack. So, it must be coded accordingly. It is very easy to have a driver bug where it works at first, but stops working after the stack re-initializes it. But I'm getting ahead of myself. Initialization begins with this: struct nw_dll_syms sam_syms[] = { {"iopkt_drvr_entry", &IOPKT_DRVR_ENTRY_SYM(sam)}, {NULL, NULL} }; This tells the stack to execute the sam_entry() function, which in turn calls the dev_attach() function for every instance of the hardware, of which there may be none, one or several. The dev_attach() function, through pre-processor trickery, gets a pointer to the following, via the "&sam_ca" parameter: CFATTACH_DECL(sam, sizeof(struct sam_dev), NULL, sam_attach, sam_detach, NULL); So for each instance of the hardware, the sam_attach() function will be called ONCE and only ONCE. The sam_attach() function basically does two things: allocate resources (eg required for the hardware) and hook itself up to the stack. Looking at sam_attach() we can see it hooking itself up to the stack, in two main ways. One is by setting the callout functions in it's ifp struct. For example, when the stack wants to transmit a packet, it calls the ifp->if_start function pointer, which has NOTHING to do with initialization, btw. In sam_attach() we see that the ifp->if_start function pointer is set to the address of the sam_start() packet transmit function. Second is by setting up for the hardware interrupt by calling the interrupt_entry_init() stack function, which is passed as a parameter, a pointer to the sc_inter struct in the per-instance device structure. The sc_inter struct contains pointers to the sam_process_interrupt() and sam_enable_interrupt() functions, and also the per-instance device structure pointer (sam). Note that pthread_create() is NOT called. This is an important detail about the threading model of io-pkt drivers: whenever the driver wishes to execute, it must do so under control of (ie called by) the stack. This quite specifically includes asynchronous events such as hardware interrupts (as discussed above) and also periodic timers via the callout_msec() stack function. This completes the part of the driver initialization that is called ONCE. Note that the network hardware will NOT function at this point - no packets will be received (or transmitted) until someone executes the ifconfig utility, eg: ifconfig sam0 10.42.107.238 Now, the stack will call the ifp->if_init function pointer for the sample driver, which in the attach function was set to be sam_init(). This is where the hardware would be enabled, and if the interrupt was not already attached, the InterruptAttach_r() function is called. Remember, the ifp->if_init function can and will be called over and over again by the stack. This is very different from an io-net driver, who's initialization executes only once. For example, if someone does this: ifconfig sam0 mtu 8100 The ifp->if_init function in the driver will be called again by the stack. So, it is up to the driver to initialize the hardware as specified. We can clearly see from this example that it would be an error of the driver to set the MTU in the attach function. Generally the first thing the init function does is disable the hardware, because it's going to initialize it all over again from scratch. Summary: the attach function is called once, to allocate resources and to hook up to the stack. The init function is called over and over again, to configure and enable the hardware. It is worth mentioning that if you wish to write a driver for a PCI nic, there is a little dance you need to go through, for vendor and device ID tables and scanning. Of course, since sample.c was written to be a hardware independent example, it does not have any of that code in it. See native PCI driver sources for code that you can copy. Interrupt Handling & Receive Packet ----------------------------------- You will note that there are two different sam_isr() functions provided. The easiest way is to simply use the kernel InterruptMask() function. A slightly more complicated way to handle the interrupt is to write to a hardware register to mask the interrupt, which works better if the interrupt is being shared with another device, and might be just a little bit faster. Either way, after the sam_isr() function executes, the stack wakes up, and calls the driver's sam_process_interrupt() function via the sam->sc_inter.func function pointer. The sam_process_interrupt() function will do whatever the hardware requires - perhaps reading count registers, error handling, etc. It might or might not service the transmit side of the hardware (generally not recommended because of negative performance impact of enabling the transmit complete interrupt). It will however service the receive side of the hardware - any filled received packet are drained from the hardware, new empty packets are passed down to the hardware, and the filled received packets are passed up to the protocol stack using the ifp->if_input function pointer. Transmit Packet --------------- As noted above, when the stack wishes to transmit a packet, it will call the driver's ifp->if_start function pointer, which was set to sam_start() in the attach function. There are a couple of handy macros that you can use here. Generally the first thing you do here is see if you have the hardware resources (descriptors, buffers, whatever) available to transmit a packet. If not, there isn't much you can do. What most drivers do is loop in this function, passing packets down to the hardware until there aren't any more packets to be transmitted, or the hardware resources aren't available to permit packet loading for transmission - whichever comes first. So you can use the IFQ_POLL() macro to peek at the transmit queue, and see if there are any more packets from the stack ready for transmit - if there are none, you're done. You use the IFQ_DEQUEUE() macro to unlink the first queued packet from the transmit queue. Some drivers just use this function, and don't bother with the IFQ_POLL() macro. See native driver sources. This really isn't very complicated. Main gotcha to remember is that before you return from this function, you must release the transmit mutex as follows: NW_SIGUNLOCK_P(&ifp->if_snd_ex, iopkt_selfp, wtp); Note that the sample driver, in the start function, calls m_free(m) to release the transmitted packet. It does this to avoid a memory leak, but you probably don't want to do that if you have a descriptor-based nic. If you have a nic which unfortunately requires that you copy the transmit packet into a buffer, then you should immediately call m_free(m) which tells the stack that the buffer is available for re-use, and it will be written to. However, if you have a descriptor-based nic, you do NOT copy the transmitted packet - the hardware does the DMA - and you only want to release the packet buffer after the DMA has completed sometime later, to avoid this packet being over-written. If you look at most native driver source, any descriptor- based nic will have a "harvest" or "reap" function which will check for transmitted descriptors, and will at that point release the transmit packet buffer. This requires that you squirrel away a pointer to the transmit packet (mbuf) somewhere. Often hardware will have a few bytes free in the descriptor for this purpose, or if not, you must maintain a corresponding array of mbufs which you index into while harvesting descriptors. Again, see native driver sources for several different ways of doing this, depending upon hardware features and driver author preference. Periodic Timers --------------- Network drivers frequently need periodic timers to perform such housekeeping functions as link maintenance and transmit descriptor harvesting. An io-pkt driver CANNOT create it's own thread or asynchronous timer (via OS function) as you might under io-net. The way you set up a periodic timer is as follows in the ifp->if_init function: callout_msec(&dev->mii_callout, 2 * 1000, dev_monitor, dev); This will cause the dev_monitor() function to be called BY THE STACK after two seconds has elapsed. The gotcha is that at the end of the dev_monitor() function, it must re-arm it's periodic timer call by making the above call again. It's a one-shot - not a repetitive timer. Note that if you call into the transmit code to harvest descriptors, you should lock the transmit mutex to avoid corrupting your data and registers, by using the NW_SIGLOCK() macro. See native driver source for examples of this. Link status events --------------------- Userland should be notified about link layer state changes. This is done via the if_link_state_change() function: if_link_state_change(ifp, LINK_STATE_UP); if_link_state_change(ifp, LINK_STATE_DOWN); Nice and easy :) Out of Band (control) --------------------- Out of band (non-data) control of the driver is accomplished by the ifp->if_ioctl function pointer which is set to sam_ioctl() in the attach function. The ioctl function can be very simple (empty) or quite complex, depending upon the features supported. For backwards compatibility of the nicinfo utility, eg: nicinfo sam0 you might wish to add support for the SIOCGDRVCOM DRVCOM_CONFIG/ DRVCOM_STATS commands. See native driver sources for examples of this. If your driver supports hardware checksumming, you probably want to support the SIOCSIFCAP command (see examples). If you want your driver to be display it's media link speed and duplex via the ifconfig utility: ifconfig -v you want to add support for the SIOCGIFMEDIA / SIOCSIFMEDIA commands, which actually allow the media speed and duplex to be set via the ifconfig utility. This is a significant change from io-net, where the driver media link parameters had to be set once with the speed and duplex command line parameters. Run this: ifconfig -m Native drivers that support the setting of media link speed and duplex via ifconfig will have a source file called: bsd_media.c If you compare this file for different native drivers, you will see that they are very similar - they all interface to the stack quite similarly, and only minor hardware-specific differences exist. Finally, the ioctl interface is how the multicast receive addresses are enabled. See native driver sources for examples on how these addresses are obtained from the stack - the ETHER_FIRST_MULTI() and ETHER_NEXT_MULTI() macros are used for this. Shutdown -------- Driver shutdown is slightly obfuscated. You may have noticed a detach function above: CFATTACH_DECL(sam, sizeof(struct sam_dev), NULL, sam_attach, sam_detach, NULL); which is kind of a red herring - it actually isn't as important as it might appear at first. BSD drivers don't even have detach functions. However, it doesn't hurt to have a detach function. In the sample driver, we can see sam_detach() calling sam_stop(), which mostly turns off the hardware (necessary for DMA nics to avoid corrupting memory). Note that sam_stop() can also be called directly by the stack - remember that we set the ifp->if_stop function pointer to sam_stop() in the attach function. However, the most important shutdown code is in the attach function: sam->sc_sdhook = shutdownhook_establish(sam_shutdown, sam); This is what will get called when io-pkt is slayed, for example, and is how the orderly shutdown of the hardware is accomplished. If you look at the sam_shutdown() function, you will find that it simply calls sam_stop() to shut down the hardware. Driver shutdown mostly consists of a bunch of different ways to call the stop function to shut down the hardware. What Now? --------- Ok, you've read all the above, and it's time to get your hands dirty. If you want to write a native io-pkt network driver, generally what you want to do is sift through the existing driver source, and try to find one that has the hardware which most resembles the hardware you wish to write a driver for. It will have similar data structures to what you want, and have similar function layouts. Good luck! -- aboyd