* Networking

Linux Simple Kernel Buffers (SKB): called "mbufs" in BSD-based OSs.

* Device drivers (part 1: receiving packets)

[last discussed what happens if not enough RAM]

If there's enough RAM inside the NIC, it'll receive the packet and store it
inside the NIC's RAM: next step is ... (give packet to OS).

NIC interrupts CPU ("network interrupt").  CPU is interrupted (if it can),
preserves state of running task/process, CPU registers, etc. (in task
struct), and executes a network interrupt handler.

Note: for every interrupt number, there's an interrupt handler function in a
global interrupt handler dispatch table.

When the CPU accepts an interrupt, it DISABLES other interrupts at the same
level.  Therefore interrupt handlers should run fast and never block (can't
really interrupt one handler with another -- causes recursion).

OS interrupt handler:
1. just got invoked for "networking" for a given NIC.
2. handler needs to get NIC data into an skb.
3. handler asks SKB for fast, pre-allocated SKB of a suitable size (faster
   than asking to allocate or calling kmalloc which can block)
4. handler then transfers data from NIC to skb:
(a) setup Direct Memory Access (DMA), tell DMA processor to copy the data
    asynchronously from NIC to kernel mem, also give it a callback fxn to
    execute when copy is done.
(b) issue processor I/O copy instructions to copy from NIC's addr space to
    kernel's addr space.  Slow instrux, copy one word/byte at a time.
5. setup packet data so it can be further processed by an async queue in
   the kernel after it's been fully copied into an skb.  but don't do that
   processing right now!
6. handler is effectively done, return, all interrupts are re-enabled.
7. Signal NIC that you're done (e.g., DMA response or other electronics).

Back inside the NIC, when it got the "done" signal from the OS's network
interrupt handler, it can free up the buffer used for that received packet.

* OS sending a packet via NIC

Bottom most layer of OS, e.g., net driver has a packet to transmit, in an
skb.  Needs to give it to the NIC.

Simple case: NIC has memory free.  Either setup DMA w/ callback fxn, or copy
bytes/words one at a time to the NIC.  When call's done, and NIC has packet,
OS can free up the SKB.

What if NIC is busy or its memory full and it can't receive the packet from
OS?  Using a single bit (wire) called the "tx on/off" (tx: transmit).  If
NIC is busy, it'll set TX to "off" (telling OS "don't transmit to me").  CPU
checks status of "TX" bit: if off, OS will NOT send packet and just wait: in
other words, the OS (a "heavy writer" to the NIC) throttles itself.

When NIC has packet to transmit:
1. sample the wire (or network)
2. make sure it's quiet
3. then start transmitting
4a. if all bits were transmitted correctly on wire, NIC can free up buffer
4b. if someone else transmits at same time, signals on wire can get mixed
    up, corrupted bits.  If that happens, stop transmitting, and back off a
    bit (possible exponential back-off).  Then try again until success.

* What happens inside OS?

After OS received packet into skb from NIC (upon receiving a new packet), it
puts the skb into a queue for further processing.

Linux sets up a system of async interrupts to process many queues of
different types, called "Soft IRQs" (soft interrupts).

Soft IRQ system:

1. defines different types of processing.  If the softirq is NET_RX, it
means there's work to do for network receiving of packets; NET_TX is for
transmitting packets; there are other softirqs and you can even define your
own.  NET_RX, NET_TX, etc. are bits in a global bitmap of the softirq
subsystem.

2. kernel starts N kthreads for processing softirqs, usually N==#cores, e.g.,
ksoftirqd/cpu0, ksoftirqd/cpu1, etc.

If you run ps -ef on a linux system, every "process" in [brackets] is a
kernel thread.

3. scheduler checks global softirq bitmap, and if any of the bits are on, it
wakes up one or more of the ksoftirqd/cpuN kthreads.  Then these kthreds
will get scheduled, eventually...

Ksoftirqd/cpuN kthreads:

when run, they check the global softirq bitmap, then execute specific "soft
interrupt handlers" for each type of softirq: NET_RX will invoke code to
receive packets in some queue, NET_TX will invoke code to send packet in
some queue, etc.

In networking, there's many layers.  Each layer has its own queues with data
in skbs, and handlers for dealing with that date.  a dev. drv. ethernet
queue will process a packet, remove ethernet headers, then add pkg to an IP
queue; when IP queue consumer is running, it'll again check headers, then
move pkt to another queue (UDP, TCP, etc.).  This processing from queue to
queue continues up until we have data to give a waiting user process.  At
final stage, kernel will copy_to_user the packet payload, and change process
state from waiting to ready (scheduler will let it run some time later).

On writing to a socket, kernel copies data from user, then return from the
syscall (unless syscall asked to block).  Data is then put into a
VFS/socket layer queue; then processed and moved down the queues: tcp or
udp, IP, ethernet, and eventually dev. driver.

Note: there's even more layers (firewalling, etc. TBD)

When softirqs wakeup, they have to decide WHICH queues to process first?

Upon NET_TX: process packets from lowest queues first, then go up the chain.
Allows upper queues to drain asynchronously.

Upon NET_RX: process packets from topmost queues first, then down the chain.
Allows bottom queues and NIC to move newly received data up the chain.

Next time: some more d-s and details of network architecture

... and a cautionary tale about locking and networking...