* freeing memory (part 2) modern Linux Backing-Device Information (BDI) system BDI system adjust rates of writeback based on measured speed of backing devices (e.g., SSD can be much faster than traditional hard disk). Measures rate of writing dirty items to each device, and adjusts rates accordingly: dynamic load balancing. Even measuring collection of devices (e.g., RAID) and virtual devices, as well as file systems -- to find speeds of entire paths in the I/O stack. Each layer can now set its own thresholds for a/sync writeback, and how many units of dirty data should be committed in each "round". BDI has both a local and global view: who are the heavy writers (and how much), what's the current capabilities/speeds of different devices/layers: performs global load balancing to ensure a fair distribution of of writing activity. Using Control Groups (CGROUPS), a process can set the min/max rates of I/O allowed for that process. CGROUPS can control process limits for cpu, mem, and more. * Networking Networking more complex and w/ more layers than, say, file systems. Layering makes for good software engineering design, easy to maintain and develop more components at a given layer. Just need to know the APIs and calling conventions. Layering has deficiencies: 1. more layers calling others (functions calling functions). 2. for perfect layering isolation, each layer has to have its own data objects, meaning each layer has to copy and pass data. Solution: share a "repository" of data objects between layers. Pass a small reference or handle to the object. Each layer can then access the actual data object from the shared location as needed. E.g., in storage stack, all layers can access a shared "page/buffer cache" (struct page's). However, there must be some coordination to access objects in the shared repository, using locking and other API conventions. * Networking layers dev drivers: know how to talk to Network Interface Cards (NICs), how to send and/or receive packets on the network (e.g., ethernet). queue discipline: traffic shaping, quality of service (QoS): guaranteed min/max rates and/or latencies for certain users. protocol drivers: manage various protocols such as TCP/IP, ethernet, netbios, appletalk, ATM, and MANY more. Correspond to multiple layers in the OSI model (detailed TBD). socket layer: an abstraction of a network communication point, regardless of protocol (similar to struct file/dentry/inode) VFS can talk to socket layer b/c a file descriptor can point to an open socket. Can use read(2), write(2), and certain ioctl(2)s. Socket API is logically at same lever as VFS and can talk to socket API, for system calls that don't go via the VFS: socket(2), bind, accept, setsockopt, listen, select, poll, send, recv, etc. SBbuff services: a shared repository of Simple Kernel Buffer (struct skbuff or struct skb) services, for all network layers to use. Has space for headroom and tailroom to add/remove headers as the sbk goes up/down the layers, w/o costly reallocation and copying of bytes. SKB services is yet another custom memory allocator (for networking). SKB header files define APIs to: 1. allocate/free an skb 2. un/lock an skb 3. add/sub a refcount on an skb 4. add/remove headers/trailers from an skb 5. split an SKB and merge SKBs (for merging and/or splitting of packets) * Device drivers (part 1: receiving packets) So far we considered two layers: user and kernel. 1. user processes 2. kernel (OS) 3. hardware (NIC) (4. network wiring) What is a NIC: - hardware unit - has some RAM - has a small processor - has a program running (in "firmware") - Input/output channels: (a) an ethernet wire (or wifi antenna) (b) PCI bus (can be any bus like USB, PCMCIA, etc.) NIC: receiving a packet, looking for packets that appear match its MAC address (e.g. hardware/ethernet addr). Note: NICs have at least one fixed MAC address (e.g., ethernet is 48 bits), but can also assign temporary or virtual MAC addresses (useful for load balancing and failover). NIC will sample the wire and "copy" the bits into its own RAM. But what if there's not enough RAM on the NIC to receive the packet?! 1. option 1: throw away something the NIC has, to make room for new packet. Not done 2. "drop" the packet, meaning do not accept to copy the bits into your own memory. Recall networking is "best effort". For UDP, no guarantees that you'll get an ACK, so no harm in dropping a UDP packet. For TCP, if you drop a packet, the sender will eventually realize it didn't get an ACK for the packet, and will retransmit it. But sender will also back-off (exponentially for TCP/IP) before next retransmission: this is "throttling the heavy writers". This indirectly allows busy NICs with a full memory to drain their own queues (e.g., send pending packets). If there's enough RAM inside the NIC, it'll receive the packet and store it inside the NIC's RAM: next step is ... (give packet to OS).