// debugging int test_debug(void) { // in userland, you have symbolic debuggers and tracers, thanks to a // system call called: ptrace(2) // in kernel, no other program can run "under" the OS // Sun Microsystem's SPARC stations had a mini kernel debugger in the // firmware (for Solaris OS). // there's examples of "user mode linux" (UML), a full running kernel as a // user process. Not helpful for debugging "real problems" // kgdb extension for linux. attach using serial port, redirect console to // serial port. But only works for some subsystems of the linux kernel. // linux uses kgdb extensions, connect via serial port to ANOTHER machine // running plain gdb that attaches to the serial port. // problem 1: most difficult bugs are timing related, so anything that // changed the timing of a running kernel isn't as helpful. Kernel // developers assume a higher level of expertise... // problem 2: temporal distance b/t bug's cause and effect // "printf is your friend" // printk in kernel takes a special first arg WITHOUT a comma, that // determines the type/destination of log message. #define UDBG printk(KERN_DEFAULT "DBG:%s:%s:%d\n", __FILE__, __func__, __LINE__) // help trace code path, esp. what's last printf before a crash UDBG; if (somecond) { UDBG; // code 1 } else { UDBG; // code 2 } UDBG; // problem: printf itself can take some cpu cycles, memory, and time // kernel print messages get passed to a slow /dev/console device. // kernel doesn't print messages right away. they get added to a fixed // size "ring buffer". A separate kthread picks msgs from ring buffer and // displays them on console and sends them to user level system logger // daemon (aka "syslogd"), which often writes them to /var/log/messages or // some other log file configured in /etc/syslogd.conf. Note "syslogd" // has various names on different systems. // even with async ring buffer, printf can still affect timing of kernel // operations (harder to debug races). // also, if you printk too many messages, the ring buffer will be // overridden with latest messages and you'd miss some of the earlier ones. // solution: use few print statements, then test the code, then remove // some old messages, and add new ones. // FASTER WAYS to debug: assertions // An assertion is a condition check for something being T or F, and if so // then take some action (e.g., print a msg, dump stack trace, crash/panic // the kernel). // assertions are faster b/c they boil down to a single condition // comparison. // dump "oops" trace, panic kernel, iff cond is true // "panic kernel" means STOPS running current code (e.g., syscall, kthread) BUG_ON(cond); BUG_ON(ptr == NULL); // useful for conditions you think should never happen // unconditional "bug" assertion, also stops processing BUG(); if (c1) { if (c2) { // code BUG(); } } // same as bug, dumps stack, but continues running (unlike BUG_ON) WARN_ON(cond); // same as WARN_ON, but only prints one time in that location (LoC) // "once-counter" is for each instance of the macro in the code of a // running OS, rebooting OS resets counters. WARN_ON_ONCE(cond); // print a stack trace; gets printed as part of any "oops" trace in a // BUG/WARN message. dump_stack(); // helpful to know code path that led to THIS function. // caveat: fxn tries best to display accurate stack, but no easy way to // tell what on a kstack is a function addr ptr vs. variable. You'll see // sometimes functions listed with a '?' in front. Functions marked with // '?' may not be actual functions: inspect actual code to see who calls // whom. // 1. ext4_read(...) // 2. vfs_read(...) // 3. ? do_read(...) // 4. sys_read(...) // sometimes you'll see a stack trace that makes NO SENSE. // this indicates often a stack memory corruption. // what's in an oops trace? // 1. a message like "null ptr dereference at 0x0000000" // 2. cpu register dumps // 3. stack trace // 4. the hex addr and (hopefully also the) name of the function where the // problem occurred. // 5. the hex instruction position inside the function that triggered the // oops, relative to the entire size of that function // e.g., BUG in foo(...) at 0x12A/0x7F9. (roughly at start) // e.g., BUG in foo(...) at 0x71B/0x7F9. (roughly at end) // caveat: compiled code includes optimizations and inline macros, and // CPP macros. // Sometimes you get a "null ptr dereference at 0x00000008" printk("%s", ptr->field); // assume "ptr" is a struct. // meaning ptr was null, and tried to deref a field that was 0x8 bytes into // the struct of ptr. // Sometimes you get a "null ptr dereference at 0xFFFFFFF0" // same as above, but trying to deref a field that's 0xF (16 bytes) BEFORE // the ptr. // TEMPORAL DISTANCE: time passes b/t cause and effect of a bug. // A NULL ptr deref will trigger an immediate oops. // a small mem corruption may take many runs, even days before it // manifests. Also "small" things like leaking a few bytes of mem, // forgetting to close a file here and there, etc. // you may only notice it much later. Have to decide if problem you're // seeing is from the latest code change, or something older (that you may // or may not have already fixed). Often, good to reboot first, to see if // the problem can be reproduced consistently. Note: your latest code may // not be at fault, could be old code, so don't just assume recent code is // bad and revert it unnecessarily. // sometimes you may corrupt your ON DISK kernel/module state. So a good // idea to do a make clean, rebuild kernel from scratch and reinstall. // Your VMs have 2 partitions: // 1. the "/" (root) partition where the system is installed, your /root // home dir, other /home/ dirs, and /usr/src. Check out your git repo // and compile it here (common to use /usr/src). // 2. a /test partition for testing code. Do all testing for hw1/etc in // /test, you can even reformat it (man mkfs). This way any corruptions // to disk state don't prevent actual system from booting. } // issues for hw1 # issues for hw1 1. there's two branches in each git repo: master and wrapfs - master is unchanged kernel 4.20.5 or .6. - wrapfs includes some additional code for later in semester, but also exports vfs_read() and vfs_write() to modules. In linux kernel, a loadable module cannot access any symbol that is 'extern'ed. EXPORT_SYMBOL(foo); // exports a function/symbol 'foo' to any module EXPORT_SYMBOL_GPL(foo); // exports a function/symbol 'foo' to any module that declares itself to be GPL compliant.