* RCU: Read-Copy-Update lock

1. Grab a spinlock
2. Read a shared d-s
3. Make a copy of the shared d-s
4. release spinlock
5. work on your private copy of the shared d-s (no time limit, NO LOCK)
6. Update the original d-s with your changes, under a spinlock.

Ex.  Suppose the d-s is a linked list of names/strings.

List L = {A, C, D}

1-4:
lock()
copy L to L' (be sure not to block on kmalloc)
unlock()

5: manipulate L'... after a time, L' = {A, C, B, D}

6: update

lock()
compare L and L', and find out that you need to add element 'B' into L.
optimization: swap L and L' pointers, and free older list.
unlock()

What if L was changed by someone else, and now it looks like: {A,E,D} (C was
removed, E was added)?  Now if my list is {A,C,B,D}, I have to do a 3-way
merge.   While there's no time limit on how long to hold copy of obj, the
longer you wait, the longer the chances are that main d-s has changed, and
you'll have to work hard to merge your changes, or even have to give up.
Incentive in RCU is: don't sit too long on a copy of your d-s.

Note: RCUs are most suitable for d-s where merging changes can be done
quickly.

To track if orig L has changed, use a monotonic version number.  Every time
one exits the update phase, increment version++.

When you copy L to L', also copy V(ersion) into V'.  Then compare your V'
against the current value of V (in update phase).

If new V == V': quick swap of d-s, fast.  No changes were made to orig L.

If new V == V' + 1: you have to merge, but hopefully won't be too hard.

If new V == V' + K (K>>1): you have to merge, but could be very hard.

Note: users who just need read access to share d-s, can do COPY phase, and
skip UPDATE phase altogether.  Also useful for those who want to make
temporary changes to the d-s L', and don't care to merge the changes back.


* Memory management

kmalloc()/kfree(): general purpose for any size item, returns contiguous
memory.  but if you try to allocate a large len, may get NULL.

vmalloc/vfree: also contiguous mem allocation, can be large, but could cause
I/O (swap) while accessing any byte vmalloc'd.  Caveat: access to such mem
can be slow by I/O and cannot be used under a spinlock, or bottom half of
drivers (where blocking is not allowed).

kmalloc/vmalloc suffer from eventual fragmentation.  Don't want to waste any
bytes of RAM.

In kernel, native mem unit is a page size, often 4KB.  Kmalloc asks another
mem subsystem (SLAB allocator) for one or more physical pages,  then manages
the to give small chunks as requested by called of kmalloc.  Same thing
happens in userland: malloc(3) calls OS to give it N more pages of virtual
memory, using system calls s/brk(2).

In Linux, several page-based allocators exist(ed): SLAB, SLOB, SLUB, ...

Page allocator variants:

1. give me one page

2. give me "order N" pages: you get 2^N pages back.  If N=3, you get 2^3=8
pages.  You get back 8 pages in an array of "struct page **".  User has to
break its data across multiple 4KB chunks and manage where each piece is.

"buddy allocator": in regular allocator, waste/fragmentation could be nearly
100% in worst case.  Buddy allocator breaks memory into units of 1/2, 1/4,
1/18, etc.  Each time a user asks to allocate X bytes, round it up to next
power of 2, then search for that unit: waste is at most 50%.  Cost to look
for available unit is O(log n) (not O(N) with general purpose malloc).

Because of this buddy system, easier to allocate in "order" of powers of 2
pages in kernel.

Next time: custom allocators.