* symbolic links

symlinks are indirect pointers to other files/dirs/etc. (even other
symlinks)

Use symlink(2) and ln(1) to create symlinks

ls -l will show a 'l' on the file's mode (leftmost char)

# create symlink from bar -> foo (3 chars)
$ ln -s foo bar

# show status of file foo
$ stat foo

# show status of symlink bar
$ stat bar

Symlinks are sometimes called "delayed name resolution" files, b/c at
creation time, what you ask the symlink to point to is NOT checked or
verified at all.  What a symlink points to may not exist now, or ever, could
exist now but be broken later.  The latter is called a "broken" or
"dangling" symlinks.

Symlinks can point to directories, other files, other symlinks, anything.

Symlinks can point to any path, including path components, '.'. '..', etc.

Symlinks could cause a cycle or loop.  The OS has to ensure that no such
loops exist, but it can't -- so it tries to detect loops.

Sidebar:

- OSs are complex and hard to dev/debug
- OS designers like to use simple data structures and algs
- fancier ones can be harder to debug, have larger mem footprint
- data structures: lists (un/sorted), tables, very free tree structures
	red-black trees, B-trees
- algorithms: simple traversal, some log(n) binary-like algs


Detecting loops:
1. When the OS starts resolving a single pathname (e.g., upon "open")
2. the OS starts a counter: how many times a symlink was traversed.
3. each time the lookup routine crosses into another symlink, counter++
4. if counter > MAX number: stop with ELOOP errno
5. usually MAX is ~10-20, some OSs would let you configure it (with limits)

So above will catch even small cycles (a->b, b->a) but will also prevent
long non-cyclic sequences of symlinks.

Symlinks have their own inode object + inode# on disk

# shows 2 diff inodes
$ stat foo; stat bar

Symlink can be thought of as a "regular" file whose content may (eventually)
be translated as if it were a pathname.

To create a symlink: use symlink(2), and CLI ln(1)
To read the contents of a symlink (what it points to): use readlink(2)
- commands like ls will use readlink so they can show you what the symlink
  points to.

Regular files are read using read(2); symlinks are read using readlink(2).

The stat(2) syscall will follow symlinks.  If you want to get the stat data
on the actual symlink, use the lstat(2) syscall.

# CLI: by default uses lstat
$ stat bar
# CLI: tell it to follow links
$ stat -L bar

* file/symlink content

Where is the content of the symlink?

1. regular file has an inode (m-d)
2. reg file has pointers to blocks/sectors of data where the actual file
   data resides on the disk.

Older file systems like MS-DOS used a long linked list of LBAs, reserving
some bytes at end of each LBA, to point to the next, etc.  Use a special LBA
number (eg 0 or all 1s) to indicated EOF.  This was fragile: any problem in
the disk could "cut off" a file.

This was inefficient: imagine having to seek to a middle of the file, or
appending to a file: such ops had O(n) complexity (n=#blocks in file).

Modern OSs (BSD FFS -- Fast File System) use direct blocks, 1st indirect
blocks, and 2nd indirect blocks, to point to LBAs that themselves are
pointers to data or other LBA pointers.  This is much more efficient to
seek and append: O(log n).  It also allows files to grow a lot.

"you can solve any problem in computer science by adding another level of indirection"

Note that every inode has some room for LBA pointers.  Typical inode
structures are 32B, 64B, 128B, even 256B (depending on the OS).  More common
these days is 128B.  Not all bytes are used, some are just reserved for
future expansion.

What happens if I have a really small file of just a few bytes?
1. I can alloc a sector (512B), point to it from inode, and then put the
   bytes in that sector.
2. but what if that file never grows beyond just a few bytes?
3. Optimization: store small files directly in the inode
	more efficient, don't need to alloc an LBA
	faster: file content is right there inside the inode
- yes, I can do this for symlinks as well.
4. This is called a "short file" or "short symlinks"
5. if file grows, you'll need to alloc actual LBAs, and move the in-inode
   data over.
6. How would you tell a short inode from one that has valid LBA pointers in
   the same location in the inode?
- reserve 1 bit for a flag to distinguish "embedded" content vs. not
- can assume that if file size is < X, then it's embedded, else stored in
  external LBAs

* hard links

A "hard" link is another name for the same inode.  Note that a symlink has a
different (new) inode.

# create hard link
$ ln foo xyz # no -s flag
or use link(2) syscall

A hard link is a directory entry with a diff name, that points to an
existing inode.

since we now have 2+ names pointing to the same inode, we start to track the
number of pointers to that inode: this is called a "reference count" and
shows up in the num_links field of the inode.

e.g., foo and xyz will both point to the same inode, now with num_links==2.

The hardlinks point to same inode even if they're moved or created anywhere
in the file system.

The OS tracks the num_links b/c you will free up the inode + data blocks
only after the LAST ref to the file is deleted.  That's why in UNIX, the
syscall to "delete" a file is called unlink -- you're just removing one
name from the dir not immediately deleting file content.