* symbolic links symlinks are indirect pointers to other files/dirs/etc. (even other symlinks) Use symlink(2) and ln(1) to create symlinks ls -l will show a 'l' on the file's mode (leftmost char) # create symlink from bar -> foo (3 chars) $ ln -s foo bar # show status of file foo $ stat foo # show status of symlink bar $ stat bar Symlinks are sometimes called "delayed name resolution" files, b/c at creation time, what you ask the symlink to point to is NOT checked or verified at all. What a symlink points to may not exist now, or ever, could exist now but be broken later. The latter is called a "broken" or "dangling" symlinks. Symlinks can point to directories, other files, other symlinks, anything. Symlinks can point to any path, including path components, '.'. '..', etc. Symlinks could cause a cycle or loop. The OS has to ensure that no such loops exist, but it can't -- so it tries to detect loops. Sidebar: - OSs are complex and hard to dev/debug - OS designers like to use simple data structures and algs - fancier ones can be harder to debug, have larger mem footprint - data structures: lists (un/sorted), tables, very free tree structures red-black trees, B-trees - algorithms: simple traversal, some log(n) binary-like algs Detecting loops: 1. When the OS starts resolving a single pathname (e.g., upon "open") 2. the OS starts a counter: how many times a symlink was traversed. 3. each time the lookup routine crosses into another symlink, counter++ 4. if counter > MAX number: stop with ELOOP errno 5. usually MAX is ~10-20, some OSs would let you configure it (with limits) So above will catch even small cycles (a->b, b->a) but will also prevent long non-cyclic sequences of symlinks. Symlinks have their own inode object + inode# on disk # shows 2 diff inodes $ stat foo; stat bar Symlink can be thought of as a "regular" file whose content may (eventually) be translated as if it were a pathname. To create a symlink: use symlink(2), and CLI ln(1) To read the contents of a symlink (what it points to): use readlink(2) - commands like ls will use readlink so they can show you what the symlink points to. Regular files are read using read(2); symlinks are read using readlink(2). The stat(2) syscall will follow symlinks. If you want to get the stat data on the actual symlink, use the lstat(2) syscall. # CLI: by default uses lstat $ stat bar # CLI: tell it to follow links $ stat -L bar * file/symlink content Where is the content of the symlink? 1. regular file has an inode (m-d) 2. reg file has pointers to blocks/sectors of data where the actual file data resides on the disk. Older file systems like MS-DOS used a long linked list of LBAs, reserving some bytes at end of each LBA, to point to the next, etc. Use a special LBA number (eg 0 or all 1s) to indicated EOF. This was fragile: any problem in the disk could "cut off" a file. This was inefficient: imagine having to seek to a middle of the file, or appending to a file: such ops had O(n) complexity (n=#blocks in file). Modern OSs (BSD FFS -- Fast File System) use direct blocks, 1st indirect blocks, and 2nd indirect blocks, to point to LBAs that themselves are pointers to data or other LBA pointers. This is much more efficient to seek and append: O(log n). It also allows files to grow a lot. "you can solve any problem in computer science by adding another level of indirection" Note that every inode has some room for LBA pointers. Typical inode structures are 32B, 64B, 128B, even 256B (depending on the OS). More common these days is 128B. Not all bytes are used, some are just reserved for future expansion. What happens if I have a really small file of just a few bytes? 1. I can alloc a sector (512B), point to it from inode, and then put the bytes in that sector. 2. but what if that file never grows beyond just a few bytes? 3. Optimization: store small files directly in the inode more efficient, don't need to alloc an LBA faster: file content is right there inside the inode - yes, I can do this for symlinks as well. 4. This is called a "short file" or "short symlinks" 5. if file grows, you'll need to alloc actual LBAs, and move the in-inode data over. 6. How would you tell a short inode from one that has valid LBA pointers in the same location in the inode? - reserve 1 bit for a flag to distinguish "embedded" content vs. not - can assume that if file size is < X, then it's embedded, else stored in external LBAs * hard links A "hard" link is another name for the same inode. Note that a symlink has a different (new) inode. # create hard link $ ln foo xyz # no -s flag or use link(2) syscall A hard link is a directory entry with a diff name, that points to an existing inode. since we now have 2+ names pointing to the same inode, we start to track the number of pointers to that inode: this is called a "reference count" and shows up in the num_links field of the inode. e.g., foo and xyz will both point to the same inode, now with num_links==2. The hardlinks point to same inode even if they're moved or created anywhere in the file system. The OS tracks the num_links b/c you will free up the inode + data blocks only after the LAST ref to the file is deleted. That's why in UNIX, the syscall to "delete" a file is called unlink -- you're just removing one name from the dir not immediately deleting file content.