* struct dentry (include/linux/dcache.h) struct hlist_bl_node d_hash; /* lookup hash list */ - to lookup a dentry by : use parent b/c multiple files of the same name can exist in different directories. struct list_head d_lru; /* LRU list */ - as with inodes: an LRU list so we can quickly "purge" older dentries to make room struct list_head d_child; /* child of parent list */ - all sibling dentries (that are cached) struct list_head d_subdirs; /* our children */ - if this dentry is a directory, then all child directory dentries that are cached. d_child, d_subdirs, and d_parent are useful for recursive user-level programs (find, rm -r, chmod -R, ls -R, etc). struct qstr d_name; unsigned char d_iname[DNAME_INLINE_LEN]; /* small names */ - most file names are small, so can embed them in d_iname: better CPU cache locality. - longer file names have to use qstr d_name.name (slower mem/cpu access, crossing a pointer). - VFS assigns "dentry->d_name.name = &dentry->d_iname" * dentry_operations int (*d_revalidate)(struct dentry *, unsigned int); - VFS calls fs ->d_revalidate after finding a cached entry in dcache, asks f/s to certify if dentry is still valid or no longer valid. - ->d_revalidate returns: 0 for invalid, 1 for valid, and -error for any error. If VFS received back "invalid dentry": VFS will discard cached entry (e.g., calling ->d_release and ->d_delete), and then issue a new inode->lookup. - optional method: only some f/s need it, mainly network/distribute f/s. - most net/dist f/s won't send a net msg for each d_revalidate, to reduce network traffic: instead, they'd send a message every few seconds or by some cache retention policy. int (*d_weak_revalidate)(struct dentry *, unsigned int); int (*d_hash)(const struct dentry *, struct qstr *); - custom hashing of into dcache, to avoid HT collisions int (*d_compare)(const struct dentry *, unsigned int, const char *, const struct qstr *); - custom name comparison fxn for f/s (e.g., for case-insensitive file names) int (*d_delete)(const struct dentry *); - called right before dentry is about to be deallocated: e.g., to kfree dentry void* private ptr, in case you put something in. int (*d_init)(struct dentry *); void (*d_release)(struct dentry *); - called when dentry refcount reaches 0 void (*d_prune)(struct dentry *); void (*d_iput)(struct dentry *, struct inode *); - called each time the inode of a dentry has its refcount decremented by 1. struct vfsmount *(*d_automount)(struct path *); int (*d_manage)(const struct path *, bool); struct dentry *(*d_real)(struct dentry *, const struct inode *); * struct super_block many of the same principles as with other data structures. struct dentry *s_root; - the "/" root dentry where lookups begin in THIS file system. inode->rename op takes (I1, D1, I2, D2, flags) in vfs code: 1. lock I1 2. lock I2 3. call ->rename 4. unlock I2 5. unlock I1 Problem: if I1==I2, self deadlock - need to distinguish b/t cases where I1==I2 or I1!=I2. If I1!=I2, can deadlock in two threads, one grabbed lock on I1, and waits for I2 lock; another locked I2, and waits on I1. - need to avoid deadlock using a "deadlock avoidance" technique. - order all locked object by some resource, e.g., lock the inode with LOWER pointer address number. Above works well for renaming FILES. But renaming dirs is more complex: can cause large change to the namespace tree in one rename op. Locking whole name space too expensive (prone to DoS attacks). Locking just the parts being moved, or common ancestors of src+dst dirs, also expensive. Linux implements a much simpler policy, only when renaming directories ACROSS two different directories. Policy is: serialize all such directory renames, so only one can take place at a time, in a given f/s. Such renames must grab the superblock mutex called s_vfs_rename_mutex. Misc questions: 1. unlink("/a/b/c/d.txt"): user has perm to dir c, but not to dir b. If no read access on dirs a or b, won't be able to get into dir 'c' to unlink file. This is POSIX standard. 2. stat("../foo/abc.txt"): how relative pathnames work? or $ ls -l ./a.out Relative pathnames always start their lookup from the "current working directory" (CWD). Change it with cd(1) or chdir(2). Find cwd inside struct task *current. Note task->cwd will have refcount increased. 3. Extended attributes: these are KV pairs attached at the f/s level to the inode. Different from apps that store m-d inside the files: databases store tables/cells and indices; mp3 files contain extra m-d inside the file (performer, album, year, etc.). These are not extended attributes at the f/s level. Unix model: all files are just random byte sequences (OS doesn't look or care to interpret the bytes).