* NFS intro, cont.

NFSd on server calls lower f/s (e.g., ext4):
- NFSd treats the disk based f/s like a stackable f/s

NFS clients/servers often in kernel
- but there's a number of user-level implementations (e.g., NFS Ganesha)

NFSv2:
- No "seek" message? b/c it'd be state
- no open/close

NFS_READ can return updated attributes, to let client know that some stat(2)
data of file has changed
- o/w, clients have to set cache expirations for file/dir meta/data, and no
  one value works well: either you send too much network updates, or you read
  stale data.
- this is "piggy-backing" some more info on top of an already needed reply

* How do applications' syscalls translate into client/server protocol msgs

Application	NFS Client	NFS Server
---------------+---------------+-------------------
read(2)	NFS_READ	nfsd_read -> ext4_read
---------------+---------------+-------------------
open		NFS_LOOKUP	nfsd_lookup -> ext4_lookup
		NFS_GETATTR	nfsd_getattr -> ext4_getattr ...
		->permission

read		NFS_READ	nfsd_read -> ext4_read
				(will update inode atime)

close		NFS_SETATTR	nothing...
		(if attrs changed)

* Implications

Client has to perform permission checking.  Anyone can hack a client OS to
bypass permission checking entirely.

When server sees first ->lookup:
- lookup and cache dentry+inode on server
- should server hold a refcnt on D+I?!  If so, when will it be released?
when server sees NFS_READ
- need a struct file object, has to filp_open, so can issue fop->read
- problem: file remains closed, no client msg will cause it to close(2)
- Solution:
1. on NFS_READ: do filp_open, ->read, filp_close: too much burden on
   server for repeated reads
2. keep file structs open for some time T, then close them.  May have to
   reopen, if file is being re-read.

Filehandles (32 bytes long):
- FH is created by server only, sent to clients
- clients can use FH to refer to "same" object, bot not allowed to modify FH
- FH is opaque for NFS clients
FH Construction (typical):
- fsid: small number, starting at 1
- inode number: typically start at 1, small number, can go up to millions on
  large f/s.
- inode generation number (igen): typically 1
- sometimes other stuff is added to FH
Meaning: easy to fake valid fhandles and read data you have no right to.

When server receives a FH,
- it interprets the FH (inum, fsid, igen, etc.)
- if ANY of them don't match, return ESTALE error

Proposed "solution" fill fh w/ random bits:
- harder for bad clients to guess
- but now server has to keep state, and map random numbers to
  <inum,fsid,igen>

* "root squash"
1. in NFSv2, clients send the effective UID+GID as part of the RPC header
- this is called "UNIX permissions model"
- problem: anyone can fake UIDs and GIDs (e.g., sudo, su)
- can fake on client as any user
- worse: can you fake being UID 0 (root)?
2. Solution:
- on server, you configure a file called /etc/exports
format:
/path	IPaddr(s)	flags
/home	130.245.0.0/16	ro,root_squash

flag: root_squash
- if server gets an op w/ UID 0
- change effective uid to -2 (user nobody)
- so root on client has very limited file access
- unless server turns off root_squash, by setting "no_root_squash" option

More modern OSs (Linux too) support "UID mapping" that can "squash" any UID,
not just UID 0.

Application	NFS Client	NFS Server
---------------+---------------+-------------------
unlink		NFS_REMOVE	->unlink
---------------+---------------+-------------------
>>PROBLEM SUPPORTING POSIX (read/write an open-unlinked file)
open		NFS_LOOKUP	->lookup
		NFS_GETATTR	->getattr
		->permission

unlink		NFS_REMOVE	->unlink

read		NFS_READ	-ESTALE (violated POSIX)

close		nothing
---------------+---------------+-------------------
>>Solution
open		NFS_LOOKUP	->lookup
		NFS_GETATTR	->getattr
		->permission

unlink		NFS_RENAME("file", ".nfsXXXXXX")
				->rename("file", ".nfsXXXXXX")

stat("file")	NFS_LOOKUP("file")	-ENOENT

read		NFS_READ	->read() successful

close		NFS_UNLINK(".nfsXXXXXX")
				->unlink(".nfsXXXXXX")

What if client program or OS dies before close(2)
- lots of .nfs* leftovers
- users/admins may delete those files too early (breaks access to open file)
- better: admins setup scripts to delete .nfs* files older than N days

* One more state, at RPC level
1. client issues RPC msg, waits for response N seconds
2. if no response, resend message and wait N seconds
3. after M tries, abort and return error
- every RPC msg has its own unique ID (RPCID): can be a number 1, 2, ...

Scenario:
1. client sends RPC to server
2. server receives RPC message, unpacks it
3. server performs the op on disk f/s: op succeeds
	if op is NFS_GETATTR
	if op is NFS_UNLINK
4. server sends reply to client "NFS_OK"
5. reply never gets to client!
6. after N seconds, client will reissue RPC
7. server gets resent RPC, unpacks it
8. server performs the op on disk f/s: NOW WHAT HAPPENS?
	if op is NFS_GETATTR: server sends NFS_OK
	if op is NFS_UNLINK: server gets ENOENT
9. server replies back to client based on step 8.
	NFS_GETATTR: reply is ok to client
	NFS_UNLINK: reply is ENOENT even though file was successfully
		unlinked the first time.

The problem has to do with two classes of operations:
1. Idempotent: op can be repeated gets same result: NFS_GETATTR/READ
2. Non-Idempotent: op cannot be repeated: NFS_UNLINK/RENAME/MKDIR/etc.

Solution to non-idempotent ops:
- keep state on server, in memory
- every RPC, has an "ID" that a client assigns
- that way client can match replies with requests
- server caches results for every reply of non-idempotent op
- when server sees request to perform an op, for a cached RPC ID that was
  preformed before, it responds with the CACHED result (not performing the
  op on disk again).
- have to keep cache for some time, to ensure that it won't be shorter than
  what the client RPC timeout is.
- but if server crashes, all that RPC state is lost, and clients get errors
  upon server reboot.


Delete fh?