[Unionfs] Anybody looking at NFS exporting a unionfs ?

Jesse I Pollard pollard at ccs.nrl.navy.mil
Thu Sep 6 10:06:32 EDT 2007


Josef Sipek wrote:
> On Wed, Sep 05, 2007 at 01:03:43PM -0400, Jesse I Pollard wrote:
>   
>> David P. Quigley wrote:
>>     
>>> It is worth nothing that this might not be a trivial implementation. In
>>> the past to ensure that this functionality was correct we needed some
>>> sort of persistent inode store. This may not be true anymore but if it
>>> is then it isn't as simple as implementing 3 functions.
>>>   
>>>       
>> That was why I was trying the 2.6.20-rc6-odf1 release. It uses an auxilary 
>> filesystem to generate and track
>> only the inode numbers, but the export capability had already disappeared. 
>> There ARE comments imbeded in the unionfs 2.1.2
>> that refer to the odf about NFS support.
>>
>> I would definitely be interested.
>>     
>
> ODF was an experimental branch to see if the concept makes sense. ODF will
> come back in the next few months :)
>
>   
>> One alternative to the odf would be to add the largest possible inode
>> number from the top level FS + 1 to an inode number from the next level
>> down. This would introduce an "inode offset base" value to the table of
>> branches - used to add to/subtract from the inode number , but would
>> guarantee unique inodes as far as unionfs was concerned. Might add other
>> numbers (maximum number of inodes in the branch fs) to make it easier to
>> recalculate offsets during remount.  It also wouldn't easily work for
>> writable branches that can dynamically add inode space.
>>     
>
> I'm afraid that will not work. In the kernel, all the inode numbers are
> 64-bits long, and there's no way for anyone to know the range of valid inode
> numbers from a fs (except the fs itself). Given this fact, each lower fs can
> give us an inode number in the range { 0 .. (2^64)-1 }. To uniquely identify
> a file, unionfs needs a <branch index, lower inum> tuple (~70 bits).
> Ideally, we would be able to take this info and shove it into our own inode
> number, but we'd have to somehow map ~70 bits into 64 bits of our own inum.
>
>   
True, the numerical limit is 64 bits. I was thinking of the actual usage 
by the fs. I believe this
information is available (like a kernel mode statfs, field f_files), 
giving the total file nodes in
the fs. This would be a physical limitation (except for those that can 
dynamically expand the
inode space), and could be mapped. A method of handling the dynamic 
situation would be to
increase the total by 10 or 20% (mount option?) this would then be added 
to the base of the
higher level branch, and hence become the base for the next lower level.

Now mapping from an inode number to a branch/inode would require a walk 
of the table since
each offset could not be determined ahead of time (like a modulo 
operation). For any reasonable
union (less than 10 or 20 branches) this shouldn't be unreasonable.

This does become wasteful of inode space when the two (or more) 
directories being mounted in
a union are on the same filesystem, but this would also be less of a 
problem because the number
of branches would also tend to be less. Here, I'm remembering the 
attempt to union mount a
standardized root system and a host specific directory to provide a 
unique NFS root for a specific
client.

Also - if an overflow of the 64 bit inode space CAN/DOES occur, it would 
also be reasonable
to disallow NFS exports for that specific mount. If it occured during a 
remount to add a branch,
it would be reasonable to disallow the remount. Of course, a suitable 
error/warning message
describing the problem would have to be provided in both situations.

The only advantage this technique has over the ODF is possibly speed, 
since the ODF method
requires a cache/disk lookup of the inode to identify the branch/inode 
pair. The ODF also
requires a file system that has the same consideration of inode space as 
the above mapping.
ODF is a more compact mapping though.

I'm not currently aware of any single filesystem that has even 4 billion 
inodes anyway. The
largest I have encountered was around 20-25 million used (a Solaris 
SAMFS site some years ago);
and I have access to a raid with 91 million allocated (but only 778,000 
used).

> Beware, I've spend long enough pondering about this, that I might be fixated
> on the <branch index, lower inum> concept that I don't see something more
> creative.
>   
My thoughts are more operationally aimed, than theoretical - theory 
would focus more on
guaranteeing inode availability, operationally though, it doesn't seem 
to be necessary given
the 64 bit range available.
> Jeff.
>
>   



More information about the unionfs mailing list