[Unionfs] unmounting order during shutdown [was: Re: unionfs 2.2.3 OOPS]

Dave Miller justdave at mozilla.com
Wed Mar 26 03:44:37 EDT 2008


So after applying this patch to 2.3 and deploying it (might as well have 
called it 2.3.1 ;) I got the following about a half hour after booting 
into it:

Mar 26 00:20:03 dm-stage02 kernel: unionfs: new lower inode mtime 
(bindex=0, name=2008-03-25-15-trunk)
Mar 26 00:20:04 dm-stage02 kernel: unionfs: unionfs: new generation 
number 57
Mar 26 00:20:04 dm-stage02 kernel: alidatalidation
Mar 26 00:20:05 dm-stage02 kernel: alidation
Mar 26 00:20:05 dm-stage02 kernel: 
alidaalidaaaliaalialidaalialialidalidaalidatioalidatialidalidation
Mar 26 00:20:05 dm-stage02 kernel: <alidalalidalidationalidation
Mar 26 00:20:05 dm-stage02 kernel: <7alidalidationalidation
Mar 26 00:20:05 dm-stage02 kernel: alalidationalialidation
Mar 26 00:20:05 dm-stage02 kernel: <alidatioalidation
Mar 26 00:20:05 dm-stage02 kernel: alidation
Mar 26 00:20:05 dm-stage02 kernel: <7alidation
Mar 26 00:20:05 dm-stage02 kernel: alalidation
Mar 26 00:20:06 dm-stage02 kernel: <7alidation
Mar 26 00:20:06 dm-stage02 kernel: <alidalialidalidaalidation
Mar 26 00:20:06 dm-stage02 kernel: alidation
Mar 26 00:20:06 dm-stage02 kernel: alidatioalialidatalidation
Mar 26 00:20:06 dm-stage02 kernel: alidation
Mar 26 00:20:06 dm-stage02 kernel: alidation

And those last 16 lines or some variation of them repeated for about 
another 20K lines while file operations that touched the unionfs mount 
would hang, until I power cycled the machine.  vewy stwange....

The kernel it upgraded from was 2.2.4, which is what I rebooted back 
into after that.  Dunno if I should try again... :)

Erez Zadok wrote on 3/25/08 5:50 PM:
> In message <47A00871.4040608 at mozilla.com>, Dave Miller writes:
>> Erez Zadok wrote on 1/29/08 11:32 PM:
>>> I was able to reproduce the bug in question: just umount -f an nfs partition
>>> or umount -l any partition that's used as a lower branch, then try to umount
>>> unionfs's mount; you get the exact oops above.  Turns out that grabbing a
>>> vfsmount ref isn't enough: it prevents a casual umount on a lower branch
>>> from succeeding, returning an EBUSY.  But we also needed to grab an s_active
>>> reference on all lower superblocks, to prevent a forced/detached unmount
>>> from destroying the lower super too early.  With the patch below, the lower
>>> super will be detached from the namespace, but it won't be destroyed until
>>> unionfs is mounted: unionfs_put_super will decrement the (possibly last)
>>> reference on the lower super, which'd then be properly destroyed.
>>>
>>> Try this patch.  I quickly tried it w/ branch management, umount -l, and my
>>> basic regression suite.  It seems to work, but I'd like to hear from both of
>>> you first before considering this bug fixed.
>> Poking around at my logs, I see that the OOPS I've been getting under 
>> heavy usage (that I've been meaning to send you all our config so you 
>> could reproduce it) actually matches the one we get when trying to shut 
>> down with the unionfs still mounted (which is the one you're trying to 
>> fix here).  If this patch fixes this particular OOPS this may well solve 
>> our whole problem.  I've got it compiling now, I'll throw the load test 
>> script at it again and let you know. :)
>>
>> -- 
>> Dave Miller                                   http://www.justdave.net/
>> System Administrator, Mozilla Corporation      http://www.mozilla.com/
>> Project Leader, Bugzilla Bug Tracking System  http://www.bugzilla.org/
> 
> Dave, please apply this very important patch below on top of unionfs-2.3,
> and let me know.  The oopses you've seen and this fix seem to be a good
> match (fingers crossed :-)
> 
> You've seen oopses that look like this
> 
> Mar 10 15:53:02 dm-stage02 kernel: BUG: Dentry
> f652e6e8{i=63,n=archive.mozilla.org} still in use (1) [unmount of nfs 0:15]
> 
> This was 'strange' b/c you weren't really unmounting anything, just doing
> some branch management commands.  The bug in question was an oops from the
> VFS b/c it seems that a superblock's reference count reached zero while it
> still had active dentries (should never happen).
> 
> The fix below seems fitting because:
> 
> - it fixes stuff in unionfs_remount_fs as per your stack trace
> 
> - if you added a branch, we had a bug which incorrectly decremented the
>   refcnt of some branches (very clear from the patch itself).
> 
> - over-decrementing the sb refcnt will result in behavior as you've seen:
>   the sb refcnt will reach zero too early.
> 
> - all of the other recent fixes were more related to races, whereas this bug
>   was a buffer overflow that went beyond the valid array range of
>   UNIONFS_SB(sb)->data[i].sb, and tried to decrement stuff there (i.e., the
>   bug didn't tickle all the time for all users b/c it depended on the memory
>   contents beyond that data[i] array, which varies from system to system).
> 
> Cheers,
> Erez.
> 
> 
> diff --git a/fs/unionfs/super.c b/fs/unionfs/super.c
> index e5cb235..4cddc83 100644
> --- a/fs/unionfs/super.c
> +++ b/fs/unionfs/super.c
> @@ -755,7 +755,7 @@ out_no_change:
>  	/* grab new lower super references; release old ones */
>  	for (i = 0; i < new_branches; i++)
>  		atomic_inc(&new_data[i].sb->s_active);
> -	for (i = 0; i < new_branches; i++)
> +	for (i = 0; i < sbmax(sb); i++)
>  		atomic_dec(&UNIONFS_SB(sb)->data[i].sb->s_active);
>  
>  	/* copy new vectors into their correct place */


-- 
Dave Miller                                   http://www.justdave.net/
System Administrator, Mozilla Corporation      http://www.mozilla.com/
Project Leader, Bugzilla Bug Tracking System  http://www.bugzilla.org/


More information about the unionfs mailing list