[Unionfs] unmounting order during shutdown [was: Re: unionfs
2.2.3 OOPS]
Dave Miller
justdave at mozilla.com
Wed Mar 26 03:44:37 EDT 2008
So after applying this patch to 2.3 and deploying it (might as well have
called it 2.3.1 ;) I got the following about a half hour after booting
into it:
Mar 26 00:20:03 dm-stage02 kernel: unionfs: new lower inode mtime
(bindex=0, name=2008-03-25-15-trunk)
Mar 26 00:20:04 dm-stage02 kernel: unionfs: unionfs: new generation
number 57
Mar 26 00:20:04 dm-stage02 kernel: alidatalidation
Mar 26 00:20:05 dm-stage02 kernel: alidation
Mar 26 00:20:05 dm-stage02 kernel:
alidaalidaaaliaalialidaalialialidalidaalidatioalidatialidalidation
Mar 26 00:20:05 dm-stage02 kernel: <alidalalidalidationalidation
Mar 26 00:20:05 dm-stage02 kernel: <7alidalidationalidation
Mar 26 00:20:05 dm-stage02 kernel: alalidationalialidation
Mar 26 00:20:05 dm-stage02 kernel: <alidatioalidation
Mar 26 00:20:05 dm-stage02 kernel: alidation
Mar 26 00:20:05 dm-stage02 kernel: <7alidation
Mar 26 00:20:05 dm-stage02 kernel: alalidation
Mar 26 00:20:06 dm-stage02 kernel: <7alidation
Mar 26 00:20:06 dm-stage02 kernel: <alidalialidalidaalidation
Mar 26 00:20:06 dm-stage02 kernel: alidation
Mar 26 00:20:06 dm-stage02 kernel: alidatioalialidatalidation
Mar 26 00:20:06 dm-stage02 kernel: alidation
Mar 26 00:20:06 dm-stage02 kernel: alidation
And those last 16 lines or some variation of them repeated for about
another 20K lines while file operations that touched the unionfs mount
would hang, until I power cycled the machine. vewy stwange....
The kernel it upgraded from was 2.2.4, which is what I rebooted back
into after that. Dunno if I should try again... :)
Erez Zadok wrote on 3/25/08 5:50 PM:
> In message <47A00871.4040608 at mozilla.com>, Dave Miller writes:
>> Erez Zadok wrote on 1/29/08 11:32 PM:
>>> I was able to reproduce the bug in question: just umount -f an nfs partition
>>> or umount -l any partition that's used as a lower branch, then try to umount
>>> unionfs's mount; you get the exact oops above. Turns out that grabbing a
>>> vfsmount ref isn't enough: it prevents a casual umount on a lower branch
>>> from succeeding, returning an EBUSY. But we also needed to grab an s_active
>>> reference on all lower superblocks, to prevent a forced/detached unmount
>>> from destroying the lower super too early. With the patch below, the lower
>>> super will be detached from the namespace, but it won't be destroyed until
>>> unionfs is mounted: unionfs_put_super will decrement the (possibly last)
>>> reference on the lower super, which'd then be properly destroyed.
>>>
>>> Try this patch. I quickly tried it w/ branch management, umount -l, and my
>>> basic regression suite. It seems to work, but I'd like to hear from both of
>>> you first before considering this bug fixed.
>> Poking around at my logs, I see that the OOPS I've been getting under
>> heavy usage (that I've been meaning to send you all our config so you
>> could reproduce it) actually matches the one we get when trying to shut
>> down with the unionfs still mounted (which is the one you're trying to
>> fix here). If this patch fixes this particular OOPS this may well solve
>> our whole problem. I've got it compiling now, I'll throw the load test
>> script at it again and let you know. :)
>>
>> --
>> Dave Miller http://www.justdave.net/
>> System Administrator, Mozilla Corporation http://www.mozilla.com/
>> Project Leader, Bugzilla Bug Tracking System http://www.bugzilla.org/
>
> Dave, please apply this very important patch below on top of unionfs-2.3,
> and let me know. The oopses you've seen and this fix seem to be a good
> match (fingers crossed :-)
>
> You've seen oopses that look like this
>
> Mar 10 15:53:02 dm-stage02 kernel: BUG: Dentry
> f652e6e8{i=63,n=archive.mozilla.org} still in use (1) [unmount of nfs 0:15]
>
> This was 'strange' b/c you weren't really unmounting anything, just doing
> some branch management commands. The bug in question was an oops from the
> VFS b/c it seems that a superblock's reference count reached zero while it
> still had active dentries (should never happen).
>
> The fix below seems fitting because:
>
> - it fixes stuff in unionfs_remount_fs as per your stack trace
>
> - if you added a branch, we had a bug which incorrectly decremented the
> refcnt of some branches (very clear from the patch itself).
>
> - over-decrementing the sb refcnt will result in behavior as you've seen:
> the sb refcnt will reach zero too early.
>
> - all of the other recent fixes were more related to races, whereas this bug
> was a buffer overflow that went beyond the valid array range of
> UNIONFS_SB(sb)->data[i].sb, and tried to decrement stuff there (i.e., the
> bug didn't tickle all the time for all users b/c it depended on the memory
> contents beyond that data[i] array, which varies from system to system).
>
> Cheers,
> Erez.
>
>
> diff --git a/fs/unionfs/super.c b/fs/unionfs/super.c
> index e5cb235..4cddc83 100644
> --- a/fs/unionfs/super.c
> +++ b/fs/unionfs/super.c
> @@ -755,7 +755,7 @@ out_no_change:
> /* grab new lower super references; release old ones */
> for (i = 0; i < new_branches; i++)
> atomic_inc(&new_data[i].sb->s_active);
> - for (i = 0; i < new_branches; i++)
> + for (i = 0; i < sbmax(sb); i++)
> atomic_dec(&UNIONFS_SB(sb)->data[i].sb->s_active);
>
> /* copy new vectors into their correct place */
--
Dave Miller http://www.justdave.net/
System Administrator, Mozilla Corporation http://www.mozilla.com/
Project Leader, Bugzilla Bug Tracking System http://www.bugzilla.org/
More information about the unionfs
mailing list