Overblog
Suivre ce blog
Editer l'article Administration Créer mon blog
13 octobre 2010 3 13 /10 /octobre /2010 22:54

 

Return to a problem with VxVM after adding san disks. By analyzing the core and the logs of daemon, I found (and understood) the origin of the problem.

 

It's a classical error but the root cause a little less traditional. The vxconfigd daemon not running just after adding disks. Why ? 

 

# vxdisk list
VxVM vxdisk ERROR V-5-1-684 IPC failure: Configuration daemon is not accessible

 

Nothing serious Doctor ? I decide to restart the daemon 

 

# vxconfigd -k
VxVM vxconfigd ERROR V-5-1-0 Segmentation violation - core dumped

 

It's bad start, hummm look at this core

 

# ls -l /core
-rw------- 1 root root 13221414 Sep 20 18:50 /core

# file /core
/core: ELF 32-bit MSB core file SPARC Version 1, from 'vxconfigd'

# mdb /core
Loading modules: [ libc.so.1 libnvpair.so.1 libavl.so.1 libuutil.so.1 ld.so.1 ]
> $C
ffbfea58 ddl_migration_devlist_removed+0x198(4820c0, 131b8, 2cf928, 2bc770, 50000, 482dd8)
ffbfeab8 ddl_check_migration_of_devices+0x9c(4b7598, 4ccdf8, ffbfeb7c, 4820c0, 5c00,0)
ffbfeb18 ddl_reconfigure_all+0x26c(2c254c, 4820c0, 2bc770, 50000, 3400, 2cf8dc)
ffbfeb80 ddl_find_devices_in_system+0x3f0(11400, 0, 5de8, 2bc770, 0, 2c0ad0)
ffbff110 find_devices_in_system+0x28(2, 0, 276664, 2cb214, 11, 276800)
ffbff170 mode_set+0x184(2, ffbff954, 2cb214, 2cb250, 0,2c0000)
ffbff8f0 setup_mode+0x24(2, 274400, 0, 2c0800, a39, 274400)
ffbff958 startup+0x284(8fa656a0, 2dcc00, 2c0800, 2c0800, 274400, 657a)
ffbff9c8 main+0xcac(2, ffbffb1c, ffffffff, 0, ffbffbf0, 0)
ffbffab8_start+0x108(0, 0, 0, 0, 0, 0)

> ddl_migration_devlist_removed+0x198::dis
ddl_migration_devlist_removed+0x170: ld [%l1], %o4
ddl_migration_devlist_removed+0x174: ba+0xb8 <ddl_migration_devlist_removed+0x22c>
ddl_migration_devlist_removed+0x178: ld [%l1 + 0xc], %l1
ddl_migration_devlist_removed+0x17c: ld [%l1 + 0x4], %o5
ddl_migration_devlist_removed+0x180: ld [%l1 + 0x8], %l2
ddl_migration_devlist_removed+0x184: btst 0x2, %o5
ddl_migration_devlist_removed+0x188: bne,pn %icc, +0x48 <ddl_migration_devlist_removed+0x1d0>
ddl_migration_devlist_removed+0x18c: btst 0x8,%o5
ddl_migration_devlist_removed+0x190: bne,pn %icc, +0x40 <ddl_migration_devlist_removed+0x1d0>
ddl_migration_devlist_removed+0x194: add %i4, 0x9, %o0
ddl_migration_devlist_removed+0x198: ld [%l2 + 0x8], %g5
ddl_migration_devlist_removed+0x19c: sethi %hi(0x2d400), %g1
ddl_migration_devlist_removed+0x1a0: sethi %hi(0x5400), %o1
ddl_migration_devlist_removed+0x1a4: ld [%l2 + 0xc], %g3
ddl_migration_devlist_removed+0x1a8: xor %g1, -0x2e4, %o7
ddl_migration_devlist_removed+0x1ac: add %o1, 0x148, %o1
ddl_migration_devlist_removed+0x1b0: add %i3, %o7, %o2
ddl_migration_devlist_removed+0x1b4: sub %g5, 0x1, %g4
ddl_migration_devlist_removed+0x1b8: mov %i2, %o3
ddl_migration_devlist_removed+0x1bc: st %g4, [%l2 + 0x8]
ddl_migration_devlist_removed+0x1c0: add %g3, 0x1, %g2

> ::regs
%g0 = 0x00000000 %l0 = 0xfffffffe
%g1 = 0x0015e160 ddl_check_if_exists_in_prop_list+0x58 %l1 = 0x00485340
%g2 = 0x00000000 %l2 = 0x00000000
%g3 = 0x002dcc00 vxconfigd`progname %l3 = 0x00000000
%g4 = 0x002c0800 vxconfigd`ioctls+0x318 %l4 = 0x00000001
%g5 = 0x00000088 %l5 = 0x004820e8
%g6 = 0x00000000 %l6 = 0x00482112
%g7 = 0xff0d2a00 %l7 = 0x004b7598
%o0 = 0x00050009 vold_change_common+0x1751 %i0 = 0x004820c0
%o1 = 0x40000025 %i1 = 0x000131b8
%o2 = 0x002bc770 %i2 = 0x002cf928
%o3 = 0x00000000 %i3 = 0x002bc770
%o4 = 0x00485340 %i4 = 0x00050000 vold_change_common+0x1748
%o5 = 0x40000025 %i5 = 0x00482dd8
%o6 = 0xffbfea58 %i6 = 0xffbfeab8
%o7 = 0x00157c7c ddl_migration_devlist_removed+0x11c %i7 = 0x00157a7c ddl_check_migration_of_devices+0x9c

%psr = 0xfe401002 impl=0xf ver=0xe icc=nZvc
                  ec=0 ef=4096 pil=0 s=0 ps=0 et=0 cwp=0x2

%y   = 0x00000000
%pc  = 0x00157cf8 ddl_migration_devlist_removed+0x198
%npc = 0x00157cfc ddl_migration_devlist_removed+0x19c
%sp  = 0xffbfea58
%fp  = 0xffbfeab8

%wim = 0x00000000
%tbr = 0x00000000

 

It's in a function 'ddl_migration_devlist_removed' that the problem is... Without a source code, it's more difficult to understand. So I change the method...

 

# vxconfigd -x 9 -k -x log -x logfile=/tmp/vxconfigd.out
...
.. vxconfigd DEBUG V-5-1-14387 ddl_check_migration_of_device: oldnode is NULL
.. vxconfigd DEBUG V-5-1-14387 ddl_check_migration_of_device: oldnode is NULL
.. vxconfigd DEBUG V-5-1-14387 ddl_check_migration_of_device: oldnode is NULL
.. vxconfigd DEBUG V-5-1-14387 ddl_check_migration_of_device: oldnode is NULL
.. vxconfigd DEBUG V-5-1-14387 ddl_check_migration_of_device: oldnode is NULL
.. vxconfigd DEBUG V-5-1-14387 ddl_check_migration_of_device: oldnode is NULL
.. vxconfigd DEBUG V-5-1-14387 ddl_check_migration_of_device: oldnode is NULL
...

 

Strange, no ? The same function as in the core. But what is DDL ? DDL (Device Discovery Layer) is the process of locating and identifying disks attached to a host. What are the disks connected to my server

 

# egrep Enclosure /tmp/vxconfigd.out
.. vxconfigd DEBUG V-5-1-14475 Enclosure is CK200072700055:0:DGC:CLR-A/PF:EMC_CLARiiON:832
.. vxconfigd DEBUG V-5-1-14475 Enclosure is CK200072700055:0:DGC:A/A:EMC_CLARiiON:832
.. vxconfigd DEBUG V-5-1-14475 Enclosure is 000290300471:0:EMC:A/A:EMC:834
.. vxconfigd DEBUG V-5-1-14475 Enclosure is 000290300446:0:EMC:A/A:EMC:834
.. vxconfigd DEBUG V-5-1-14475 Enclosure is DISKS:0:SEAGATE:Disk:Disk:514
.. vxconfigd DEBUG V-5-1-14475 Enclosure is 000290300446:50:EMC:A/A:PP_EMC:194
.. vxconfigd DEBUG V-5-1-14475 Enclosure is DISKS:5:SEAGATE:Disk:Disk:2
.. vxconfigd DEBUG V-5-1-14475 Enclosure is CK200072700055:52:DGC:A/A:PP_EMC_CLARiiON:194
.. vxconfigd DEBUG V-5-1-14475 Enclosure is 000290300471:50:EMC:A/A:PP_EMC:194

 

Looking at the vxconfigd.log, we noticed the following

 

.. vxconfigd DEBUG V-5-1-14475 Enclosure is CK200072700055:0:DGC:CLR A/PF:EMC_CLARIION:832
.. vxconfigd DEBUG V-5-1-14475 Enclosure is CK200072700055:0:DGC:A/A:EMC_CLARIION:832

 

The same enclosure is being seen as both ALUA and AP/F. Google is my friend, after a little research on the web I found the following bug by Symantec: TECH76277. This root is simple: this can occur if the array had existing LUNS configured as AP/F. Subsequently, some ALUA LUNS were added. This mixed configuration can cause problems with DMP. An the solution is: after changing all LUNS to either AP/F or ALUA, the problem is resolved.

 

Partager cet article

Published by gloumps - dans lvm
commenter cet article

commentaires

Gaëtan 24/01/2012 11:10

Petite coquille, le niveau de DEBUG de "vxconfigd" va jusqu'à 9 et non jusqu'à 10.

gloumps 24/01/2012 22:48



Effectivement j'ai commis une petit erreur en recopiant la commande. C'est corrigé.