Overblog
Suivre ce blog Administration + Créer mon blog
28 août 2016 7 28 /08 /août /2016 12:27

Dernier article sur ce blog, eh oui tout à une fin... Quoique, je viens de me (re)lancer sur un nouveau blog BlogUnix.

On se retrouve bientôt je l'espère.

Partager cet article
31 juillet 2013 3 31 /07 /juillet /2013 21:11

 

I already wrote an similar article to Solaris 11 and Zones (link). Today I will describe how to configure several Ldoms Guest an emphasis on network configuration (several vlan).

 

 

In this example, there are 3 Ldoms running on dedicated systems that are exposed to the external networks.

  • The Ldom control runs in 4 vlan (front, admin, backup, interconnect) - OS Solaris 11.1
  • Ldom Guest 1 runs in 4 vlan (front, admin, backup, interconnect) - OS Solaris 10u11
  • Ldom Guest 2 runs in 3 vlan (front, admin, backup) - OS Solaris 10u10

 

Vlans informations:

  • Vlan id 1 : address 192.168.1.0/24 - front
  • Vlan id 2 : address 192.168.2.0/24 - admin
  • Vlan id 3 : address 192.168.3.0/24 - backup
  • Vlan id 4 : address 192.168.4.0/24 - interconnect


Address for Ldom control

  • Vlan id 1 : 192.168.1.10 - defaultrouter 192.168.1.1
  • Vlan id 2 : 192.168.2.10
  • Vlan id 3 : 192.168.3.10
  • Vlan id 4 : 192.168.4.10

 

Let's go... Just wait... The network configuration of switch must be already configured (please contact network team !?)

 

 

Step 1: Create link aggregation and vlan configuration on Ldom control

 

My system (Sparc T4-2) includes 2 NICs (10G). There is no network configuration yet (I connect on ILOM).

 

# dladm show-phys
LINK       MEDIA         STATE      SPEED  DUPLEX    DEVICE
[...]
net8       Ethernet      unknown    0      unknown   ixgbe1
net9       Ethernet      unknown    0      unknown   ixgbe0
[...] 

 

I create a basic link aggregation (I use LACP) with 2 NICs.

 

# dladm create-aggr -P L2,L3 -L active -l net8 -l net9 aggr0

 

I check quicly the status of the aggregation.

 

# dladm show-link
LINK       CLASS     MTU    STATE    OVER
[...]
net8       phys      1500   up       --
net9       phys      1500   up       --
[...]
aggr0      aggr      1500   up       net8 net9

 

# dladm show-aggr -x
LINK   PORT  SPEED    DUPLEX  STATE  ADDRESS            PORTSTATE
aggr0    --  10000Mb  full    up     90:xx:xx:xx:xx:x8  --
       net8  10000Mb  full    up     90:xx:xx:xx:xx:x8  attached
       net9  10000Mb  full    up     90:xx:xx:xx:xx:x9  attached

 

Yet, I create 1 virtual card for each vlan id.

 

# dladm create-vlan -l aggr0 -v 1 front0
# dladm create-vlan -l aggr0 -v 2 admin0
# dladm create-vlan -l aggr0 -v 3 backup0
# dladm create-vlan -l aggr0 -v 4 interco0 

 

# dladm show-vlan
LINK          VID   OVER      FLAGS
front0        1     aggr0     -----
admin0        2     aggr0     -----
backup0       3     aggr0     -----
interco0      4     aggr0     -----

 

# ipadm create-ip front0
# ipadm create-addr -T static -a local=192.168.1.10/24 front0/v4
# ipadm create-ip admin0
# ipadm create-addr -T static -a local=192.168.2.10/24 admin0/v4
# ipadm create-ip backup0
# ipadm create-addr -T static -a local=192.168.3.10/24 backup0/v4
# ipadm create-ip interco0
# ipadm create-addr -T static -a local=192.168.4.10/24 interco0/v4 

 

# ipadm
NAME           CLASS/TYPE STATE  UNDER  ADDR
admin0         ip         ok     --     --
   admin0/v4   static     ok     --     192.168.2.10/24
backup0        ip         ok     --     --
   backup0/v4  static     ok     --     192.168.3.10/24
front0         ip         ok     --     --
   front0/v4   static     ok     --     192.168.1.10/24
inter0         ip         ok     --     --
   inter0/v4   static     ok     --     192.168.4.10/24
lo0            loopback   ok     --     --
   lo0/v4      static     ok     --     127.0.0.1/8
   lo0/v6      static     ok     --     ::1/128
[...]

 

Don't forget, the configuration of router.

 

# route add -p default 192.168.1.1 -ifp front0

 

 

Step 2: Create link virtual switch and configuration vnic for each Ldoms Guest

 

I create one virtual switch for all vlan

 

# ldm add-vswitch net-dev=aggr0 vid=1,2,3 primary-vsw0 primary

 

For Ldom Guest 1, I create 4 vnic (see definition)

 

# ldm add-vnet pvid=1 id=0 vnet0 primary-vsw ldom1
# ldm add-vnet pvid=2 id=0 vnet1 primary-vsw ldom1
# ldm add-vnet pvid=3 id=0 vnet2 primary-vsw ldom1
# ldm add-vnet pvid=4 id=0 vnet3 primary-vsw ldom1

 

For Ldom Guest 2, I create 3 vnic (see definition)

# ldm add-vnet pvid=1 id=0 vnet0 primary-vsw ldom2
# ldm add-vnet pvid=2 id=0 vnet1 primary-vsw ldom2
# ldm add-vnet pvid=3 id=0 vnet2 primary-vsw ldom2

 

 

Conclusion: We hope this step-by-step guide will give you some ideas for future consolidation with Oracle VM for Sparc. With Oracle Solaris 11 capabilities (aka Crossbow), you can easily set up fairly complex environments (simply network configuration).

 

 

See Also

 
Partager cet article
22 mai 2013 3 22 /05 /mai /2013 20:37

 

Mais que s'est-il passé Mercredi dernier ? Vous ne suivez pas l'actualité de l'association Guses !? C'est vraiment dommage pour vous...

 

En collaboration avec Oracle, le Guses a participé activement au TechDay Solaris 2013. Je ne vais pas vous faire un résumé de la soirée car Eric Bezille et Axel Paratre ont déjà écrit à ce sujet. Je vais juste profiter de cet article pour publier les deux présentations du Guses (ainsi que celles des précédentes éditions déjà disponibles sur mon blog).

 

Je remercie particulièrement René Garcia (Ingénieur Unix chez PSA Peugeot Citroen) pour sa présentation ZFS. Merci aussi à vous tous de votre présence, en espérant vous croiser lors d'une prochaine soirée que nous organisons.

 

Les présentations du TechDay 2013 :

 

Si jamais cela vous a échappé, le Guses a aussi participé aux précédents TechDay Solaris... Petit rappel des présentations...

 

Les présentations du TechDay 2011 :

 

La présentation du TechDay 2012 :

 

 

 

Maintenant plus d'excuses suivez le Guses...

Partager cet article
22 mai 2013 3 22 /05 /mai /2013 14:20

 

Did you know ? Solaris 11 is capable of doing a fast reboot, skipping the power-on style self tests (POST) that have traditionally accompanied a reboot. Finished the coffee break !?

 


On x86 machines, this will automatically happen if you use the reboot command (or init 6). To force a full test cycle, and/or to get access to the boot order menu from the BIOS, you can use halt, followed by pressing a key.



On SPARC, the default configuration requires that you use reboot -f for a fast reboot. If you wish fast reboot to be the default, you must change an SMF property fastreboot, as follows:



# svccfg -s system/boot-config:default 'setprop config/fastreboot_default=true'
# svcadm refresh svc:/system/boot-config:default



To temporarily override the setting and reboot the slow way, you can use reboot -p, aka "reboot to PROM".

 

 

It is necessary to find a new excuse for your coffee break !!

Partager cet article
20 mai 2013 1 20 /05 /mai /2013 21:10

 

I describe only migration P2V of a physical server in a ldom, the installation and the configuration of Oracle VM Server for Sparc are not specified in this article.

 

Some details:

  • The name of physical server is ldom-guest (Solaris 10u3 – kernel 118833-33)
  • The name of crontol domain is ldom-crtl (Solaris 11.1 SRU 5.5)

 

There are a 3 phases to migrate from a physical system to a virtual system:

  • Collection Phase: A filesystem source is created based on the configuration information that it collects about the source system.
  • Preparation Phase: A logical domain is created
  • Conversion Phase: The filesystem source is converted into a logical domain (ex: conversion from sun4u to sun4v)

 

To execute this procedure, you must use tool ldmp2v (download this path p15880570 to obtain the tool - In Solaris 11, this tool is directly available).

 

Before starting, let us look at the configuration available on control domain:

 

ldom-crtl # ldm –V

Logical Domains Manager (v 3.0.0.2)
        Hypervisor control protocol v 1.7
        Using Hypervisor MD v 1.3

System PROM:
        Hypervisor v. 1.10.0. @(#)Hypervisor 1.10.0.a 2011/07/15 11:51\015
        OpenBoot   v. 4.33.0. @(#)OpenBoot 4.33.0.b 2011/05/16 16:28

 

ldom-crtl # ldm ls –o console,network,disk primary
[…]

VCC
    NAME           PORT-RANGE
    primary-vcc0   5000-5100

VSW
    NAME           MAC          […]
    primary-vsw0   x:x:x:x:x:x  […]

VDS
    NAME           VOLUME       […]
    primary-vds0

[…]

 

A traditional configuration !?, no ?

 

Fisrt step: Collection phase (runs on the physical source system)

 

To create a consistent file system image, I suggest you to boot the server in “single mode”. To save a file system image, I often use a NFS share.

 

ldom-guest # mount –F nfs myshare:/tempo /mnt

 

By default, the cmd ldmp2v creates a flar image.

 

ldom-guest # /usr/sbin/ldmp2v collect -d /mnt/ldom-guest
Collecting system configuration ...
Archiving file systems ...
Full Flash
Checking integrity...
Integrity OK.
Running precreation scripts...
Precreation scripts done.
Creating the archive...
136740734 blocks
Archive creation complete.

ldom-guest # init 0

 

Second step: Preparation phase (runs on the control domain)

 

I start by creating a ZFS pool which will contain the data of the logical domain.

 

ldmon-crtl # zpool create -m none ldom-guest cXtYdZ

 

I prefer to use the manual mode to create a logical domain (so I edit the following file ldmp2v.conf).

 

ldmon-crtl # cat /etc/ldmp2v.conf
# Virtual switch to use
VSW="primary-vsw0"
# Virtual disk service to use
VDS="primary-vds0"
# Virtual console concentrator to use
VCC="primary-vcc0"
# Location where vdisk backend devices are stored
BACKEND_PREFIX=""
# Default backend type: "zvol" or "file".
BACKEND_TYPE="zvol"
# Create sparse backend devices: "yes" or "no"
BACKEND_SPARSE="yes"
# Timeout for Solaris boot in seconds
BOOT_TIMEOUT=60

 

Just after mounted the share NFS, I create a logical domain by indicating the following informations: cpu, mem and prefix (here it is name of ZFS pool)

 

ldom-crtl # mount –F nfs myshare:/tempo /mnt
ldom-crtl # ldmp2v prepare -c 16 -M 16g –p ldom-guest -d /mnt/ldom-guest ldom-guest
Creating vdisks ...
Creating file systems ...
Populating file systems ...
136740734 blocks
Modifying guest OS image ...
Modifying SVM configuration ...
Unmounting file systems ...
Creating domain ...
Attaching vdisks to domain ldom-guest ...

 

For this example, the ldom guest is configured with 16 vcpu and 16 Go (options –c and –M).

 

Final step: Conversion phase (runs on the control domain)

 

In the conversion phase, the logical domain uses the Oracle Solaris upgrade process to upgrade to the Oracle Solaris 10 OS. The upgrade operation removes all existing packages and installs the Oracle Solaris 10 sun4v packages, which automatically performs a sun4u-to-sun4v conversion. The convert phase can use an Oracle Solaris DVD ISO image or a network installation image. On Oracle Solaris 10 systems, you can also use the Oracle Solaris JumpStart feature to perform a fully automated upgrade operation.

 

On “jumpstart server” (do you known jet ?), I edit the jumpstart profile to add the following lines

 

install_type    upgrade
root_device c0d0s0

 

Ready for conversation !! The last command to convert the Sparc architecture and to start the guest domain.

 

ldom-crtl # ldmp2v convert -j -n vnet0 -d /mnt/ldom-guest ldmon-guest
Testing original system status ...
LDom ldom-guest started
Waiting for Solaris to come up ...
Using Custom JumpStart
Trying 0.0.0.0...
Connected to 0.
Escape character is '^]'.

 

Connecting to console "server" in group "server" ....
Press ~? for control options ..
Configuring devices.
Using RPC Bootparams for network configuration information.
Attempting to configure interface vnet0...
Configured interface vnet0
Setting up Java. Please wait...
Extracting windowing system. Please wait...
Beginning system identification...
Searching for configuration file(s)...
Using sysid configuration file 10.x.x.x:/opt/SUNWjet/Clients/ldom-guest/sysidcfg
Search complete.
Discovering additional network configuration...
Completing system identification...
Starting remote procedure call (RPC) services: done.
System identification complete.
Starting Solaris installation program...
Searching for JumpStart directory...
Using rules.ok from 10.x.x.x:/opt/SUNWjet.
Checking rules.ok file...
Using begin script: Utils/begin
Using derived profile: Utils/begin
Using finish script: Utils/finish
Executing JumpStart preinstall phase...
Executing begin script "Utils/begin"...
Installation of ldom-guest at 00:41 on 10-May-2013
Loading JumpStart Server variables
Loading JumpStart Server variables
Loading Client configuration file
Loading Client configuration file
Running base_config begin script....
Running base_config begin script....
Begin script Utils/begin execution completed.
Searching for SolStart directory...
Checking rules.ok file...
Using begin script: install_begin
Using finish script: patch_finish
Executing SolStart preinstall phase...
Executing begin script "install_begin"...
Begin script install_begin execution completed.

WARNING: Backup media not specified.  A backup media (backup_media) keyword must be specified if an upgrade with disk space reallocation is required

Processing default locales
       - Specifying default locale (en_US.ISO8859-1)

Processing profile

Loading local environment and services

Generating upgrade actions
       - Selecting locale (en_US.ISO8859-1)

Checking file system space: 100% completed
Space check complete.

Building upgrade script

Preparing system for Solaris upgrade

Upgrading Solaris: 101% completed
       - Environment variables (/etc/default/init)

Installation log location
       - /a/var/sadm/system/logs/upgrade_log (before reboot)
       - /var/sadm/system/logs/upgrade_log (after reboot)

Please examine the file:
       - /a/var/sadm/system/data/upgrade_cleanup

It contains a list of actions that may need to be performed to complete the upgrade. After this system is rebooted, this file can be found at:
       - /var/sadm/system/data/upgrade_cleanup

Upgrade complete
Executing SolStart postinstall phase...
Executing finish script "patch_finish"...

Finish script patch_finish execution completed.
Executing JumpStart postinstall phase...
Executing finish script "Utils/finish"...
[…]
Terminated

Finish script Utils/finish execution completed.
The begin script log 'begin.log'
is located in /var/sadm/system/logs after reboot.

The finish script log 'finish.log'
is located in /var/sadm/system/logs after reboot.

syncing file systems... done
rebooting...
Resetting...

 

T5240, No Keyboard
Copyright (c) 1998, 2011, Oracle and/or its affiliates. All rights reserved.
OpenBoot 4.33.0.b, 16384 MB memory available, Serial #83470255.
Ethernet address 0:x:x:x:x:x, Host ID: 84f9a7af.

Boot device: disk0:a  File and args:
SunOS Release 5.10 Version Generic_118833-33 64-bit
Copyright 1983-2006 Sun Microsystems, Inc.  All rights reserved.
Use is subject to license terms.
Hostname: ldom-guest
Loading smf(5) service descriptions: 1/1
checking ufs filesystems
/dev/rdsk/c0d1s0: is logging.

ldom-guest console login:

 

It’s already finishes. Is it simple !? You do not have any more excuses not to use ldoms.

 

See Also

 

Partager cet article
6 avril 2013 6 06 /04 /avril /2013 10:44

 

Oracle vient d'annoncer de nouveaux serveurs basés sur les processeurs SPARC T5 & M5. Solaris 11, le système d'exploitation d'Oracle, est au coeur de la stratégie de ces systèmes. Il permet d'en exploiter toute la quintessence: capacités de virtualisation, montée en charge au delà de 1000 threads, optimisation des bases de données Oracle, socle pour une infrastructure Cloud...

 

Nous vous invitons, en association avec le GUSES (Groupe d'Utilisateurs du Système d'Exploitation Solaris), à participer à notre séminaire "Oracle SPARC/Solaris TechDay" le 25 avril à 17h00 au Caves Legrand à Paris 2ème.

 

Nous vous proposons au cours de ce séminaire de partager l'actualité SPARC/Solaris, au travers de retours d'expériences et de cas pratiques.

 

Nous espérons avoir le plaisir de vous accueillir le 25 avril prochain. Plus de détails ici

 

noname

Partager cet article
30 mars 2013 6 30 /03 /mars /2013 22:20

 

Everyone knows that one of the major problem for consolidating Solaris 10 is network. if each Solaris Zones use a different network (vlan), the configuration of the Global Zone becomes a real headache.

 

In Solaris 11, Crosbow effectively addresses this problem. This article explains how to create several Solaris Zone an emphasis on network configuration (several vlan).

 

In this example, there are 3 Solaris Zone running on dedicated systems that are exposed to the external networks. Each Solaris Zone runs a different vlan.

  • The Global Zone running in vlan id 1 (Address: 192.168.1.10/24 - Router: 192.168.1.1)
  • The Solaris Zone zone1 running in vlan id 1 (Address: 192.168.1.11/24 - Router: 192.168.1.1)
  • The Solaris Zone zone2 running in vlan id 2 (Address: 192.168.2.10/24 - Router: 192.168.2.1)
  • The Solaris Zone zone3 running in vlan id 3 (Address: 192.168.3.10/24 - Router: 192.168.3.1)
  • Each port of NIC used by aggregation is configured in different vlans (vlan id 1, 2 and 3)

Let's go... Just wait... The network configuration of switch must be already configured (please contact network team !?)

 

 

Step 1: Create link aggregation

 

My system (Sparc M5000) includes 4 NICs. There is no network configuration yet (I connect on XSCF).

 

# dladm show-phys
LINK       MEDIA         STATE      SPEED  DUPLEX    DEVICE
net1       Ethernet      unknown    0      unknown   bge1
net0       Ethernet      unknown    0      unknown   bge0
net3       Ethernet      unknown    0      unknown   bge3
net2       Ethernet      unknown    0      unknown   bge2

 

I create a basic link aggregation (I don't use LACP) with 4 NICs.

 

# dladm create-aggr -P L2,L3 -l net0 -l net1 -l net2 -l net3 default0

 

I check quicly the status of the aggregation.

 

# dladm show-link
LINK          CLASS     MTU    STATE    OVER
net1          phys      1500   up       --
net0          phys      1500   up       --
net3          phys      1500   up       --
net2          phys      1500   up       --
default0      aggr      1500   up       net0 net1 net2 net3

 

Yet, I configure address on this aggregation.

 

# ipadm create-ip default0
# ipadm create-addr -T static -a local=192.168.1.10/24 default0/v4

 

Don't forget, the configuration of router.

 

# route add -p default 192.168.1.1 -ifp default0

 

 

Step 2: Create Solaris Zone for Cloning

 

It is much faster to clone Solaris Zone than to create one from scratch, because building an image from packages takes longer than, in essence, copying an existing zone. I use the cloning technique in this example to first create one Solaris Zone and then clone it three times.

 

# zfs create -o mountpoint=/zones -o dedup=on rpool/zones
# zfs create -o mountpoint=/zones/zclone rpool/zones/zclone
# chmod 700 /zones/zclone

 

# zonecfg -z zclone
Use 'create' to begin configuring a new zone.
zonecfg:zclone> create
create: Using system default template 'SYSdefault'
zonecfg:zclone> set zonepath=/zones/zclone
zonecfg:zclone> set ip-type=exclusive
zonecfg:zclone> exit

 

# zoneadm -z zclone install
Progress being logged to /var/log/zones/zoneadm.20130329T161207Z.zclone.install
       Image: Preparing at /zones/zclone/root. 
[...] 
  Next Steps: Boot the zone, then log into the zone console (zlogin -C)
              to complete the configuration process.
Log saved in non-global zone as /zones/zclone/root/var/log/zones/zoneadm.20130329T161207Z.zclone.install

 

# zoneadm -z zclone boot ; zlogin -C zclone
[Connected to zone 'zclone' console]
Loading smf(5) service descriptions: 115/115

 

When I obtain the screen to configure this Solaris Zone, I halt this zone.

 

# zoneadm -z zclone halt

 

 

Step 3: Create Solaris Zones zone1

 

Remimber, Solaris Zone zone1 use a same vlan that Global Zone. First, I create a vlan link over a datalink (default0).

 

# dladm create-vnic -v 1 -l default0 vnic1

 

Next, I create zone1 from the zclone zone (don't forget a profile creation - new sysidcfg).

 

# zonecfg -z zone1 "create -t zclone"
# zonecfg -z zone1
zonecfg:zone1> set zonepath=/zones/zone1
zonecfg:zone1> select anet linkname=net0
zonecfg:zone1:anet> set linkname=vnic1
zonecfg:zone1:anet> set lower-link=default0
zonecfg:zone1:anet> end
zonecfg:zone1> commit
zonecfg:zone1> exit

 

# zoneadm -z zone1 clone -c /tmp/sc_profile1.xml zclone
The following ZFS file system(s) have been created:
    rpool/zones/zone1
Progress being logged to /var/log/zones/zoneadm.20130329T172124Z.zone1.clone
Log saved in non-global zone as /zones/zone1/root/var/log/zones/zoneadm.20130329T172124Z.zone1.clone

 

 

Step 4: Create Solaris Zones zone2

 

Solaris Zone zone2 use a vlan id 2. First, I create a vlan link over a datalink (default0).

 

# dladm create-vnic -v 2 -l default0 vnic2

 

Next, I create zone2 from the zclone zone (don't forget a profile creation - new sysidcfg). Beware, I use the following paramater to configure the vlan id: vlan-id.

 

# zonecfg -z zone2 "create -t zclone"
# zonecfg -z zone2
zonecfg:zone2> set zonepath=/zones/zone2
zonecfg:zone2> select anet linkname=net0
zonecfg:zone2:anet> set linkname=vnic2
zonecfg:zone2:anet> set lower-link=default0
zonecfg:zone2:anet> set vlan-id=2
zonecfg:zone2:anet> end
zonecfg:zone2> commit
zonecfg:zone2> exit

 

# zoneadm -z zone2 clone -c /tmp/sc_profile2.xml zclone
The following ZFS file system(s) have been created:
    rpool/zones/zone2
Progress being logged to /var/log/zones/zoneadm.20130329T174913Z.zone2.clone
Log saved in non-global zone as /zones/zone2/root/var/log/zones/zoneadm.20130329T174913Z.zone2.clone

 

 

Step 5: Create Solaris Zones zone3

 

It's the same configuration than zone2, the only change comes from vlan id. This zone uses a vlan id 3.

 

# dladm create-vnic -v 3 -l default0 vnic3

 

# zonecfg -z zone3 "create -t zclone"
# zonecfg -z zone3
zonecfg:zone3> set zonepath=/zones/zone3
zonecfg:zone3> select anet linkname=net0
zonecfg:zone3:anet> set linkname=vnic3
zonecfg:zone3:anet> set lower-link=default0
zonecfg:zone3:anet> set vlan-id=3
zonecfg:zone3:anet> end
zonecfg:zone3> commit
zonecfg:zone3> exit

 

# zoneadm -z zone3 clone -c /tmp/sc_profile3.xml zclone
The following ZFS file system(s) have been created:
    rpool/zones/zone3
Progress being logged to /var/log/zones/zoneadm.20130329T175707Z.zone3.clone
Log saved in non-global zone as /zones/zone3/root/var/log/zones/zoneadm.20130329T175707Z.zone3.clone

 

 

Step 6: Start all Solaris Zone

 

My configuration is finished. I just start all zone.

 

# zoneadm list -cv
  ID NAME      STATUS     PATH               BRAND    IP   
   0 global    running    /                  solaris  shared
   - zclone    installed  /zones/zclone      solaris  excl 
   - zone1     installed  /zones/zone1       solaris  excl 
   - zone2     installed  /zones/zone2       solaris  excl 
   - zone3     installed  /zones/zone3       solaris  excl 

 

# zoneadm –z zone1 boot ; zoneadm –z zone2 boot ; zoneadm –z zone3 boot

 

 

Conclusion: We hope this step-by-step guide will give you some ideas for future consolidation. With Oracle Solaris 11 capabilities, you can easily set up fairly complex environments.

 

 

See Also

 

Partager cet article
27 mars 2013 3 27 /03 /mars /2013 08:42

 

Last weekend, I found the origin of the SVM bug using the mdb tool. Good reading !! 

 

After restarting the server, I wanted to mount a filesystem (metaset object), but the following command was not responding...

 

# metaset
^Cmetaset: Interrupt

 

Very strange... I tested another SVM command.

 

# metastat
^Cmetastat: Interrupt

 

Hmm "bizarre"... All SVM commands were blocked. Is there a blocked SVM process ?

 

# pgrep -fl meta
12140 /usr/sbin/metasync -r

 

Yes of course !? But this process is only used to resync a meta element. Is there a problem with a device ? My first action : I checked the system logs looking for disk errors. No errors were reported in the system logs.

 

What does the metasync process do ? I check the activity of this process

 

# truss -aefl -p 12140
12140/1:        psargs: /usr/sbin/metasync -r
^C

 

Nothing... A more thorough analysis is essential to understand this problem. I use mdb to check a stack of this thread.

 

# mdb -k
Loading modules: [ unix genunix dtrace specfs ufs sd mpt pcisch sgsbbc ssd fcp fctl md ip hook neti qlc sgenv sctp arp usba lofs zfs cpc fcip random crypto logindmux ptm nfs ipc ]

> 0t12140::pid2proc
60053be24f0

> 60053be24f0::walk thread |::findstack -v
stack pointer for thread 30160427920: 2a100e92d21
[ 000002a100e92d21 sema_p+0x138() ]
  000002a100e92dd1 biowait+0x6c(600520ca500, 0, 18b7c00, 300b17ae000, 8, 600520ca500)
  000002a100e92e81 default_physio+0x388(12f650c, 200, 0, 600520ca540, 12e6c38, 600520ca538)
  000002a100e92fb1 scsi_uscsi_handle_cmd+0x1b8(20000000b0, 1, 60053d18238, 12f650c, 600520ca500, 60053fed7c0)
  000002a100e930a1 sd_send_scsi_cmd+0x114(20000000b0, 1964c00, 60053fed7c0, 1, 3009f2ca6c0, 2a100e93a30)
  000002a100e93161 sd_send_scsi_RDWR+0x2bc(600500ccf80, 10000, 14, 2a100e93a1c, 2a100e93a70, 1)
  000002a100e93281 sd_use_efi+0x90(3009f2ca6c0, 1, 0, 6a945a3b, 130a3fc, 3041df72000)
  000002a100e93351 sd_validate_geometry+0xc8(3009f2ca6c0, 1, 60, 1, 7, ea000050)
  000002a100e93411 sd_ready_and_valid+0x2d4(3009f2ca6c0, ea, ea000050, 600500ccf80, 30, 0)
  000002a100e93521 sdopen+0x248(8, 3009f2ca6c0, 3, 196c6d8, 3009f2ca7a0, 0)
  000002a100e935d1 PxOpenNativeDev+0x148(600515ab510, 3, 701ae328, 4, 60050003e00, 3011bb3a7a0)
  000002a100e936e1 PxSolUpdateSize+0x78(600515ab510, 60053e523d0, 60053e525f0, 20, ffffffff, 3)
  000002a100e937b1 PowerPlatformBottomDispatch+0x150(600515ab510, 60053e523d0, 0, 0, 60050555558, 20000000b0)
  000002a100e939d1 PowerBottomDispatch+0xa0(600515ab510, 60053e523d0, 0, 0, 600547cd74c, 600515ab250)
  000002a100e93aa1 PowerBottomDispatchPirp+0x88(600515ab250, 60053e523d0, 9, 60053e523d0, 600515a5718, 600515ab510)
  000002a100e93b71 PowerDispatch+0x3a4(600515a9470, 60053e523d0, 60053e52630, 0, 600547cd718, 600515ab250)
  000002a100e93c81 GpxDispatch+0x68(600515a9378, 60053e523d0, 600547cd728, 60053e525b0, 600515a57e8, 0)
  000002a100e93d61 PowerDispatch+0x264(600515a9378, 60053e523d0, 60053e52610, 0, 0, 600515ab250)
  000002a100e93e71 GpxDispatch+0x68(600515a9280, 60053e523d0, ffffffffffffffff, 7bf12e987bf22408, 600515a5780, 0)
  000002a100e93f51 PowerDispatch+0x264(600515a9280, 60053e523d0, 60053e525f0, 0, 2a100e94531, 600515ab250)
  000002a100e94061 GpxDispatch+0x68(600515a9188, 60053e523d0, 20f, 0, 60050555558, 0)
  000002a100e94141 PowerDispatch+0x264(600515a9188, 60053e523d0, 60053e525d0, 0, 600547cd74c, 600515ab250)
  000002a100e94251 GpxDispatch+0x68(600515a9090, 60053e523d0, 9, 60053e523d0, 600515a5718, 0)
  000002a100e94331 PowerDispatch+0x264(600515a9090, 60053e523d0, 60053e525b0, 0, 600547cd718, 600515ab250)
  000002a100e94441 PxUpdateSize+0x80(600515a9090, 0, 600547cd728, 60053e525b0, 600547cd728, 60053e523d0)
  000002a100e94531 power_open+0x5c8(2a100e94f48, 40000003, 2, 60050003e00, 600515ab250, 600515ab510)
  000002a100e94691 spec_open+0x4f8(2a100e950b8, 224, 60050003e00, a01, 60053b3f2d0, 0)
  000002a100e94751 fop_open+0x78(2a100e950b8, 2, 60050003e00, 40000003, 6005483d900, 6005483d900)
  000002a100e94801 dev_lopen+0x34(2a100e95170, 3, 4, 60050003e00, ffffffff, ffffffffffffffff)
  000002a100e948c1 md_layered_open+0x120(13, 2a100e95258, 1, 300ad44b084, 20000000b0, 60050003e00)
  000002a100e94981 stripe_open_all _devs+0x188(1a, 1, 0, 0, 0, 65)
  000002a100e94a61 stripe_open+0xa0(65, 3, 4, 60050135f68, 600500ede00, 1)
  000002a100e94b11 md_layered_open+0xb8(0, 2a100e954b0, 1, 60050135f68, 65, 60050003e00)
  000002a100e94bd1 mirror_open_all _devs+0x240(1, 60050135770, 2a100e958c0, 300ad585070, 64, 1)
  000002a100e94cc1 mirror_internal_open+0xf0(600501357a0, 3, 4, 0, 2a100e958c0, 0)
  000002a100e94d71 mirror_resync_unit+0x74(60053d0d018, 60053d0d000, 60050135770, 2a100e958c0, 0, 1)
  000002a100e94e31 mirror_admin_ioctl+0x1f8(e, ffbffb00, 102003, 2a100e958c0, 54, 102003)
  000002a100e94f41 md_admin_ioctl+0x130(19c4c00, 19, ffbffb00, 102003, 2a100e958c0, 5615)
  000002a100e95011 mdioctl+0xf4(550003ffff, 5615, ffbffb00, 102003, 60053b97100, 1f)
  000002a100e950e1 fop_ioctl+0x20(600548dd840, 5615, ffbffb00, 102003, 60053b97100, 1288f38)
  000002a100e95191 ioctl+0x184(0, 600501e7778, ffbffb00, fffffff8, 0, 5615)
  000002a100e952e1 syscall_trap32+0xcc(0, 5615, ffbffb00, fffffffffffffff8, 0, ffbffb51)

 

Really interesting, the kthead is suspended by a scsi operation (The function biowait suspends processes pending completion of block I/O operations). Which is the device used ?

 

> 600520ca500::print -t buf_t
{
    int b_flags = 0x2200061
    struct buf *b_forw = 0
    struct buf *b_back = 0
    struct buf *av_forw = 0
    struct buf *av_back = 0
    o_dev_t b_dev = 0
    size_t b_bcount = 0x200
    union  b_un = {
        caddr_t b_addr = 0x303666e1480
        struct fs *b_fs = 0x303666e1480
        struct cg *b_cg = 0x303666e1480
        struct dinode *b_dino = 0x303666e1480
        daddr32_t *b_daddr = 0x303666e1480
    }
    lldaddr_t _b_blkno = {
        longlong_t _f = 0
        struct  _p = {
            int32_t _u = 0
            int32_t _l = 0
        }
    }
    char b_obs1 = '\0'
    size_t b_resid = 0
    clock_t b_start = 0
    struct proc *b_proc = 0
    struct page *b_pages = 0
    clock_t b_obs2 = 0
    size_t b_bufsize = 0
    int (*)() b_iodone = 0
    struct vnode *b_vp = 0
    struct buf *b_chain = 0
    int b_obs3 = 0
    int b_error = 0                  
    void *b_private = 0x300edf4b000
    dev_t b_edev = 0x20000000b0
    ksema_t b_sem = {
        void *[2] _opaque = [ 0, 0 ]
    }
    ksema_t b_io = {
        void *[2] _opaque = [ 0x30160427920, 0 ]
    }
    struct buf *b_list = 0
    struct page **b_shadow = 0x60053df97c0
    void *b_dip = 0x6005013d6d0
    struct vnode *b_file = 0
    offset_t b_offset = 0xffffffffffffffff
}

 

The macro devt enables us to obtain his minor and major numbers

 

> 0x20000000b0::devt
     MAJOR       MINOR
        32         176

 

With the numbers, it's easy to find the name of device (There is a lot of differents methods to obtain the device). For exemple :

 

# cd /devices
# ls -Rl > /tmp/bruno
# view /tmp/bruno

[...]
/ssm@0,0/pci@18,600000/pci@1/scsi@2
[...]
brw-r-----   1 root     sys       32, 176 Mar 23 18:46 sd@9,0:a
[...] 

# grep "/ssm@0,0/pci@18,600000/pci@1/scsi@2/sd@9" /etc/path_to_inst
"/ssm@0,0/pci@18,600000/pci@1/scsi@2/sd@9,0" 22 "sd"

# ls -l /dev/dsk | grep "/ssm@0,0/pci@18,600000/pci@1/scsi@2/sd@9"
lrwxrwxrwx  1 root  root  57 Aug 22 2007 c2t9d0s0 -> ../../devices/ssm@0,0/pci@18,600000/pci@1/scsi@2/sd@9,0:a
lrwxrwxrwx  1 root  root  57 Aug 22 2007 c2t9d0s1 -> ../../devices/ssm@0,0/pci@18,600000/pci@1/scsi@2/sd@9,0:b
lrwxrwxrwx  1 root  root  57 Aug 22 2007 c2t9d0s2 -> ../../devices/ssm@0,0/pci@18,600000/pci@1/scsi@2/sd@9,0:c
lrwxrwxrwx  1 root  root  57 Aug 22 2007 c2t9d0s3 -> ../../devices/ssm@0,0/pci@18,600000/pci@1/scsi@2/sd@9,0:d
lrwxrwxrwx  1 root  root  57 Aug 22 2007 c2t9d0s4 -> ../../devices/ssm@0,0/pci@18,600000/pci@1/scsi@2/sd@9,0:e
lrwxrwxrwx  1 root  root  57 Aug 22 2007 c2t9d0s5 -> ../../devices/ssm@0,0/pci@18,600000/pci@1/scsi@2/sd@9,0:f
lrwxrwxrwx  1 root  root  57 Aug 22 2007 c2t9d0s6 -> ../../devices/ssm@0,0/pci@18,600000/pci@1/scsi@2/sd@9,0:g
lrwxrwxrwx  1 root  root  57 Aug 22 2007 c2t9d0s7 -> ../../devices/ssm@0,0/pci@18,600000/pci@1/scsi@2/sd@9,0:h

 

The unix device is c2t9d0. Impossible to obtain information on this device (format, prtvtoc, ... do not respond). If your production system is correctly configured, Explorer runs every week. This is the moment to use it.

 

# cd /opt/SUNWexplo/output/explorer.84549349.server-2013.03.16.23.05/disks
# grep c2t9d0 diskinfo
c2t9d0      SEAGATE    ST314655LSUN146G     0491 0719S16ZB2 primary

 

With the type of device determined, all we have to do is open a ticket with support and change it.

 

In conclusion, the breakdown of the device not having been brutal, a ioctl has remained suspended. As a result all SVM commands remained suspended. Using mdb, it was easy to diagnose the problem and solve it quickly.

 

Partager cet article
23 février 2013 6 23 /02 /février /2013 21:37

 

Petite observation lors d’une recette cluster d’un Oracle RAC 11gR2 sur Solaris 10 Sparc. Pendant le test suivant « perte d’une baie SAN », nous avons observé un petit problème lors de la resynchronisation des diskgroups ASM utilisant des volumes sous ACFS. Nous nous attendions à utiliser la fonctionnalité fastresync et pourtant…

 

Observons tout d’abord le contexte initial : « perte d'une baie SAN sur les deux nodes du clusters Oracle RAC ».

 

Perte de la baie sur les deux nodes du cluster RAC (observation système).

 

Feb 12 18:08:55 rac01 fctl: [ID 517869 kern.warning] WARNING: fp(6)::GPN_ID for D_ID=150300 failed
Feb 12 18:08:55 rac01 fctl: [ID 517869 kern.warning] WARNING: fp(6)::N_x Port with D_ID=150300, PWWN=5000097408231998 disappeared from fabric
Feb 12 18:09:52 rac01 fctl: [ID 517869 kern.warning] WARNING: fp(5)::GPN_ID for D_ID=b0300 failed
Feb 12 18:09:52 rac01 fctl: [ID 517869 kern.warning] WARNING: fp(5)::N_x Port with D_ID=b0300, PWWN=50000974082319a4 disappeared from fabric
[…]

 

Feb 12 18:08:55 rac02 fctl: [ID 517869 kern.warning] WARNING: fp(5)::GPN_ID for D_ID=150300 failed
Feb 12 18:08:55 rac02 fctl: [ID 517869 kern.warning] WARNING: fp(5)::N_x Port with D_ID=150300, PWWN=5000097408231998 disappeared from fabric
Feb 12 18:09:52 rac02 fctl: [ID 517869 kern.warning] WARNING: fp(6)::GPN_ID for D_ID=b0300 failed
Feb 12 18:09:52 rac02 fctl: [ID 517869 kern.warning] WARNING: fp(6)::N_x Port with D_ID=b0300, PWWN=50000974082319a4 disappeared from fabric
[…]

 

 

Confirmation des  « path dead » sous powerpath (observation multiptah).

 

rac01 # /etc/powermt display
Symmetrix logical device count=30
CLARiiON logical device count=0
Hitachi logical device count=0
Invista logical device count=0
HP xp logical device count=0
Ess logical device count=0
HP HSx logical device count=0

==============================================================================

----- Host Bus Adapters ---------  ------ I/O Paths -----  ------ Stats ------
###  HW Path                       Summary   Total   Dead  IO/Sec Q-IOs Errors

==============================================================================
3077 pci@11,700000/SUNW,qlc@0/fp@0,0 degraded    30     14      -     0     16
3078 pci@1,700000/SUNW,qlc@0/fp@0,0  degraded    30     14      -     0     20

 

rac02 # /etc/powermt display
Symmetrix logical device count=30
CLARiiON logical device count=0
Hitachi logical device count=0
Invista logical device count=0
HP xp logical device count=0
Ess logical device count=0
HP HSx logical device count=0

==============================================================================

----- Host Bus Adapters ---------  ------ I/O Paths -----  ------ Stats ------
###  HW Path                       Summary   Total   Dead  IO/Sec Q-IOs Errors

==============================================================================
3077 pci@1,700000/SUNW,qlc@0/fp@0,0  degraded    30     14      -     0     17
3078 pci@11,700000/SUNW,qlc@0/fp@0,0 degraded    30     14      -     0     17

 

 

Etat des diskgroups ASM (observation couche ASM).

 

$ asmcmd lsdsk -pk
 Name           Failgroup     … Mount_Stat … Mode_Stat State  Path
 DATA_DG1_0000  DATA_DG1_FG1  … MISSING    …   OFFLINE NORMAL 
 DATA_DG1_0001  DATA_DG1_FG1  … MISSING    …   OFFLINE NORMAL 
 DUMP_DG_0002   DUMP_DG_FG1   … MISSING    …   OFFLINE NORMAL 
 DUMP_DG_0009   DUMP_DG_FG1   … MISSING    …   OFFLINE NORMAL 
 SYSTEMDG_0000  SYSTEMDG_0000 … MISSING    …   OFFLINE NORMAL 
 DUMP_DG_0000   DUMP_DG_FG1   … MISSING    …   OFFLINE NORMAL 
 DUMP_DG_0001   DUMP_DG_FG1   … MISSING    …   OFFLINE NORMAL 
 DUMP_DG_0008   DUMP_DG_FG1   … MISSING    …   OFFLINE NORMAL 
 DUMP_DG_0003   DUMP_DG_FG1   … MISSING    …   OFFLINE NORMAL 
 REDO_DG1_0000  REDO_DG1_FG1  … MISSING    …   OFFLINE NORMAL 
 FLASH_DG1_0000 FLASH_DG1_FG1 … MISSING    …   OFFLINE NORMAL 
 DATA_DG1_0002  DATA_DG1_FG2  … CACHED     …   ONLINE  NORMAL  …
 DATA_DG1_0003  DATA_DG1_FG2  … CACHED     …   ONLINE  NORMAL  …
 DUMP_DG_0004   DUMP_DG_FG2   … CACHED     …   ONLINE  NORMAL  …
 DUMP_DG_0005   DUMP_DG_FG2   … CACHED     …   ONLINE  NORMAL  …
 DUMP_DG_0006   DUMP_DG_FG2   … CACHED     …   ONLINE  NORMAL  …
 DUMP_DG_0007   DUMP_DG_FG2   … CACHED     …   ONLINE  NORMAL  …
 DUMP_DG_0010   DUMP_DG_FG2   … CACHED     …   ONLINE  NORMAL  …
 DUMP_DG_0011   DUMP_DG_FG2   … CACHED     …   ONLINE  NORMAL  …
 FLASH_DG1_0001 FLASH_DG1_FG2 … CACHED     …   ONLINE  NORMAL  …
 REDO_DG1_0001  REDO_DG1_FG2  … CACHED     …   ONLINE  NORMAL  …
 SYSTEMDG_0004  SYSTEMDG_0004 … CACHED     …   ONLINE  NORMAL  …
 SYSTEMDG_0001  SYSTEMDG_000  … CACHED     …   ONLINE  NORMAL  …

 

 

La perte de la baie est confirmée par les différentes couches (serveurs, multipath, lvm). Le cluster RAC lui fonctionne correctement (un peu normal). Le test est concluant, passons maintenant à la « redécouverte » de la baie et la resynchronisation (fastresync) des diskgroups ASM.

 

Baie online sur les deux nodes du cluster RAC (observation système).

 

Feb 12 18:23:16 rac01 fctl: [ID 517869 kern.warning] WARNING: fp(6)::N_x Port with D_ID=150300, PWWN=5000097408231998 reappeared in fabric
Feb 12 18:23:34 rac01 fctl: [ID 517869 kern.warning] WARNING: fp(5)::N_x Port with D_ID=b0300, PWWN=50000974082319a4 reappeared in fabric
[…]

 

Feb 12 18:23:16 rac02 fctl: [ID 517869 kern.warning] WARNING: fp(5)::N_x Port with D_ID=150300, PWWN=5000097408231998 reappeared in fabric
Feb 12 18:23:34 rac02 fctl: [ID 517869 kern.warning] WARNING: fp(6)::N_x Port with D_ID=b0300, PWWN=50000974082319a4 reappeared in fabric
[…]

 

 

Confirmation des  « path dead » sous powerpath (observation multiptah)

 

rac01 # /etc/powermt display    
Symmetrix logical device count=30
CLARiiON logical device count=0
Hitachi logical device count=0
Invista logical device count=0
HP xp logical device count=0
Ess logical device count=0
HP HSx logical device count=0

==============================================================================

----- Host Bus Adapters ---------  ------ I/O Paths -----  ------ Stats ------
###  HW Path                       Summary   Total   Dead  IO/Sec Q-IOs Errors

==============================================================================
3077 pci@11,700000/SUNW,qlc@0/fp@0,0 optimal     30     0      -     0     16
3078 pci@1,700000/SUNW,qlc@0/fp@0,0  optimal     30     0      -     0     20

 

rac02 # /etc/powermt display    
Symmetrix logical device count=30
CLARiiON logical device count=0
Hitachi logical device count=0
Invista logical device count=0
HP xp logical device count=0
Ess logical device count=0
HP HSx logical device count=0

==============================================================================

----- Host Bus Adapters ---------  ------ I/O Paths -----  ------ Stats ------
###  HW Path                       Summary   Total   Dead  IO/Sec Q-IOs Errors

==============================================================================
3077 pci@1,700000/SUNW,qlc@0/fp@0,0  optimal     30     0      -     0     17
3078 pci@11,700000/SUNW,qlc@0/fp@0,0 optimal     30     0      -     0     17

 

 

La baie est de nouveau visible sur le serveur. Il est donc possible de relancer la resynchronisation des diskgroups ASM.

 

$  sqlplus / as sysasm

SQL*Plus: Release 11.2.0.3.0 Production on Tue Feb 12 18:23:36 2013
Copyright (c) 1982, 2011, Oracle.  All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options

SQL> alter diskgroup data_dg1 online all;
Diskgroup altered.
SQL> alter diskgroup systemdg online all;
Diskgroup altered.
SQL> alter diskgroup redo_dg1 online all;
Diskgroup altered.
SQL> alter diskgroup flash_dg1 online all;
Diskgroup altered.
SQL> alter diskgroup dump_dg online all;
Diskgroup altered.

SQL> set lines 120
SQL> select * from v$asm_operation;

GROUP_NUMBER OPERA STAT POWER ACTUAL SOFAR EST_WORK EST_RATE EST_MINUTES
------------ ----- ---- ----- ------ ----- -------- -------- -----------
ERROR_CODE
--------------------------------------------
           2 ONLIN RUN      1      0     0        0        0           0

 

 

La resynchronisation semble être terminée et pourtant.

 

$ asmcmd lsdsk -pk
 Name           Failgroup     … Mount_Stat … Mode_Stat State  Path
 DATA_DG1_0000  DATA_DG1_FG1  … CACHED     …   ONLINE  NORMAL 
 DATA_DG1_0001  DATA_DG1_FG1  … CACHED     …   ONLINE  NORMAL 
 DUMP_DG_0002   DUMP_DG_FG1   … CACHED     …   SYNCING NORMAL 
 DUMP_DG_0009   DUMP_DG_FG1   … CACHED     …   SYNCING NORMAL 
 SYSTEMDG_0000  SYSTEMDG_0000 … CACHED     …   ONLINE  NORMAL 
 DUMP_DG_0000   DUMP_DG_FG1   … CACHED     …   SYNCING NORMAL 
 DUMP_DG_0001   DUMP_DG_FG1   … CACHED     …   SYNCING NORMAL 
 DUMP_DG_0008   DUMP_DG_FG1   … CACHED     …   SYNCING NORMAL 
 DUMP_DG_0003   DUMP_DG_FG1   … CACHED     …   SYNCING NORMAL 
 REDO_DG1_0000  REDO_DG1_FG1  … CACHED     …   ONLINE  NORMAL 
 FLASH_DG1_0000 FLASH_DG1_FG1 … CACHED     …   ONLINE  NORMAL 
 DATA_DG1_0002  DATA_DG1_FG2  … CACHED     …   ONLINE  NORMAL  …
 DATA_DG1_0003  DATA_DG1_FG2  … CACHED     …   ONLINE  NORMAL  …
 DUMP_DG_0004   DUMP_DG_FG2   … CACHED     …   ONLINE  NORMAL  …
 DUMP_DG_0005   DUMP_DG_FG2   … CACHED     …   ONLINE  NORMAL  …
 DUMP_DG_0006   DUMP_DG_FG2   … CACHED     …   ONLINE  NORMAL  …
 DUMP_DG_0007   DUMP_DG_FG2   … CACHED     …   ONLINE  NORMAL  …
 DUMP_DG_0010   DUMP_DG_FG2   … CACHED     …   ONLINE  NORMAL  …
 DUMP_DG_0011   DUMP_DG_FG2   … CACHED     …   ONLINE  NORMAL  …
 FLASH_DG1_0001 FLASH_DG1_FG2 … CACHED     …   ONLINE  NORMAL  …
 REDO_DG1_0001  REDO_DG1_FG2  … CACHED     …   ONLINE  NORMAL  …
 SYSTEMDG_0004  SYSTEMDG_0004 … CACHED     …   ONLINE  NORMAL  …
 SYSTEMDG_0001  SYSTEMDG_000  … CACHED     …   ONLINE  NORMAL  … 

 

 

Bizarrement le diskgroup ASM dump_dg est encore en cours de synchronisation !? Que se passe-t-il ? Vu qu'il s'agit d'une recette, il n'y a eu aucune activité utilisateurs, ce qui implique aucune modification des données. La resynchronisation entre les deux baies doit être immédiate avec le fasresync. Fonctionne-t-il correctement ?

 

Regardons ce qui se passe dans les logs de l’ASM.

 

$ cd /oracle/base/diag/asm/+asm/+ASM1/trace
$ tail -f alert_+ASM1.log
Tue Feb 12 18:38:55 2013
NOTE: successfully read ACD block gn=2 blk=10752 via retry read
Errors in file /oracle/base/diag/asm/+asm/+ASM1/trace/+ASM1_lgwr_3524.trc:
ORA-15062: ASM disk is globally closed
NOTE: successfully read ACD block gn=2 blk=10752 via retry read
Errors in file /oracle/base/diag/asm/+asm/+ASM1/trace/+ASM1_lgwr_3524.trc:
ORA-15062: ASM disk is globally closed
NOTE: successfully read ACD block gn=2 blk=10752 via retry read
Errors in file /oracle/base/diag/asm/+asm/+ASM1/trace/+ASM1_lgwr_3524.trc:
ORA-15062: ASM disk is globally closed
NOTE: successfully read ACD block gn=2 blk=10752 via retry read
Errors in file /oracle/base/diag/asm/+asm/+ASM1/trace/+ASM1_lgwr_3524.trc:
ORA-15062: ASM disk is globally closed
[…]

 

$ cat /oracle/base/diag/asm/+asm/+ASM1/trace/+ASM1_lgwr_3524.trc
Trace file /oracle/base/diag/asm/+asm/+ASM1/trace/+ASM1_lgwr_3524.trc
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options
ORACLE_HOME = /oracle/product/11.2.0/grid
System name:    SunOS
Node name:      rac01
Release:        5.10
Version:        Generic_147440-26
Machine:        sun4u
Instance name: +ASM1
Redo thread mounted by this instance: 0 <none>
Oracle process number: 15
Unix process pid: 3524, image: oracle@rac01 (LGWR)

*** 2013-02-12 18:10:12.384
*** SESSION ID:(931.1) 2013-02-12 18:10:12.385
*** CLIENT ID:() 2013-02-12 18:10:12.385
*** SERVICE NAME:() 2013-02-12 18:10:12.385
*** MODULE NAME:() 2013-02-12 18:10:12.385
*** ACTION NAME:() 2013-02-12 18:10:12.385
[…]
*** 2013-02-12 18:10:20.264
NOTE: successfully read ACD block gn=1 blk=10752 via retry read
ORA-15062: ASM disk is globally closed
*** 2013-02-12 18:11:11.444
NOTE: successfully read ACD block gn=1 blk=10752 via retry read
ORA-15062: ASM disk is globally closed
NOTE: successfully read ACD block gn=2 blk=10752 via retry read
ORA-15062: ASM disk is globally closed
NOTE: successfully read ACD block gn=4 blk=10752 via retry read
ORA-15062: ASM disk is globally closed
NOTE: successfully read ACD block gn=5 blk=0 via retry read
ORA-15062: ASM disk is globally closed
[…]

 

D'après les logs Oracle, il semble toujours y avoir des opérations ASM. Vérifions les propriétés de ce diskgroup ASM, peut-être y-a-t-il des différences.

 

$ asmcmd
ASMCMD> lsattr -l -G dump_dg
Name                     Value      
access_control.enabled   FALSE      
access_control.umask     066        
au_size                  1048576    
cell.smart_scan_capable  FALSE      
compatible.advm          11.2.0.0.0 
compatible.asm           11.2.0.0.0 
compatible.rdbms         11.2.0.0.0 
disk_repair_time         24h        
sector_size              512

 

 

En la comparant à un autre diskgroup ASM, il n’y a aucune différence. La seule différence est l’utilisation du volume en ACFS. Cherchons différemment...

 

Lancement d’un débat entre les DBA Oracle et moi : pourquoi le diskgroup se synchronise toujours au bout de 20 minutes ?

  • Soit le diskgroup ASM n’utilise pas la fonctionnalité fastresync (si ACFS) ?
  • Soit un bug ASM – ACFS ?
  • Y-a-t-il encore des I/O ?

 

On va mettre tout le monde d’accord avec Dtrace ! Un petit on-liners Dtrace pour trouver le pid effectuant les demandes d’écriture.

 

# dtrace –n 'syscall::write:entry /execname == “oracle”/ \
{ @[pid] = count(); }’
^C
[…]
   3572       1080

 

Effectuons un petit oradebug sur ce PID

 

$ sqlplus / as sysasm

SQL*Plus: Release 11.2.0.3.0 Production on Tue Feb 12 18:23:36 2013
Copyright (c) 1982, 2011, Oracle.  All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options

SQL> oradebug setospid 3572
Oracle pid: 27, Unix process pid: 3572, image: oracle@rac01 (ASMB)
SQL> oradebug unlimit
Statement processed.
SQL> oradebug EVENT 10046 TRACE NAME CONTEXT FOREVER, LEVEL 12
Statement processed.
SQL> oradebug tracefile_name
/oracle/base/diag/asm/+asm/+ASM1/trace/+ASM1_asmb_3572.trc
SQL> ! tail -f /oracle/base/diag/asm/+asm/+ASM1/trace/+ASM1_asmb_3572.trc
WAIT #0: nam='ASM file metadata operation' ela= 62 msgop=4 locn=0 p3=0 obj#=-1 tim=443471434898
WAIT #0: nam='ASM file metadata operation' ela= 61 msgop=4 locn=0 p3=0 obj#=-1 tim=443471435027
WAIT #0: nam='ASM file metadata operation' ela= 109 msgop=4 locn=0 p3=0 obj#=-1 tim=443471435205
WAIT #0: nam='ASM file metadata operation' ela= 64 msgop=4 locn=0 p3=0 obj#=-1 tim=443471435338
WAIT #0: nam='ASM background timer' ela= 4175 p1=0 p2=0 p3=0 obj#=-1 tim=443471439572
WAIT #0: nam='ASM file metadata operation' ela= 1 msgop=0 locn=3 p3=0 obj#=-1 tim=443471439636
WAIT #0: nam='ASM file metadata operation' ela= 64 msgop=4 locn=0 p3=0 obj#=-1 tim=443471439751
WAIT #0: nam='ASM file metadata operation' ela= 61 msgop=4 locn=0 p3=0 obj#=-1 tim=443471439967
WAIT #0: nam='ASM file metadata operation' ela= 59 msgop=4 locn=0 p3=0 obj#=-1 tim=443471440102
WAIT #0: nam='ASM file metadata operation' ela= 62 msgop=4 locn=0 p3=0 obj#=-1 tim=443471440231
WAIT #0: nam='ASM background timer' ela= 81 p1=0 p2=0 p3=0 obj#=-1 tim=443471440388
WAIT #0: nam='ASM file metadata operation' ela= 1 msgop=0 locn=3 p3=0 obj#=-1 tim=443471440444
WAIT #0: nam='ASM file metadata operation' ela= 63 msgop=4 locn=0 p3=0 obj#=-1 tim=443471440552
WAIT #0: nam='ASM file metadata operation' ela= 60 msgop=4 locn=0 p3=0 obj#=-1 tim=443471440681
WAIT #0: nam='ASM file metadata operation' ela= 60 msgop=4 locn=0 p3=0 obj#=-1 tim=443471440801
WAIT #0: nam='ASM file metadata operation' ela= 60 msgop=4 locn=0 p3=0 obj#=-1 tim=443471440937
WAIT #0: nam='ASM background timer' ela= 74 p1=0 p2=0 p3=0 obj#=-1 tim=443471441070
WAIT #0: nam='ASM file metadata operation' ela= 0 msgop=0 locn=3 p3=0 obj#=-1 tim=443471441119
WAIT #0: nam='ASM file metadata operation' ela= 60 msgop=4 locn=0 p3=0 obj#=-1 tim=443471441223
WAIT #0: nam='ASM file metadata operation' ela= 60 msgop=4 locn=0 p3=0 obj#=-1 tim=443471441347
WAIT #0: nam='ASM file metadata operation' ela= 60 msgop=4 locn=0 p3=0 obj#=-1 tim=443471441472
WAIT #0: nam='ASM file metadata operation' ela= 76 msgop=4 locn=0 p3=0 obj#=-1 tim=443471441614
[…]

SQL> oradebug EVENT 10046 TRACE NAME CONTEXT OFF
Statement processed.

 

 

Nouvelle discussion : il s’agit d’opérations uniquement sur les metadas (synchronisation avec « fastresync »). Alors pourquoi le statut dans ASM est toujours en « SYNCING ». Il n’y a eu aucune mise jours pendant le test !!

 

$ asmcmd lsdsk –pk | grep SYNCING 
 DUMP_DG_0002   DUMP_DG_FG1   … CACHED     …   SYNCING NORMAL 
 DUMP_DG_0009   DUMP_DG_FG1   … CACHED     …   SYNCING NORMAL 
 DUMP_DG_0000   DUMP_DG_FG1   … CACHED     …   SYNCING NORMAL 
 DUMP_DG_0001   DUMP_DG_FG1   … CACHED     …   SYNCING NORMAL 
 DUMP_DG_0008   DUMP_DG_FG1   … CACHED     …   SYNCING NORMAL 
 DUMP_DG_0003   DUMP_DG_FG1   … CACHED     …   SYNCING NORMAL 

 

 

Nouvelle discussion : y-a-t-il vraiment des IO ? N'y a-t-il pas simplement un problème d'affichage du résultat de la commande ? Pour les DBA non, pour moi si. Vérifions.

 

$ asmcmd
ASMCMD> iostat --io -G dump_dg 5
Group_Name  Dsk_Name      Reads  Writes 
DUMP_DG     DUMP_DG_0000  456    67329  
DUMP_DG     DUMP_DG_0001  223    67612  
DUMP_DG     DUMP_DG_0002  223    68516  
DUMP_DG     DUMP_DG_0003  223    66326  
DUMP_DG     DUMP_DG_0004  66887  106845 
DUMP_DG     DUMP_DG_0005  59264  7561   
DUMP_DG     DUMP_DG_0006  58953  6996   
DUMP_DG     DUMP_DG_0007  73000  10251  
DUMP_DG     DUMP_DG_0008  223    66383  
DUMP_DG     DUMP_DG_0009  223    66820
DUMP_DG     DUMP_DG_0010  66421  8176   
DUMP_DG     DUMP_DG_0011  58860  8546   

 

Group_Name  Dsk_Name      Reads  Writes 
DUMP_DG     DUMP_DG_0000  0.00   32.20  
DUMP_DG     DUMP_DG_0001  0.00   43.80  
DUMP_DG     DUMP_DG_0002  0.00   32.60  
DUMP_DG     DUMP_DG_0003  0.00   35.20  
DUMP_DG     DUMP_DG_0004  35.20  0.20   
DUMP_DG     DUMP_DG_0005  30.40  1.00   
DUMP_DG     DUMP_DG_0006  30.40  0.00   
DUMP_DG     DUMP_DG_0007  28.80  11.40  
DUMP_DG     DUMP_DG_0008  0.00   39.00  
DUMP_DG     DUMP_DG_0009  0.00   27.40  
DUMP_DG     DUMP_DG_0010  27.40  0.20   
DUMP_DG     DUMP_DG_0011  33.60  11.80

 

 

Il semble que oui d'un point de vue Oracle. Et d’un point de vue système ?

 

# iostat –Mxnzt 5
[…]
    0.0  14.4   0.0  14.4  0.0  0.2  0.0  14.5   0   5 c1t5000097408231998d7
    0.0  26.4   0.0  14.5  0.0  0.2  0.0   8.5   0   6 c1t5000097408231998d6
    0.0  14.4   0.0  14.4  0.0  0.2  0.0  15.4   0   6 c1t5000097408231998d5
    0.0  16.2   0.0  16.0  0.0  0.2  0.0  15.0   0   6 c1t5000097408231998d4
   16.0  11.6  16.0   0.0  0.0  0.3  0.0  10.9   0   9 c1t500009740823ED98d7
   16.0   0.0  16.0   0.0  0.0  0.2  0.0  11.4   0   5 c1t500009740823ED98d6
   12.8   0.8  12.8   0.0  0.0  0.2  0.0  17.4   0   7 c1t500009740823ED98d5
   16.0   0.2  16.0   0.0  0.0  0.2  0.0  10.5   0   4 c1t500009740823ED98d4
   16.0  11.6  16.0   0.0  0.0  0.3  0.0  10.7   0   9 c3t500009740823EDA4d7
   16.0   0.0  16.0   0.0  0.0  0.2  0.0  11.4   0   5 c3t500009740823EDA4d6
   12.8   0.2  12.8   0.0  0.0  0.2  0.0  17.5   0   7 c3t500009740823EDA4d5
   16.0   0.2  16.0   0.0  0.0  0.2  0.0  10.4   0   4 c3t500009740823EDA4d4
    0.0  14.4   0.0  14.4  0.0  0.2  0.0  14.5   0   5 c3t50000974082319A4d7
    0.0  26.2   0.0  14.5  0.0  0.2  0.0   8.5   0   6 c3t50000974082319A4d6
    0.0  14.4   0.0  14.4  0.0  0.2  0.0  15.3   0   6 c3t50000974082319A4d5
    0.0  16.2   0.0  16.0  0.0  0.2  0.0  15.0   0   6 c3t50000974082319A4d4
   15.2   0.0  15.2   0.0  0.0  0.3  0.0  18.0   0   8 c1t500009740823ED98d13
   16.0   0.4  16.0   0.0  0.0  0.3  0.0  16.7   0   8 c1t500009740823ED98d12
   15.2   0.0  15.2   0.0  0.0  0.3  0.0  17.6   0   8 c3t500009740823EDA4d13
   16.4   0.0  16.0   0.0  0.0  0.3  0.0  17.9   0   9 c3t500009740823EDA4d12
    0.0  16.2   0.0  16.0  0.0  0.2  0.0  15.0   0   6 c3t50000974082319A4d13
    0.0  17.8   0.0  17.6  0.0  0.3  0.0  14.4   0   7 c3t50000974082319A4d12
    0.0  16.2   0.0  16.0  0.0  0.3  0.0  15.5   0   6 c1t5000097408231998d13
    0.0  17.8   0.0  17.6  0.0  0.3  0.0  14.9   0   7 c1t5000097408231998d12
[…]

 

 

Effectivement il y a bien des IO physiques. (pour information, les luns ci-dessus appartiennent bien au diskgroup ASM dump_dg).

 

Pourquoi la resynchronisation dure aussi longtemps alors que vue d'Oracle, on écrit uniquement les « metadatas » !? C’est vraiment bizarre. Vérifions avec Dtrace ce qui se passe vraiment lors des lectures et écritures.

 

# dtrace -n 'io:::start { @[pid] = count(); }'
dtrace: description 'io:::start ' matched 6 probes
^C 

    13029                4
    13184                5
    13242                8
     2637               27
    13031              161
    13763             1256

 

# dtrace -n 'io:::start /pid == 13763/ \
{ @["disk I/O operation", args[1]->dev_pathname, \
args[0]->b_flags & B_READ ? "R" : "W"] \
=  quantize(args[0]->b_lblkno); }’
dtrace: description 'io:::start ' matched 6 probes
^C  

  disk I/O operation           /devices/pseudo/emcp@96:g,blk         R                                                     

           value  ------------- Distribution ------------- count   
        33554432 |                                         0           
        67108864 |@                                        8           
       134217728 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  288     
       268435456 |                                         0                        

 

  disk I/O operation           /devices/pseudo/emcp@24:g,blk         W                                                    

           value  ------------- Distribution ------------- count   
        33554432 |                                         0           
        67108864 |@@@@                                     32      
       134217728 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@     280     
       268435456 |                                         0           

 

  disk I/O operation          /devices/pseudo/emcp@25:g,blk          W                                                    

         value  ------------- Distribution ------------- count   
        33554432 |                                         0           
        67108864 |@@@                                      24          
       134217728 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@    288     
       268435456 |                                         0           

 

  disk I/O operation         /devices/pseudo/emcp@94:g,blk           R                                                    

           value  ------------- Distribution ------------- count   
        33554432 |                                         0           
        67108864 |@                                        8           
       134217728 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  296     
       268435456 |                                         0           

 

  disk I/O operation         /devices/pseudo/emcp@26:g,blk           W                                                    
           value  ------------- Distribution ------------- count   
        33554432 |                                         0           
        67108864 |@@@                                      24          
       134217728 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@    296     
       268435456 |                                         0           

 

  disk I/O operation         /devices/pseudo/emcp@27:g,blk           W                                                     

           value  ------------- Distribution ------------- count    
        33554432 |                                         0            
        67108864 |@@@                                      24          
       134217728 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@    296     
       268435456 |                                         0        

 

  disk I/O operation        /devices/pseudo/emcp@613:g,blk           W                                                    

         value  ------------- Distribution ------------- count   
         67108864 |                                         0           
       134217728 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 312     
       268435456 |                                         0           

 

  disk I/O operation       /devices/pseudo/emcp@913:g,blk            R                                                    

           value  ------------- Distribution ------------- count   
        67108864 |                                         0           
       134217728 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 320     
       268435456 |                                         0           

 

  disk I/O operation      /devices/pseudo/emcp@97:g,blk              R                                                    

           value  ------------- Distribution ------------- count  
        33554432 |                                         0           
        67108864 |@@                                       16          
       134217728 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@   320     
       268435456 |                                         0           

 

  disk I/O operation     /devices/pseudo/emcp@95:g,blk               R                                                     

           value  ------------- Distribution ------------- count  
        33554432 |                                         0          
        67108864 |@                                        8           
       134217728 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  328     
       268435456 |                                         0               

 

  disk I/O operation     /devices/pseudo/emcp@912:g,blk               R                                                     

           value  ------------- Distribution ------------- count   
        67108864 |                                         0           
       134217728 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 344     
       268435456 |                                         0           

 

  disk I/O operation     /devices/pseudo/emcp@612:g,blk               W                                                    

           value  ------------- Distribution ------------- count   
        67108864 |                                         0           
       134217728 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 360     
       268435456 |                                         0           

 

 

On lit et on écrit toujours sur les mêmes blocks !!

 

Si nous résumons : l’utilisation du « fastresync » sur un diskgroup ASM ayant un ou des volume(s) ACFS semble poser problème : on synchronise bien les metadatas mais cette opération « boucle » beaucoup trop longtemps (plus d'une heure de resynchronisation alors qu'aucune modification a eu lieu). Un petit call chez Oracle pour avoir quelques précisons…

 

Je remercie David et Yahya (encore) pour notre analyse commune sur ce problème. “Hardware and software work together” : le slogan suivant s’applique particulièrement bien à ce contexte.

 

Partager cet article
10 février 2013 7 10 /02 /février /2013 20:14

 

Lors d'un précédent article, j'ai traité la mise en place d'un serveur AI personnalisé pour l'architecture Sparc (déploiement via Wanboot). Comme convenu, je vais traité ici la mise en place d'un serveur AI mais sur l'architecture x86. La différence entre ces deux architectures (d'un point vue installation) se situe principalement sur la phase d'initialisation juste avant le début de l'installation.

 

Sur une architecture x86, la phase d'initialisation est généralement exécutée par le couple pxe / dhcp. Il est donc nécessaire de configurer un serveur dhcp permettant d'interpréter la requête pxe que le client enverra. Il peut s'agir d'un serveur dédié ou mutualisé avec le serveur AI. Dans mon exemple ci-dessous, il n'y a qu'un serveur pour la configuraton dhcp et AI.

 

Un choix s'offre à nous concernant le type de serveur dhcp. Il est possible d'utiliser le serveur dhcp de l'ISC ou alors le serveur dhcp de Solaris. La configuration d'un serveur dhcp ISC est automatique si celui-ci se trouve sur le serveur AI. Toutefois, je préfére utiliser le serveur dhcp Solaris.

 

Il faut installer les package dhcp et ai sur le serveur d'installation depuis notre serveur de repos (pour créer les repos lire cet article). Ensuite il suffit d'initialiser le serveur dhcp avec les bonnes informations.

 

# pkg install install/installadm SUNWdhcs

 

# /usr/sbin/dhcpconfig -D -r SUNWfiles -p /var/dhcp
Created DHCP configuration file.
Created dhcptab.
Added "Locale" macro to dhcptab.
Added server macro to dhcptab - aiserver.
DHCP server started.

 

# dhcpconfig -N 192.168.10.0 -m 255.255.255.0 -t 192.168.10.1
Added network macro to dhcptab - 192.168.10.0.
Created network table. 

 

# pntadm -L
192.168.10.0

 

 

Une fois ces étapes effectuées, il faut initialiser le service d'installation pour les clients x86.

 

# installadm create-service –a i386
Warning: Service svc:/network/dns/multicast:default is not online.
   Installation services will not be advertised via multicast DNS.

 

Creating service from: pkg:/install-image/solaris-auto-install
OK to use subdir of /export/auto_install to store image? [y/N]: y
DOWNLOAD              PKGS         FILES    XFER (MB)   SPEED
Completed              1/1       514/514  292.3/292.3 11.1M/s

 

PHASE                                      ITEMS
Installing new actions                   661/661
Updating package state database             Done
Updating image state                        Done
Creating fast lookup database               Done
Reading search index                        Done
Updating search index                        1/1

 

Creating i386 service: solaris11_1-i386
Image path: /export/auto_install/solaris11_1-i386

 

Refreshing install services
Warning: mDNS registry of service solaris11_1-i386 could not be verified.

 

Creating default-i386 alias

 

Setting the default PXE bootfile(s) in the local DHCP configuration
to:
bios clients (arch 00:00):  default-i386/boot/grub/pxegrub2
uefi clients (arch 00:07):  default-i386/boot/grub/grub2netx64.efi

 

Unable to update the DHCP SMF service after reconfiguration: DHCP
server is in an unexpected state: action [enable] state [offline]

 

The install service has been created and the DHCP configuration has
been updated, however the DHCP SMF service requires attention. Please
see dhcpd(8) for further information.

 

Refreshing install services
Warning: mDNS registry of service default-i386 could not be verified.

 

 

Le service pour les clients x86 est maintenant disponible.

 

# installadm list -m

Service/Manifest Name  Status   Criteria
---------------------  ------   --------

default-i386
   orig_default        Default  None

solaris11_1-i386
   orig_default        Default  None

 

 

Concernant la personnalisation, je vous renvoie au précédent article pour plus de détails. On crée un manifest spécifique en utilisant les commandes suivantes.

 

# installadm export --service solaris11_1-i386 \
--manifest orig_default \

--output /export/auto_install/manifests/sol11.1-i386-001
# vi /export/auto_install/manifests/sol11.1-i386-001
# installadm create-manifest \
-f /export/auto_install/manifests/sol11.1-i386-001 \

-n solaris11_1-i386 -m sol11.1-i386-001 -d

 

 

En cas d'autre modification sur ce manifest, on utilise les commandes suivantes.

 

# vi /export/auto_install/manifests/sol11.1-i386-001
# installadm update-manifest \
-f /export/auto_install/manifests/sol11.1-i386-001 \

-n solaris11_1-i386 -m sol11.1-i386-001

 

 

Pour éviter de garder le service et le manifest par défaut, on nettoie un peu la configuration.

 

# installadm delete-service default-i386
# installadm delete-manifest -n solaris11_1-i386 -m orig_default

 

 

On passe maintenant à la création du profile pour un client donné.

 

# sysconfig create-profile -o /export/auto_install/ref/profile.xml
# cd /export/auto_install/ref
# cp profile.xml ../clients/i386-01.xml
# vi /export/auto_install/clients/i386-01.xml

 

# installadm create-profile \
-f /export/auto_install/clients/i386-01.xml \
-n solaris11_1-i386 \

-p i386-01 -c mac="00:xx:xx:xx:xx:04"

 

 

Lors de la création du client, j'initialise la redirection série ainsi que le mode debug (connexion ssh distante pendant l'installation). Pour plus de détails sur la redirection série je vous invite à lire cet autre article.

 

# installadm create-client -e 00xxxxxxxx04 -n solaris11_1-i386 \
-b console=ttya,livessh=enable,install_debug=enable

Warning: Service svc:/network/dns/multicast:default is not online.
   Installation services will not be advertised via multicast DNS.
Adding host entry for 00:xx:xx:xx:xx:04 to local DHCP configuration.

 

Local DHCP configuration complete, but the DHCP server SMF service is
offline. To enable the changes made, enable:
svc:/network/dhcp/server:ipv4.
Please see svcadm(1M) for further information.

 

 

La configuration du serveur AI est terminée et un client a été généré (profile spécifique).

 

# installadm list -c -p -m

 
Service Name      Client Address    Arch   Image Path
------------      --------------    ----   ----------
solaris11_1-i386 00:xx:xx:xx:xx:04  i386  /export/auto_install/solaris11_1-i386

 

Service/Manifest Name  Status   Criteria
---------------------  ------   --------

solaris11_1-i386
   sol11.1-i386-001   Default  None 

 

Service/Profile Name  Criteria
--------------------  --------

solaris11_1-i386
   i386-01      mac = 00:xx:xx:xx:xx:04

 

 

Reste la configuration dhcp pour ce client.

 

# pntadm -A 192.168.10.123 -i 0100xxxxxxxx04 \
-m 0100xxxxxxxx04 -f "PERMANENT+MANUAL" 192.168.10.0

 

# pntadm -P 192.168.10.0 | grep 0100xxxxxxxx04
0100xxxxxxxx04  03  192.168.10.5  192.168.10.123   Zero   0100xxxxxxxx04

 

# dhtadm -g -A -m 0100xxxxxxxx04 -d \
":Include=`uname -n`:BootSrvA=192.168.10.5:BootFile=0100xxxxxxxx04:"

 

 

L'installation du client peut donc commencer. Depuis l'ILO de ce client x86, on sélectionne notre carte réseau comme périphérique de boot puis dane le menu de grub on sélectionne le choix 2 pour lancer l'installation. 

 

SunOS Release 5.11 Version 11.1 64-bit
Copyright (c) 1983, 2012, Oracle and/or its affiliates. All rights reserved.
Remounting root read/write
Probing for device nodes ...
Preparing network image for use

 

Downloading solaris.zlib
--2013-01-30 20:51:33--  http://192.168.10.5:5555//export/auto_install/solaris11_1-i386/solaris.zlib
Connecting to 192.168.10.5:5555... connected.
HTTP request sent, awaiting response... 200 OK
Length: 135808512 (130M) [text/plain]
Saving to: `/tmp/solaris.zlib'

100%[======================================>] 135,808,512 57.3M/s   in 2.3s   

2013-01-30 20:51:35 (57.3 MB/s) - `/tmp/solaris.zlib' saved [135808512/135808512]

 

Downloading solarismisc.zlib
--2013-01-30 20:51:35--  http://192.168.10.5:5555//export/auto_install/solaris11_1-i386/solarismisc.zlib
Connecting to 192.168.10.5:5555... connected.
HTTP request sent, awaiting response... 200 OK
Length: 11935744 (11M) [text/plain]
Saving to: `/tmp/solarismisc.zlib' 

100%[======================================>] 11,935,744  58.3M/s   in 0.2s   

2013-01-30 20:51:36 (58.3 MB/s) - `/tmp/solarismisc.zlib' saved [11935744/11935744]

 

Downloading .image_info
--2013-01-30 20:51:36--  http://192.168.10.5:5555//export/auto_install/solaris11_1-i386/.image_info
Connecting to 192.168.10.5.:5555... connected.
HTTP request sent, awaiting response... 200 OK
Length: 228 [text/plain]
Saving to: `/tmp/.image_info'

100%[======================================>] 228         --.-K/s   in 0s     

2013-01-30 20:51:36 (19.5 MB/s) - `/tmp/.image_info' saved [228/228]

 

Done mounting image
Configuring devices.
Hostname: i386-01
Setting debug mode to enable
Service discovery phase initiated
Service name to look up: solaris11_1-i386
Service discovery over multicast DNS failed
Service solaris11_1-i386 located at 192.168.10.5:5555 will be used
Service discovery finished successfully
Process of obtaining install manifest initiated
Using the install manifest obtained via service discovery

 

i386-01 console login:
Automated Installation started
The progress of the Automated Installation will be output to the console
Detailed logging is in the logfile at /system/volatile/install_log

 

Press RETURN to get a login prompt at any time.

 

Installer will be run in debug mode
20:52:02    Using XML Manifest: /system/volatile/ai.xml
20:52:02    Using profile specification: /system/volatile/profile
20:52:02    Using service list file: /var/run/service_list
20:52:02    Starting installation.
20:52:02    0% Preparing for Installation
20:52:03    100% manifest-parser completed.
20:52:03    0% Preparing for Installation
20:52:03    1% Preparing for Installation
20:52:03    2% Preparing for Installation
20:52:03    4% Preparing for Installation
20:52:07    6% target-discovery completed.
20:52:07    Selected Disk(s) : c8t0d0
20:52:07    10% target-selection completed.
20:52:07    12% ai-configuration completed.
20:52:07    14% var-share-dataset completed.
20:52:30    16% Beginning IPS transfer
20:52:30    Creating IPS image
20:52:34     Startup: Retrieving catalog 'solaris' ... Done
20:52:36     Startup: Caching catalogs ... Done
20:52:37     Startup: Refreshing catalog 'site' ... Done
20:52:37     Startup: Refreshing catalog 'solaris' ... Done
20:52:40     Startup: Caching catalogs ... Done
20:52:40    Installing packages from:
20:52:40        solaris
20:52:40            origin:  http://192.168.10.5:8000/
20:52:40        site
20:52:40            origin:  http://192.168.10.5:8001/
20:52:41     Startup: Refreshing catalog 'site' ... Done
20:52:41     Startup: Refreshing catalog 'solaris' ... Done
20:52:44    Planning: Solver setup ... Done
20:52:45    Planning: Running solver ... Done
20:52:45    Planning: Finding local manifests ... Done
20:52:45    Planning: Fetching manifests:   0/408  0% complete
20:52:53    Planning: Fetching manifests: 100/408  24% complete
[…]
20:53:11    Planning: Fetching manifests: 408/408  100% complete
20:53:22    Planning: Package planning ... Done
20:53:23    Planning: Merging actions ... Done
20:53:26    Planning: Checking for conflicting actions ... Done
20:53:28    Planning: Consolidating action changes ... Done
20:53:30    Planning: Evaluating mediators ... Done
20:53:33    Planning: Planning completed in 52.04 seconds
20:53:33    Please review the licenses for the following packages post-install:
20:53:33      runtime/java/jre-7                       (automatically accepted)
20:53:33      consolidation/osnet/osnet-incorporation  (automatically accepted,
20:53:33                                                not displayed)
20:53:33    Package licenses may be viewed using the command:
20:53:33      pkg info --license <pkg_fmri>
20:53:34    Download:     0/60319 items    0.0/822.8MB  0% complete
[…]
21:00:44    Download: 60010/60319 items  822.0/822.8MB  99% complete (650k/s)
21:00:45    Download: Completed 822.79 MB in 431.69 seconds (1.9M/s)
21:01:00     Actions:     1/85295 actions (Installing new actions)
21:01:01    16% Transferring contents
21:01:01    19% Transferring contents
21:01:05     Actions: 13914/85295 actions (Installing new actions)
21:01:06    45% Transferring contents
21:01:10     Actions: 18060/85295 actions (Installing new actions)
21:01:15     Actions: 18534/85295 actions (Installing new actions)
[…]
21:09:55     Actions: 83977/85295 actions (Installing new actions)
21:10:00     Actions: 84781/85295 actions (Installing new actions)
21:10:01     Actions: Completed 85295 actions in 540.82 seconds.
21:10:01    Finalize: Updating package state database ...  Done
21:10:03    Finalize: Updating image state ...  Done
21:10:15    Finalize: Creating fast lookup database ...  Done
21:10:25    Version mismatch:
21:10:25    Installer build version: pkg://solaris/entire@0.5.11,5.11-0.175.1.0.0.24.2:20120919T190135Z
21:10:25    Target build version: pkg://solaris/entire@0.5.11,5.11-0.175.1.1.0.4.0:20121106T001344Z
21:10:25    46% initialize-smf completed.
21:10:27    Setting console boot device property to ttya
21:10:27    Disabling boot loader graphical splash
21:10:27    Installing boot loader to devices: ['/dev/rdsk/c8t0d0s1']
21:10:32    Setting boot devices in firmware
21:10:32    54% boot-configuration completed.
21:10:32    55% update-dump-adm completed.
21:10:32    57% setup-swap completed.
21:10:32    58% device-config completed.
21:10:33    60% apply-sysconfig completed.
21:10:33    61% transfer-zpool-cache completed.
21:10:51    90% boot-archive completed.
21:10:51    92% transfer-ai-files completed.
21:10:52    99% create-snapshot completed.
21:10:52    Automated Installation succeeded.
21:10:52    System will be rebooted now
Automated Installation finished successfully
Auto reboot enabled. The system will be rebooted now
Log files will be available in /var/log/install/ after reboot
Jan 30 21:10:56 i386-01 reboot: initiated by root
WARNING: Fast reboot is not supported on this platform since some BIOS routines are in RAM
syncing file systems... done
rebooting...

 

Plus d'excuse maintenant, vous pouvez installer un serveur AI pour déployer aussi bien des serveurs Sparc que des serveurs i386.

 

Partager cet article