Solaris Boot Environment And Lost Changes

Maybe you have similarly been perplexed like me when you are missing a file after a new boot environment(BE) was activated.

In my example I know I created a file in my /root/Desktop folder and after pkg update and the resulting new BE was automatically activated and I rebooted I noticed my file was missing.

This can be quite nasty if you have made system changes and not noticed they all were lost. For example updating /etc/hosts etc…

Sequence that is faulty

After OS initial installation and before updates (pkg update)
Save a text file /root/Desktop/20181206-pkg-update
Current BE is openindiana

Do updates and reboot
Current BE is openindiana-1
File is gone

Sequence that works

Save a text file /root/Desktop/20181206-pkg-update
beadm create new BE and activate
reboot immediately
pkg update

Question:

Is there an easier way to update to avoid this problem? Maybe passing parms? 

Read here also….
https://blog.teodeh.com/2013/02/25/solaris_11_patching_and_updates/

Tip: If just restoring a file you can just use beadm mount to find a specific file for retrieval or comparison with existing files.

Solaris DLMP Test VLAN

As usual use at own risk!

This may not apply to many people but I have an instance where we use DLMP which I really like. However one weakness I have in my environment if someone screw up a VLAN port configuration on a switch you may have serious networking issues and not understand what is happening. DLMP is not going to disable a port in an aggregation because there is link and it does not know if VLAN’s are set or not. So I wrote a quick script I may play with more later to improve but this already helps with speeding up testing. Time is something you may not have a lot of when you have intermittent and very odd network behavior.

And of course you will need to take these ports you want to test out of the aggregation for testing and add back in if you need to.

Let me know if anyone find a better way for example DLMP built-in test or testing at a lower level in the networking model etc.

# cat check_vlans.py 
#!/usr/bin/python
import subprocess,re,sys

vlans={}
vlans[1915] = {'IP': '10.2.12.230/24', 'name': 'DR'}
vlans[1953] = {'IP': '10.2.13.230/24', 'name': 'PPE'}
vlans[1954] = {'IP': '10.2.16.230/23', 'name': 'TST'}
vlans[1912] = {'IP': '10.2.14.230/23', 'name': 'DEV'}
vlans[1913] = {'IP': '10.2.10.230/24', 'name': 'MGMT'}

def get_aggr_nets():
  ls_lines = subprocess.Popen(['dladm', 'show-link','xgaggr1'], stdout=subprocess.PIPE).communicate()[0].splitlines()
  line = re.sub(' +',' ',ls_lines[1])
  props = line.split(' ')
  print "LINK: {} CLASS: {} MTU: {} STATE: {} NETS: ".format(props[0],props[1],props[2],props[3]),
  print props[4:]

def vlan_test(nets):
  for net in nets:
      result = subprocess.Popen(['dladm', 'create-vlan','-l',str(net),'-v','1912','vlan1'], stdout=subprocess.PIPE).communicate()[0].splitlines()

      for i,v in vlans.iteritems() :
        print "Testing interface: {} over vlan id: {} {} using IP: {} Result: ".format(net,i,v['name'],v['IP']),

        result = subprocess.Popen(['dladm', 'modify-vlan','-v',str(i),'vlan1'], stdout=subprocess.PIPE).communicate()[0].splitlines()

        result = subprocess.Popen(['ipadm', 'create-ip','vlan1'], stdout=subprocess.PIPE).communicate()[0].splitlines()

        result = subprocess.Popen(['ipadm', 'create-addr','-T','static','-a',v['IP'],'vlan1/v4'], stdout=subprocess.PIPE).communicate()[0].splitlines()

        subnet=v['IP'].split('.')
        gateway='10.2.' + str(subnet[2]) + '.1'

        #result = subprocess.Popen(['ping', '-i','vlan1',gateway], stdout=subprocess.PIPE).communicate()[0].splitlines()
        result = subprocess.Popen(['ping', '-i','vlan1',gateway,'57','1'], stdout=subprocess.PIPE).communicate()[0].splitlines()
        print result

        result = subprocess.Popen(['ipadm', 'delete-ip','vlan1'], stdout=subprocess.PIPE).communicate()[0].splitlines()


      result = subprocess.Popen(['dladm', 'delete-vlan','vlan1'], stdout=subprocess.PIPE).communicate()[0].splitlines()

print "\n\nShow nets in xgaggr1:"
get_aggr_nets()
print "\n\nTesting net: " + sys.argv[1]
test_nets = [sys.argv[1]]
print test_nets
vlan_test(test_nets)

Example run.

root@usli-psvm-ld01 # python check_vlans.py net3


Show nets in xgaggr1:
LINK: xgaggr1 CLASS: aggr MTU: 1500 STATE: up NETS:  ['net1']


Testing net: net3
['net3']
Testing interface: net3 over vlan id: 1912 DEV using IP: 10.2.14.230/23 Result:  ['no answer from 10.2.14.1']
Testing interface: net3 over vlan id: 1953 PPE using IP: 10.2.13.230/24 Result:  ['no answer from 10.2.13.1']
Testing interface: net3 over vlan id: 1954 TST using IP: 10.2.16.230/23 Result:  ['no answer from 10.2.16.1']
Testing interface: net3 over vlan id: 1915 DR using IP: 10.2.12.230/24 Result:  ['no answer from 10.2.12.1']
Testing interface: net3 over vlan id: 1913 MGMT using IP: 10.2.10.230/24 Result:  ['no answer from 10.2.10.1']

root@usli-psvm-ld01 # python check_vlans.py net0


Show nets in xgaggr1:
LINK: xgaggr1 CLASS: aggr MTU: 1500 STATE: up NETS:  ['net1']


Testing net: net0
['net0']
Testing interface: net0 over vlan id: 1912 DEV using IP: 10.2.14.230/23 Result:  ['10.2.14.1 is alive']
Testing interface: net0 over vlan id: 1953 PPE using IP: 10.2.13.230/24 Result:  ['10.2.13.1 is alive']
Testing interface: net0 over vlan id: 1954 TST using IP: 10.2.16.230/23 Result:  ['10.2.16.1 is alive']
Testing interface: net0 over vlan id: 1915 DR using IP: 10.2.12.230/24 Result:  ['10.2.12.1 is alive']
Testing interface: net0 over vlan id: 1913 MGMT using IP: 10.2.10.230/24 Result:  ['10.2.10.1 is alive']

Solaris lp printer queue job ids

If you have a Unix queue name that is long, your job id’s may be cut off in the list. So you will be trying to troubleshoot/cancel jobs with “not-found” messages.

lpstat output. Note all job id’s cut off…

printer company_check_M402n now printing company_check_M402n-19101. enabled since Wed Dec 28 05:54:55 2016. available.
[..]
company_check_M402n-191 ebsuser_a         1165   Dec 27 15:36

Correct job id’s shown with a short script. Script below is self explanatory:

~/scripts# python check_spool.py 
Listing LP spool job id's
company_check_M402n-19104

# cat check_spool.py 
from os import listdir,path
from os.path import isfile, join
print "Listing LP spool job id's"
spoolpath='/var/spool/lp/requests/localhost/'
onlyfiles = [f for f in listdir(spoolpath) if isfile(join(spoolpath, f))]
for f in onlyfiles:
  fname = path.abspath(spoolpath + f)
  with open(fname) as spoolfile:
    lines = spoolfile.readlines()
    print lines[0].strip()

Solaris SFTP Containment Multiple Nodes

Previous post explaining SFTP containment: http://blog.ls-al.com/sftp-containment-solaris-10/

That solution does not work in a clustered environment. Since then I did also play with loop back (LOFS in Solaris) mounts to a NFS folder. That also works but it had issues being in the vfstab at boot time.

Below is my final solution:
– Since i am trying to avoid number of mounts I also used autofs in this case.
– Create a NFS share INTERFACES so we can share across multiple nodes.
– In order to not add more mounts I did this with autofs. If that does not work on bootup we can can just make a permanent /etc/vfstab mount.
– In our case the application use the following logical path so we need a soft link to our containment area. Soft link svcaccxfr -> /opt/interfaces/svcaccxfr/ in application tree.

Make direct automount
# grep direct /etc/auto_master
/- auto_direct -ro
# cat /etc/auto_direct
/opt/interfaces -rw,vers=3 10.2.13.35:/export/INTERFACES

# svcadm refresh autofs
# svcadm restart autofs

Ensure match in sshd correct folder
# tail -10 /etc/ssh/sshd_config
Match User svcxfr
ChrootDirectory /opt/interfaces/svcxfr
AllowTCPForwarding no
X11Forwarding no
ForceCommand internal-sftp -u 017 -l info

Folders and permissions
# cd /opt
# ls -l | grep interfaces
drwxr-xr-x 3 root root 3 Dec 21 14:12 interfaces
# cd interfaces/
# ls -l | grep svcxfr
drwxr-xr-x 3 root root 3 Dec 21 14:13 svcxfr
# ls -l | grep svcxfr/uploads
# cd svcxfr/
# ls -l | grep uploads
drwxrwxr-x 2 ebsppe_a ebsppe 3 Dec 21 14:50 uploads

Check soft link
# cd /apps/ebs11i/appltop/xxnp/11.5.0/interfaces
# ls -l | grep interfaces
lrwxrwxrwx 1 root root 26 Dec 21 14:14 svcxfr -> /opt/interfaces/svcxfr/

Test client
$ sftp svcxfr@server1
Password:
Connected to server1.
sftp> dir
uploads
sftp> cd uploads
sftp> put zfsrest_test1.py
Uploading zfsrest_test1.py to /uploads/zfsrest_test1.py
zfsrest_test1.py 100% 1934 1.9KB/s 00:00
sftp> exit

Can check sftp issues here.

For example sftp containment does not work if root does not own top levels.
# tail -f /var/log/authlog
Dec 21 14:49:48 server1 sshd[12790]: [ID 800047 auth.info] Accepted keyboard-interactive for svcxfr from 192.168.38.104 port 39788 ssh2
Dec 21 14:49:49 server1 sshd[12790]: [ID 800047 auth.info] subsystem request for sftp
Dec 21 14:50:04 server1 sshd[12790]: [ID 800047 auth.info] Received disconnect from 192.168.38.104: 11: disconnected by user

SFTP Containment Solaris 10

Using the SSH match directive it is possible to contain a user to an isolated folder.

This article is how to get this done on Solaris 10. Of course using a more up to date version of Solaris is preferable but in this case Solaris 10 is required for the application workload.

Your mileage may vary and you could probably simplify this slightly. For us our /apps tree can’t be owned by root and also we have several apps nodes so we did it this way so all apps nodes see the uploaded files.

For containing end users to an isolated folder the following must be true.

1. SSH version new enough to allow “match” configs. Solaris 10 needs patching for new enough SSHD.

2. In our case SFTP containment to a path under our /apps tree is not possible since the top level need to be root user owned.

3. To accommodate above we create /opt/svcaccxfr and then lofs/bind mount /opt/svcaccxfr -> /apps/ebs11i/appltop/xxnp/11.5.0/interfaces/svcaccxfr

4. Ensure the permissions is correct under the svcaccxfr folder. The uploads folder need to be set correct for user and group and chowned 775. In our case this was set from a DB node which mounts the whole /apps folder as NFSv3. When /apps is NSFv4 like we use on the apps nodes you may have issues setting perms.

5. We also needed to se an exception in our clone process to flag /apps/ebs11i/appltop/xxnp/11.5.0/interfaces/svcaccxfr as root:root. Our clone process was setting the whole /apps recursively to the apps user and group. root ownership is a requirement for SFTP match.

# ssh -V
Sun_SSH_1.1.7, SSH protocols 1.5/2.0, OpenSSL 0x1000113f

# grep svcaccxfr /etc/passwd 
svcaccxfr:x:403:340:Accounting xfr sftp account:/opt/svcaccxfr:/bin/false

# tail -10 /etc/ssh/sshd_config
Match User svcaccxfr
  #ChrootDirectory /apps/ebs11i/appltop/xxnp/11.5.0/interfaces/svcaccxfr
  ChrootDirectory /opt/svcaccxfr
  AllowTCPForwarding no
  X11Forwarding no
  ForceCommand internal-sftp -u 017 -l info

# ls -l /apps/ebs11i/appltop/xxnp/11.5.0/interfaces/svcaccxfr
total 3
drwxrwxr-x   2 ebsppe_a ebsppe         4 Oct 11 14:14 uploads

# ls -l /apps/ebs11i/appltop/xxnp/11.5.0/interfaces/ | grep svcaccxfr
drwxr-xr-x   3 root     root           3 Oct 11 12:38 svcaccxfr

# grep svcacc /etc/vfstab
## Special lofs/bind mount for SFTP containment svcaccxfr
/apps/ebs11i/appltop/xxnp/11.5.0/interfaces/svcaccxfr - /opt/svcaccxfr  lofs    -       yes      -

# ls -l /opt | grep svcaccxfr
drwxr-xr-x   3 root     root           3 Oct 11 12:38 svcaccxfr

# ls -l /opt/svcaccxfr
total 3
drwxrwxr-x   2 ebsppe_a ebsppe         4 Oct 11 14:14 uploads

Solaris Find Process Id tied to IP Address

Recently I needed to find out who is connecting to an Oracle database and at the same time I wanted to see the load the specific connection add to the CPU. So in short I needed IP Address and Port tied to a Unix Pid.

I wrote this quick and dirty python script.

#!/usr/bin/python
import subprocess

## No doubt you would want to exclude some non local or expected IP addresses
excludeIPs="10.2.16.86|10.2.16.62|10.2.16.83|\*.\*"

p = subprocess.Popen("/usr/bin/netstat -an | grep 1521 | awk '{print $2}' | egrep -v '" + excludeIPs + "'", stdout=subprocess.PIPE, shell=True)
nonlocals= p.stdout
 
if nonlocals <> '':
  p = subprocess.Popen("pfiles `ls /proc` 2>/dev/null", stdout=subprocess.PIPE, shell=True)
  try:
    outs, errs = p.communicate()
  except TimeoutExpired:
    p.kill()
    outs, errs = p.communicate()

  pfiles = outs

  for line in nonlocals:
    line=line.strip()
    (IP,port) = line.rsplit('.',1)
    print ("Going to find PID for connection with IP %s and port %s" % (IP,port) )

    for line in pfiles.splitlines():
      if line[:1].strip() <> '':
        pid = line
      if "port: " + port in line:
        print pid

I plan to enhance this script a little bit but for now it did exactly what I needed.

Check Logfiles Only a Few Minutes Back

This is an update post. Previously I had a post here: http://blog.ls-al.com/check-logfiles-for-recent-entries-only/

The code has been problematic around when a new year starts because of the lack of a year in the log entries. I updated the code a little bit to account for the year ticking over. I may still need to come up with a better way but below seem to work ok.

#!/usr/bin/python
#

#: Script Name  : checkLogs.py
#: Version      : 0.0.1.1
#: Description  : Check messages for last x minutes.  Used in conjunction with checkLogs.sh and a cron schedule

from datetime import datetime, timedelta

#suppressPhrases = ['ssd','offline']
suppressPhrases = []

#now = datetime(2015,3,17,7,28,00)						## Get time right now. ie cron job execution
now = datetime.now()
day_of_year = datetime.now().timetuple().tm_yday   		## Used for special case when year ticks over. Older log entries should be one year older.

## How long back to check. Making it 11 mins because cron runs every 10 mins
checkBack = 11

lines = []

#print "log entries newer than " + now.strftime('%b %d %H:%M:%S') + " minus " + str(checkBack) + " minutes"

with open('/var/adm/messages', 'r') as f:
    for line in f:
      myDate = str(now.year) + " " + line[:15]          ## Solaris syslog format like this: Mar 11 12:47:23 so need to add year

      if day_of_year >= 1 and day_of_year <= 31:        ## Brain dead log has no year so special case during January
        if not "Jan" in myDate:         #2015 Dec 30
          myDate = str(now.year -1) + " " + line[:15]

      if myDate[3] == " ":								## What about "Mar  1" having double space vs "Mar 15". That will break strptime %d.
        myDate = myDate.replace(myDate[3],"0")			## zero pad string position 4 to make %d work?

      #print "myDate: %s and now: %s" % (myDate,now)
      lt = datetime.strptime(myDate,'%Y %b %d %H:%M:%S')
      diff = now - lt
      if diff.days <= 0:
        if lt > now - timedelta(minutes=checkBack):
          #print myDate + " --- diff: " + str(diff)
          match = False
          for s in suppressPhrases:
            i = line.find(s)
            if i > -1:
              match = True
          if not match:
            lines.append(line)

if lines:
    message = '\n'.join(lines)
    print message										    # do some grepping for my specific errors here.. send message per mail...

Solaris Multipath Incorrect Totals

From time to time we notice that some LUN’s are not optimal. It could be because of off-lining a LUN, changes on the switches I am not sure why exactly it happens. If multipath is not showing the correct Path Counts you may need to run cfgadm.

See how some LUN’s here are showing 4 paths only. We expect 8.

# mpathadm list lu
        /dev/rdsk/c0t5000CCA04385ED60d0s2
                Total Path Count: 1
                Operational Path Count: 1
[..]
        /dev/rdsk/c0t600144F09D7311B500005605A40C0006d0s2
                Total Path Count: 8
                Operational Path Count: 8
        /dev/rdsk/c0t600144F09D7311B50000561ED8AB0007d0s2
                Total Path Count: 8
                Operational Path Count: 8
[..]
        /dev/rdsk/c0t600144F09D7311B50000538507080021d0s2
                Total Path Count: 8
                Operational Path Count: 8
        /dev/rdsk/c0t600144F09D7311B50000534309C40011d0s2
                Total Path Count: 4
                Operational Path Count: 4
        /dev/rdsk/c0t600144F09D7311B500005342FE86000Fd0s2
                Total Path Count: 4
                Operational Path Count: 4
        /dev/rdsk/c0t600144F09D7311B5000053D13E130029d0s2
                Total Path Count: 4
                Operational Path Count: 4
        /dev/rdsk/c0t600144F09D7311B50000566AE1CC0008d0s2
                Total Path Count: 4
                Operational Path Count: 4

I tried a few things and it looks like the cfgadm worked. It could also be a couple other thing that triggered it like destroying an unused LUN or changing a recently added LUN’s target group to be more restrictive but I doubt that is it. Most likely cfgadm.

# cfgadm -o show_SCSI_LUN -al
Ap_Id                          Type         Receptacle   Occupant     Condition
c1                             fc           connected    unconfigured unknown
c8                             fc-fabric    connected    configured   unknown
c8::20520002ac000f02,254       ESI          connected    configured   unknown
c8::21000024ff3db11d,0         disk         connected    configured   unknown
c8::21000024ff3db11d,1         disk         connected    configured   unknown
[..]
c8::21000024ff57d646,54        disk         connected    configured   unknown
c8::21000024ff57d646,56        disk         connected    configured   unknown
c8::21520002ac000f02,254       ESI          connected    configured   unknown
c8::22520002ac000f02,254       ESI          connected    configured   unknown
c8::23520002ac000f02,254       ESI          connected    configured   unknown
c9                             fc           connected    unconfigured unknown
c13                            fc-fabric    connected    configured   unknown
c13::20510002ac000f02,254      ESI          connected    configured   unknown
c13::21000024ff3db11c,0        disk         connected    configured   unknown
[..]
c13::21000024ff3db11c,44       disk         connected    configured   unknown
c13::21000024ff3db11c,46       disk         connected    configured   unknown
c13::21000024ff3db11c,48       disk         connected    configured   unknown
c13::21000024ff3db11c,50       disk         connected    unconfigured unknown
c13::21000024ff3db11c,52       disk         connected    unconfigured unknown
c13::21000024ff3db11c,54       disk         connected    unconfigured unknown
c13::21000024ff3db11c,56       disk         connected    configured   unknown
c13::21000024ff3db1b4,0        disk         connected    configured   unknown
c13::21000024ff3db1b4,1        disk         connected    configured   unknown
[..]
c13::21510002ac000f02,254      ESI          connected    configured   unknown
c13::22510002ac000f02,254      ESI          connected    configured   unknown
c13::23510002ac000f02,254      ESI          connected    configured   unknown

After I ran cfgadm..

# mpathadm list lu
        /dev/rdsk/c0t5000CCA04385ED60d0s2
                Total Path Count: 1
                Operational Path Count: 1
[..]
        /dev/rdsk/c0t600144F09D7311B500005605A40C0006d0s2
                Total Path Count: 8
                Operational Path Count: 8
        /dev/rdsk/c0t600144F09D7311B50000561ED8AB0007d0s2
                Total Path Count: 8
                Operational Path Count: 8
[..]
        /dev/rdsk/c0t600144F09D7311B5000053BE90620024d0s2
                Total Path Count: 8
                Operational Path Count: 8
        /dev/rdsk/c0t600144F09D7311B50000533C012A0009d0s2
                Total Path Count: 8
                Operational Path Count: 8
        /dev/rdsk/c0t600144F09D7311B50000533AAFF00007d0s2
                Total Path Count: 8
                Operational Path Count: 8
        /dev/rdsk/c0t600144F09D7311B50000538507080021d0s2
                Total Path Count: 8
                Operational Path Count: 8
        /dev/rdsk/c0t600144F09D7311B50000534309C40011d0s2
                Total Path Count: 8
                Operational Path Count: 8
        /dev/rdsk/c0t600144F09D7311B500005342FE86000Fd0s2
                Total Path Count: 8
                Operational Path Count: 8
        /dev/rdsk/c0t600144F09D7311B5000053D13E130029d0s2
                Total Path Count: 8
                Operational Path Count: 8
        /dev/rdsk/c0t600144F09D7311B50000566AE1CC0008d0s2
                Total Path Count: 8
                Operational Path Count: 8

I can also see the changes in messages…

# dmesg
Nov  9 03:59:21 solaris11 mac: [ID 469746 kern.info] NOTICE: ldoms-vsw0.vport15 registered
Nov  9 05:41:55 solaris11 scsi: [ID 583861 kern.info] ssd98 at scsi_vhci0: unit-address g600144f09d7311b50000534309c40011: f_tpgs
[..]
Dec 14 09:38:01 solaris11 genunix: [ID 483743 kern.info] /scsi_vhci/ssd@g600144f09d7311b50000566ae1cc0008 (ssd101) multipath status: optimal: path 343 fp16/ssd@w21000024ff3db1b5,38 is standby
Dec 14 09:38:01 solaris11 last message repeated 1 time
Dec 14 09:38:01 solaris11 genunix: [ID 483743 kern.info] /scsi_vhci/ssd@g600144f09d7311b5000053d13e130029 (ssd100) multipath status: optimal: path 344 fp16/ssd@w21000024ff3db1b5,36 is standby
Dec 14 09:38:01 solaris11 genunix: [ID 483743 kern.info] /scsi_vhci/ssd@g600144f09d7311b500005342fe86000f (ssd99) multipath status: optimal: path 345 fp16/ssd@w21000024ff3db1b5,34 is standby
Dec 14 09:38:01 solaris11 genunix: [ID 483743 kern.info] /scsi_vhci/ssd@g600144f09d7311b50000534309c40011 (ssd98) multipath status: optimal: path 346 fp16/ssd@w21000024ff3db1b5,32 is standby
Dec 14 09:38:07 solaris11 genunix: [ID 483743 kern.info] /scsi_vhci/ssd@g600144f09d7311b50000566ae1cc0008 (ssd101) multipath status: optimal: path 347 fp16/ssd@w21000024ff3db11d,38 is standby
Dec 14 09:38:07 solaris11 last message repeated 1 time
Dec 14 09:38:07 solaris11 genunix: [ID 483743 kern.info] /scsi_vhci/ssd@g600144f09d7311b5000053d13e130029 (ssd100) multipath status: optimal: path 348 fp16/ssd@w21000024ff3db11d,36 is standby
Dec 14 09:38:08 solaris11 genunix: [ID 483743 kern.info] /scsi_vhci/ssd@g600144f09d7311b500005342fe86000f (ssd99) multipath status: optimal: path 349 fp16/ssd@w21000024ff3db11d,34 is standby
Dec 14 09:38:08 solaris11 genunix: [ID 483743 kern.info] /scsi_vhci/ssd@g600144f09d7311b50000534309c40011 (ssd98) multipath status: optimal: path 350 fp16/ssd@w21000024ff3db11d,32 is standby
Dec 14 09:38:16 solaris11 genunix: [ID 483743 kern.info] /scsi_vhci/ssd@g600144f09d7311b50000566ae1cc0008 (ssd101) multipath status: optimal: path 351 fp21/ssd@w21000024ff3db1b4,38 is standby
Dec 14 09:38:16 solaris11 last message repeated 1 time
Dec 14 09:38:17 solaris11 genunix: [ID 483743 kern.info] /scsi_vhci/ssd@g600144f09d7311b5000053d13e130029 (ssd100) multipath status: optimal: path 352 fp21/ssd@w21000024ff3db1b4,36 is standby
Dec 14 09:38:17 solaris11 genunix: [ID 483743 kern.info] /scsi_vhci/ssd@g600144f09d7311b500005342fe86000f (ssd99) multipath status: optimal: path 353 fp21/ssd@w21000024ff3db1b4,34 is standby
Dec 14 09:38:17 solaris11 genunix: [ID 483743 kern.info] /scsi_vhci/ssd@g600144f09d7311b50000534309c40011 (ssd98) multipath status: optimal: path 354 fp21/ssd@w21000024ff3db1b4,32 is standby
Dec 14 09:38:22 solaris11 genunix: [ID 483743 kern.info] /scsi_vhci/ssd@g600144f09d7311b50000566ae1cc0008 (ssd101) multipath status: optimal: path 355 fp21/ssd@w21000024ff3db11c,38 is standby
Dec 14 09:38:22 solaris11 last message repeated 1 time
Dec 14 09:38:22 solaris11 genunix: [ID 483743 kern.info] /scsi_vhci/ssd@g600144f09d7311b5000053d13e130029 (ssd100) multipath status: optimal: path 356 fp21/ssd@w21000024ff3db11c,36 is standby
Dec 14 09:38:23 solaris11 genunix: [ID 483743 kern.info] /scsi_vhci/ssd@g600144f09d7311b500005342fe86000f (ssd99) multipath status: optimal: path 357 fp21/ssd@w21000024ff3db11c,34 is standby
Dec 14 09:38:23 solaris11 genunix: [ID 483743 kern.info] /scsi_vhci/ssd@g600144f09d7311b50000534309c40011 (ssd98) multipath status: optimal: path 358 fp21/ssd@w21000024ff3db11c,32 is standby

Solaris Snoop on File Access

If you find yourself trying to figure out where your operating system is spending time with reads and writes try this little dtrace gem. Script is here: http://dtracebook.com/index.php/File_System:rwsnoop

I ran it like below. Unknown is socket access and filtering out ssh and grep explains itself.

# ./rwsnoop.dtrace | egrep -v "sshd|grep|unknown"
  UID    PID CMD          D   BYTES FILE
    0    637 utmpd        R       4 /var/adm/wtmpx
  324   2884 java         W      77 /scratch/agtst1ML/MemoryMonitorLog.log
  324   2884 java         W      77 /scratch/agtst1ML/MemoryMonitorLog.log
  324   2884 java         W      77 /scratch/agtst1ML/MemoryMonitorLog.log
  324   2884 java         W      16 /devices/pseudo/poll@0:poll
  324   2884 java         W       8 /devices/pseudo/poll@0:poll
    1    593 nfsmapid     R      78 /etc/resolv.conf
    1    593 nfsmapid     R       0 /etc/resolv.conf
  324   2884 java         W      77 /scratch/agtst1ML/MemoryMonitorLog.log
    0      1 init         R    1006 /etc/inittab
    0      1 init         R       0 /etc/inittab
    0      1 init         W     412 /etc/svc/volatile/init-next.state
    0      1 init         W     412 /etc/svc/volatile/init-next.state
    0      1 init         R    1006 /etc/inittab
    0      1 init         R       0 /etc/inittab
    1    180 kcfd         R     976 /usr/lib/security/pkcs11_kernel.so.1

Icinga2 on Solaris 11

I typically prefer using Nagios for network monitoring. Nagios itself is tricky to get going on Solaris and I have had a long running issue with Nagios on Solaris. Despite the issue around Nagios Core Worker timing out I still use Nagios on Solaris. If Linux is an option in a particular environment it would be preferable to use a packaged up Nagios from any of the popular distributions.

Having said that I have recently tried Icinga2 on Solaris and here is some notes around getting it running. Consider yourself warned this is not pretty but it runs surprisingly well and lightweight compared to Nagios. At least on Solaris SPARC.

There is a few hoops to jump through before compiling icinga2 like the compile environment and boost is particularly nasty. I will document that separate at some point in future. As I said you are on your own to get the compile environment and pre-requisites in order.

Stage and compile:

# pwd
/usr/src
# wget https://github.com/Icinga/icinga2/archive/v2.3.11.tar.gz
# gzip -d v2.3.11.tar.gz 
# tar xf v2.3.11.tar 
# mv v2.3.11.tar icinga-v2.3.11.tar
# cd icinga2-2.3.11/
# mkdir build &amp;&amp; cd build
# cmake -D MYSQL_INCLUDE_DIR=/usr/mysql/5.1/include -D MYSQL_LIB=/usr/mysql/5.1/lib -DICINGA2_WITH_PGSQL=OFF ..

Issue 1 — cmake:

# pwd
/usr/src/icinga2-2.3.11
# diff CMakeLists.txt CMakeLists.txt.orig 
135,136c135,136
<     set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -g -pthread -lm")
<     set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -g -pthread -DSOLARIS2=11 -D_POSIX_PTHREAD_SEMANTICS")
---
>     set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -g")
>     set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -g")
139,140c139,140
<   set(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} -Wl")
<   set(CMAKE_SHARED_LINKER_FLAGS "${CMAKE_SHARED_LINKER_FLAGS} -Wl")
---
>   set(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} -Wl,--gc-sections")
>   set(CMAKE_SHARED_LINKER_FLAGS "${CMAKE_SHARED_LINKER_FLAGS} -Wl,--gc-sections")

** rerun cmake after fixing CMakeLists.txt

Compile issue 2 — INFINITY:

# cd build/
# gmake VERBOSE=1

[..]
/usr/src/icinga2-2.3.11/lib/base/math-script.cpp:85:18: error: ‘INFINITY’ was not declared in this scope

# diff /usr/src/icinga2-2.3.11/lib/base/math-script.cpp /usr/src/icinga2-2.3.11/lib/base/math-script.cpp.orig
30,31d29
< #define INFINITY (__builtin_huge_val ())
< 

# gmake VERBOSE=1

** takes a very long time

Compile issue 3 — mysql.h:

/usr/src/icinga2-2.3.11/lib/db_ido_mysql/idomysqlconnection.hpp:27:19: fatal error: mysql.h: No such file or directory

# find / -name mysql.h
/usr/mysql/5.1/include/mysql/mysql.h
/usr/mysql/5.5/include/mysql.h

# cmake -D MYSQL_INCLUDE_DIR=/usr/mysql/5.5/include -D MYSQL_LIB=/usr/mysql/5.1/lib -DICINGA2_WITH_PGSQL=OFF ..
-- Could NOT find yajl (missing:  YAJL_LIBRARY YAJL_INCLUDE_DIR) 
running /usr/bin/cmake -E copy_if_different "/usr/src/icinga2-2.3.11/third-party/yajl/src/api/yajl_parse.h" "/usr/src/icinga2-2.3.11/build/third-party/yajl/src/../include/yajl"  2>&1
running /usr/bin/cmake -E copy_if_different "/usr/src/icinga2-2.3.11/third-party/yajl/src/api/yajl_gen.h" "/usr/src/icinga2-2.3.11/build/third-party/yajl/src/../include/yajl"  2>&1
running /usr/bin/cmake -E copy_if_different "/usr/src/icinga2-2.3.11/third-party/yajl/src/api/yajl_common.h" "/usr/src/icinga2-2.3.11/build/third-party/yajl/src/../include/yajl"  2>&1
running /usr/bin/cmake -E copy_if_different "/usr/src/icinga2-2.3.11/third-party/yajl/src/api/yajl_tree.h" "/usr/src/icinga2-2.3.11/build/third-party/yajl/src/../include/yajl"  2>&1
-- MySQL Include dir: /usr/mysql/5.5/include  library dir: /usr/mysql/5.1
-- MySQL client libraries: mysqlclient_r
-- Configuring done
-- Generating done
-- Build files have been written to: /usr/src/icinga2-2.3.11/build

** of course rerun gmake

Install:

# gmake install
[  1%] Built target mmatch
[ 10%] Built target yajl
[..]
-- Installing: /usr/local/share/doc/icinga2/markdown/21-debug.md
-- Installing: /usr/local/share/doc/icinga2/markdown/12-distributed-monitoring-ha.md

Attempt first run:

# LD_LIBRARY_PATH=/usr/local/lib/:/usr/mysql/5.1/lib/mysql
# echo $LD_LIBRARY_PATH
/usr/local/lib/:/usr/mysql/5.1/lib/mysql
# /usr/local/sbin/icinga2 daemon
[2015-11-10 06:59:17 -0800] information/cli: Icinga application loader (version: r2.3.11-1)
[2015-11-10 06:59:17 -0800] information/cli: Loading application type: icinga/IcingaApplication
[..]
[2015-11-10 06:59:17 -0800] information/ConfigItem: Checked 1 UserGroup(s).
[2015-11-10 06:59:17 -0800] information/ConfigItem: Checked 1 IcingaApplication(s).
[2015-11-10 06:59:17 -0800] information/ConfigItem: Checked 1 ScheduledDowntime(s).
[2015-11-10 06:59:17 -0800] information/ScriptGlobal: Dumping variables to file '/usr/local/var/cache/icinga2/icinga2.vars'
[2015-11-10 06:59:17 -0800] information/DynamicObject: Restoring program state from file '/usr/local/var/lib/icinga2/icinga2.state'
[2015-11-10 06:59:17 -0800] information/DynamicObject: Restored 157 objects. Loaded 5 new objects without state.
[2015-11-10 06:59:17 -0800] information/ConfigItem: Triggering Start signal for config items
[2015-11-10 06:59:17 -0800] information/DbConnection: Resuming IDO connection: ido-mysql
[2015-11-10 06:59:17 -0800] information/ConfigItem: Activated all objects.
[2015-11-10 06:59:17 -0800] information/IdoMysqlConnection: MySQL IDO instance id: 1 (schema version: '1.13.0')

Web front-end:

Definitely read:
https://github.com/Icinga/icingaweb2
https://github.com/Icinga/icingaweb2/blob/master/doc/installation.md

# pwd
/var/apache2/2.2/htdocs
# mv master.zip icinga2_master_old.zip
# wget https://github.com/Icinga/icingaweb2/archive/master.zip

# unzip icinga2_master_20151110.zip
[..]
 inflating: icingaweb2-master/test/php/res/status/icinga.objects.cache  
  inflating: icingaweb2-master/test/php/res/status/icinga.status.dat  

Since I am upgrading in my case I followed header “Upgrading Icinga Web 2

Before upgrade version.

# pwd
/var/apache2/2.2/htdocs/icingaweb2
# more VERSION 
v2.0.0-rc1

# mv icingaweb2 icingaweb2.20151110
# mv icingaweb2-master/ icingaweb2

Not sure if my upgrade actually needed the schema change(did not verify) this but I ran it just in case.

# pwd
/var/apache2/2.2/htdocs/icingaweb2.new/etc/schema/mysql-upgrades#
# mysql -u root -p icinga < 2.0.0beta3-2.0.0rc1.sql 

** In addition although the v2.0.0-rc1 web front-end is working fine for me the latest (v2.0.0) is not. I have a problem on the dashboard I have not had time to track down. The error is: The filter column “view” is not allowed here.