CIFS ACLs on ZFS Problem

Recently had an issue with a CIFS share on a Solaris 11 box.  Still not sure how this happened but it turned out there was a weird Idmap mapping.  Active Directory Group and members were correct and group had correct members.  Yet still the users in this group could not write to the folder.

 How to check identities in idmap:

# idmap show -cv rrosso@domain.com
winuser:rrosso@domain.com -> uid:2147483651
Source: Cache
Method: Ephemeral

# idmap show -cv DFS_Corp-CA-Dept-IT_rw@domain.com
wingroup:DFS_Corp-CA-Dept-IT_rw@domain.com -> gid:2147483667
Source: Cache
Method: Ephemeral

Lets just see how the mapping rules look:

# idmap list
add     winuser:*@domain.com  unixuser:*
add     wingroup:*@domain.com unixgroup:*
add     winuser:administrator@domain.com      unixuser:root
add     "wingroup:Domain Users@domain.com"    unixgroup:smbusers

The Active Directory Read-Write group that is not allowing the members to write to the folder:

# idmap show -cv DFS_Eng-CA-Dirs-Engineering-Bugzilla_rw@domain.com
wingroup:DFS_Eng-CA-Dirs-Engineering-Bugzilla_rw@domain.com -> gid:2147484149
Source: Cache
Method: Ephemeral

Looking at the folder called Bugzilla:
Current (broken) acl must be this one user:2147483813 if I look at the gid above.  Not to mention the mapping is not for a group but for a user.

root@zfs001:/tank/dfs/engdirs/engineering/engineering# /bin/ls -v | more
d---------+ 16 2147483650 smbusers      17 Oct 12 14:14 Bugzilla
0:user:2147483813:list_directory/read_data/add_file/write_data
/add_subdirectory/append_data/read_xattr/write_xattr/execute
/read_attributes/write_attributes/delete/read_acl/synchronize
:file_inherit/dir_inherit:allow
1:group:2147483763:list_directory/read_data/add_file/write_data
/add_subdirectory/append_data/read_xattr/write_xattr/execute
/delete_child/read_attributes/write_attributes/delete/read_acl
/synchronize:file_inherit/dir_inherit:allow
2:group:2147483660:list_directory/read_data/read_xattr/execute
/read_attributes/read_acl/synchronize:file_inherit/dir_inherit
:allow

Looking at above something looks odd.  Looking at the windows side we expect three groups to have permission here but spot the “user” listed in the first ACL.

Lets find the three id’s.  Left the grep wide open to find all uid and gid matching the number. But really we are just after the gid’s:

# idmap dump -n | grep 2147483813
wingroup:Guests@BUILTIN ==      gid:2147483813
wingroup:DFS_Eng-CA-Dirs-Engineering-Bugzilla_rw@domain.com   ==      uid:2147483813

# idmap dump -n | grep 2147483763
winuser:Homey@domain.com     ==      uid:2147483763
wingroup:DFS_Eng-CA-Dirs-Engineering_rw@domain.com    ==      gid:2147483763

# idmap dump -n | grep 2147483660
winuser:Stewey@domain.com     ==      uid:2147483660
wingroup:DFS_Eng-CA-Dirs-Engineering_ro@domain.com    ==      gid:2147483660

# idmap dump -n | grep 2147484149
wingroup:DFS_Eng-CA-Dirs-Engineering-Bugzilla_rw@domain.com   ==      gid:2147484149

 

After we removed and  recreated the group in AD.  Might take a little bit to show up:

# idmap show -cv DFS_Eng-CA-Dirs-Engineering-Bugzilla_rw@domain.com
wingroup:DFS_Eng-CA-Dirs-Engineering-Bugzilla_rw@domain.com -> gid:2147484149
Source: Cache
Method: Ephemeral

# idmap dump -n | grep 2147483813
wingroup:Guests@BUILTIN ==      gid:2147483813
usid:S-1-5-21-1977730361-3076317898-4166923938-22371    ==      uid:2147483813

# idmap dump -n | grep 147484149
wingroup:DFS_Eng-CA-Dirs-Engineering-Bugzilla_rw@domain.com   ==      gid:2147484149

Permissions after re-applying from Windows:

# /bin/ls -dv Bugzilla/
d---------+ 17 2147483650 smbusers      18 Nov 12 20:12 Bugzilla/
     0:group:2147483763:list_directory/read_data/add_file/write_data
         /add_subdirectory/append_data/read_xattr/write_xattr/execute
         /delete_child/read_attributes/write_attributes/delete/read_acl
         /synchronize:file_inherit/dir_inherit:allow
     1:group:2147483660:list_directory/read_data/read_xattr/execute
         /read_attributes/read_acl/synchronize:file_inherit/dir_inherit
         :allow
     2:group:2147484149:list_directory/read_data/add_file/write_data
         /add_subdirectory/append_data/read_xattr/write_xattr/execute
         /read_attributes/write_attributes/delete/read_acl/synchronize
         :file_inherit/dir_inherit:allow

Just checking a new file we just created for good measure:

# /bin/ls -v | grep Test
d---------+  2 2147483740 smbusers       2 Nov 12 20:12 Test

Solaris Idmap Problems

When using the kernel enabled CIFS server on Solaris 11, we found that the idmap service picks Domain Controllers that are located across a WAN link, which cause two problems:
A) slow authentication; or even worse
B) idmap will use a server that disappears when a WAN link goes down which causes havoc

After watching the debug logs I can see that idmap scans the SRV records in DNS to get a list of Domain Controllers in the forest.  Even when config/site_name (not a well documented setting) is set in the SMF properties for idmap, the discovery process still cycles through the whole list of DC’s in the forest.  If the first one is unreachable it keeps going until it finds one.  The list of SRV records is pretty much random since Active Directory assigned a weight of 100% to each SRV entry.  So in our case the discovery routine of idmap use basically a random server in a list of 21 Domain Controllers no matter where they live.  As long as its reachable through LDAP.

If the idmap service would just use the DC’s listed in the specific site we specify for this CIFS server this would be a much more stable service.  It’s possible this could be a bug that needs to be reported to Sun (Oracle) I am not sure.

My work around:

In my case I made local firewall rules on the inferior Windows Domain Controllers to block the specific Solaris CIFS server from connecting to them.  So the idmap logs will still show the unsuccessful attempts connecting to non reachable servers during discovery, but at least it will not be able to use them.  Whereas without the firewall block idmap would happily attach to a reachable DC in India or Australia.

PS C:\Users\Administrator.DOMAIN> netsh advfirewall firewall add rule name="Block Solaris IDMAPD" dir=In new remoteip="172.19.8.62/32,172.19.8.64/32,172.21.8.33/32" Action="Block" protocol="Any" Profile="Domain,Private,Public" enable="no

Ok.

PS C:\Users\Administrator.DOMAIN> netsh advfirewall firewall show rule name="Block Solaris IDMAPD"

Rule Name:                            Block Solaris IDMAPD
----------------------------------------------------------------------
Enabled:                              No
Direction:                            In
Profiles:                             Domain,Private,Public
Grouping:
LocalIP:                              Any
RemoteIP:                             172.19.8.62/32,172.19.8.64/32,172.21.8.33/32
Protocol:                             Any
Edge traversal:                       No
Action:                               Block
Ok.

PS C:\Users\Administrator.DOMAIN> netsh advfirewall firewall set rule name="Block Solaris IDMAPD" new enable="yes"

Updated 1 rule(s).
Ok.

Log entries looks  like this:

# pwd
/var/svc/log

# tail -f system-idmap:default.log
LDAP: frdc001.domain.com:389: Can't connect to the LDAP server
frdc001.sonosite.com: Can't connect to the LDAP server - Operation now in progress
LDAP: dedc002.domain.com:389: Can't connect to the LDAP server
dedc002.sonosite.com: Can't connect to the LDAP server - Operation now in progress
Using server usdc001.sonosite.com:3268

** Note:
Unfortunately the “Using server” log entry is specific to SunOS 5.11 151.0.1.8 which I think translates to Solaris 11 Express.  Even with debugging turned on for all, discovery or ldap I did not get the “Using server” entries on 5.11 11.0.

Check what DNS shows in the forest.  Our case 21 DC’s:

# dig _ldap._tcp.domain.com SRV +short
;; Truncated, retrying in TCP mode.
0 100 389 frdc001.domain.com.
<snip>
0 100 389 indc002.domain.com.

Set Debugging Higher. Play with these. All might be too high, especially in a working server:

# svccfg -s idmap setprop 'debug/all = integer: 0'
# svccfg -s idmap setprop 'debug/ldap = integer: 1'
# svccfg -s idmap setprop 'debug/discovery = integer: 1'

Refresh the service to reload configuration change:

# svcadm refresh svc:/system/idmap:default

Set site_name :

# svccfg -s idmap setprop 'config/site_name = astring: US'
# svcadm refresh svc:/system/idmap:default

If the site name is not set the discovery process will complain that no site found.  It does not really affect anything since it goes and use any DC in the forest anyhow but I would think if site is set the discovery should behave better.

Check the SRV record for US site as we configured in Active Directory:

# dig _ldap._tcp.US._sites.domain.com SRV +short
0 100 389 usdc101.domain.com.
<snip>
0 100 389 usdc001.domain.com.

Check the CA site:

# dig _ldap._tcp.CA._sites.domain.com SRV +short
0 100 389 cadc001.domain.com.
0 100 389 cadc002.domain.com.

Check if this service is running. Might be required:

# svcs name-service-cache
STATE          STIME    FMRI
online         Jun_04   svc:/system/name-service-cache:default

TODO:

– Check how the Solaris ZFS appliance does this.  It does not appear to suffer the same fate.

Links:

http://docs.oracle.com/cd/E19082-01/819-3194/adsetup-2/index.html