Sun Unified Storage 7410 put into production

Despite several speed bumps, we ended up deciding to keep the Sun storage system at PI. Here is what we found.

Our configuration:

  • 7410 Storage Controller with 16GB RAM, 2 x 2.3GHz Quad-Core processors and built-in 4 x 1Gb ethernet ports
  • 22 x 1TB SATA drives, Double parity RAID (14TB formatted)
  • 2 x 18GB Solid State write disks, RAID1
  • 100Gb Solid State read disk

This does not provide redundant controllers. The system can be made fully redundant by doubling the price and clustering multiple controllers and J4400s.

Exported, we have:

  • /export/home – NFSv3 and CIFS home directories
  • /export/vmware – NFSv3 for VMWare (currently 6 VMs)
  • exchangedb iSCSI lun – for exchange database files
  • exchangelogs iSCSI lun – for exchange transaction logs

Configuration issues:

  • The Sun tech that installed the system suggested to dedicate one of the ethernet interfaces to the administrative BUI as the analytics tool uses a lot of bandwidth. This proves to only be useful if you are hitting the administrative interface from the same network as the Sun system is on. Otherwise it sends all traffic out the interface configured for your default route, which is likely tied to your data interface. In our case we would be hitting the admin interface more-or-less always from a different network, making their suggestion a waste of an interface.
  • We wanted to do a combination of IPMP and LACP. This would allow us to aggregate two interfaces each to two different network switches. If one switch died the 7410 would fail over to using the other aggregate. The problem with this is there was no option in the BUI to configure failback to a preferred aggregate in the IPMP options. As we were using iSCSI to email and NFS to VMWare with the 7410 we would prefer to use a specific switch. Because of this we ended up using LACP to aggregate all four interfaces. This makes the system not resiliant to switch stack failure, but significantly reduces the chance of overloading cross-switch traffic.
  • We found that whenever we reconfigured the network on the 7410 that the system needed a reboot before it was accessible again. This isn’t an issue as network settings are a set once thing for us, but can be a pain for some people.

Performance:

  • We did a JetStress test of a 173GB test database on an iSCSI lun from the system. We achieved 750 IOPS. This was while the system was otherwise idle. Our NetApp FAS270 topped out at ~250 IOPS from the SATA shelf and ~400 IOPS from the local FC disk.
  • We ran iozone on the system.  I am not very good at interpreting the results nor making graphs of the results. It looks like we saw on average 80MB/s transfer for the test.
  • We see on average 130 IOPS use of the system, which is more than sufficient for us.
  • Performance can still be increased by adding up to 5 more (albeit expensive) 100Gb Solid State read disks.
  • Aside from the Microsoft Office bug (see below) we have not heard complaints about performance in two weeks.

Backups:

  • We were able to successfully do a full backup the NFS/CIFS fileshares from the 7410 using NDMP. The fileshares are not browsable via NDMP, so you must tell your NDMP client the full path to what you want to export.
  • Incremental or differential backups via NDMP is still a mystery. I need to open a support case for this.
  • iSCSI luns are not yet able to be backed up via NDMP. We are doing host-level exchange backups for this. There is no “SnapManager for Exchange”-like tool as NetApp had. Sun claims NDMP backups of iSCSI luns are on the development board for later this year.

Other issues:

  • There is a bug for file locking on shares that are exported both NFS and CIFS. When opening Microsoft Office documents from the share there is a 15-30 second wait while a file lock is aquired. This has been explained by Sun tech support as being the result of having to delve down to the individual SATA disk for the file lock. They are implementing a fix for the next release.
  • We created the fileshares with Reject non-UTF8 filenames. This is a default setting and not changable once the share is created. This caused issues copying files while using linux from the NetApp to the 7410. The NetApp had some latin1 encoded files that would not copy. We were 6 hours into the data move when the issue showed itself. We found a work-around to use CIFS to copy these specific files.
  • File permissions mapping between CIFS and NFS is just as bad as using NetApp in Mixed mode. This is due to Posix file permissions being inherently incompatible with ACLs. After a lot of work one can massage the permissions to work properly, but it’s mind boggling madness. I understand this is not so big an issue with NFS4 which uses ACLs by default, but we are stuck in NFS3 world.
  • Authenticated NFS (eg. kerberos) does not exist yet. Apparently this is in a future release.
  • User quotas don’t exist. There is a workaround to create a seperate share for each user. This is unmanageable in my opinion. It’s a good thing Perimeter’s administration thinks it’s too draconian to implement quotas.
  • Snapshots are not named in a user-intelligible way. They show up with the unix timestamp (eg. .auto-1238601600). They are only accessible from the root of the directory structure (eg. \\solar\home\.zfs\snapshot). The Windows Previous Versions tab does not show up. Thus, snapshots are a little less user-friendly, however they are still there and quite usable.

All in all, I like the system. It is significantly cheaper than anything else which was a major decision in keeping the unit and gives as much expandability as the NetApp 3410 would have. The unit comes with free software upgrades, and for a very cheap price (considering the cost of the unit) was purchased with 3 year hardware warranty.