Despite several speed bumps, we ended up deciding to keep the Sun storage system at PI. Here is what we found.
Our configuration:
- 7410 Storage Controller with 16GB RAM, 2 x 2.3GHz Quad-Core processors and built-in 4 x 1Gb ethernet ports
- 22 x 1TB SATA drives, Double parity RAID (14TB formatted)
- 2 x 18GB Solid State write disks, RAID1
- 100Gb Solid State read disk
This does not provide redundant controllers. The system can be made fully redundant by doubling the price and clustering multiple controllers and J4400s.
Exported, we have:
- /export/home – NFSv3 and CIFS home directories
- /export/vmware – NFSv3 for VMWare (currently 6 VMs)
- exchangedb iSCSI lun – for exchange database files
- exchangelogs iSCSI lun – for exchange transaction logs
Configuration issues:
- The Sun tech that installed the system suggested to dedicate one of the ethernet interfaces to the administrative BUI as the analytics tool uses a lot of bandwidth. This proves to only be useful if you are hitting the administrative interface from the same network as the Sun system is on. Otherwise it sends all traffic out the interface configured for your default route, which is likely tied to your data interface. In our case we would be hitting the admin interface more-or-less always from a different network, making their suggestion a waste of an interface.
- We wanted to do a combination of IPMP and LACP. This would allow us to aggregate two interfaces each to two different network switches. If one switch died the 7410 would fail over to using the other aggregate. The problem with this is there was no option in the BUI to configure failback to a preferred aggregate in the IPMP options. As we were using iSCSI to email and NFS to VMWare with the 7410 we would prefer to use a specific switch. Because of this we ended up using LACP to aggregate all four interfaces. This makes the system not resiliant to switch stack failure, but significantly reduces the chance of overloading cross-switch traffic.
- We found that whenever we reconfigured the network on the 7410 that the system needed a reboot before it was accessible again. This isn’t an issue as network settings are a set once thing for us, but can be a pain for some people.
Performance:
- We did a JetStress test of a 173GB test database on an iSCSI lun from the system. We achieved 750 IOPS. This was while the system was otherwise idle. Our NetApp FAS270 topped out at ~250 IOPS from the SATA shelf and ~400 IOPS from the local FC disk.
- We ran iozone on the system. I am not very good at interpreting the results nor making graphs of the results. It looks like we saw on average 80MB/s transfer for the test.
- We see on average 130 IOPS use of the system, which is more than sufficient for us.
- Performance can still be increased by adding up to 5 more (albeit expensive) 100Gb Solid State read disks.
- Aside from the Microsoft Office bug (see below) we have not heard complaints about performance in two weeks.
Backups:
- We were able to successfully do a full backup the NFS/CIFS fileshares from the 7410 using NDMP. The fileshares are not browsable via NDMP, so you must tell your NDMP client the full path to what you want to export.
- Incremental or differential backups via NDMP is still a mystery. I need to open a support case for this.
- iSCSI luns are not yet able to be backed up via NDMP. We are doing host-level exchange backups for this. There is no “SnapManager for Exchange”-like tool as NetApp had. Sun claims NDMP backups of iSCSI luns are on the development board for later this year.
Other issues:
- There is a bug for file locking on shares that are exported both NFS and CIFS. When opening Microsoft Office documents from the share there is a 15-30 second wait while a file lock is aquired. This has been explained by Sun tech support as being the result of having to delve down to the individual SATA disk for the file lock. They are implementing a fix for the next release.
- We created the fileshares with Reject non-UTF8 filenames. This is a default setting and not changable once the share is created. This caused issues copying files while using linux from the NetApp to the 7410. The NetApp had some latin1 encoded files that would not copy. We were 6 hours into the data move when the issue showed itself. We found a work-around to use CIFS to copy these specific files.
- File permissions mapping between CIFS and NFS is just as bad as using NetApp in Mixed mode. This is due to Posix file permissions being inherently incompatible with ACLs. After a lot of work one can massage the permissions to work properly, but it’s mind boggling madness. I understand this is not so big an issue with NFS4 which uses ACLs by default, but we are stuck in NFS3 world.
- Authenticated NFS (eg. kerberos) does not exist yet. Apparently this is in a future release.
- User quotas don’t exist. There is a workaround to create a seperate share for each user. This is unmanageable in my opinion. It’s a good thing Perimeter’s administration thinks it’s too draconian to implement quotas.
- Snapshots are not named in a user-intelligible way. They show up with the unix timestamp (eg. .auto-1238601600). They are only accessible from the root of the directory structure (eg. \\solar\home\.zfs\snapshot). The Windows Previous Versions tab does not show up. Thus, snapshots are a little less user-friendly, however they are still there and quite usable.
All in all, I like the system. It is significantly cheaper than anything else which was a major decision in keeping the unit and gives as much expandability as the NetApp 3410 would have. The unit comes with free software upgrades, and for a very cheap price (considering the cost of the unit) was purchased with 3 year hardware warranty.
tony parsonage | 14-Apr-09 at 12:13 am | Permalink
Hi,
I don’t suppose you have any step by step guides on setting up a CIFS/NFS share using NFSv4 by any chance. We are having major issues and I believe it to be an ACL issue why we can create files in windows but get “I/O Error” when trying to list the files in OpenSolaris.
Any ideas greatly appreciated.
Tony
jjstautt | 14-Apr-09 at 8:23 pm | Permalink
Unfortunately, I don’t. The version of Ubuntu we use at work wasn’t quite there yet on the NFSv4 front so we are stuck using NFSv3. One of the things I’ve noticed is:
1) getting the idmap rules specified correctly is essential
2) if we are modifying the permissions from linux, it is best to “su – username” to change permissions on files owned by username. If we don’t it seems to mess up username’s access from CIFS.
3) make sure you have non-blocking mandatory locking is on if you are using both CIFS and NFS on the share
Serdar Kaya | 17-Apr-09 at 1:51 am | Permalink
Hello, now we are testing a 7410 with 3 read & 2 write ssds and 22tb sata. The problem is that, we could not warm-up ssds yet, so performance is worse than we expect to. How about your system? I think it must be better now.
mark | 27-Apr-09 at 12:10 am | Permalink
Hi, we looked at one of these and am looking forward to purchasing one.
A few points of note
1. At present rebooting the device clears out the SSD and needs to be re-warmed up. Apparenmtly this will be resolved in a future update
2. A good reference for performance is Brendan’s blog, basically it is highly recommended if you don’t need the storage to use RAID1. http://blogs.sun.com/brendan/
3. The SSDs only store randomly accessed blocks, streaming data is excluded from the L2ARC
4. Don’t forget to check the Fishworks Wiki to see if there are any relevant updates http://wikis.sun.com/display/FishWorks/Fishworks;jsessionid=1EAA02A67A44460598F11F22D0788B64
5. Another good performance point is to use an 8kByte block size as this means your data is more likely to sit in the L2ARC (also probably easier in future updaets for De-Duplication to find duplicate blocks.
Cheers
Mark
Charles Soto | 20-Jul-09 at 12:19 pm | Permalink
We just racked a 7410 clustered configuration. I’m waiting on new 10gig switches before we go production, however. The single-head version we tested only had 10 drives, but I was getting similar high performance (about the maximum I would expect with 1Gig LACP connections all around). This is going to replace our old Dell/EMC CX3-40, which costs almost as much to renew the support contract as it was to buy the X7410 (well, single-head – we decided the extra few $K would serve us for availability).
I think you may be incorrect about the need for a second J4400 array. The array is dually-attached. Each controller can see the pools configured on it. Yes, only one controller can currently “own” a pool, so if you want real active/active, you have to have at least two POOLs, but that can be done on one array. Ideally, yes, you’ll want twice as many cylinders, to maintain good streaming speed.
jjstautt | 20-Jul-09 at 8:38 pm | Permalink
You are correct Charles. You can split the disks into two pools with the J4400 and use that between two different controllers. This setup would be much more redundant than what Perimeter bought. I would ask Sun some very hard questions about if the internals of the J4400 are fully redundant before doing this though. I haven’t done this research so I can’t tell you.
Trevayne O'Brien | 31-Aug-09 at 4:27 am | Permalink
Anyone having CIFS operation timeouts or dropouts? on the SUN 7410
Tony Parsonage | 02-Mar-10 at 2:33 am | Permalink
Hi again,
I have had a “quick quick” moment from my boss to put this 7410 in. I have allowed iSCSI access from windows servers to use as a temporary measure but noticed that the disk iops average was approximately 116 ops per second? Is this good? Does anybody have any standard settings for connections from windows servers using iSCSI. (This is not a DB server)
Regards
Tony
jjstautt | 02-Mar-10 at 9:18 am | Permalink
That sounds low, but it’s been a while since I’ve looked at this. On the 7410, how you define the various properties of the iSCSI share (Data Compression, Checksum, etc) will affect the speed. You will want to ensure a minimum 1Gbps link between your windows hosts and the 7410. A 100Mb/s network link would be akin to slapping an IDE drive into your server from 10 years ago. Also, check the processor usage on your server. Unless you have an iSCSI HBA, all the iSCSI processing will be done in the CPU.