Sunday, July 14, 2013

Personal Data Storage

This post is a departure from beaten track as we will discuss Redundant Arrays of Independent Disks; RAIDs. Having reliable data storage is essential to a personal data storage plan.  Many folks may chuckle at the thought of a personal data storage plan but there are good reasons for having one - especially if you have lost data in the past.  If you store large volumes of digital information such as tax records, medical records, vehicle documents, digital music, movies, photo collections, graphics, and libraries of source code or articles then a storage data plan is essential to reduce the risk of loss. Loss can occur due to disk drive failures, accidental deletions, or hardware controller failures that wipe drives.  Online services offer affordable subscription plans to back up your PC or store data in a cloud but you risk the loss of privacy when using these services. Having local, portable, and reliable data storage is the best approach and a personal data management plan is the centerpiece of the effort. 

Such a plan should be designed around two points. First, there should be a portable detachable and reliable independent disk drive system. Second, there should be a backup system. We will focus on the first point in this post. 

Figure 1: Completed RAID
After a lot of research, I settled on a barebones SANS Disk Raid TR4UT+(B) model, Figure 1. The device has a maximum capacity of 16 TBs and supports up to USB 3.0 and has an option to operate from a controller card improving data transfer raters over USB 3.0.  Fault tolerance methods of cloning, numerous RAID levels, and JBOD are supported as well.  Thus, the unit is well poised for a long term durable use.  

Since the device is barebones, I had to find drives that are compatible. Fortunately, the device was compatible with 11 different drives ranging from 500MB up to 4TBs across three vendors.  I had to figure out what characteristics mattered and determine which of the drives were optimal for my needs. The approach I used was a spreadsheet matrix, Figure 2. The illustrated matrix is a shortened form and did not consider the 4TB drive as it was cost prohibitive from the start as were several of the other drives. The 3 TB drive was used to breakout or create a spread for the other options. I computed the coefficient of performance, CP, then averaged them for the overall performance. In the end, I selected the Hitachi UltraStar 1TB in this example and purchased 4 of them. They are a high end server drive that are quiet and can sustain high data transfer rates for long periods of time. 

Figure 2: Decision Matrix for Drive Selection and Purchase

Figure 3: Installed Drives 
After selection, purchasing, and installation of the drives, Figure 3, RAID 5 was selected for the drive configuration. RAID 5 permitted hot swappable drives should one fail and provided more disk space than the other RAID modes. RAID 5 is a cost effective mode providing good performance and redundancy. Although, writes are a little slow. 

The final part of the process was to initialize and format the drives. File Allocation Tables, FAT and FAT32 are not viable options as they provide little recovery support.  New Technology File System, NTFS, improves reliability and security among other features. However, there is an emergent file system GUID Partition Table, GPT, which improves upon NTFS and breaks through older limitations. Current versions of Mac OS and MS Windows support this file system on a read and write level. Therefore, in a forward looking expectation of future movement towards this file system, the RAID was initialized then formatted with GPT. The formatting process was slow and took a long time. 

In the end, the RAID unit was accessible by both Windows and the MacBook Pro. All the data and personal information on disparate USB drives, memory sticks, and the local machines were consolidated to the RAID device. For the first time all my music, movies, professional files, and personal data were in one place with the strongest protection. The final cost was less than $650. The cost can be kept down if you shop around for the components: Amazon. It took about 8 hours of direct effort. Although the formatting and files transfers occurred as I did other things. 

While I will still use my memory sticks and a 1 TB portable USB drive with my notebooks, the RAID is the primary storage device. It can be moved relatively easily if I change locations and/or swap it between computers if necessary. The device can also be installed as a serverless network drive and hung off of a wireless router. I prefer not to use it in that manner as the risk of exposure or loss of privacy slightly increases.

Overall, the system is quiet and has a low power drain while in operation with heightened data protection. I encourage others to rethink how they are storing their data and invest in a solid reliable solution.  As the solid state drive come into increasing use, the traditional silver oxide platter drives will drop in price dramatically.  This will enable more folks to build drive arrays like mine at lower costs then convert them later to the solid state systems as those prices drop.