If there is one area that has been a consistent disaster in terms of IT infrastructure, it has been the storage and management of all of the unstructured data, essentially MS Office files, PDF’s, maybe now MP3 recordings of conference calls, etc etc. Most firms have gone from having hundreds of Windows file servers to a bunch of big network attached storage clusters, and/or Sharepoint databases. The data contained on these infrastructure elements is growing exponentially, is mostly valueless to the firm, is insecure, and in incredibly expensive to store, back up, and manage…..and most firms do it badly. I consistently blame Microsoft for this but in truth the entire industry carries lots of blame…
Anyway every year I update my belief system about what IT should be doing to improve in this area, or what I think the top five best practices should be.
1. Move most end user unstructured data to Sharepoint. The fact that the metadata is contained in a SQL database means you have a much better shot at securing, classifying, and understanding what is in all those MS Office Documents. In addition archival and versioning mean you have some ability to age out redundant or old data.
a. Modern versions of Sharepoint allow you to keep the metadata in SQL but “blob out” the files (ie store them in standard file folder structures. This makes using Sharepoint your prime unstructured data repository possible by reducing costs
b. On premise Sharepoint can also be integrated with SkyDrive Pro (Essentially Sharepoint in Office 365) to really reduce cost.
2. Separate infrastructure data needing high performance/high availability from basic MS Office files. Enterprises tend to spend way too much on NAS because they put user profile data and application data on the same infrastructure as Office documents. If end users can’t get to their profile data, they often can’t login. Typical answer. High resiliency/high availability NAS across two data centres. Bah!
a. Put the profile data into a profile management tool like Appsense, which stores this critical info in SQL
b. Put application logs and data on smaller, high performance disk (maybe SSD)
c. and put Office files in less expensive, resilient configurations, whether Sharepoint, NAS, or ideally the Cloud.
3. Get your self one of the new breed of unstructured data management tools that provide audit/access controls/usage patterns/etc. Think Varonis or Stealthbits. IT Infrastructure teams need to get these tools in place to deal with audit and classification pressures..and once you have them make sure you deploy the elements of those products that put accountability for managing access controls to the data back to the data owner! IT should not be in the access to data business. We spent inordinate amounts of time and resource granting or taking away access to data we don’t understand or own, via requests from business folk who may or may not have authority to permission us to do it. STOP. Give the job back to the data owners.
4. Keep fighting the good fight on compliance and data retention. It is not easy to get businesses to recognize that keeping a backup of Windows folders for 7 years is NOT a regulatory retention strategy. But keep trying. Cloud archiving is getting cheaper and expanding from message type data (email, chat, etc) to broader unstructured data. We will eventually get business users to understand they need to tag data, move data they need to keep to archive locations, and go back to keeping no more than a few months of backup data.
5. Try the cloud. The economics are in arguable. Example: 100 gigs on Skydrive can be had for as little as $100 per year. On most Enterprise NAS configurations that cost would be between $1500 and $3000 per year.
a. I think all of Microsoft’s offerings in this space are interesting. SkyDrive, Skydrive Pro, Office 365, Azure
b. I particularly think the StorSimple on site appliance, backed with archiving/tiered storage in the Azure Cloud, needs to be looked at more closely for unstructured MS Office data
c. and you cannot beat Amazon Web Services for certain data types
well as usual I have run out of the energy to type long before I run out of things to complain about (smiley), but I do think the potential design patterns for dealing with unstructured data are simplifying and coming together, and smarter people than I will make some of this work