I had an interesting conversation with Eric Toczek at Storage Made Easy on our company scrum call recently about ‘cloud locked data’ that I subsequently thought was worthy of a short blog post.
The main premise of our call was to figure out how to help a company move their data from one cloud storage provider to another and the difficulties therein.
In this particular case the difficulties boiled down to:
(i) The amount of data: Around 5 Petabytes stored that required moving
(ii) Getting the data out of the service: This was a B2B SaaS business share and sync product and it implemented per day data caps on data movement and, given the API was the only way to get the data out, it also implemented API throttling and/or ‘back-off’ requests. This meant getting the data out was anything but trivial.
(iii) Permissions on data: The permissions on data are proprietary to the service and these did not come down with the service ‘on the way out’.
(iv) Old data: How much of the 5PB was old, stale data no longer required ? The company did not know so there was a chicken and egg situation of the effort required to discover which data was no longer useful and could be deleted rather than moved.
In our experience this type of situation is not unusual. In this age of digital transformation companies should be just as aware of how they can get their data ‘out’ of a cloud storage service as well as how they get their data ‘in’. I’ve seen many online arguments about how it is impractical on Cloud to be moving large amounts of data around once committed. I ask what is the alternative ? Locked in for ever ? Increased charges because you can’t get out ? Feature envy because the service you chose is lagging behind ? Poor service because you can’t move ? It makes no sense to be locked in because there was no detailed plan for future data portability.
The example I referred to is a commercial Sync and Share solution wrapped with storage, but what if it was just a pure storage service such as Amazon S3, Azure Blob Storage, or Google Cloud Storage ? Surely things there would be much more open, yes ? Actually not necessarily. In our experience third party S3, Azure, Google products that can act as an interface into these clouds very often proprietize the data, meaning that the stored data cannot be accessed without having to use the vendor supplied application.
Why is this a problem ? The Amazon / Azure / Google ecosystems have an every increasing plethora of services to enable companies to extract the maximum value from their data assets be this analytics at one end of the spectrum to AI and deep learning at the other. If the data is proprietized then Companies instantly lose access to to using these ecosystems because the stored data is in a format that cannot be understood. Some vendors may provide some sort of connector to get the data out and back into its native format but they can often charge for this and the process can become more convoluted, onerous and costly the larger the data set becomes.
So what are the key takeaways or recommendations with regards to cloud stored data:
1. Always ask / investigate the right questions on the way in particularly with regards how data can be taken out of the service, if required, at a future point. If necessary ask for some contractual commitment but note that this is unlikely with a B2B type SaaS service.
2. Check whether the vendor being chosen stores the data in proprietary format particular if the data is going to be stored on one of the big three Cloud vendors where there is access to other tools and Apps to gain more value from the data.
3. Consider a multi-cloud strategy. There are varying service and costs models that can accommodate large archives of data at a fairly low cost providing future choice.
4. Consider lock-in – your data should not be locked solely to a particular vendor. Think about what you would have to do if you required direct access to the data prior to procuring service. Solutions like the File Fabric allow the best of both worlds as companies do not have to migrate old data to leverage new features as they can add new storage providers and migrate old ones over time whilst still leveraging the same technology. The Global File System enables Folder shares on the old storage provider and the new storage provider to be transparent to the end user with direct access to the data still being possible from old and new services.