We've been building a system for higher level management of storage assets in a cloud context for some time. The Elastacloud Data Catalogue (EDC) manages the state, history and the utilisation of Cloud Storage. Unlike other offering the EDC takes a data oriented view of storage, rather than a directory oriented view. The fundamental difference it that we are managing how data is used and the effects that has on Storage, and not treating the storage as a directory of unknown content. EDC deals with data content, origin, lineage, provenance, quality, existence, utilisation and metrics.
Core to this is the building of a graph database that stores the assets under management on the low level storage medium as edges associated with vertices that relate them together. This allows for the traversal of the database, discovering related files, measuring distances and establishing provenance of data assets held in BI tooling; the natural visualisation facet of advanced analytics.
We achieve this with low friction meta-management, as shown in this diagram: