Review: Cohesity Data Platform

Product Review:  Cohesity Storage Architecture – A Hyperconverged Storage Platform

The Cohesity data management system seeks to enable enterprises to be able to wrangle their chaotic data growth and complex storage environments by hyperconverging secondary storage onto an infinitely scalable, intelligent data platform.

SnapTree allows for frequent and near instant snapshots of the data while preserving data hydration. This supports the most stringent RPO/RTO goals imaginable. While many systems limit the frequency and maintain a low ceiling of maximum snapshots possible, SnapTree allows Cohesity users to take as many snapshots as desired without constraints.

The core platform includes global deduplication and compression across all nodes, in-line and post-process, replication that is optimized for multi-site protection, real-time indexing, and cloud integration.

Related: Storage Resource Monitor helps administrators manage resources, capacity and troubleshoot storage environments

Cohesity has built a web-scale, distributed, multi-layered framework of secondary storage. It consists of physical layers and software layers that labor together to support the application level that ultimately manifests and supports the various systematic functions.

Coheisty Secondary Storage Workflow
Coheisty Secondary Storage Workflow

The Cohesity website describes its product as “physically a shared-nothing distributed architecture.” It allows for low-cost, high performance commodity hardware and contains either three or four nodes in each system. Each node has compute and storage resources linked together by a dual 10GbE network; software connects these nodes to operate and work as a single, coherent system.

Related: Virtualization Manager, helps administrators manage resources, capacity and performance in the virtual environment

The foundation of Cohesity’s platform is its Open Architecture for Scalable Intelligent Storage (OASIS) filesystem that operates by consolidating multiple data storage workloads into its single platform. One of the most exceptional elements of the system is SnapTree.
SnapTree allows for frequent and near instant snapshots of the data while preserving data hydration. This supports the most stringent RPO/RTO goals imaginable. While many systems limit the frequency and maintain a low ceiling of maximum snapshots possible, SnapTree allows Cohesity users to take as many snapshots as desired without constraints. Another key part is a true global deduplication capability that ensures that the same dedup block is not written twice in the nodes.

Related: Server and Application Monitor, monitors/alerts on the health and status of servers and their applications

OASIS is created by utilizing several different constituents that are assigned to specific, singular roles to allow flawless operations simultaneously. These components enable seamless scaling even as new nodes are added or tweaked. This ability also ensures high availability of all hardware and software parts. The system uses hardware resources including compute and different tiers of storage (SSDs, HDDs, Cloud) to manage multiple transactions and quality of service levels for different workloads that co-exist on the system.

For maintaining data integrity, Cohesity has crated software to make it easy to upload data. Cohesity calls it TOWS or Tier-Optimized Write Scheme. Spinning disks prefer sequential I/O and thus writes data out-of-place; an SSD doesn’t mind random I/O so incoming writes are placed in the “correct” location straight away.

Related: Network Performance Monitor monitors/alerts on the health and status of network nodes and interfaces

The full power of the OASIS file system works on a set of interfaces that together constitute the service layer. This service layer is pivotal to the power of the filesystem in numerous different storage workflows and supports storage protocols such as NFS and SMB.

It also enables replication between different clusters to support disaster recovery and data availability. It has built-in search and a MapReduce framework to support instant search and file content analytics.  If one of the standard analytics workloads isn’t enough, personalized code can be interjected via a Java interface and run for example a customized search for SSNs.

Related: Database Performance Analyzer, detects, diagnoses and helps resolve the root cause of long wait times and database performance issues, for MySQL, Oracle and DB2 and SQL Server

The application layer supports all the storage workflows which currently include comprehensive data protection, Test/Dev, File Shares, Analytics and Cloud integration. The application layer provides cloning, scheduling, policy management, backup software, application adapters, integration with LDAP/AD, data archival, pre-built analytics apps, as well as a powerful architecture for creating custom analytic applications.

Cohesity CS2000 Series

Cohesity sells its software bundled on Intel-based 2U CS2000 appliances as a minimum four-node cluster of hybrid storage (three node cluster options exist). CS2000 series is a standards based hardware platform, built on high-end commodity components. The system is built to be integrated into current set-ups and incorporate data storage systems already set up.

The discounted price for a four-node box would come to somewhere between $80,000 and $100,000. According to founder Aron, any other backup storage and software with a similar capacity would be running at a cost of at least $200K or more.

In an interview, Mohit Aron noted that they “are building the infrastructure and the platform that can deploy some native applications to solve these customer use cases. In the future, we want to expand and have third parties write software on our platform.” Cohesity does not seek to create a monopoly on their own system and wants to create partnerships to further develop and unlock the potential in the system.

Cohesity Tech Spec C2300 C2500
Cohesity Tech Spec C2300 C2500

Cohesity serves as an alternative to other enterprise systems like ClearSky Data, which serves customers by tiering the data between data storage system, cloud integration and PPS. Other options include all-flash options like Pure Storage that aim to allow companies to build and maintain their own data storage management center as they need.

Though Cohesity manages all of the information in a centralized hub, the replication and movement of data within the system aims to protect backed up data and disaster recovery. So even if one part of the data machine crashes, the entirety is not at risk.

With style reminiscent of the Google file system that Aron himself helped develop and the visionary of Nutanix steering the ship at Cohesity, it is surely an option to consider and watch evolve in the coming days.

Additional Resources:

Cohesity Data Sheet

Cohesity White Paper

Cohesity Spotlight Report

For those in the Chicago area, Cohesity will be having a presentation at Wildfire on April 5.  Visit their event page for more information and to register.

Lindsey Cobb

Lindsey Cobb, a Georgia native and former history major, is a technology researcher who is fascinated by past and future of technology. When she is not engrossed in the prophecy of science fiction stories, Lindsey is likely to be planning her next adventurous trip or petting every dog she meets. Contact Lindsey at [email protected]