"
Due to our new storage virtualization capability, we’ve realized an 80% space savings over tape. Backups now finish 16 times faster. Restore times have improved as well, up to six times faster on average.
"
— Sacramento County Dept. of Human Assistance
Why Data Deduplication?
Data deduplication is popular in disk storage today because it reduces the amount of disk space needed to store data. The average UNIX® or Windows® enterprise disk volume contains thousands or even millions of duplicate data objects. As these objects are modified, distributed, backed up, and archived, the duplicate data objects are stored repeatedly. The end result of this is inefficient use of storage resources. Deduplication helps to prevent this inefficiency.
How Much Space Does Deduplication Actually Save?
Deduplication vendors often claim that their products offer 20:1, 50:1, or even greater data reduction ratios. These claims actually refer to the "time-based" space savings effect of deduplication on repetitive data backups. Since data backups contain largely unchanged data, once the first full backup has been stored, all subsequent full backups will see a very high occurrence of deduplication.
In non-backup data environments, such as file archival or infrequently accessed unstructured data, the rules of time-based data reduction ratios do not apply. In these environments, volumes do not receive a steady supply of redundant full backups, but may still contain a large amount of resident duplicate data objects. The ability to reduce the space requirements in these volumes through deduplication is measured in "spatial" terms. In other words, if a 500GB data archival volume can be reduced to 300GB through deduplication, the spatial reduction is 40%.
Space savings through deduplication can provide substantial cost benefits through reduced storage capacity requirements. As with any new technology, however, there are many approaches and techniques being implemented to accomplish data deduplication. Because of this, users should look beyond the actual deduplication of data and also consider design factors such as data reliability and performance overhead.
About Champion
Champion Solutions Group is an Independent Solutions Provider that specializes in solutions and services that reduce costs, increase productivity, and improve application availability for more than thirty years. Our solution offerings include Virtualization, Data Management, Business Continuity, Maintenance Consulting and Managed Services. Headquartered in Boca Raton, Florida, Champion has office locations in these districts: Florida, Georgia, the Carolinas, DC/Virginia/Maryland, Delaware Valley (Philadelphia / New Jersey), Connecticut, Massachusetts, Ohio and western Pennsylvania.