SPEAKER: Philip Shilane (EMC, Data Domain) TITLE: Topics in Deduplicated Backup Storage ABSTRACT: Any individual or organization with critical data has to protect their data against disasters. Backup storage in particular is growing rapidly because customers want to retain multiple weeks or months of backups, so backup storage needs to ingest the entire primary storage system each week and retain many near-identical versions. Deduplicated storage is a promising technique to deal with increasing data growth, throughput requirements, and storage needs in light of the special properties of backup storage by replacing repeated data patterns with references to an earlier copy. This talk will provide background in backup storage and describe the basic architecture of a deduplicated storage systems in the first hour and then become a free-ranging discussion of how design goals interact with architecture decisions. If we optimize deduplication compression, have we hurt our read back performance? How do we clean up deleted data when references span multiple terabytes or petabytes of storage and there are too many references to maintain in memory? Recent research on deduplication for primary storage will be briefly discussed. Food will be provided, and there will be a raffle item. BIO: Philip Shilane has worked for EMC and before that Data Domain since 2006 in the area of deduplicated backup storage. Phil's projects include improving storage compression and performance, WAN optimization, dataset analysis, and synthetic backup generation. Phil works for EMC in Princeton, NJ under the CTO of the Core Technology Division. His research is regularly published in Venues such as the USENIX FAST and ATC. He is an inventor of 17 US patents with 40+ currently under submission. He graduated from Princeton University in 2008 with a PhD in CS focusing in 3D object retrieval and similarity analysis. Before that, he completed MS and BS degrees in CS from Stanford University focusing in artificial intelligence.