We have found that some of the most commonly used benchmarks are flawed, and many research papers do not provide a clear enough picture of file system performance. We believe that a good performance evaluation should use micro-benchmarks to highlight both the good and bad qualities of a file system, as well as general-purpose benchmarks or traces to give an idea about how it would perform under expected and realistic workloads. Nevertheless, care should be taken to ensure that general-purpose benchmarks indeed accurately reflect the real-world workloads. In addition, benchmarks should scale well, and results should be reproducible and comparable across papers.
In this project, we survey file system benchmarks used in many recent research papers. We found that no single benchmark adequately measures file system performance. We show how some commonly acceptable and widely used benchmarks and benchmarking techniques can easily conceal overheads, unfairly over-emphasize overheads, or can in general emphasize or de-emphasize many of the file system's properties. We offer suggestions on how to create and conduct benchmarks so that they provide a more fair and accurate picture of file system performance.
We describe our views on the future of file system benchmarking. To that end, we have been developing several technologies: fine-grained file system tracing, efficient file system replaying, automated file system benchmarking tools, and low-overhead detailed file system behavior visualization tools.
This project now expands into evaluating new multi-dimensional optimization techniques for storage systems, with Big Data data sets as one key scientific case study; often, we use HDF5 based data sets of large sizes (many gigs to a few terabytes).
# | Title (click for html version) | Formats | Published In | Date | Comments |
1 | Performance and resource utilization of fuse user-space file systems | PDF BibTeX | ACM Transactions on Storage (TOS) | May 2019 | FUSE Article Online Appendices |
2 | Cluster and Single-Node Analysis of Long-Term Deduplication Patterns | PDF BibTeX | ACM Transactions on Storage (TOS) | May 2018 | |
3 | vNFS: Maximizing NFS Performance with Compounds and Vectorized I/O | PDF BibTeX | ACM Transactions on Storage (TOS) | Sep 2017 | |
4 | Filebench: A Flexible Framework for File System Benchmarking | BibTeX | ;login: The USENIX Magazine | Mar 2016 | |
5 | Unifying Biological Image Formats with HDF5 | BibTeX | Communications of the ACM | Oct 2009 | |
6 | Notes on a Nine Year Study of File System and Storage Benchmarking | BibTeX | Byte and Switch | Jul 2009 | |
7 | A Nine Year Study of File System and Storage Benchmarking | PS PDF BibTeX | ACM Transactions on Storage (TOS) | May 2008 | Online data appendix |
# | Title (click for html version) | Formats | Published In | Date | Comments |
1 | A Practical Auto-Tuning Framework for Storage Systems | PDF BibTeX | Stony Brook U. CS TechReport FSL-19-01 | Jan 2019 | Ph.D. Dissertation Defense |
2 | A Practical, Real-Time Auto-Tuning Framework for Storage Systems | PDF BibTeX | Stony Brook U. CS TechReport FSL-18-01 | Apr 2018 | Ph.D. Dissertation Proposal |
3 | Parametric Optimization of Storage Systems | PDF BibTeX | Stony Brook U. CS TechReport FSL-16-01 | Jan 2016 | Ph.D. Research Proficiency Exam (RPE) |
4 | Design and Implementation of an Open-Source Deduplication Platform for Research | PDF BibTeX | Stony Brook U. CS TechReport FSL-15-03 | Dec 2015 | Ph.D. Research Proficiency Exam (RPE) |
5 | A Context Aware Block Layer: The Case for Block Layer Deduplication | PDF BibTeX | Stony Brook U. CS TechReport FSL-12-04 | May 2012 | M.S. Thesis |
6 | A Nine Year Study of File System and Storage Benchmarking | PS PDF BibTeX | Stony Brook U. CS TechReport FSL-07-01 | May 2007 | Online data appendix |
7 | Versatile File System Tracing with Tracefs | PS PDF BibTeX | Stony Brook U. CS TechReport FSL-04-05 | Aug 2004 | M.S. Thesis |
# | Name (click for home page) | Program | Member Since |
1 | Umit Ibrahim Akgun | PhD | Sep 2017 |
2 | Tyler Estro | PhD | May 2018 |
# | Name (click for home page) | Program | Period | Current Location |
1 | Ming Chen | PhD | May 2012 - Apr 2017 | Software Engineer, Datadog, Datadog (New York, New York) |
2 | Nikolai Joukov | PhD | Jan 2004 - Dec 2006 | Research Staff Member, Storage and Data Services Research group, IBM T. J. Watson Research Center (Hawthorne, NY) |
3 | Sonam Mandal | PhD | Jun 2013 - Dec 2016 | Staff Software Engineer, LinkedIn (Sunnyvale, CA) |
4 | Wei Su | PhD | May 2019 - Dec 2021 | Performance and Capacity Engineer, Facebook (Menlo Park, CA) |
5 | Vasily Tarasov | PhD | Jan 2008 - Nov 2013 | Research Staff Member, Scale-out Storage Software, IBM Research - Almaden (San Jose, USA) |
6 | Avishay Traeger | PhD | Sep 2003 - Aug 2008 | Senior Principal Software Engineer, Red Hat (Raanana, Israel) |
7 | Charles P. Wright | PhD | May 2003 - May 2006 | Partner, Senior Software Architect, Illumon (New York, NY) |
8 | Prafful Agarwal | MS | Jan 2018 - Dec 2019 | |
9 | Akshat Aranya | MS | May 2003 - Aug 2004 | Software Development Engineer III, AWS Elemental, Elemental Technologies (Portland, OR) |
10 | Akshay Aurora | MS | Jan 2019 - Dec 2019 | Software Engineer, Databricks (San Francisco, CA) |
11 | Geetika Babu Bangera | MS | Jan 2017 - Dec 2017 | Member Technical Staff, Software, NetApp, Inc. (Sunnyvale, CA) |
12 | Arvind Chaudhary | MS | Sep 2014 - Dec 2015 | Member of Technical Staff, CNA group, VMware Inc. (Palo Alto, CA) |
13 | Udit Kaushik Chitalia | MS | May 2014 - May 2015 | Software Engineer, Twitter (San Francisco, CA) |
14 | Tyler Estro | MS | Sep 2017 - Apr 2018 | PhD candidate Stony Brook University, Computer Science Department |
15 | Sujay Godbole | MS | Sep 2008 - Dec 2009 | Member of Technical Staff, Core Storage Group (ESX), Vmware Inc. (Cambridge, MA) |
16 | Darshan Godhia | MS | Jan 2017 - Dec 2017 | Software Engineer, Youtube Engineering, Google (San Bruno, CA) |
17 | Shivanshu Goswami | MS | Aug 2016 - Dec 2017 | TBA |
18 | Mayur Jadhav | MS | Jun 2017 - May 2018 | GPGPU Machine Learning Engineer, Intel, Intel (Folsom, CA) |
19 | Pragesh Jagnani | MS | Jan 2019 - Dec 2019 | Software Development Engineer, Amazon Selection and Catalog Systems, Amazon (Seattle, WA) |
20 | Deepak Jain | MS | Sep 2012 - Dec 2013 | Member of Technical Staff, Project FVP - Engineering, Pernixdata Inc. (San Jose, USA) |
21 | Mehul Jain | MS | Jan 2019 - Dec 2019 | Software Engineer 2, Twitter (San Francisco, CA) |
22 | Farhaan Jalia | MS | Jan 2017 - Dec 2017 | Member of Technical Staff II, Cloud Native Group, VMware Inc. (Bellevue, WA) |
23 | Sagar Jeevan | MS | Jun 2019 - Dec 2019 | Software Engineer II, Dell Technologies (Isilon) (Seattle, WA) |
24 | Aneesh Joshi | MS | Aug 2019 - May 2020 | Member of Technical Staff, Core Data Path, Nutanix, Inc. (San Jose, CA) |
25 | Shobhit Khandelwal | MS | Jan 2019 - Dec 2019 | Member of Technical Staff, Pure Storage, Pure Storage (Mountain View, CA) |
26 | Koundinya Santhosh Kumar | MS | Sep 2010 - Dec 2011 | Senior Development Software Engineer, Advanced Software Development and Performance, SanDisk (Milpitas, CA) |
27 | Noopur Anil Maheshwari | MS | Aug 2017 - Dec 2018 | Software Engineer, HPE (Nimble Storage) (Sunnyvale, CA) |
28 | Manu Mathew | MS | Jan 2018 - Dec 2018 | Software Engineer, SolidFire Element OS, NetApp ( Raleigh, NC) |
29 | Amar Mudrankit | MS | Jan 2011 - May 2012 | Software Engineer, Advanced Development Group at Fusion-IO (San Jose, CA) |
30 | Ritika Nevatia | MS | Sep 2018 - Dec 2019 | Software Engineer, iCloud, Apple Inc. (Seattle, WA) |
31 | Dongju Ok | MS | Sep 2014 - May 2016 | Software Engineer, Application Team, Commvault Systems Inc. (Tinton Falls, NJ) |
32 | Karthikeyani Palanisami | MS | May 2012 - Jun 2013 | Member of Technical Staff, Project MARS - Engineering, NetApp Inc (Sunnyvale, USA) |
33 | Nidhi Panpalia | MS | Jan 2017 - Dec 2017 | Development Engineer, AWS Lambda, Amazon (Seattle, WA) |
34 | Dhanashri Patil | MS | Jan 2018 - Dec 2018 | Senior Software Engineer, Dell Technologies (Isilon) (Seattle, WA) |
35 | Dhivahar Perumal | MS | Sep 2018 - May 2019 | Software Engineer, Data Services Team (CASL), Nimble Storage - HPE (San Jose, USA) |
36 | Vinothkumar Raja | MS | Sep 2016 - Dec 2017 | Software Engineer, Pure Storage Inc. (Mountain View, CA) |
37 | Venkatakrishnan Rajagopalan | MS | Jan 2016 - Dec 2016 | Member of the Technical Staff, VMware Inc. (Palo Alto, CA) |
38 | Vineeth Ramesh | MS | Jan 2018 - Dec 2018 | Software Engineer, Dialpad, Dialpad (San Francisco, CA) |
39 | Rahul Rane | MS | Jan 2018 - Dec 2018 | Software Engineer, HPE (Nimble Storage) (Sunnyvale, CA) |
40 | Shubhi Rani | MS | Sep 2015 - Dec 2016 | Member of Technical Staff, VMware Inc. (Palo Alto, CA) |
41 | Nehil Shah | MS | Jan 2017 - Dec 2017 | Software Development Engineer, Amazon AWS Infrastructure - Enterprise Networking, Amazon (Seattle, WA) |
42 | Krapi Ravindra Shah | MS | Jan 2019 - Dec 2019 | Assistant VP, Data Platforms, Tradeweb Markets LLC. (Jersey City, NJ) |
43 | Rushabh Shah | MS | Jan 2017 - Dec 2017 | Software Engineer, Facebook Inc. (Menlo Park, CA) |
44 | Mukul Sharma | MS | Aug 2016 - Dec 2017 | Member of the Technical Staff, Core Data Path, Nutanix (San Jose, CA) |
45 | Varun Shastry | MS | Sep 2014 - Dec 2015 | Member of Technical Staff, Disaster Recovery Team, Nutanix Inc. (San Jose, CA) |
46 | Siddesh Shinde | MS | Jan 2018 - Dec 2018 | Member of Technical Staff, Core Data Path, Cohesity Inc (San Jose, CA) |
47 | Gyumin Sim | MS | Jan 2010 - Dec 2010 | Software Engineer, Data Center Power Team Google (Mountain View, CA) |
48 | Nilesh Somani | MS | May 2018 - Dec 2019 | Senior Software Engineer, Storage Team, Robin Systems Inc. (San Jose, CA) |
49 | Jatin Sood | MS | Jan 2019 - Dec 2019 | |
50 | Aayush Sureka | MS | Sep 2018 - Dec 2019 | Principal Member of Technical Staff, Transactions group, Oracle Database (Redwood Shores, CA) |
51 | Sagar Trehan | MS | Sep 2012 - Dec 2013 | Member of Technical Staff, CASL Performance Group - Engineering, Nimble Storage Inc (San Jose, USA) |
52 | Maryia Maskaliova | BS/MS | May 2017 - May 2018 | Software Engineer, Android Performance |
53 | Leixiang Wu | BS/MS | Feb 2015 - May 2017 | Software Development Engineer, Amazon Prime Video, Amazon (Seattle, WA) |
54 | Amrith Arunachalam | BS | May 2018 - Dec 2018 | |
55 | Abraham Spitalny | BS | Jul 2019 - Dec 2019 | Software Development Engineer I, Ads, Amazon (New York, NY) |
56 | Kevin Sun | BS | May 2017 - May 2018 | |
57 | Tim Wong | BS | Dec 2004 - Jun 2005 | Associate, Volatility Arbitrage, Global Asset Allocation, Applied Quantitative Research (Greenwich, CT) |
58 | Yinuo Zhang | BS | Aug 2019 - May 2020 | |
59 | Henry Nelson | HS | Sep 2015 - Aug 2017 | CS undergraduate at CMU |