How Facebook Leverages NVMe™ Cloud Storage in the Datacenter

Blog

By Ross Stenfort, Facebook

Source: Accelerating Facebook’s Hardware Infrastructure, Vijay Rao

Many of today’s data center challenges are met head-on with NVMe™ solutions, which work to improve performance and endurance and strategically address a variety of challenges. These include the ever-evolving needs of security, methods for effective systems communications across the data center ecosystem, and accurately scaling. Facebook is one of the world’s largest cloud market leaders and we employ NVMe cloud storage technology in our data centers around the clock to address hyperscale issues.

How Facebook Manages the Data of 2.5 Billion Users Each Month with NVMe Technology
In 2019, Facebook numbers show that we have over 2.7 billion users, add 1 billion Instagram users, and 1.3 billion more on Messenger. Now imagine the amount of data that needs to be transferred and stored. Our data centers are enormous and must keep up with over 2.5 billion people who use our products every month. We leverage NVMe technology to maintain our data environment and overcome challenges found with de-allocation, scaling, debugging, SSD endurance, performance testing and security.

NVMe Tackles Facebook Data Center Issues: A Challenge-by-Challenge Overview
NVMe architecture works like a virtual superhero when it comes to the Facebook challenges mentioned above. I have created a breakdown of how NVMe technology comes into play for each challenge.

Challenge One: De-allocation
As latency is the main issue with de-allocation and old methods are very time consuming and difficult, the improved NVMe 1.4 specification allows the SSD to advertise its preferred de-allocation size and enables the systems to be optimized as standard mechanisms. In terms of garbage collection, NVMe technology is also useful, because it sends a hint from the system to help garbage collect while reducing write amplification and improving performance and endurance.

Challenge Two: Managing at Scale
Since Facebook does not allow unique vendor tools within its data centers, our vendors run into a restricted access problem and they need a solution to offset this issue. We use NVMe Command Line Tools (CLI) so that each vendor can have unique aspects and access, CLI also provides a standard way of reading everything and is open sourced.

Challenge Three: Suppliers Drive Debugging
Another restriction issue for vendors comes into play during the debugging process. Facebook had to solve so that supplies could debug their drives without physical access to our data centers. We employ NVMe telemetry technology to enable smooth communication with our vendors.

Challenge Four: How to Prevent SSD Drive Fatigue
Facebook manages issues of SSD endurance carefully, as the SSD may exceed its endurance. Despite having a specific number of drive writes per day, the expected lifetime of the write intensive workload can eventually wear out. NVMe’s Namespace Management command nicely alleviates this problem. For example, Facebook engineers can take a 512 GB drive, configure it as a 256 GB drive. The application sees a 256 GB drive, but twice the number of drive writes per day are available because the overall capacity seen by the application is reduced.

Challenge Five: Performance Testing
Performance testing can be tricky if a lot of trim is being executed because it can be difficult to tell how many blocks have data in them and how many do not. So, to understand how much free space exists, Facebook again uses NVMe Namespace technology to determine the number of LBAs that contain data.

Challenge Six: Security
NVMe technology solutions are helpful when it comes to Facebook security issues. For example, NVMe’s secure send/receive commands will allow for security protocols to be tunneled right back into NVMe technology. Facebook also uses NVMe architecture as an open source tool for Opal security to secure boot issues.

Learn More About NVMe Technology and Facebook
To learn more, please join at our next webinar: “How Facebook and Microsoft Leverage NVMe™ Cloud Storage” where you can gain additional insight to how hyperscalers like Facebook and Microsoft chose NVMe flash for the storage.

For an in-depth look watch my FMS 2019 presentation on how NVMe technology and Facebook work together.