Utilizing NVMe® Technology for Meta’s Hyperscale Cloud Storage

Blog

By Ross Stenfort, Hardware System Engineer, Meta

At Meta, our goal is to give our users the optimal user experience. The NVM Express® (NVMe®) 2.0 specifications, which were released this past June, were restructured for easier development and contain features that address new use cases for the cloud. In this blog, I’ll discuss how hyperscale cloud data centers utilize NVMe technology and the standout features for cloud applications.

NVMe technology scalability in Meta’s cloud storage

To paint a picture of Meta at scale, Meta has billions of users using its suite of applications, including Facebook, Messenger, WhatsApp and Instagram. As our infrastructure needs to scale with our applications and as new features are added, we require our storage solutions to scale as well.

NVMe technology provides the low latency, high performance, and scalability needed for these hyperscale scenarios. The NVMe architecture has been a part of our servers since 2013 and we use NVMe technology to connect our flash and SSDs.

Benefits of NVM Express technology performance and management features

NVMe technology  offers improved performance, Quality of Service (QoS) and scalability compared to legacy standards. Its management capabilities have also proved to be beneficial with the use of the NVMe Management Interface (NVMe-MI™) specification.

There are three categories we have found NVMe technology to be useful for cloud applications:

  1. NVMe SSDs can be used for caching applications
  2. NVMe SSDs can be applied to database applications
  3. NVMe SSDs can also be used as boot drives on servers

Large-scale companies such as ours often have cloud applications that rely on being able to host multiple applications or multiple customers on the same set of hardware – often referred to as multi-tenancy. Overall, NVMe technology will allow companies to both improve the efficiency of these services and easily scale their services.

NVMe 2.0 specifications meet hyperscale cloud data center needs

New features such as Domains and Partitions and Endurance Group Management will benefit hyperscale cloud data centers. Domains and Partitions addresses the multi-tenancy use case, because it partitions SSDs effectively and removes conflict between tenants — helping performance, latency and isolation. With Endurance Group Management, we can optimize the use of an SSD based on a specific usage scenario. For example, if the customer only uses 256 GB of capacity to store their data, and the total amount of NAND Flash is 512 GB, using Endurance Group Management the SSD may be re-configured as a 256 GB SSD with twice the number of drive writes per day because we’ve reduced the capacity that’s seen by the application.

Telemetry offers a standardized mechanism of debugging drives in the data center with no physical access – meaning remote debug information can be used to resolve issues. Telemetry optimizations and enhanced log pages are enabling us to manage applications more effectively.

Finally, the reorganization of the NVMe 2.0 specifications will help cloud users to innovate more quickly. This is extremely important with cloud applications because new use cases and increased workloads are constantly emerging. The new specifications restructure will allow NVMe technology to keep pace with those emerging use cases.

Learn More

Recently, I was interviewed about the NVMe cloud market and the impactful new NVMe 2.0 specifications features. I invite you to watch my video to learn more.

View the series of NVMe 2.0 specifications videos on the NVM Express YouTube Channel to learn more about the latest features and market use cases. If you’re a cloud provider and looking to learn more about NVMe technology, you can download the latest specifications.