HPE Unveils Mysterious AI Supercomputer Platform
HPE has finally lifted the veil on its highly anticipated generative AI supercomputer platform, designed to assist enterprises in creating, fine-tuning, and running powerful large-scale language models within their own data centers.
On this occasion, HPE and its competitor Supermicro have both announced significant updates to their product portfolios used for running generative AI workloads, including some powerful new servers equipped with Nvidia’s state-of-the-art Blackwell GPUs, which were unveiled at the recent GTC 2024 conference.
Having closely collaborated with Nvidia, HPE leveraged Nvidia’s expertise in high-performance computing to build a generative AI supercomputer, providing developers with all the software and services needed to build advanced models, along with robust computational capabilities.
HPE has stated that the generative AI supercomputing platform launched in November last year is now available for ordering, offering an ideal solution for enterprises needing to run AI projects on their local servers. This system is referred to as a full-stack solution for developing and training large language models, supported by the Nvidia GH200 Grace Hopper super chip and including everything required for getting started with generative AI, such as liquid cooling systems, accelerated computing, networking, storage, and AI services.
HPE mentions that this supercomputer platform is primarily aimed at large enterprises, research institutions, and government agencies, available for direct purchase or through HPE GreenLake on a pay-as-you-go basis. It comes pre-configured for fine-tuning and inference workloads, offering powerful computing, storage, software, networking capabilities, and consulting services to help enterprises venture into generative AI.
At its core, the system provides high-performance AI computing clusters supported by HPE ProLiant DL380a Gen11 servers and Nvidia H100 GPUs, integrating Nvidia Spectrum-X Ethernet networking technology and the BlueField-3 data processing unit for optimizing AI workloads. HPE has also added its own machine learning and analytics software, while the Nvidia AI Enterprise 5.0 platform includes Nvidia’s newly released NIM microservices to simplify AI development tasks.
HPE mentions that it will support various large language models, including proprietary and open-source versions. The system is ideal for lightweight fine-tuning of AI models, retrieval-enhanced generation, and scaling inference. It is claimed that fine-tuning a 700 billion parameter model based on Llama 2 on this 16-node system only takes six minutes.
Moreover, the product aims to bridge the gap in AI skills, with HPE Services offering the expertise needed for designing, deploying, and managing on-premises platforms and implementing AI projects.
HPE’s President and CEO, Antonio Neri, emphasizes the need for a “hybrid design approach” to support the entire AI lifecycle for many businesses. He explains, “From training and tuning models on-premises, in hosted facilities, or public clouds, to edge inference, AI is a hybrid cloud workload.”
AI Software Stack
While putting the finishing touches on the generative AI supercomputing platform, HPE has collaborated with Nvidia to develop various software systems required to utilize this platform, including the HPE Machine Learning Inference Software, which is now available as a technical preview starting today. This software will aid customers in deploying AI models quickly and securely on their infrastructure, integrating Nvidia’s new NIM microservices to access the optimized base models hosted in pre-built software containers.
Additionally, HPE has developed the RAG reference architecture, enabling large language models to leverage proprietary datasets for enhanced knowledge. HPE has also released the HPE Machine Learning Data Management Software, Machine Learning Development Environment Software, and Machine Learning Inference Software to support generative AI development.
Finally, HPE has revealed some upcoming new servers based on Nvidia’s new Blackwell GPU architecture, including the Nvidia GB200 Grace Blackwell Superchip, HDX B200, and HGXB100 GPU.
Supermicro Launches First Servers with Blackwell GPUs
While HPE is set to announce more details on Grace-based servers in the coming weeks, Supermicro seems to be one step ahead. At the GTC 2024 conference, Supermicro introduced a range of new servers with the new GB200 Grace Blackwell Superchip and Blackwell-based B200 and B100 Tensor Core GPUs. Moreover, Supermicro states that existing systems based on Nvidia HGX H100 and H200 are “ready” for the new GPUs, allowing customers to enhance their existing data center investments by simply purchasing the chips.
Supermicro announces that it will be the first company to launch servers with Nvidia HGX B200 8-GPU and HGX B100 8-GPU systems later this year. The new systems will feature 8 Nvidia Blackwell GPUs connected via the fifth-generation NBLink interconnect technology, delivering a bandwidth of 1.8TB per second. Supermicro pledges that the large language model training performance of the new systems will be 3 times better than systems based on Nvidia’s old Hopper architecture.
Kaustubh Sanghani, Vice President of Nvidia GPU Product Management, remarks, “Supermicro continues to deliver a range of amazing accelerated computing platform servers finely tuned for AI training and inference, meeting any demands of the current market.”
To meet the demands of on-premises large language model workloads, Supermicro is introducing a series of new MGX servers equipped with the GB200 Grace Blackwell Superchip, which is more powerful than standard GPU chips. The new Superchip features 2 Blackwell GPUs alongside multiple CPUs, promising a significant boost for AI inference workloads, boasting a 30x performance increase compared to the previous generation Superchip.
For cutting-edge large language model workloads, Supermicro provides a detailed look at a rack-level server based on the Nvidia GB200 NVL72, soon to be launched, connecting 36 Nvidia Grace CPUs and 72 Blackwell GPUs within a single rack. Each GPU in this configuration will utilize the latest Nvidia NVLink technology, offering a GPU-to-GPU communication speed of up to 1.8 terabits per second.