Huawei announces new supercomputer to train AI

The cluster’s innovative super node architecture significantly increases its overall computing power

At Huawei Connect 2023, David Wang, Executive Director of Huawei’s Board of Directors as well as Chairman of the ICT Infrastructure Management Board and President of Enterprise BG, shared Huawei’s experience supporting various industrial sectors during his speech “Accelerate Intelligence”. Wang also announced the launch of Huawei’s new Atlas 900 SuperCluster, a new AI computing cluster, the latest product in the Ascend series, which uses an entirely new architecture, along with nine new intelligent industrial solutions based on the reference architecture Huawei’s Intelligent Transformation, designed to meet the specific needs of industries such as finance, government, manufacturing, energy and railways. “A new chapter of intelligent transformation is opening up, with great opportunities and challenges ahead of us,” the manager said. “We must work together, delve into specific industry scenarios, and build a strong network backbone to power countless new AI models and applications. Together, we can help all industries in their intelligent transformation.”

With numerous recent discoveries relating to underlying models, a vast range of new AI models and applications are emerging and are increasingly being integrated into the most diverse sectors and scenarios. However, data, computing power, algorithms and application delivery are struggling to keep pace and will be critical to enabling the intelligent transformation of industries. To address these challenges, Wang called for joint efforts to support connectivity, computing, and smart industries, which will be key to addressing problems encountered in implementing artificial intelligence and scenario-specific models. Enormous computing power is critical for developing basic models. Moving away from traditional server “stacking” approaches, Huawei’s new AI clusters, supported by recent innovations, stand out at both the system and architectural levels. Furthermore, through the integration of computing power with transmission and storage, it has been possible to overcome current bottlenecks.

More and more baseline models trained with trillions of parameters are emerging, and so Huawei launched its Atlas 900 SuperCluster, designed specifically for training massive AI models. Atlas 900 SuperCluster comes with Huawei’s cutting-edge Xinghe Network CloudEngine XH16800 switch, with high-density 800GE ports. Overall, the cluster’s innovative super node architecture dramatically increases its overall computing power and takes the speed and efficiency of basic model training to a whole new level. Additionally, Huawei has leveraged its strengths in computing, storage, networking and energy to systematically improve system reliability at the component, node, cluster and service levels. System reliability is incredibly important for training large baseline models, and this approach has effectively extended the cluster’s ability to support continuous model training from several days to a month or more.

To further accelerate the development of the core model, Huawei also launched a more open and easy-to-use Neural Network (Cann) 7.0 computing architecture. This architecture is not only compatible with other available AI frameworks, acceleration libraries, and traditional core models, but also opens up lower-level functionality. More open capabilities ensure that AI frameworks and acceleration libraries can directly invoke and manage compute resources, so developers can customize their own high-performance operators to make their base models more unique and competitive. Huawei also updated its Ascend C programming language for Transformer network models. More efficient programming and simplified operator deployment logic reduces development time for a fusion operator from two person-months to two person-weeks, significantly accelerating the development of AI models and apps.

Wang also announced the publication of the White Paper “Accelerating Intelligent Transformation”. This is a collection of case studies and best practices from Huawei, its customers and partners, with the aim of helping all industries successfully begin their intelligent transformation journey. The White Paper finds that artificial intelligence is driving the industry upgrade by serving more and more scenarios and becoming an important growth engine for the progress of society. In particular, it is also stated that collaboration between different roles in business, academia and research circles will be important for new applications of artificial intelligence and for the development of the entire sector. Specifically, different players in the ecosystem will need to work together to ensure that AI is designed to benefit everyone by quickly identifying emerging trends, continuously pursuing technological innovation, and rapidly improving both engineering practices. To broaden and deepen the application of AI across industries, they will need to enable a broad range of models and applications. The White Paper, which has already received support from several academics, examines both the latest trends and developments in the field of artificial intelligence and explores 63 scenario-specific AI applications from 16 different industries. Additionally, it features a set of innovative intelligent transformation best practices across 18 different industries.