Prévia do material em texto
System Tuning Guide for AMD Instinct™ GPU Servers with EPYC 7002 CPUs Application Note Part Number: 57286_1.00 © 2021 Advanced Micro Devices Inc. All rights reserved. Disclaimer The information contained herein is for informational purposes only and is subject to change without notice. This document may contain technical inaccuracies, omissions, and typographical errors. AMD is under no obligation to update or otherwise correct this information. Advanced Micro Devices, Inc. makes no representations or warranties with respect to the accuracy or completeness of the contents of this document and assumes no liability of any kind, including the implied warranties of noninfringement, merchantability or fitness for particular purposes, with respect to the operation or use of AMD hardware, software, or other products described herein. No license, including implied or arising by estoppel, to any intellectual property rights is granted by this document. Terms and limitations applicable to the purchase or use of AMD’s products or technology are as set forth in a signed agreement between the parties or in AMD’s Standard Terms and Conditions of Sale. This information is subject to the terms and conditions of the Export Control MOU, as well as other applicable agreements, between AMD and the recipient of this document. You shall adhere to all applicable U.S., European, and other export laws, including but not limited to the U.S. Export Administration Regulations (“EAR”) (15 CFR Sections 730-774), and E.U. Council Regulation (EC) No 428/2009 of 5 May 2009. Further, pursuant to Section 740.6 of the EAR, You hereby certify that, except pursuant to a license granted by the United States Department of Commerce Bureau of Industry and Security or as otherwise permitted pursuant to a License Exception under the EAR, You will not (1) export, re-export, or release to a national of a country in Country Groups D:1, E:1, or E:2 any restricted technology, software, or source code it receives from AMD, or (2) export to Country Groups D:1, E:1, or E:2 the direct product of such technology or software, if such foreign produced direct product is subject to national security controls as identified on the Commerce Control List (currently found in Supplement 1 to Part 774 of EAR). For the most current Country Group listings, or for additional information about the EAR or Your obligations under those regulations, please refer to the U.S. Bureau of Industry and Security’s website at http://www.bis.doc.gov. Trademarks AMD, the AMD Arrow logo, and combinations thereof are trademarks of Advanced Micro Devices, Inc. PCIe is a registered trademark of PCI-SIG. Other product names used in this publication are for identification purposes only and may be trademarks of their respective companies. Dolby Laboratories, Inc. Manufactured under license from Dolby Laboratories. Rovi Corporation This device is protected by U.S. patents and other intellectual property rights. The use of Rovi Corporation's copy protection technology in the device must be authorized by Rovi Corporation and is intended for home and other limited pay-per-view uses only, unless otherwise authorized in writing by Rovi Corporation. Reverse engineering or disassembly is prohibited. USB Implementers Forum, Inc. USB Type-C and USB-C are trademarks of USB Implementers Forum, Inc. USE OF THIS PRODUCT IN ANY MANNER THAT COMPLIES WITH THE MPEG ACTUAL OR DE FACTO VIDEO AND/OR AUDIO STANDARDS IS EXPRESSLY PROHIBITED WITHOUT ALL NECESSARY LICENSES UNDER APPLICABLE PATENTS. SUCH LICENSES MAY BE ACQUIRED FROM VARIOUS THIRD PARTIES INCLUDING, BUT NOT LIMITED TO, THOSE LICENSES IN THE MPEG PATENT PORTFOLIO, WHICH ARE AVAILABLE FROM MPEG LA, L.L.C., 6312 S. FIDDLERS GREEN CIRCLE, SUITE 400E, GREENWOOD VILLAGE, COLORADO 80111. ii System Tuning Guide for AMD Instinct™ GPU Servers with EPYC 7002 CPUs 57286_1.00 © 2021 Advanced Micro Devices, Inc. AMD Confidential - Do not duplicate. Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2 SBIOS Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7 3 Optimized PCIe Performance Targets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 © 2021 Advanced Micro Devices, Inc. AMD Confidential - Do not duplicate. System Tuning Guide for AMD Instinct™ GPU Servers with EPYC 7002 CPUs 57286_1.00 iv System Tuning Guide for AMD Instinct™ GPU Servers with EPYC 7002 CPUs 57286_1.00 © 2021 Advanced Micro Devices, Inc. AMD Confidential - Do not duplicate. Figures 3 Optimized PCIe Performance Targets Figure 3–1 PCIe Transfer Types - Without Instinct Infinity Fabric Installed . . . . . . . . . . . . . . . . . . . . 10 Figure 3–2 Target Ranges with 18 Gbps Epyc Infinity Fabric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Figure 3–3 Target Ranges with 16 Gbps Epyc Infinity Fabric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 © 2021 Advanced Micro Devices, Inc. AMD Confidential - Do not duplicate. System Tuning Guide for AMD Instinct™ GPU Servers with EPYC 7002 CPUs 57286_1.00 Tables 2 SBIOS Settings Table 2–1 SBIOS Setting Descriptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7 © 2021 Advanced Micro Devices, Inc. AMD Confidential - Do not duplicate. System Tuning Guide for AMD Instinct™ GPU Servers with EPYC 7002 CPUs 57286_1.00 1 Introduction This application note lists the SBIOS settings and other means to target maximum performance for AMD Instinct™ GPU Servers with AMD Epyc 7002 CPUs. It defines how to tune system parameters for optimal PCIe® bandwidth and latency, lower CPU control latency, and higher GPU performance. Idle power may be increased in order to provide additional performance for GPU- centric applications. If this hinders the deployment, work with the systems manager to determine the settings favorable for the specific scenario. 2 SBIOS Settings This section describes the SBIOS settings, their priority and the relevance of each option. The settings that you enable are dependent on your deployment criteria. Read the notes in the table below and set appropriately. Note: The names and parameters for some of the SBIOS settings may vary across different platform vendors. The SBIOS setting table below pulls information from the Workload Tuning Guide for AMD EPYC™ 7002 Series Processor Based Servers. Table 2–1 SBIOS Setting Descriptions SBIOS Setting Priority Relevance Advanced ▷ PCIe ▷ Above 4G Decoding: Enable CRITICAL Necessary for GPU Large-Bar Support (All GPU memory mapped into PCIe® address space) and high-performance GPU DMA Enable Enhanced Preferred IO on all PCIe ports CRITICAL For peak PCIe performance it is necessary to run the AMD I/O Power Management Utility after every boot. Doing so will improve PCIe® bandwidth up to 60% (9.8GB/s improvement) for transfers between 256KB to 256MB. AMD CBS ▷ NBIO Common Options ▷ IOMMU: Disable* CRITICAL Refer to the IOMMU Note 1 below Advanced ▷ PCIe Subsystem Setting ▷ SR-IOV: Disable CRITICAL Disables Single Root IO Virtualization. AMD CBS ▷ NBIO Common Options ▷ PCIe Ten Bit Tag Support: Enable CRITICAL Improves PCIe® performance by enabling a larger number of active/outstanding transactions. With PCIe® Gen 4, to achieve full bandwidth, an adapter should support 10-bit extended tags. AMD CBS ▷ CPU Common Options ▷ Global C-state Control: Auto CRITICAL This should not be disabled, as the Clock Gating (CC1, or S/W C1) and Power Gating (CC6, or S/W C2) settings will be disabled. Introduction 7 © 2021 Advanced Micro Devices, Inc. AMD Confidential - Do not duplicate. System Tuning Guide for AMD Instinct™ GPU Servers with EPYC 7002 CPUs 57286_1.00 https://developer.amd.com/wp-content/resources/56745_0.80.pdf https://developer.amd.com/wp-content/resources/56745_0.80.pdf https://developer.amd.com/iopm-utility/https://developer.amd.com/iopm-utility/ SBIOS Setting Priority Relevance AMD CBS ▷ NBIO Common Options ▷ SMU Common Options ▷ Fixed SOC Pstate:P0 Important Algorithm Performance Boost which controls the P-States for the Data Fabric. Under certain scenarios, involving low bandwidth but latency-sensitive traffic (and memory latency checkers), the transition from low power to full power can adversely impact latency. Setting APBDIS to 1 (to disable APB) and specifying a fixed Infinity Fabric P-state of 0 will force the Infinity Fabric and memory controllers into full- power mode, eliminating any such latency jitter. • AMD CBS ▷ NBIO Common Options ▷ SMU Common Options ▷ xGMI Link Width Control: Manual • AMD CBS ▷ NBIO Common Options ▷ SMU Common Options ▷ xGMI Force Link Width: 2 • AMD CBS ▷ NBIO Common Options ▷ SMU Common Options ▷ xGMI Force Link Width Control: Force • AMD CBS ▷ NBIO Common Options ▷ SMU Common Options ▷ xGMI Force Link Width:1 • AMD CBS ▷ NBIO Common Options ▷ SMU Common Options ▷ xGMI Max Link Width Control: Manual Important Forces the Infinity Fabric links between the EPYC CPUs to maximum width (x16). AMD CBS ▷ SMU Debug Options ▷ SMU Feature Enable/Disable ▷ LCLK Deep Sleep: Disabled Recommended Disables CPU LCLK Deep Sleep • AMD CBS ▷ NBIO Common Options ▷ SMU Common Options ▷ Determinism Control: Manual • AMD CBS ▷ NBIO Common Options ▷ SMU Common Options ▷ Determinism Slider: Power Recommended Ensure maximum performance levels for each CPU in a large population of identically configured CPUs by throttling CPUs only when they reach the same cTDP AMD CBS ▷ UMC Common Options ▷ DDR4 Common Options ▷ DRAM Controller Configuration ▷ DRAM Power Options ▷ Power Down Enable: Disabled Recommended Prevents DRAM controllers from powering down for lower latency DRAM access Run cpupower idle-set -d 2 Recommended Disables the power-gating (C6) on all cores. 8 SBIOS Settings System Tuning Guide for AMD Instinct™ GPU Servers with EPYC 7002 CPUs 57286_1.00 © 2021 Advanced Micro Devices, Inc. AMD Confidential - Do not duplicate. SBIOS Setting Priority Relevance AMD CBS ▷ DF Common Options ▷ Memory Addressing ▷ NUMA nodes per socket Application Dependent With NPS1, all eight memory channels are interleaved. With NPS2, every four channels are interleaved with each other. With NPS4, every pair of channels is interleaved” Normal operations is NPS1, however AMD's machine learning data transfer library RCCL prefers NPS2. AMD CBS ▷ UMC Common Options ▷ DDR4 Common Options ▷ Security ▷ TSME: Disabled Some deployments require encryption Disable transparent secure memory encryption. The impact of this encryption is 5 ns–7 ns of additional memory latency. • AMD CBS ▷ DF Common Options ▷ Link ▷ 4-Link xGMI Max Speed: 18Gbps • AMD CBS ▷ DF Common Options ▷ Link ▷ 3-Link xGMI Max Speed: 18Gbps Important if supported Forces the xGMI links between the EPYC CPUs to max speed (18) if the server supports it. Up to 12.5% faster GPU-to-Remote CPU DRAM and GPU-to-GPU & GPU- to-NIC transfers • AMD CBS ▷ CPU Common Options ▷ Performance ▷ CCD/Core/Thread Enablement: Accept • AMD CBS ▷ CPU Common Options ▷ Performance ▷ CCD/Core/Thread Enablement ▷ SMT Control: Disable Application Dependent Disables SMT for higher per-core performance 1. In certain systems, it may be necessary to enable the IOMMU in SBIOS. When doing this, the operating system must be configured to set the IOMMU in PassThru mode. For Ubuntu this is performed by: a. Set IOMMU in the SBIOS to Enabled b. Edit /etc/default/grub and set the linux default line to: GRUB_CMDLINE_LINUX_DEFAULT="amd_iommu=on iommu=pt" c. Run 'update-grub' 3 Optimized PCIe Performance Targets This section describes the measured PCIe® performance developed from data gathered on an A+A GPU Server with eight AMD Instinct™ MI50 GPUs and dual AMD Epyc™ 7742 GPUs. This server connectes all eight MI50s directly to the 7742s via PCIe® Gen4 x16 ports, without PCIe® switches. The insertion of PCIe® switches between the GPUs and CPUs will reduce the bandwidth and increase the latency. Note: The SBIOS settings above were enabled to achieve these results. Systems which deviate from the above SBIOS implementation may achieve different results. The format described in the figures below matches the output generated by the rocm- bandwidth-test executed with no parameters on a dual socket AMD Epyc 7742 System Gen4 PCIe® server with no PCIe® switches and eight AMD Instinct™ MI50 Gen4 PCIe® GPUs. Note: This is as measured by the rocm-bandwidth-test v2.4.0 or later. Systems with differing boards or topologies will exhibit different performance. Optimized PCIe Performance Targets 9 © 2021 Advanced Micro Devices, Inc. AMD Confidential - Do not duplicate. System Tuning Guide for AMD Instinct™ GPU Servers with EPYC 7002 CPUs 57286_1.00 Figure 3–1 PCIe Transfer Types - Without Instinct Infinity Fabric Installed Figure 3–2 Target Ranges with 18 Gbps Epyc Infinity Fabric 10 Optimized PCIe Performance Targets System Tuning Guide for AMD Instinct™ GPU Servers with EPYC 7002 CPUs 57286_1.00 © 2021 Advanced Micro Devices, Inc. AMD Confidential - Do not duplicate. Figure 3–3 Target Ranges with 16 Gbps Epyc Infinity Fabric Optimized PCIe Performance Targets 11 © 2021 Advanced Micro Devices, Inc. AMD Confidential - Do not duplicate. System Tuning Guide for AMD Instinct™ GPU Servers with EPYC 7002 CPUs 57286_1.00 12 Optimized PCIe Performance Targets System Tuning Guide for AMD Instinct™ GPU Servers with EPYC 7002 CPUs 57286_1.00 © 2021 Advanced Micro Devices, Inc. AMD Confidential - Do not duplicate. System Tuning Guide for AMD Instinct™ GPU Servers with EPYC 7002 CPUs Contents Figures Tables 1 Introduction 2 SBIOS Settings Table 2–1 SBIOS Setting Descriptions 3 Optimized PCIe Performance Targets Figure 3–1 PCIe Transfer Types - Without Instinct Infinity Fabric Installed Figure 3–2 Target Ranges with 18 Gbps Epyc Infinity Fabric Figure 3–3 Target Ranges with 16 Gbps Epyc Infinity Fabric