SUMMARY
Describe the problem and the resolution which causes UEB kernel panics or kernel hang messages on Hyper-V 2012 servers with NUMA spanning enabled. This problem only occurs with Hyper-V server 2012 / 2012R2 and CentOS6/RHEL6 VMs. The CentOS5/RHEL5 VMs do not recognize NUMA so do not see this problem. Prior versions of Hyper-V server with CentOS6/RHEL6 do not experience the problem either.
ISSUE
Issue
UEB kernel panics or kernel hang messages on Hyper-V 2012 servers with NUMA spanning enabled.This problem only occurs with Hyper-V server 2012 / 2012R2 and CentOS6/RHEL6 VMs. The CentOS5/RHEL5 VMs do not recognize NUMA so do not see this problem. Prior versions of Hyper-V server with CentOS6/RHEL6 do not experience the problem either.
Problem
Adding more than 6GB of RAM causes a kernel panic with the text "PANIC: early exception 06 rip 10:fffff… error 0 cr2 0". Usually the UEB is CentOS6 and the server is Hyper-V 2012 or 2012R2 with NUMA spanning enabled.Alternatively, with 6GB of RAM or less, this kernel hang message may appear, which will cause the system to hang during shutdown.
INFO: task flush-253:1:39591 blocked for more than 120 seconds. Not tainted 2.6.32-504.1.3.el6_bp.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. flush-253:1 D 0000000000000000 0 39591 2 0x00000080 ffff880037d3b9d0 0000000000000046 0000000000000000 ffff88017f857100 ffff880037d3b990 ffffffffa000461c 00003572e88abc43 ffffffff8113ae0c 0000000200b73b70 000000010377f21b ffff8800bd4a8638 ffff880037d3bfd8 Call Trace: [<ffffffffa000461c>] ? dm_table_unplug_all+0x5c/0x100 [dm_mod] |
Resolution
The root cause is a bug in Hyper-V 2012 NUMA logic which occurs with any CentOS6 or RHEL6 Linux VM. The workaround is to disable NUMA entirely in the Linux VM. This workaround is included in Hyper-V UEB 9.0.0 and later.
For the workaround, perform the following steps:
- In the UEB Linux VM, edit /boot/grub/grub.conf to add numa=off at the end of any kernel lines after the other kernel parameters. Example below.
Before:
kernel /vmlinuz-2.6.32-504.1.3.el6_bp.x86_64 ro root=UUID=56ec91c2-6524-45de-8db4-b15d90d992ae rd_NO_LUKS LANG=en_US.UTF-8 rd_NO_MD SYSFONT=latarcyrheb-sun16 crashkernel=auto rd_LVM_LV=vg_root/lv_swap KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM quiet loglevel=1 nmi_watchdog=0 rd_NO_PLYMOUTH max_loop=128 elevator=noop
After:
kernel /vmlinuz-2.6.32-504.1.3.el6_bp.x86_64 ro root=UUID=56ec91c2-6524-45de-8db4-b15d90d992ae rd_NO_LUKS LANG=en_US.UTF-8 rd_NO_MD SYSFONT=latarcyrheb-sun16 crashkernel=auto rd_LVM_LV=vg_root/lv_swap KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM quiet loglevel=1 nmi_watchdog=0 rd_NO_PLYMOUTH max_loop=128 elevator=noop numa=off
kernel /vmlinuz-2.6.32-504.1.3.el6_bp.x86_64 ro root=UUID=56ec91c2-6524-45de-8db4-b15d90d992ae rd_NO_LUKS LANG=en_US.UTF-8 rd_NO_MD SYSFONT=latarcyrheb-sun16 crashkernel=auto rd_LVM_LV=vg_root/lv_swap KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM quiet loglevel=1 nmi_watchdog=0 rd_NO_PLYMOUTH max_loop=128 elevator=noop
After:
kernel /vmlinuz-2.6.32-504.1.3.el6_bp.x86_64 ro root=UUID=56ec91c2-6524-45de-8db4-b15d90d992ae rd_NO_LUKS LANG=en_US.UTF-8 rd_NO_MD SYSFONT=latarcyrheb-sun16 crashkernel=auto rd_LVM_LV=vg_root/lv_swap KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM quiet loglevel=1 nmi_watchdog=0 rd_NO_PLYMOUTH max_loop=128 elevator=noop numa=off
- [Optional] Disable NUMA in "Hyper-V Settings", then restart the Hyper-V service
-
[Optional] In the UEB VM Settings, "Memory" settings, check the "Enable Dynamic Memory" option
References
Related social.technet Article about CentOS6 panics on Hyper-V 2012:https://social.technet.microsoft.com/Forums/windowsserver/en-US/4e7b89d8-62a4-4dcd-9181-c0d186c6060b/centos-63-on-windows-server-2012-hyperv-30-bug-panic-early-exception?forum=winserverhyperv
Microsoft TechNet: CentOS and RHEL virtual machines on Hyper-V:
https://docs.microsoft.com/en-us/windows-server/virtualization/hyper-v/Supported-CentOS-and-Red-Hat-Enterprise-Linux-virtual-machines-on-Hyper-V