SUMMARY
Prevent Intel 82574 NICs from going offline
ISSUE
Purpose
Prevent Intel 82574 NICs from going offline
Description
Some Unitrends systems, like R813, have Intel 82574 NICs which are vulnerable to certain ASPM faults which will cause the NIC to stop working and accumulate errors.
Below are typical symptoms when this occurs.
# lspci |grep Eth 04:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection 05:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection (rev ff) # ifconfig eth0 eth0 Link encap:Ethernet HWaddr 00:25:90:C7:41:AF inet addr:192.168.134.51 Bcast:192.168.134.255 Mask:255.255.255.0 inet6 addr: fe80::225:90ff:fec7:41af/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:215 errors:747324309330 dropped:124554051555 overruns:0 frame:498216206220 TX packets:38 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:26075 (25.4 KiB) TX bytes:5512 (5.3 KiB) Interrupt:17 Memory:fb6e0000-fb700000 # ethtool -i eth0 driver: e1000e version: 2.3.2-k firmware-version: 1.9-0 bus-info: 0000:05:00.0 supports-statistics: yes supports-test: yes supports-eeprom-access: yes supports-register-dump: yes supports-priv-flags: no
Cause
The Intel 82574 NICs are vulnerable to certain ASPM faults with the Linux e1000e 2.1.4 or greater driver.
Resolution
If you have unitrends-drivers-2.6.32_279.el6.x86_64-2014011025.x86_64 or later from release 7.4.0, this should already be resolved, unless you have added a NIC card with 82574s after installing it.
If you are experiencing this problem with the symptoms above, it can be resolved as follows:
1) Download the nic82574.sh script
wget ftp://ftp.unitrends.com/support/scripts/nic82574.sh
2) Run the script to patch the NIC EEPROM and set a kernel parameter to avoid the problem.
sh nic32574.sh
3) Reboot
Third-Party Sources
http://sourceforge.net/projects/e1000/files/e1000e%20stable/eeprom_fix_82574_or_82583/ where Intel says it can be fixed with an EEPROM bit change via a Linux shellscript.
https://bugzilla.redhat.com/show_bug.cgi?id=632650 reports an important change in e1000e 2.1.4 and later around ASPM. The user resolution was to set
’pcie_aspm=off’ in the kernel params.