Apr 29

Don’t forget to change the Registry before upgrading your Cisco 4500-E with a SUP-8E!!!

I recently acquired a brand new Cisco 4506-E with the Supervisor 8-E. The Supervisor came with the latest K9 version software. This was very odd for me, I almost always have to upgrade my new network equipment when I get it. After configuring the switch I relocated it to the wiring closet where it is going to spend the rest of it’s life.

Two days before the scheduled installation, I turned it on to make a couple changes that needed to be made. Once it was booted up, I issued the “show ip int brief” command and the only interfaces that showed up were the tengigabit interfaces on the Supervisor. I did some more digging and found the following error in the log %C4K_CHASSIS-3-BACKPLANESEEPROMREADFAILED. Cisco’s Error Message Decoder stated that the chassis was bad and to return the chassis.

After replacing the chassis, I still had the same error and none of the line cards works. Cisco then sent me a replacement Supervisor 8-E. I installed the replacement Supervisor 8-E, configured it enough to get on the network, then did a TFTP to get the configuration file on it. Knowing it took 5-7 minutes to boot up, I stepped away and when I came back I was able to validate the config loaded (From the console port).


After the chassis was booted up and the configuration was validated I attempted to configure the SSH version and generate the crypto key. The commands where not there. I thought this was very odd because there was only one software version for this platform and the original Supervisor came with the correct version. After checking, this version did not have the SSH feature so I needed to upgrade the switch to Crypto image cat4500es8-universalk9.SPA.03.03.00.XO.151-1.XO.bin. The only difference in the file name is the K9.

I copied the file to the bootflash and changed the boot statement to “boot system flash bootflash:/cat4500es8-universalk9.SPA.03.03.00.XO.151-1.XO.bin”, saved the configuration and reloaded. The switch ignored the boot statement and used the first file in the bootflash. I thought that I had the boot statement wrong, so I tried it without the /, same thing.

After reading the configuration guide I found that the registry needed to be changed to 0X0102. This registry entry tells the switch to read the boot statement. So I entered the following command config-register 0x0102, saved the configuration and reloaded. The switch now booted up with the new image and I was able to configure SSH on the switch.

Like so many other Cisco products, I thought I could simply change the boot statement, save and reload. My assumption failed me on this upgrade. Because of this, I always recommend reading the configuration guides or release notes. Sometimes I get in a hurry and don’t read them. When I do that, I usually get reminded that I need to read the documentation.

Have you found other networking devices that you have to change the config-register to tell it to read the boot statement? If so, please share your experience

Mar 04

Servers keep dropping their network connection, is it the server or the networks fault?

I had just arrived home from work and received a call stating many servers are dropping their network connection. The voice on the other end was very concerned that there was a major problem. I promptly logged into the network and started looking at the network equipment.

The specific servers were connected to Brocade MLXe switches via Multi Chassis Trunking (MCT). If you are not familiar with MCT, it is similar to Cisco’s Virtual Port Channel (VPC). It allows two MLX chassis to act like a single switch from the servers view. LACP is used to create a LAG (Trunk/Etherchannel) to the server.

Upon reviewing my log, I found the following.

Feb 28 16:08:14:W:LACP: 13/32 state changes from LACP_BLOCKED to FORWARD
Feb 28 16:08:14:I:LACP: Port 13/32 mux state transition: not aggregate -> aggregate
Feb 28 16:08:14:I:CLUSTER FSM: Cluster CNS-Cluster (Id: 1), client (RBridge Id: 161) – Remote client CCEP up
Feb 28 16:08:12:I:LACP: Port 13/32 partner port state transition: not aggregate -> aggregate
Feb 28 16:08:12:I:LACP: Port 13/32 rx state transition: defaulted -> current
Feb 28 16:06:43:I:LACP: Port 13/32 rx state transition: current -> expired (reason: timeout)
Feb 28 16:05:29:I:CLUSTER FSM: Cluster CNS-Cluster (Id: 1), client (RBridge Id: 161) – Remote client CCEP down
Feb 28 16:05:29:I:CLUSTER FSM: Cluster CNS-Cluster (Id: 1), client (RBridge Id: 161) – Remote client CCEP up
Feb 28 16:05:26:I:CLUSTER FSM: Cluster CNS-Cluster (Id: 1), client (RBridge Id: 161) – Remote client CCEP down
Feb 28 16:05:26:I:LACP: Port 13/32 mux state transition: aggregate -> not aggregate (reason: peer is out of sync)
Feb 28 16:05:26:W:LACP: 13/32 state changes from FORWARD to DOWN
Feb 28 16:04:36:I:RSTP: VLAN VLAN: 110 Port 12/32 – STP State FORWARDING (EnableFwding)
Feb 28 16:04:36:I:RSTP: VLAN VLAN: 110 Port 12/32 – STP State LEARNING (EnableLearning)
Feb 28 16:03:05:I:RSTP: VLAN VLAN: 110 Port 12/32 – STP State FORWARDING (EnableFwding)
Feb 28 16:03:05:I:RSTP: VLAN VLAN: 110 Port 12/32 – STP State LEARNING (EnableLearning)
Feb 28 14:29:28:I:RSTP: VLAN VLAN: 110 Port 16/6 – STP State FORWARDING (EnableFwding)

One thing I quickly noticed was the lack of interface up/down entries. I finally came to the conclusion that the log entries were a result of the interface switching to UP. The mass outage that I was called about, wasn’t such a mass outage after all. Yes, the log showed many servers going down, but not all at the same time and they were not down now.

After some more conversation with the server team the next day, we came to the conclusion that all of the HP Gen8 servers were having this issue. They would drop their connection, send an SNMP trap, then recover by the time a support engineer could take a look at the server. I was surprised to hear this was going on for many weeks. Knowing that the Brocade MLXe MCT has been stable for a couple of years now, I felt safe suggesting that the server team update the drivers on the server for the NIC. There was an update available and that resolved the issue.

I have had wireless NIC drivers cause connectivity issues in the past, but never on a server.
Can any of you share any stories where a server network card driver caused an issue?

Back to the switch log, where was the port UP/DOWN log?


After some more searching, I figured out that I had the following command applied to the interface “no snmp-server enable traps link-change”. I have this command on every interface that is NOT an uplink interface. This command prevents the switch from sending interface up/down traps to the monitoring system. I do this because I don’t want to receive port up/down traps when the servers do their scheduled reboot, or go down when the server team takes down the server for maintenance.

I removed the “no snmp-server enable traps link-change” command off of a test interface. I then connected my PC to the port, then disconnected it. I received the following log entries.

Feb 26 09:39:40:I:System: Interface ethernet 15/35, state down – link down
Feb 26 09:39:02:I:RSTP: VLAN VLAN: 4 Port 15/35 – STP State FORWARDING (EnableFwding)
Feb 26 09:39:02:I:RSTP: VLAN VLAN: 4 Port 15/35 – STP State LEARNING (EnableLearning)
Feb 26 09:39:02:I:System: Interface ethernet 15/35, state up

I found that the “no snmp-server enable traps link-change” command is preventing the link up/down log entries. After talking to Brocade support, this is a software defect in 5.2d.

Have you run into this software defect on the MLX?
Have you run into this driver issue on the HP Gen8 servers? If so, what switches were you using?

No account needed to post a reply, find the Reply button below and add your comment!!!