Dec 24

Redundancy testing, Do you do it?

In time of an outage, will your systems continue to run and provide services to your customers? Unless you test it, you won’t know.

Testing the redundancy in your network is very important, unfortunately, it usually doesn’t get done. For years I have suggested that we perform annual redundancy testing. Every year the testing gets denied.

Here is a list of things I would like to test.

– Make sure both power supplies (In dual powered equipment) can handle the load by themselves.
– Dual connected servers stay online when 1 of the 2 switches are powered off.
– Move the entire data center to the same UPS, then to the other to make sure it can handle the load.
– Fail any Active/Standby pair to the Standby to make sure the standby configuration and hardware works.

During a switch upgrade I took down multiple services. All of the servers were dual homed to the other switch, but they still went down. Later we found out that the redundant switch port was not configured. Another server never had the network cards setup for active/standby, but the standby network card was connected to the standby switch. Another server didn’t even have the cable run to the second switch. Everybody thought these services were redundant, but it didn’t work.

Many times I don’t have time to perform failover testing when I deploy new equipment. With out testing in the production environment, we don’t know how the other equipment is going to react to a failure. In my network we have many TAP’s and Bypass switches. If the switch on one side of the TAP fails, does the port go down on the other side of the TAP? I hope it goes down. The device on the other side needs to know that the neighbor just went down.

Do you perform annual redundancy testing? If so, what problems have you found and what outages have you avoided?

Please share, maybe your experiences can help other justify performing these tests!!

Nov 12

N7K-AC-6.0KW Power supply failure in a Cisco Nexus 7010

After many years working on network equipment I have seen too many power supplies fail. These power supply failures are usually on old power supplies that have been running for many years. I have had newer power supplies fail due to high temperatures, usually due to air conditioner failures. I did have the pleasure of being in the room when a power supply blew in an old FORE Systems chassis. There was a loud pop and a bright flash. It was accompanied by a loud yell by the tech that was sitting in front of it at the time.

The latest power supply failure was on a N7K-AC-6.0KW. If you are not familiar with this power supply, it is a 6KW supply that takes two 220 power circuits. The Nexus 7010 holds three of these power supplies. Nobody was around when this failure happened, I’m sure it was loud when it blew. When the component failed, it blew a hole in the housing of the power supply. Thankfully it didn’t catch on fire.

These power supply failures are why I don’t like single power supply devices. I want all of my network gear to have more then one removable power supply. Unfortunately, network vendors don’t always provide an option for dual removable power supplies. I have found that some vendors really increase the price of the network device when they add these removable power supplies. In critical areas, the extra cost is worth it.

What Power supply failures have you had to work on?
Have you had a power supply catch on fire?
Have you experienced a power supply failure taking down a chassis when there was still a good supply in the chassis?
If so, tell us about it!!!

Failed N7K-AC-6.0KW

Failed N7K-AC-6.0KW