The Importance Of Change Management And Record Keepingon October 14, 2009 at 12:00 am
For the past two months something has been going on with our firewall at the office.
I use JFFNMS to monitor my network and I began receiving notifications that certain services and ports on the firewall were randomly closing and opening. I tried to see what was going on, but nothing turned up in the event logs. It would happen once or twice a day, but none of the users ever mentioned the services not being available or having trouble accessing the Internet, so I didn’t make much of it.
Beginning about two weeks ago it started happening more and more often, and this time the firewall was crashing every 3-4 days. When the server would crash everyone would lose external email and Internet access, so this clearly was not an acceptable situation. I tried some suggested fixes, including updating the NIC drivers but nothing worked. The machine would run out of resources, eventually lock up and need to be hard booted. I had decided it was probably time to call Microsoft to find out what was going on.
Before I laid out the $270 for a support call I needed to do some research into when this problem really started (and MS was probably going to ask me for that information anyway.) I looked back through my email archive to see when I started receiving the failure notifications. I had been focusing only on the past two weeks when the machine started crashing and not remembering that the problem actually started much earlier.
Now comes the important part. I used that information to look through the log I keep of server and network changes and maintenance. It turns out that two months ago I made a change to the firewall rules, and the problem started the very next day. It was hard to see how this particular change would cause the machine to crash, but it was too coincidental, so I disabled that rule and restarted the firewall service. There have been no warnings, problems or crashes since.
I hope you can see the value of good record keeping. If you are making changes to your network without keeping a detailed record of what you are doing and when you’re doing it, it is going to be extremely difficult to diagnose problems later.
You can keep the log any way that works for you, just be sure you are keeping something. I used to keep it manually in an actual notebook, but I later switched to using a blog. That just made it easier to have everything logged by date. Then I categorize the posts so it’s easy to find all the entries about my Exchange server or firewall. If you keep the log electronically, you would be wise to keep it someplace other than on a file server on your network. If you can’t get into the network or server because of a problem, you don’t want
to also be kept from accessing the change log which might help you fix it.
Geeky details: What I think was really happening was that the rule I changed blocked access to some domains I didn’t want people accessing. These domains were still sending us email, or more accurately; spam, but the return emails/NDRs were being blocked. The firewall rule also was set to create a log entry when the rule fired. As the volume of incoming emails increased, the system couldn’t keep up with the logging of the return failed emails and the firewall was configured to shut down the firewall service if logging failed too many times in a row. So once I turned off logging for that rule, everything went back to normal.
So let that be a lesson to you too. Unless you’re running a real high-security system, don’t let a logging failure bring down the server. Logs are important, but not usually not more important than uptime.