The Importance Of Change Management And Record Keeping
on October 14, 2009 at 12:00 amFor the past two months something has been going on with our firewall at the office.
I use JFFNMS to monitor my network and I began receiving notifications that certain services and ports on the firewall were randomly closing and opening. I tried to see what was going on, but nothing turned up in the event logs. It would happen once or twice a day, but none of the users ever mentioned the services not being available or having trouble accessing the Internet, so I didn’t make much of it.
Beginning about two weeks ago it started happening more and more often, and this time the firewall was crashing every 3-4 days. When the server would crash everyone would lose external email and Internet access, so this clearly was not an acceptable situation. I tried some suggested fixes, including updating the NIC drivers but nothing worked. The machine would run out of resources, eventually lock up and need to be hard booted. I had decided it was probably time to call Microsoft to find out what was going on.
Before I laid out the $270 for a support call I needed to do some research into when this problem really started (and MS was probably going to ask me for that information anyway.) I looked back through my email archive to see when I started receiving the failure notifications. I had been focusing only on the past two weeks when the machine started crashing and not remembering that the problem actually started much earlier.
Now comes the important part. I used that information to look through the log I keep of server and network changes and maintenance. It turns out that two months ago I made a change to the firewall rules, and the problem started the very next day. It was hard to see how this particular change would cause the machine to crash, but it was too coincidental, so I disabled that rule and restarted the firewall service. There have been no warnings, problems or crashes since.
I hope you can see the value of good record keeping. If you are making changes to your network without keeping a detailed record of what you are doing and when you’re doing it, it is going to be extremely difficult to diagnose problems later.
You can keep the log any way that works for you, just be sure you are keeping something. I used to keep it manually in an actual notebook, but I later switched to using a blog. That just made it easier to have everything logged by date. Then I categorize the posts so it’s easy to find all the entries about my Exchange server or firewall. If you keep the log electronically, you would be wise to keep it someplace other than on a file server on your network. If you can’t get into the network or server because of a problem, you don’t want
to also be kept from accessing the change log which might help you fix it.
Geeky details: What I think was really happening was that the rule I changed blocked access to some domains I didn’t want people accessing. These domains were still sending us email, or more accurately; spam, but the return emails/NDRs were being blocked. The firewall rule also was set to create a log entry when the rule fired. As the volume of incoming emails increased, the system couldn’t keep up with the logging of the return failed emails and the firewall was configured to shut down the firewall service if logging failed too many times in a row. So once I turned off logging for that rule, everything went back to normal.
So let that be a lesson to you too. Unless you’re running a real high-security system, don’t let a logging failure bring down the server. Logs are important, but not usually not more important than uptime.




Discussion (19) ¬
I cannot stress how dead on this post is. I took over the network I currently manage a little over 18 months ago. A colleague of mine was the former IT Manager here (he also built this network from the ground up), and so I assumed (yes I know) that everything would be running smoothly and correctly documented.
He had this network for five years before I took control, and here is what I found in the way of documentation:
- Approximately 20 “pages” of wiki documentation on the theory of how things work in the network.
- Approximately 700 pages of printed out server logs PER SERVER (for 6 servers). This is stuff like TCP/IP connection requests and security event logs. I have since shredded and recycled all this, as it is totally useless in paper form (I’ll never read it – I leafed through it but there is nothing useful there).
- A building diagram on the wall of my office listing machine names and printer names in their respective locations (more than 50% of which information was wrong).
- Network cables and jacks labeled with roman numerals (XI looks exactly like IX when there is no indication which direction it was written from).
- A note from him telling me to call him with any questions (almost every answer would turn out to be “I don’t remember” or “Google that”.
Now. we’ve also got a rather… unique cabling infrastructure (we are basically playing the TCP/IP version of Ping Pong with our internet access and subnets, bouncing back and forth between buildings). I think there are like 5 extra 24 port switches to accomplish this, where a single ROUTER would do just fine.
This all combines to form the Voltron of clusterfucks when there is a problem.
I have produced over 300 pages of usable documentation since I took charge of this network, however there is no trending data or change logs from before I got here.
I have literally no idea what caused some of the issues we have with DNS and group policy (I believe the DNS issues are causing the GP issues), because the only thing in the event logs is a note that the policy engine has previously logged the cause of the issue, but there are no event logs from prior to like a month before I took control of this network.
A simple change log, and event log archiving regimen would have prevented all the frustration I have experienced over the past 18 months.
Wow Joe sounds familiar to my enviroment when I took over. I have been here almost 10 years now and started after 9/11. You got some kind of documentation do you know what I got?
A list of passwords. And most of them were dictionary words. Can you imagine how many viruses were running around? Think back to the Blaster worm… Ug.
Network diagrams that looked like they were made up in someones head as “What I wish the network looked like”
It took me years to figure everything out and then slowly change it to where we are at now.
Ask me how much documentation I have… Yeah I will ignore you :)
The important stuff is heavily documented in my personal files like if I was upgrading a box I wrote down all the old settings and what sp’s and patched I was applying.
I requested a technical writer to follow me around but they said it wasn’t in the budget. LOL!
Nice lol. Yeah I don’t advertise that I have documentation, but it will be readily apparent if anything happens to me.
So how about Change Control Logs? Anyone have a good sample sheet they’d like to share?
I wouldn’t mind seeing something as well. If no one posts one I will go searching on Monday.
Change management is huge. Admittedly I am not as good at it as I should be.
There is a lot of stuff out there for this but I have found that it is important to try and conform to the ITIL standards that are published out there.
A product I found, that will help you be more ITIL compliant(not that I am or my network for that matter…YIKES!) is a product called Service Now (http://www.service-now.com/)
I recommend you guys who have very large installations take a look at this product through the demo which you can find here: https://www.service-now.com/demo/
It is a fully, I mean down to the short hairs, intricate, detailed product if you implement stringent guidelines for use on a daily basis.
The costs for our set up, for it to be a hosted app, was around $2000/yr if I remember. But give them a call they are quite cool and will even hook you up with a 30 day dealio to get the feel for it and you can just go live from there.
Youch 2g’s a year… Sounds expensive. What does it do that SpiceWorks doesn’t besides being ITL compliant?
Quick FYI, the JFFNMS link is not complete and doesn’t work.
Fixed! Thanks Joe.
Can anyone suggest some open source or free change management software? I think this is great information. I know alot of my clients are not utilizing anything for change management right now.
I’d love to see an example of what others are doing as well.
Paldies par interesantu informaciju
im not satisfying with answer which was asked that reason which makes arecord to be keep in a archives