Tackling the ‘whole pasta buffet’ mess of a network configuration – preamble to a series
From spaghetti code to pasta buffet
If you have a programming background you might be familiar with the term ‘spaghetti code’ – depicting a program whose internal structure is so messed up that its source code reminds you of a plate of spaghetti. When managing a local computer network, configuration changes over time might lead to deterioration of a once clean structure in a way that the result resembles not only ‘spaghetti code’ but a whole ‘pasta buffet’ instead – after a bunch of hungry guests paid it a visit.
And that is definitely the situation I’m currently facing at work. How it came to this, you can read about in a minute. Since just describing a despicable situation is somewhat dull and helping nobody, I’ll make this review of my current situation the starting point of some blog posts about the way how I try to tackle this situation with the help of Powershell and probably some other tools as well. If you are the organized type and working for a large institution with a good-sized IT budget, you might have implemented some ITIL conforming processes for your network and a well stuffed CMDB in place as well. In that case, you can stop reading here. If not, at the end of this series you might end up with a set of information that feels very much at home in a CMDB – or can be seen as a low-cost substitute for one.
Abbreviated long-term history of an institutional local network
To give you an idea how this ‘pasta buffet’ mess in a local network can come into existence, consider this history of a LAN system for an organization over the last fifteen years:
- You start with a network of about 60 workstations and a few servers – where none of the latter has to offer services to the outside world.
- Since you’re lucky, your internet provider assigns you a full class C net of official IP addresses. So you just assign IP addresses from this pool directly to all your machines.
- The world-wide web comes to your organization. Now you have a web server hosting the institutional website. Of course this has to be available from the outside.
- After some strategic decisions your organization starts building a R&D department excelling in Web development. Suddenly you have to manage more front end servers, application servers, and database servers than you’ve ever imagined necessary.
- Somebody tells you that exposing all internal systems to the internet via the use of official IP addresses is a bad idea. So you start using private IP addresses internally. For the systems which are accessible from the outside you decide to use static NAT entries on your gateway router. Since this doesn’t work well in all situations, some systems keep their directly assigned official IP address.
- You suddenly realize that giving your servers official IPs by static NAT leads to problems when internal clients try to get access to them. Your remedy for this is a ‘split brain’ DNS configuration where host name resolution for internal clients gives them the private IP of a server system.
- Merger time! Your organization does a merge with two others which were just partners before. Suddenly you have two more office locations and VPN tunnels and routing between all of them.
- As a part of the merger, the local IT teams are merged too, now working together on networking problems and remedies for them. One of the first outcomes is a organization-wide plan for the use of new private IP address ranges. So you start to assign addresses from this new range to new computer systems.
- The unification of formerly distinct local IT structures continues. You get a new Active Directory Domain and start to migrate your servers and client PCs to this new domain. Of course this includes configuring and using new DNS and DHCP servers as well.
- Wait a minute: Some servers can’t be migrated just like that. To be able to continue using existing installations of Sharepoint and Exchange you keep the old domain working for quite some time.
- Bad guys all over the web. Your humble gateway router with its old-fashioned access list based restrictions is no longer secure enough. You introduce a firewall appliance between your LAN and the internet. Your old gateway router is still in use for routing and some VPN tunneling. Your plan is to replace its functionality piece by piece with equivalent features offered by your shiny new appliance.
- You learn that the best way to managed your servers visible to the outside world is to put them into a separate network segment called a ‘Demilitarized Zone’ (DMZ). For that you need a consecutive range of official IP addresses. Luckily your internet provider still has some left (this is about the time when the IPv4 pool finally drains) and assigns your 64 addresses. You start moving servers out of the inner LAN to the DMZ.
- And finally: It’s moving time! Your founders have evaluated your institution and recommended that two of your locations in neighbouring cities should be merged into one. As usual, the hope is for synergistic effects to happen afterwards.
- Rejoice: You’ll get a whole new, state of the art data center! The backside of this is that it doesn’t free you from moving your current server systems into this new data center. And of course, the systems from your second, soon to be former location have to be moved too. Now it’s really time to start planning how to tackle this mess of a grown network configuration…
How to get a grip on this situation
Well, well, well. We should have really always completely cleaned up everything directly after making configuration changes to our network. But that reflective thought is not helping at all, so what to do right now? Of course anyone in a similar situation should do quite some network and system configuration cleanup as soon as possible. But even if everything is neat and tidy, moving a data center still means that the configuration of most servers and other network components will definitely change. And of course the following general thoughts don’t apply only to this quite specific situation of a data center move and merge, so you might profit from reading them as well. Even if you only feel a bit ‘unwell’ about the current state of a local network you have to administer.
At the heart of planning for network configuration changes lies the need for current, up to date information. So no matter what you plan to do, make sure that you have all necessary information available in an easy to use format. What information you really need might depend on your plans, but for many purposes you’ll basically need at least a common set of information about your systems and network. So our first goal is to get up to date configuration information. And since we’re about to change configurations iteratively, it doesn’t make much sense to collect this information without applying some automation to the information gathering process. That brings us to the question which information sources are readily available for automated extraction. The list of possible sources partly depends on the size and type of your network. You might not have a firewall appliances or even not a single machine running Windows. But the following list includes a few items specific to a Microsoft-centric shop:
- DNS, both forward and reverse lookup zones
- Configuration files from switches, gateway routers, firewall appliances
- Active Directory: Information in there ranges from locations and IP subnets to computer accounts
- Polling devices directly using WMI or SNMP
- Network monitoring systems (BigBrother, Nagios, WhatsUp, …)
Getting and combining information from these sources can be a demanding task. Some of the mentioned items describe only a broad category (e.g. “gateway routers”) so the abstract goal of “getting configuration information from a gateway router” might result in many slightly different implementations, always depending on the type of your specific router brand and model. Other items allow much more standardized querying. It doesn’t make a difference at all whether your DNS server is using Microsoft’s own implementation, a BIND server running on Windows or Linux, or something else completely. They all implement the same protocol to query against. The same holds true when using SNMP for querying devices. To a lesser extend using WMI is also an abstraction over different kinds of machines, but here you’re limited to the Windows world.
In which format should we document our network?
The important thing for being able to combine information from many different sources is to build an abstraction about what you might call your conceptual or application domain. For example, firewall configurations are always about systems, interfaces, IP-Addresses, and allowed or denied traffic. This list does not show what your firewall actually does for you, but nonetheless it is working on these items. For our limited purpose of getting general network configuration information, the differences between different models lie in the way how each one requires you to write down the rules and to which systems they apply. Generally speaking, the network devices from which we want to gather information all share the same conceptual domain of computers, devices, NICs, IP addresses, network masks, DNS resolution, etc. Unfortunately it’s our task to translate all their funny little dialects into a common and manageable form. Then, how do we choose a suitable format to translate into? Do we have to start from scratch, while adjusting and expanding the format as we go? Or is there a kind of ‘standardized system and network configuration documentation format’ available to build on? Actually, there is work going on in this area. But it is a groundlaying work in progress and getting into it when you ‘just want to describe a few systems in a network using XML’ can result in quite some overhead. To give you a short overview:
- The IETF has a workgroup on a standard protocol for configuring network devices called NETCONF.
- Since configuring network devices involved passing configuration data around, there are several (!) proposals for NETCONF Data Modeling Languages. You’ll find that the IETF workgroup for that runs under the name of NETMOD and that one of the proposed modeling languages is called YANG. Another one goes by the name of KALUA but YANG looks more ‘mainstream’ at the moment – if you can apply this term to a proposal.
- If you want to use XML you’re then told that there is always an exact mapping from YANG to an XML encoding. This is called YIN and can be seen as a subset of NETCONF XML.
- The technical documents about YANG currently only contain elementary building blocks for data modeling applied to network devices. RFC 6021 tells you about “Common YANG data types” and contains definitions for concepts like counters, object-identifiers, timestamps, physical and MAC addresses, IP-addresses, domain names, hosts, and the like. Definitely all very important but there is still quite a gap from that to describing all currently relevant aspects of a computer system or a router.
- For describing items like routers or computers you have to write a YANG Data Model. The NETMOD workgroup web page currently lists several papers in draft status about Data Models for the areas of (general) System Management and the configuration of IP, SNMP, and Routing. The ‘oldest’ of these documents is dated March 2011 so this is really work in progress. Still you might get some ideas from these drafts how best to describe network devices. For example, the System Management draft gives a basic definition of the entity ‘system’.
At the moment it looks like we’re quite a bit too early to fully take advantage of established standards for describing network devices. Just to give you one example: You start to collect all this important information about your network devices using this shiny new standards. Wouldn’t it be nice to later reuse this information by importing it into a CMDB? But at the time of this writing I wasn’t able to find any CMDB system being able to import device configurations given in YANG. On the other hand NETCONF already has the backing from manufacturers like Cisco and Juniper, so this doesn’t look at all like a dead-end. So I opt for a pragmatical use of soon-to-be-standards: Have a look at them whenever making decisions about how to encode configuration data. Then make your own XML compatible to the draft. But while doing so, keep your learning and encoding overhead low.
Define your goals, decide what to do
Depending on your individual situation, the changes to plan for your network and systems configuration vary a lot. This planning is always a demanding intellectual process for which any current configuration information can only be a basis. So we don’t attempt to automate the planning process itself, we just attempt to assist this process as good as it seems possible. But developing these assisting tools must still leave us enough time to do the planning itself. To say it the other way round: It’s really good to have a neat and complete documentation about your network. But finishing the tools for getting this documentation ready one day before moving the data center is definitely ‘too late to satisfy’.
Don’t stop with generating reports, generate ‘action templates’ as well
If you have a strong vision about the ‘goal state’ for your network and you have all this nice documentation about its current state, do you really want to write down and execute by hand all the changes necessary for this transformation? Why not generate ‘action templates’ from the information about the current state? But be careful: Don’t execute the changes directly as you generate them. Write them to a file and review them intellectually one by one. You might even want to run some automated tests after every little or larger reconfiguration.
Check outcomes and side effects
Applying changes is always an error-prone process. So you should consider how to check whether any changes applied lead to undesired results (interruption of services). Because of that, automated tests might be as helpful after network configuration changes as they are in software development. For some tests your can employ network and device monitoring software, if available at your site. But some tests might take quite a long time or only make sense to run once or twice after actual configuration changes. Because of that they might not fit well into a monitoring system which primary use is to monitor constantly. So one task to keep in mind is to write test scripts which will show you whether configuration changes applied to network components work out OK.