Unfinished Draft – Work in Progress
Author: Martin Jackson [martin@guyver.demon.co.uk]
I've been playing around with Linux since 1994 using it for simple tasks such as DHCP and DNS services. Linux has been my hobby for many years and for the last few years and I have had the luck to be paid for doing my hobby ;-).
Personally I've been engaged in multiple contracts over the last couple of years delivering mainly:
Point solutions that require linux typically email, file and print servers;
Pilots using linux e.g. Linux desktop and Thin client
Recently I've been hired quite a lot to do core strategic project work such as:
Converting old company hardware into diskless X terminals based on Linux running Citrix;
Building Super Computing Farms using Linux for Electronic Design Automation Work;
And ...
Project managing and implementing migrations from Solaris to Linux.
The model I typically use for my deployments are based on the Bootstrapping an Infrastructure paper [BootInfra01] by Steve Traugott and Joel Huddleston, its available at www.infrastructures.org. I recommend you read this highly informative high level paper.
I can't mention any companies or any specific details about their inner workings in this paper since I'm under certain NDA's but I'll summarise a basic infrastructure as follows:
Just 200 users situated in 4 sites across the US and UK;
A heterogenous environment of:
Windows NT/2K and XP
Windows 2K Active Directory with Exchange 2K email services
Sun Solaris 6,7 & 8 on UltraSparc stations
DNS and DHCP on Windows 2K
NIS on Solaris
Version control? whats that then? I have to admit I was in luck since I typically don't have to inherit someone elses linux deployment so I cam put in place processes and tools in order to track the versions of OS and applications that I'll be deploying and maintain them.
The Linux distributions that I typically use is Redhat which while a tad bulky, allows you to easily install, remove and track software using the RPM toolset.
In the past I worked for Nortel Networks supporting their global Windows and Unix environment, this experience was invaluable because it taught me that in order to effectively maintain a large number of boxes there are 3 things that you really need:
Configuration Management;
Software and Hardware Inventory;
Change Management/Control.
I use alist (http://www.brains2bytes.com/alist/) this handy tool allows you to collect the software and hardware configuration data from a Linux/Unix/Windows box and store it somewhere safe so if that machine ever has a failure I will know the following:
What the machine is;
What software it had on it;
A copy of the key configuration files ;
What the network and system state is.
One of the major benefits of alist is that it stores the configuration on a central server via upload the data using a perl script to a central daemon, the auditted data is then browsable using a cgi script on a typical web server say Apache.
The physical audit of software and hardware, this is the slow and painful part which nobody really likes to do but it needs to be done so that you will know the exact lie of the land with respect to:
Licensing – What you currently have and what you need to buy;
Assets – What you've got and whats missing!
Warranty – Whats covered and whats not;
For licensing I use PhPMyInventory (http://phpmyinventory.sourceforge.net) its a good piece of soft but it has a few weakness, it doesn't understand the concept of location or hostname's (you have to purchase the offical copy) so you have to either tag the hostname as part of the serial number field or as a perpheral, (*Tip* different databases (i.e. websites) for each geographic location).
If you want really detailed inventory information using PhpMyInventory without having to do excessive work, use base level primatives for your system type e.g. HP Vectra VL400, Pentium 3 Processor, 0 MB ram and CD, then create peripheral types for the specific system data e.g. 700 Mhz CPU, 128 Mb DRAM Simm, this way you can have minimal system types which are easily auditted.
Honorable mention
An honorable mention is Inventory (http://inventory.sourceforge.net), this newcomer looks very promising, but its a bit too flashy for me personally, however it can be easily rebranded through the use of selectable cascading stylesheets. It offers a high level of granuality but this increased granuality makes it more difficult to use than PhpMyInventory.
I constantly have trouble find system documentation [because typically because someone's cleaned up my desk!], So I typically scan and store warranty information as a PDF so I don't have to worry too much about where I left it. I typically store this information on a browsable web server is directories such as /systems/systemname/files but for those who like something a little more formal I suggest SMDS.
Simple Document Management System (http://sdms.cafuego.net/) is as the name states a simple dms, I use this to store scanned PDF and other documentation and files related to a specific machine
Change management or control is about process than technology, it deals with the notification, approval and implementation of changes to a system or infrastructure structure, the method I typical use is the creation of a change control process that includes implementors, approvers, stake holders, a change control form and email system to tie it all together.
In essesence change management deals with making changes to a infrastructure trackable, repeatable and possibly most important allows for the rolling back of a change.
Implementors who wish to make a change to an infrastructure and submit a change control form that has the following information:
The name of the implementor, e.g. John Doe;
A brief description of what the change entails, e.g. Configuring all linux boxes to accept digital certicates [more secure than passwords] rather than using ssh rather than normal passwords. This is typically aimed at Management short, sweet and easily understandable by the layman;
Other applications that will be impacted, e.g. The server may require a reboot that would adversily impact a key service for example NIS authentication;
The date the approval is needed by to save the implementor any wasted time spent doing forward preparation for the change;
The planned date of the change, i.e. When the change is going to take place;
A detailed description of the change, i.e. Typical aimed at the experienced technician;
Roll back plan i.e. What you are going to do if something goes wrong, this makes you think about what you will do if something does not go according to plan;
Reason for the change, i.e. Why do are doing it and finallly...
Estimated downtime, how long is the service outage going to last.
This is a master or policy server from which all client changes originate from. The basic theory is that all system changes will come from a central server ensuring that all clients conform to an established baseline. In english this means you will be able to roll out changes that will ensure that all the machines thats you want to configure from this central server will be setup exactly how you wish them to be.
To do this I set up following:
Apt-Get repository, To kick things off I typically create my own APT-GET Server APT stands for Advanced Package Tool, it originates from Debian Linux and its main function is to download and install a software package from a central repository, doesn't sound too impressive does it? Well it also downloads and installs all then packages dependencies. The version I use is APT-RPM ( https://moin.conectiva.com.br/AptRpm )a APT port configured specificially for Redhat Package Manager files.
Now You don't really have to create your own you can use one of the numerous APT-RPM repositories available on the web such as freshrpms, dag or fedora, but I usually create my own for speed and stability reasons.
APT-RPM has too parts a webserver with RPM configured into a specific directory structure [typically built from an offical Redhat distro and its updates] and a client application called apt-get and its related support file. The client can be used to:
Upgrade an entire distribution to the latest redhat upgrades;
Install a specific piece of software and all its dependencies;
Remove a specific piece of software and all its dependencies;
Upgrade a specific piece of software;
FTP Server, an FTP server is used to serve files out to FTP clients, I typically use an FTP server to share out an entire Redhat distribution set e.g. 7.3, 8.0 and 9. I typical use the distribution I downloaded for the APT repositories then hardlink and/or mount using a bind or loopback option [depending on the Linux version] to make the entire distribution set available through http and ftp.
Kickstart Scripts, these scripts are basically pre-written answer files used for building a redhat box using automated answers, I typical store the central kickstart configuration files on a central webserver then use a Linux network bootdisk to kick off an automated build just by typing in linux ks=http://webserver/customkickstart.configfie, when booting up using a netboot floppy image from Redhat distribution.
CFEngine, now this a very clever tool, it consists of a series of programs and services that allow you do define a central policy and roll out that policy to any number of servers or clients. For the Microsofties out there its basically an SMS for Linux [The CFEngine crew describe it as part of a computer immunity that will fix any problems that it detects]. CFEngine used a high level description language to describe what a declared machine [or group type] should be like, e.g. Directories should softlinked to certain other directories and /tmp should be cleared of files every 7 days.
Host Install Tools deal with installing hosts without the need for human intervention, this help me to deal with other elements of my workload which could easily be sucked up building boxes one by one, swapping CDs, clicking the right options and making the correct updates to a base distribution and making sure I make exactly that same choices for each individual machine.
G4U – Stands for Ghost for Unix, g4u is basically a set of scripts that compress a raw disk image to file and saves it to FTP server for future download to another hard disk
APM-RPM, apt-get from a cron script or through cfengine
Cfengine, configured as a front-end to cron
Honorable Mentions -
SIS Suite - http://www.systemimager.org/
FAI -
PartImage -
LUI - http://oss.software.ibm.com/developerworks/projects/lui/
Recovering from early and big problems – Rolling back
DSH – Parallel tasking shell which allows you to run automated tasks across and entire set of computers
RPM
AD
DNS
NIS
OpenLDAP
NTP
RTools
NFS
CIFS
Rsync
Mirrordor
Rdist
CFEngine
APT-GET Repository
Autofs, Automount NFS Servers
/app linux - D OS=linux
/app sunos5 -D OS=sunos5
Crond
CFengine
Apt-get
Rug
Yum
Alist
cfengine
Rsync
honorable mention
SCC
Rsync
CFEngine
apt-get
Yum
SMTP , Sendmail, Postfix
Linux null client mc file
smart host
CUPS Printing
QTCups, Gnome Print Manager?
Samba?
Syslog host
Nagios/Netsaint
References
BootInfra01 – Bootstrapping an Infrastructure by
Steve Traugott, Sterling Software, NASA Ames Research Center -- stevegt@TerraLuna.Org
Joel Huddleston, Level 3 Communications -- joelh@TerraLuna.Org
http://www.infrastructures.org/papers/bootstrap/