In my work for TOPP, I’m the middle of some changes to our build system. We’re using an in-house build tool called fassembler. Considering that it’s completely specific to our needs, and was written mostly from scratch, it’s got some pretty great features (e.g. color coded output, database initialization). Our config files are stored in subversion, checked out, and then compared against when there’s an update. If they differ, you’re prompted to either replace, discard, view the diff, or merge the files. This is great for when you’re running a build.
As the Deployment Manager for openplans.org, however, I’m running tens or hundreds of builds. My goal is to make building and maintaining a deployment easier, and so I need to be able to run the build unattended, and not in a way that blindly discards or overwrites those changes.
Enter Gentoo Linux. Gentoo is a distribution of linux where all of the packages are built from source. On a system-wide level, or for each individual package, build options can be set before installing a piece of software. A fully installed Gentoo system, whether a server or desktop, can contain hundreds of packages, and users don’t have the time to sit interactively through the building and updating of each package.
Gentoo uses a script called etc-update to handle the merging of configuration files separately from the building of software. It works by saving the new configurations with a mangled name (e.g. httpd.conf
would become ._cfg0000_httpd.conf
), building the list of these files, and then allowing the user to diff, overwrite, discard, or merge any of the new configurations. It allows you to configure which tools to use, defaulting to diff, smerge, and nano. I’m a vi user, but I have that set at a system level, so that’s picked up by the script. smerge is just fine for me, but I prefer colordiff (some screenshots), because of it’s nicely readable output, and so I have that overridden in a configuration file.
etc-update is licensed under version 2 of the GPL, and so we will be redistributing it bundled with the rest of our build software. Where our situation is different, however, is that we can build in a myriad of locations, and the configuration files are specific to each build. In Gentoo’s version of the script, portage (their packaging system) is queried for the location of configuration files, but we don’t have the luxury of a system level tool to perform that work for us. I looked at a couple of possible solutions to the problem:
- The command line
- etc-update alread includes a way to pass directories on the command line, but this requires too much typing by the user.
- Building a custom script
- Easy to type, but it means installing modified versions of the script all over the place, which is just harder to maintain.
- Reading from the environment
- It requires the user to set the environment somehow, requiring extra steps, and is very hacky
- Look in a path relative to the current script
- Some magic involved, but if we at least use a configuration file relative to the script, it's relatively straightforward, and the only magic involved is in expecting where the list of directories is saved.
Based on these options, I decided on the latter option. But this all hinges on knowing during execution where the script is located. Well, I know how the script has been called. That’s available as Arg0 ( $0 ) in the shell, I figured it would be pretty easy to go from there to the actual location of the script.
Being a python programmer, my first instinct was to code the logic in python, This wasn’t too tough. I took advantage of the fact that you can pipe a script to the python shell, but used bash string interpolation to pass the argument hardcoded into the script. Since it was a multiline program, I used a bash here document to make it readable. Here’s an example script (that just returns Arg0).
1 2 3 4 |
|
It took me about five minutes to put together a final script. It first checked to see if the script was called with any path information (e.g. relative: ../script.sh
or absolute: /home/script.sh
) If not, it looked for the script file in the $PATH command variable. Failing that, it tried to join the current directory to Arg0 to find the actual location. (Python’s os.path.normpath command will override the base path if the search path is absolute).
This script worked, and was easy to ready for python programmers. It bothered me a bit, however, because: 1) I was embedding a python script into a bash script, which could be rather confusing, and 2) it was 32 lines long, not exactly the shortest of solutions. This is that script:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
|
My next thought was to re-implement the script algorithm natively in bash. Unfortunately, bash doesn’t have the python standard library at its disposal. Thankfully, however, there are a number of commands that allow us to achieve more or less what I wrote above. I use readlink -f /basepath/../somepath
to convert two joined paths into a normalized path. The only problem with this is that when we executed a symlink to a shell program, it returns the location of the actual file and not the symlink. I’m not really sure if this is a problem that merits any worrying, but I could imagine having a single “source” script, and symlinking it into different environments. The second command I needed to replicate was os.path.basename (used to extract the directory from the scripts full path); luckily the basename program handles this identically.
I ran into one final problem in interpreting this algorithm in bash, and that was splitting the $PATH variable. Normally the for..in control structure in bash splits a string by spaces. We could use sed or tr to convert the colon seperated pathinto a space seperated path, but that’s going to run into problems when you have spaces in you directory names. Here’s where the $IFS variable saves us. The $IFS variable is a variable that tells bash what characters to use to split up a string into a set. For our purposes, we temporarily save $IFS and set it to a single colon. This allows you to perform a simple “for DIR in $PATH”. If you’ve got colons in you directories, well hey, you could have used python… ;-) Here’s that script:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
|
The same script is 20 lines in bash, which is an improvement. At this point I was happy enough with the result that I started to embed it into our local copy of etc-update. In doing so, however, I ran across a usage of the type built-in command that piqued my interest. It was being used to test for the existence of egrep on the system. It turns out that “type -p path” looks for a file-based command and prints it if it exists. I figured that this could be used in an even shorter bash only script, and wrote a test script to do so. In checking out the various permutations (in a symlink, from the path, etc.) I found out something interesting: when you invoke a script through in the path directly, bash sets Arg0 to the full path. “Great!” I thought, combine that with readlink from above, and I have a one-liner.
And then it hit me.
1
|
|
From the which man page:
Which takes one or more arguments. For each of its arguments it prints to stdout the full path of the executables that would have been executed when this argument had been entered at the shell prompt. It does this by searching for an executable or script in the directories listed in the environment variable PATH using the same algorithm as bash(1).
The captian obvious award of the day goes to me. which $0
will always return the full path, as bash sees it, of the script file.