Tuesday, August 25, 2009

Episode #57: At Your Services

Ed embarks:

Sometimes, administrators and users need to alter the state of running services on a Windows box, especially those services that automatically start during system boot. Many users are familiar with the services control panel GUI, with its familiar "Startup Type" column and values of Manual, Disabled, and Automatic. The Automatic services, in particular, are those that startup when the system is booted, even if no one logs onto the machine.

As described in Episode #40, Ed's Heresy, you can launch the services control GUI at the command line using:

C:\> services.msc

Look through all those services and their startup types, and you'll see an awful lot of automatic ones. Unlike Linux, Windows doesn't have the concept of different run-levels which start different sets of services. Instead, all automatic services are... well... automatically started at bootup. Vista and later did introduce a new option for automatic services startup type, called "Automatic (Delayed start)", which causes Windows to activate a service after all of the initial boot activities are completed, making bootup faster. So, I guess we do have a bit of fine-grained differentiation in bootup services: those that are done during the boot itself [Startup Type= Automatic] and those that happen right afterward [Startup Type= Automatic (Delayed Start)]. Of course, "Manual" services are initiated by user action, such as clicking the "Start" button in the services.msc GUI.

But, speaking honestly and bluntly, why use the crappy services.msc GUI when we can interact with our services at the command line? One of the real gems of the Windows command line is the Service Control command, known as "sc" for short. In the olden days, we could use the "net start" command to get a list of running services. But, good old "net start" pales in comparison to the mighty sc. I only use "net start" when I'm on a Windows 2000 box that lacks the sc command from the Resource Kit. I'm happy to report that sc is included in XP Pro and later.

We can use sc to get a list of all services on the box, regardless of their state, by running:

C:\> sc query state= all

Please note that sc is finicky about spaces in its command options. You have to enter it this way: "state equals space all". If you omit the space, it doesn't work. If you put a space before the equals, it doesn't work. Annoying, to be sure, but once you get the hang of it, it's almost tolerable. In fact, that could be the new marketing line for Windows: "Once you get the hang of it, it's almost tolerable." Microsoft marketing reps can feel free to contact me if they'd like to license that line.

Anyway, we can get more detail about each service's state using:

C:\> sc queryex state= all

The queryex there gives us additional information, including the PID each service is running inside of. If you'd like to focus on a single service, looking at all of its details, you could run:

C:\> sc qc [SERVICE_NAME]

The qc stands for "query configuration", and shows all kinds of neat stuff, including the binary path and command line flags used to launch the executable.

That [SERVICE_NAME] option used in the sc command must be the "SERVICE_NAME" and not the "DISPLAY_NAME", both of which are shown in the "sc query" output. The former is the internal name of the command used within Windows, while the latter is a more human-friendly form of the name used in the GUI. Most Windows admins think in terms of DISPLAY_NAME, but the sc command thinks otherwise. To map the DISPLAY_NAME in your head to SERVICE_NAME, you could use WMIC to map one to the other:

C:\> wmic service where displayname="[DISPLAY_NAME]" get name

You can even use substring wildcards on DISPLAY_NAME here with:

C:\> wmic service where (displayname like "%something%") get name

OK, now, fully armed with the sc-friendly SERVICE_NAME, we can proceed. If you want to stop or start a service immediately, you can use this simple command:

C:\> sc [start|stop] [SERVICE_NAME]

Please note, however, if you stop a service whose type is "Automatic" or "Automatic (Delayed Start)", that service will come back the next time you reboot. To change that behavior, you'll have to alter the startup type of the service, making it either manual or disabled. To do so, you could use the sc command as follows:

C:\> sc config [SERVICE_NAME] start= demand

Note that the GUI refers to such services as "manual", but the sc command calls them "demand". Consistency, thy name is Windows.... NOT! (Hey! There's another potential marketing campaign!) Note that if the service is currently running, this change in its configuration won't stop it... we are merely changing its configuration, not its current state.

To configure a service to be disabled at next boot, you could run:

C:\> sc config [SERVICE_NAME] start= disabled

For Automatic, use "start= auto" and for Automatic (Delayed Start), use "start= delayed-auto".

Note that there are two other service start types: boot and system. These are for device drivers, and I wouldn't mess with them unless you first create a snapshot of your virtual machine. What? You are running Windows on _real_ hardware and not a VM? Yikes. Be very, very careful with your config!

Another nifty feature of the sc command is its ability to show us service dependencies, as follows:

C:\> sc enumdepend [SERVICE_NAME] [buffer_size]

Try this for the RPC Service as follows:

C:\> sc enumdepend rpcss

It'll show you the start of the list of services that depend on rpcss, but then complain that its default buffer for gathering this information is too small. You can specify a bigger buffer by putting an integer at the end for the output display buffer, such as 8092. How convenient it is that it allows you to specify your display buffer size. Yeah, right.

Another fine aspect of the sc command is that it can be used remotely, provided that you have admin-level SMB access of a remote system. Any of the sc commands listed above can be directed to a remote system by simply adding \\[IP_addr] right after the sc, as in:

C:\> sc \\[IP_addr] [other_options]

Oh, and one more thing... All those automatic services associated with the underlying operating system do start in a specific order that Microsoft has carefully planned. That order is stored in the registry as a list, and can be displayed using:

C:\> reg query hklm\system\currentcontrolset\control\servicegrouporder

The list looks hideous in the output of the reg command. If you'd like it to look a little prettier, you can open that registry key in regedit.

Hal wants to know where Ed gets off:

Oh rats. Welcome to another episode of "this would be simple if every flavor of Unix didn't do this a little bit differently". The first simplifying assumption I'm going to make is to throw the *BSD folks off the bus. Sorry guys, if you're reading this then you already know how boot services work on your OS and you don't need me to tell you.

For everybody else in the world, the boot service controls on your system have typically evolved from some knockoff version of the System V boot sequencing scheme. This means there's probably a directory on your system called /etc/init.d (or perhaps /etc/rc*/init.d) that contains a whole pile of scripts. Generally, each one of these scripts is responsible for starting and stopping a particular service.

One of the nice features of this system is that you can run the boot scripts manually to start and stop services. The scripts accept "start", "stop", and often the "restart" option as well:

# /etc/init.d/ssh stop
* Stopping OpenBSD Secure Shell server sshd [ OK ]
# /etc/init.d/ssh start
* Starting OpenBSD Secure Shell server sshd [ OK ]
# /etc/init.d/ssh restart
* Restarting OpenBSD Secure Shell server sshd [ OK ]

In general, using the /etc/init.d scripts is the preferred method for starting and stopping services-- as opposed to just killing them-- because the script may perform additional service-specific cleanup actions.

Now when the system is booting we need to be careful that services get started in the correct dependency order. For example, there's no point in starting network services like SSH and Apache until the network interfaces have been initialized. Boot sequencing is the job of the /etc/rc*.d (sometimes /etc/rc*/rc*.d) directories.

If you look into these directories, you'll find a bunch of files named Snn* and Knn*-- for example "S16ssh". These "files" are actually links back to the scripts in /etc/init.d. The numbers are used to make sure the scripts are run by the init process in the correct sequence, and there are usually gaps left in the numbering so that you can add your own scripts at appropriate points in the sequence. The leading "S" tells init to run the script with the "start" option to start the given sequence at boot time. "K" means kill or "stop".

So why are there lots of different rc*.d directories? The basic idea was that Unix systems were supposed to be able to boot to different "run levels", numbered 1-5, that enabled different levels of functionality. I'm old enough to remember a time when booting to run level 2 meant "multi-user mode" and run level 3 meant "multi-user mode plus network file sharing" (yes, I know, get me a walker and you kids stay off my lawn!). These days, the whole "run level" concept has gotten wildly confused. Many Linux systems use run level 3 for multi-user and run level 5 for "multi-user with GUI", but Ubuntu boots into run level 2 by default. Solaris boots using both the run level 2 and run level 3 scripts, which is just wacky.

The reason the whole run level question is relevant is that in order to enable or disable certain services from being started at boot time, you need to change the links in the appropriate rc*.d directory. To do that, you need to know what run level your system is booting into by default. Some systems have a "runlevel" command which will tell you what run level you're currently booted into. On other systems you'll need to find the default run level setting in /etc/inittab-- "grep default /etc/inittab" usually works. The alternative is to just change the links in all of the rc*.d directories, just to be on the safe side.

Say you're on a Ubuntu system and you're booting into run level 2. You "cd /etc/rc2.d" and start messing with the links. You can see the services that are started at this run level with a simple "ls S*". If you want to make changes, just remember that the init program ignores any script that doesn't start with an "S" or a "K". So one way to prevent a service from being started at boot time is to just rename the link. There are lots of different conventions people use for this: some people change the "S" links to "K" links, others give them names like "aaa*" (sorts at the beginning of the directory), "zzz*" (sorts to the end), or ".NO* (hides links from normal "ls"). You can also just remove the links.

The only problem with messing with the link names directly is that you'll often find that vendor patches and software updates will often restore the original links in your rc*.d directories. So after updating your system it's a good idea to check your rc*.d directories to make sure your changes haven't been reverted.

To deal with this problem, most Unix variants have some sort of tool that manages boot time configuration of services. The tool typically manages some extra "meta-data" associated with each boot script that says which scripts should be started at the different run-levels. For example, Red Hat derived systems (RHEL, Fedora, CentOS, etc) use a command-line tool called chkconfig (originally developed for IRIX) that uses meta-data stored in special comments at the top of each boot script. Debian has update-rc.d and sysv-rc-conf which just rename "S" links to "K" links on your behalf (and vice versa). Solaris has the XML horror that is the Service Management Framework (SMF). You'll need to read the docs for your particular flavor of Unix.

Tuesday, August 18, 2009

Episode #56: Find the Missing JPEG

Hal is helpful:

As a way of giving back to the community, I occasionally drop in and answer questions on the Ubuntu forums. One of the users on the forum posed the following question:

I have about 1300 pictures that I am trying to organize.
They are numbered sequentially eg. ( 0001.jpg -> 1300.jpg )
The problem is that I seem to be missing some...

Is there a fast way to be able to scan the directory to see which ones I am missing? Other than to do it manually, which would take a long time.


A fast way to scan the directory? How about some command-line kung fu:

for i in $(seq -w 1 1300); do [ ! -f $i.jpg ] && echo $i.jpg; done

The main idiom that I think is important here is the use of the test operator ("[ ... ]") and the short-circuit "and" operation ("&&") as a quick-and-dirty "if" statement in the middle of the loop. Ed and I have both used this trick in various Episodes, but I don't think we've called it out explicitly. For simple conditional expressions, it sure saves a lot of typing over using a full-blown "if ... then ..." kind of construct.

I'm also making use of the seq command to produce a sequence of numbers. In particular, I'm using the "-w" option so that the smaller numbers are "padded with zeroes" so that they match the required file names. However, while seq is commonly found on Linux systems, you may not have it on other Unix platforms. Happily, bash includes a printf routine, so I could also write my loop as:

for ((i=1; $i <= 1300; i++)); do file=$(printf "%04d.jpg" $i); \
[ ! -f $file ] && echo $file; done


Update from a Loyal Reader: Jeff Haemer came up with a nifty solution that uses the brace expansion feature in bash 4.0 and later (plus a clever exploitation of standard error output) to solve this problem with much less typing. You can read about it in his blog posting.

Now I have to confess one more thing. The other reason I picked this problem to talk about is that I'm pretty sure it's going to be one of those "easy for Unix, hard for Windows problems". Let's see if Ed can solve this problem without using four nested for loops, shall we?

Ed retorts:
Apparently our little rules have changed. Now, it seems that we get to impose constraints on our shell kung fu sparing partners, huh? Hal doesn't want four nested FOR loops. As if I didn't have enough constraints working in cmd.exe, Hal the sadist wants to impose more. Watch out, big guy. Perhaps next time, I'll suggest you solve a challenge without using any letter in the qwerty row of the keyboard. That should spice things up a bit. Of course, you'd probably use perl and just encode everything. But I digress.

One of the big frustrations of cmd.exe is its limitations on formulating output in a completely flexible fashion. Counting is easy, thanks to the FOR /L loop. But prepending zeros to shorter integers... that's not so easy. The Linux printf command, with its % notation, is far more flexible than what we've got in cmd.exe. The most obvious way to do this is that which Hal prohibits, namely four FOR /L loops. But, our dominatrix Hal says no to four nested FOR loops. What can we do?

Well, I've got a most delightful little kludge to create leading zeros, and it only requires one FOR /L loop counter plus a little substring action like I discussed in Episode #48: Parse-a-Palooza. Here is the result:

c:\> cmd.exe /v:on /c "for /l %i in (10001,1,11300) do @set name=%i & set 
fullname=!name:~1,4!.jpg & dir !fullname! >nul 2>nul || echo !fullname! Missing"
0008.jpg Missing
0907.jpg Missing
1200.jpg Missing

Here, I'm launching a cmd.exe with /v:on to perform delayed variable expansion. That'll let my variables inside my command change as the command runs. Then, I have cmd.exe run the command (/c) of a FOR /L loop. That'll be an incrementing counter. I use %i as the iterator variable, counting from 10001 to 11300 in steps of 1. "But," you might think, "You are ten thousand too high in your counts." "Ah..." I respond, "That extra 10,000 gives me my leading zeros, provided that I shave off the 1 in front." And, that's just what I do. In the body of my FOR loop, I store my current iterator variable value of %i in a variable called "name". Remember, you cannot perform substring operations on iterator variables themselves, so we squirrel away their results elsewhere. I then introduce another variable called fullname, which is the value of name itself (when referring to delay-expanded vars, we use !var! and not %var%), but with a substring operation of (~1, 4), which means that I want characters starting at an offset of 1 and printing four characters (in this case digits). With offset counting starting at 0, we are shaving off that leading 1 from our ten-thousand-to-high iterator. I throw the output of my dir command away (>nul) as well as its standard error (2>nul).

Then, I use the || operator, which I mentioned in Episode #47, to run the echo command only if the dir command fails (i.e., there is no such file). I display the name of the missing file, and the word "Missing".

There are many other ways to do this as well, such as using IF NOT EXIST [filename]. But, my initial approach used dir with the || to match more closely Hal'sl use of &&. The IF statement is far more efficient, though, because it doesn't require running the dir command and disposing of its standard output and standard error. So, we get better performance with:
c:\> cmd.exe /v:on /c "for /l %i in (10001,1,11300) do @set name=%i &
set fullname=!name:~1,4!.jpg & IF NOT EXIST !fullname! echo !fullname!
Missing"
In the end, we have a method for generating leading zeros without using nested loops by instead relying on substring operations, all to make the rather unreasonable Mr. Pomeranz happy. :)

Tuesday, August 11, 2009

Episode #55: Fishing for Network Configs

Ed kicks it off:

Man, we've covered a lot of topics in our 54 episodes prior to this one. But, in our rush to get you the latest chocolate-covered command-line fu, occasionally we've missed some fundamentals. People write in with questions (which we love) about such items, inspiring a new episode. Back in May, we received a great question from Johnny C:

> I have a suggestion for command line kungfu.
> I need to be able to change my IP Address back and forth from DHCP
> where everything is dynamic to a dedicated IP address.
> I've worked with this for a while and my problems have been not able
> to update DNS on Windows

Ah... good one, sir! Let's face it: the built-in Windows GUI associated with network configuration changes is horrible... forcing numerous clicks through various screens to make even small tweaks. At least we don't have to live through the dreaded reboots of the Windows 95 era just to change IP addresses anymore.

On Windows, for manipulating network configs at the command line, netsh rocks, and it can do what you want, Johnny C, and much more. In fact, when I've got a lazy summer afternoon with nothing better to do, I fire up netsh (or the equally fun and interesting wmic command) and just explore, sometimes for hours on end. The netsh command (like wmic) can run in two modes: either as a little command interpreter of itself (by typing netsh and hitting Enter) lending itself to exploration, or as a single shot command of netsh followed by various options.

To get a glimpse of the capabilities of netsh, run the following:
C:\> netsh
netsh> ?

The following commands are available:

Commands in this context:
.. - Goes up one context level.
? - Displays a list of commands.
abort - Discards changes made while in offline mode.
add - Adds a configuration entry to a list of entries.
advfirewall - Changes to the `netsh advfirewall' context.
alias - Adds an alias.
bridge - Changes to the `netsh bridge' context.
bye - Exits the program.
commit - Commits changes made while in offline mode.
delete - Deletes a configuration entry from a list of entries.
dhcpclient - Changes to the `netsh dhcpclient' context.
dump - Displays a configuration script.
exec - Runs a script file.
exit - Exits the program.
firewall - Changes to the `netsh firewall' context.
help - Displays a list of commands.
http - Changes to the `netsh http' context.
interface - Changes to the `netsh interface' context.
ipsec - Changes to the `netsh ipsec' context.
lan - Changes to the `netsh lan' context.
nap - Changes to the `netsh nap' context.
netio - Changes to the `netsh netio' context.
offline - Sets the current mode to offline.
online - Sets the current mode to online.
p2p - Changes to the `netsh p2p' context.
popd - Pops a context from the stack.
pushd - Pushes current context on stack.
quit - Exits the program.
ras - Changes to the `netsh ras' context.
rpc - Changes to the `netsh rpc' context.
set - Updates configuration settings.
show - Displays information.
unalias - Deletes an alias.
winhttp - Changes to the `netsh winhttp' context.
winsock - Changes to the `netsh winsock' context.
wlan - Changes to the `netsh wlan' context.

Nice! Lots of very useful stuff, including "interface" and "firewall" (the latter of which we discussed in Episode #30). There's also some really nifty settings for ipsec (on 2003 and later) and wlan (on Vista and later) contexts. To change to an individual context, just type its name (such as "interface") and then type ? at the netsh> prompt to get more info about it. You can then navigate down by entering follow-up commands and contexts, and then pop back up to earlier contexts entering a command of dot-dot (".."). I wish there was a "back" command instead of .., but I can cope. There's even a pushd and popd command for netsh contexts, rather similar to the pushd and popd for directories we discussed in Episode #52.

One of my most common uses of netsh is to change IP address settings of the machine. In the spirit of the cliche "Give a man a fish and feed him for a day... teach him to fish and feed him for life", let me show you how you can fish around inside of netsh.

We first invoke netsh and then move to the interface context:

C:\> netsh
netsh> interface
netsh interface> ?

Here, you can see options for various elements we can configure on the machine. Of particular interest to us now is ip (on XP and 2003) or ipv4 (on Vista and later). Happily, you can just type "ip" on Vista, and it will take you to the ipv4 context, so our netsh commands for changing addresses and such are compatible between various versions of our beloved Windows operating system.

netsh interface> ip
netsh interface ip> set ?

Now, we can get a sense of the various items we can set, including addresses, dns, and wins. But, wouldn't it be nice if Windows would give us examples of how to set each? Well, ask and ye shall receive:

netsh interface ip> set address ?
Usage: set address [name=] [[source=]dhcp|static] [[address=][/] [[mask=]
] [[gateway=]|none [gwmetric=]] [[type=]unicast|anycast] [[subinterface=]
] [[store=]active|persistent]

Examples:

set address name="Local Area Connection" source=dhcp
set address "Local Area connection" static 10.0.0.9 255.0.0.0 10.0.0.1 1

If you'd like to get a list of all interfaces available on the machine, you could run (from the normal C:\> prompt, not within netsh):

C:\> netsh interface show interface


I know... it looks like it is redundantly repeating itself twice back to back, and it is. But, that's the command. Now, we know how to refer to our network interfaces for manipulating them.

Then, to set an IP address, we could just run the command:
C:\> netsh interface ip set address name="Local Area Connection"
static 10.10.10.10 255.255.255.0 10.10.10.1 1


This will set our IP address to 10.10.10.10, with a netmask of 255.255.255.0, a default gateway of 10.10.10.1, and a routing metric (number of hops to that gateway) of 1.

For DHCP, we simply run:
C:\> netsh interface ip set address name="Local Area Connection" source=dhcp

OK.... now to answer Johnny C's specific question, setting our primary DNS server:

C:\> netsh interface ip set dnsserver name="Local Area Connection"
static 10.10.10.85 primary


And, if you'd rather get that info from DHCP, you could use:
C:\> netsh interface ip set dnsserver name="Local Area Connection" source=dhcp

I frequently find myself changing between my laboratory network and my production network, which have completely different IP addressing schemes. To help make a quick switch between them, I don't use those one of those goofy network configurator GUIs, because, well, they are kind of tawdry. Instead, I've created two simple scripts that I keep on my desktop: test.bat and prod.bat. Each one contains two netsh commands. The first command sets my IP address, netmask, and default gatewy for either prod or test, and the second command sets my DNS server. When I want to invoke them, I simply run them with admin privs (based on either being logged in as admin, or right clicking and selecting "run as administrator").

Hal kicks it old school:

Where the Windows way is to have one big command that does everything, remember the Unix design religion is to have a bunch of small commands that do simple things and then combine them to produce the effect you want. It doesn't help that Unix systems have been networked devices since the earliest days of the Internet-- we've got multiple generations of command interfaces to deal with. But let me try to hit the high points.

Suppose our system is normally configured to use DHCP but we want to manually move it onto a new network with a static address assignment. Step one is to change your IP address with the ifconfig command:

# ifconfig eth0 10.10.10.1 netmask 255.255.255.0
# ifconfig eth0
eth0 Link encap:Ethernet HWaddr 00:0C:29:18:C3:0D
inet addr:10.10.10.1 Bcast:10.10.10.255 Mask:255.255.255.0
inet6 addr: fe80::20c:29ff:fe18:c30d/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:55 errors:0 dropped:0 overruns:0 frame:0
TX packets:158 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:7542 (7.3 KiB) TX bytes:32712 (31.9 KiB)
Interrupt:18 Base address:0x2024

As you can see from the example above, you can also use ifconfig to display information about an interface's configuration ("ifconfig -a" will display the configuration information for all interfaces on the system).

However, changing the IP address with ifconfig doesn't have any impact on your routing table. You'll probably need to add a default route when you change the IP address:

# route add default gw 10.10.10.254
# netstat -rn
Kernel IP routing table
Destination Gateway Genmask Flags MSS Window irtt Iface
10.10.10.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
0.0.0.0 10.10.10.254 0.0.0.0 UG 0 0 0 eth0

I should note that as you move from on Unix distribution to another, the route command syntax tends to change in minor ways, rendering a command that is syntactically correct on one system completely useless on another. The above example is for Linux. Oh and by the way, if your DHCP configuration has created a default route for the network you used to be on, you can remove it with "route del default gw <ipaddr>".

The last thing you have to do is update your list of local DNS servers. This list is configured in /etc/resolv.conf. Most DHCP clients will simply overwrite the contents of this file with the name servers and local domain they learn from their DHCP servers, but you can also edit this file directly. A sample file might look like:

nameserver 10.10.10.100
search somedomain.com

Replace "somedomain.com" with the default domain you want the host to use for looking up unqualified host names. You can have multiple "nameserver" lines in your file for redundancy. However, I warn you that the timeout on the first lookup is long enough that your users will pick up the phone and call you to tell you the "network is down" before the system fails over to the next name server in the list.

The combination of ifconfig, route, and editing your resolv.conf file should be sufficient to get you manually moved onto a new network. The more interesting question is how to you revert back to using DHCP to configure your network interface? Assuming your machine is configured by default to use DHCP, the easiest thing is to just shut down and then reactivate your network interface. Of course the process for doing this is completely different for each Unix system you encounter. On most Linux systems, however, the following will work:

# ifdown eth0
# ifup eth0

Determining IP information for eth0... done.
# ifconfig eth0
eth0 Link encap:Ethernet HWaddr 00:0C:29:18:C3:0D
inet addr:192.168.1.1 Bcast:192.168.1.255 Mask:255.255.255.0
inet6 addr: fe80::20c:29ff:fe18:c30d/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:56 errors:0 dropped:0 overruns:0 frame:0
TX packets:228 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:7884 (7.6 KiB) TX bytes:44273 (43.2 KiB)
Interrupt:18 Base address:0x2024

By the way if you're looking for your static network interface configuration information, you'll find it in /etc/sysconfig/network-scripts/ifcfg-eth0 on Red Hat systems (including CentOS and Fedora) and in /etc/network/interfaces on Debian systems. This is the place where you can set the default interface configuration parameters to be used when booting. Be aware, however, that on modern Ubuntu systems network configuration is under the control of NetworkManager by default-- a GUI-based network configuration tool very reminiscent of the Windows network configuration GUI. Helpful for people coming over from the Windows environment I guess, but kind of a pain for us old farts who are used to configuring network interfaces with vi.

Tuesday, August 4, 2009

Episode #54: chmod Squad

Hal says:

I was doing some work for a customer recently that involved a lot of manipulating file system permissions and ownerships and I realized that we've not yet had a Command Line Kung Fu Episode on this subject. That's just crazy! So let's address this shortcoming right here and now.

You're probably familiar with the chown command for changing the owner (and group owner, if desired) of a file or group of files. Add the "-R" (recursive) option, and you can set the ownerships on an entire directory structure:

# chown -R hal:users /home/hal

These days you typically have to be root to use the chown command. In the olden days of Unix, it was common to allow any user to use chown and let them "give away" their own files to other users. But this caused problems for systems that used disk quotas because it was a loophole that allowed malicious users to consume other users' quotas. And in the really old days the chown command wouldn't strip execute bits when files were chowned, thus allowing you to create set-UID binaries belonging to other users-- obviously a huge security problem.

The problem with having to chown everything as root, however, is that all too often I see administrators making a mistake like this:

# cd /home/hal
# chown -R hal:users .* # *NEVER* DO THIS! DANGER!

Why is the above chown command so dangerous? The problem is that the glob ".*" matches the ".." link that points back to the parent directory of the current directory. So in the example above we're going to end up making the entire /home file system owned by "hal". This is actually incredibly difficult to recover from, short of restoring from a backup, because it's not safe to assume that every file in a user's home directory is going to be owned by that user.

Anyway, the safe way to chown a user's "dot-files" and directories is:

# chown -R hal:users .[^.]*     # SAFE

The "[^.]" means "match any character EXCEPT period", thus protecting you from matching the ".." link. Of course it would also skip files and directories named things like "..example", but you wouldn't expect to find these in a typical file system.

Having set the ownerships on a set of files, you can also control the access rights or permissions on those files. There are three categories of permissions-- the permissions for the primary owner of the file ("user" permissions), the group owner of the file ("group" permissions), and "everybody else" ("other" in Unix parlance). For each group you can allow "read" access (view the contents of a file or get a listing of a directory), "write" (modify the contents of a file or add, remove, and/or rename files in a directory), and/or "execute" privileges (execute a file as a program or access files in a directory). In "absolute" mode with the chmod command, we express these permissions as a vector of octal digits: "read" is 4, "write" is 2, and "execute" is 1. Here are some common examples:

$ chmod 755 /home/hal     # rwx for me, r+x for group and other
$ chmod 666 insecure # rw for everybody, "world writable"
$ chmod 700 private.dir # only I have access here

Because the default assumption for the Unix operating system is to create world-writable files and directories, we use the umask value to express which bits we want NOT to be set when new files are created. The common Unix umask default is "022" which means "don't set the write bits for group and other". Some sites enforce a default umask of "077", which requires you to explicitly use chown to all others to access your files ("discretionary access control" is the term usually bandied about here).

There's actually an optional leading fourth digit you can use with the chmod command, which covers the "set-UID" (4), "set-GID" (2), and "sticky-bit" settings (1). Here are some examples showing the typical permission settings for some common Unix programs and directories:

# chmod 4755 /bin/su              # set-UID
# chmod 2755 /usr/sbin/sendmail # set-GID
# chmod 1777 /tmp # "sticky"

The "sticky-bit" is another interesting piece of Unix history. Back in the very old days when computers, disks, and even memory were vastly slower than they are today, there was a significant start-up cost with loading a program into memory before execution. For commonly-used programs like ls, the total overhead was enormous. So certain executables were marked with the "sticky-bit" as a signal to the kernel so that the program image would tend to "stick around" in memory so that the program didn't have to constantly be reloaded. Of course, in the age of shared libraries plus fast disks and memory this use has long since stopped having any value. Nowadays, the sticky bit is used as a marker on world-writable directories like /tmp and prevents anybody except the owner of the file from removing or renaming the file.

As you're probably aware, chmod also supports "symbolic" mode for expressing changes to the permissions of files:

$ chmod go-w insecure   # strip the write bits for group and other
$ chmod a+x myscript # make file executable for everybody ("all")
$ chmod +x myscript # same as above

The first part of the syntax is the groups to apply the permission change to: "u" for the primary user or owner of the file, "g" for the group owner, and "o" for other or everybody else. You can also use "a" to represent all groups, or just leave it off completely because "a" is the default. The next item in the symbolic description is a plus or a minus sign depending on whether you're "adding" or "subtracting" permission bits. Finally you specify the bit(s) you want to add or subtract.

Why is symbolic mode useful? Well, I recently needed to make an entire directory structure be only accessible by its owner. You can't really use absolute file modes for this, because while "chmod -R 600 ..." would work fine for regular files, it wouldn't do for the directories (directories must have "x" set to be usable). "chmod -R 700 ..." would be fine for directories, but not appropriate for regular files. You could hack something together with find, but it's much easier to just do:

# chmod -R go-rwx /some/directory

This strips all bits for the "group" and "other" categories, but leaves the current permissions for the owner of the file alone.

Unfortunately, I now have to turn things over to Ed. I say "unfortunately", because I'm going to have to endure his gloating about the higher level of permissions granularity implemented in NTFS as compared to the typical Unix file system. To this my only response is, "Oh yeah? Tell me about ownerships and permissions in FAT file systems, boyo."

Ed responds:
I thought we had an agreement to never bring up FAT. As far as I'm concerned, it never happened. NTFS has been with us since Day 1. Yeah... right.

Anyway, since Hal brought up FAT, I'll address it briefly. There is no concept of security with FAT file systems. Every user is God and can access everything. It's a relic from the days when DOS (and later Windows) machines were expected to be single-user systems not connected to, you know, a network or anything. The security model was to diligently implement no security model whatsoever. Mission accomplished!

But, then, there came NTFS. Ahhh... finally, a modern file system for Windows boxen. NTFS has the concept of ownership and even more fine-grained controls than our Unix brethren have. As Windows NT matured into Windows 2000 and then XP/2003, and then Vista/2008 and beyond, these fine-grained permissions and the ability to manipulate them at the command line has expanded significantly. In fact, we're now to the point where these features are simultaneously flexible enough to use yet complex enough to be overwhelming and almost unusable. That happens a lot in the Windows world.

From the command-line, we can see the ownership of a file or directory using the dir command with the /q option:

C:\> dir /q

Nice! Easy! Cool!

Yeah, but wait. Now things get ugly. If the hostname\username is particularly long, it will be truncated in the display, which allocates only 23 characters for the hostname backslash username. None of the other output formatting options of dir (/b, /w, /d, and /x) fix this, because they either override /q (leaving off the ownership info) or keep the truncation. This problem is livable, but still kind of annoying.

To change the owner of a file or directory, a la the Linux chown command, we can use the icacls command found in 2003 SP2, Vista, 2008, and Windows 7, as follows:

C:\> icacls [filename] /setowner [username]
To change the owner in a recursive fashion, use the /t option with icacls, because, as everyone knows the word "recursive" has no letter t in it. That makes it easy to remember, right? (Actually, I think they were going after the mnemonic for "tree", but a /s or even a /r would be more palatable). And, /c makes icacls continue despite an error with one or more files as it is processing through a /t or wildcard.

Now, look at that list of supported Windows versions for icacls... there's an important one missing from the list. Which one? Queue the Jeopardy music and pause. Sadly, it's our good friend, Windows XP. Unfortunately, I haven't found a way at the command line using only built-in tools to change owner in XP. You could mount the XP system's drive from a 2003/Vista/2008/7 box and run icacls, but that's not exactly using built-in tools, now, is it? You can copy the 2003 SP2 version of icacls.exe to Windows XP, and it'll work... but again, that's not exactly using built-in tools, and I have no idea whether that violates some sort of Windows license limitation. And, I don't want to know, so don't send me any mail about it.

The Vista version won't run on XP though. I've also found that icacls on 2003 (whether running in 2003 or... ahem... copied to XP) is quite buggy as well, often giving errors when trying to change owners. This 2003 icacls fail is a known issue documented by Microsoft, and they released a hotfix for it which is seldom installed. So, does this count as not using a built-in command? :)

To change ownership in the GUI on XP, you can apply a ridiculous process described by Microsoft here.

Now, XP does include the cacls command (not icacls), as does Win2K and all of the later versions of Windows. The cacls command lets you change permissions, the rough equivalent of the Linux chmod command. But, cacls will not let you change the owner.

The syntax for the cacls command lets us specify the file or directory name we want to change, followed by a bunch of potential options. We can grant access rights with /G [user:perm]. The perms supported at the command line are R (Read), W (Write), C (Change, sometimes referred to as "Modify" in Windows documentation), and F (Full Control). These rights are actually conglomerations of the very fine grained rights built-into NTFS, which are described here.

We can revoke these access rights with /R [user]. Note that you cannot revoke individual rights (R/W/C/F), but instead you revoke all of them at a given time for a user. Revocation is an all-or-nothing situation. The /E is used to edit the existing ACL, as opposed to the default, which replaces the ACL. We often want to use /E so that we don't blow away any access rights already there. There is also a /D [user] option in cacls, which explicitly denies the user access to the object, again on an all or nothing basis. These deny rights override any allow rights, thankfully.

With that overview under our belts, to frame the following fu, I'd like to mimic Hal's commands, mapping them into the Windows world to the extent we can.

We start with Hal's first command:
$ chmod 755 /home/hal     # rwx for me, r+x for group and other
In Windows, we can achieve the same thing with these three commands:
C:\> cacls [filename] /G [username]:F
C:\> cacls [filename] /E /G [groupname]:R
C:\> cacls [filename] /E /G Everyone:R
Note that the R here is a conglomerated Read, which includes reading and executing the file (again, those conglomerated rights are defined here). Also, in the first command of these three, we've not used /E, so we are blowing away all existing access rights to start out and adding full control for our username in the first command. Hal's assigning permissions absolutely, not relative to existing permissions, so we leave off the /E. In our follow-up command, though, we edit rights (/E) building on with a couple of extra commands to match roughly what Hal has done.

Next, our sparring buddy ran:
$ chmod 666 insecure      # rw for everybody, "world writable"

We can roughly mimic this with:
C:\> cacls [filename] /G [username]:F
C:\> cacls [filename] /E /G Everyone:C

Now, the execute capability is baked into both Full control (F) and Change (C), so we're not really removing execute capability here. With icacls (not cacls), we can access the fine-grained rights and make a file read-only with:

C:\> icacls [filename] /grant Everyone:RW
Remember, /G is for cacls, and /grant is for icacls. Consistency is a beautiful thing.

And then, Hal pulled this out of his ear:
$ chmod 700 private.dir   # only I have access here
In our comfy Windows world, we could run:
C:\> cacls [filename] /G [username]:F
Without the /E above, this snips off everyone else's access.

Hal then wowed us all with:
$ chmod go-w insecure   # strip the write bits for group and other
This one is a bear to do at the command-line in Windows, because the /R option is all or nothing when revoking permissions. We'd need to analyze the existing ACL first, and then manipulate it with a /E to build the ACL we want. It would require a script to do this by my estimation.

Hal then popped off:
$ chmod a+x myscript    # make file executable for everybody ("all")
Which we can do with:
C:\> cacls [filename] /E /G Everyone:R
Again, in cacls, R includes Read and Execute.

And, finally, Hal did:
# chmod -R go-rwx /some/directory
Which, mapped to our own personal insanity, is:
C:\> cacls [directory] /p [user]:F /t
In this one, we've used the /p to replace the existing ACLs for the given user name, which we are giving full control (F), in a recursive fashion (/t).

Be careful when playing with icacls and cacls, because they are a great way to very much hose up your system. The icacls command has an option to save ACLs into a file for later inspection or even restoring from:

C:\> icacls [file_or_dir] /save [aclfile]

Again, we have a /t or /c option here. The restore function is invoked with /restore. It should be noted that the aclfile holds only relative paths. Thus, if you are going to do a restore, you need to run the command in the same directory that you used for the icacls /save.