Tuesday, May 25, 2010

Episode #96: Hardware Death Watch

Hal's Laptop Is Having Issues

This is pretty much a SANS instructor's worst nightmare. I'm headed out to teach Forensics 508 in VA Beach, and I fire up my laptop to get some work done on the plane. The CPU fan makes a choking sound, the laptop beeps, and the screen flashes "Fan error". Thankfully, a little gentle coercion rendered the system bootable, but I'm clearly looking at a complete fan failure in the near future. So I want to keep an eye on my hardware so I can prevent an incident that involves the magic smoke.

There are a number of different ways of getting information about your hardware on Linux. The simplest is probably lshw:

# lshw
elk
description: Notebook
product: 7668CTO
vendor: LENOVO
version: ThinkPad X61s
serial: LVA9486
width: 64 bits
capabilities: smbios-2.4 dmi-2.4 vsyscall64 vsyscall32
configuration: administrator_password=disabled boot=normal chassis=notebook...

lshw provides a ton of other info on your BIOS, CPU(s), memory, disk drives, display and so on-- almost 400 lines of output on my laptop! Note that there's also the report-hw command which reports similar information, but was designed to help with debugging hardware auto-detection and so has lots of extra output that makes things less readable overall.

While lshw is good for getting an overview of the hardware configuration of your system, it doesn't probe any of the internal hardware sensors in your computer. To talk to the sensors in your CPU(s) and disk drives, you'll need a couple of other packages that are standard with most Linux distros these days: lm-sensors and smartmontools. lm-sensors interacts with the CPU sensors and smartmontools lets you get information from your disk drives, assuming they're modern enough to support the SMART device interface.

To get started with the lm-sensors package, you'll need to load the appropriate kernel modules for your device. Happily, the package includes a tool called sensors-detect that will auto-detect the kernel modules you need, and even offer to update your configuration so that the appropriate modules will be automatically loaded whenever your system boots. Here's an excerpt from the output of this program:

# sensors-detect
# sensors-detect revision 5249 (2008-05-11 22:56:25 +0200)

This program will help you determine which kernel modules you need
to load to use lm_sensors most effectively. It is generally safe
and recommended to accept the default answers to all questions,
unless you know what you're doing.

We can start with probing for (PCI) I2C or SMBus adapters.
Do you want to probe now? (YES/no): yes
[...]

Now follows a summary of the probes I have just done.
Just press ENTER to continue:

Driver `coretemp' (should be inserted):
Detects correctly:
* Chip `Intel Core family thermal sensor' (confidence: 9)

I will now generate the commands needed to load the required modules.
Just press ENTER to continue:

To load everything that is needed, add this to /etc/modules:

#----cut here----
# Chip drivers
coretemp
#----cut here----

Do you want to add these lines automatically? (yes/NO) yes
# cat /etc/modules
# /etc/modules: kernel modules to load at boot time.
#
# This file contains the names of kernel modules that should be loaded
# at boot time, one per line. Lines beginning with "#" are ignored.

loop
lp
rtc

# Generated by sensors-detect on Sun May 23 10:59:47 2010
# Chip drivers
coretemp

Once the appropriate drivers are loaded, you can just run the sensors command-- and you don't even have to be root:

$ sensors
acpitz-virtual-0
Adapter: Virtual device
temp1: +39.0°C (crit = +127.0°C)
temp2: +39.0°C (crit = +100.0°C)

thinkpad-isa-0000
Adapter: ISA adapter
fan1: 3872 RPM
fan2: 0 RPM
temp1: +39.0°C
temp2: +46.0°C
temp3: +46.0°C
temp4: +37.0°C
ERROR: Can't get value of subfeature temp5_input: Can't read
temp5: +0.0°C
ERROR: Can't get value of subfeature temp6_input: Can't read
temp6: +0.0°C
ERROR: Can't get value of subfeature temp7_input: Can't read
temp7: +0.0°C
ERROR: Can't get value of subfeature temp8_input: Can't read
temp8: +0.0°C
temp9: +42.0°C
temp10: +38.0°C
ERROR: Can't get value of subfeature temp11_input: Can't read
temp11: +0.0°C
ERROR: Can't get value of subfeature temp12_input: Can't read
temp12: +0.0°C
ERROR: Can't get value of subfeature temp13_input: Can't read
temp13: +0.0°C
ERROR: Can't get value of subfeature temp14_input: Can't read
temp14: +0.0°C
ERROR: Can't get value of subfeature temp15_input: Can't read
temp15: +0.0°C
ERROR: Can't get value of subfeature temp16_input: Can't read
temp16: +0.0°C

coretemp-isa-0000
Adapter: ISA adapter
Core 0: +39.0°C (high = +100.0°C, crit = +100.0°C)

coretemp-isa-0001
Adapter: ISA adapter
Core 1: +39.0°C (high = +100.0°C, crit = +100.0°C)

Clearly, not all temperature sensors are supported on all CPU architectures. But at least this allows me to keep up my morbid death watch on my fans and my CPU temp.

The smartmontools package includes the smartctl command for probing your disk drives. The easiest way to get started is to just use the "-a" option to dump all available info about your drive:

# smartctl -a /dev/sda
smartctl version 5.38 [x86_64-unknown-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Device Model: ST9500420AS
Serial Number: 5VJ09ARF
Firmware Version: 0002SDM1
User Capacity: 500,107,862,016 bytes
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: 8
ATA Standard is: ATA-8-ACS revision 4
Local Time is: Sun May 23 11:11:33 2010 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

[...]

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 118 099 006 Pre-fail Always - 188548774
3 Spin_Up_Time 0x0003 100 098 085 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 269
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 074 060 030 Pre-fail Always - 27946375
9 Power_On_Hours 0x0032 098 098 000 Old_age Always - 2501
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 037 020 Old_age Always - 202
184 Unknown_Attribute 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
188 Unknown_Attribute 0x0032 100 099 000 Old_age Always - 121
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 061 051 045 Old_age Always - 39 (Lifetime Min/Max 28/39)
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 8
193 Load_Cycle_Count 0x0032 098 098 000 Old_age Always - 5470
194 Temperature_Celsius 0x0022 039 049 000 Old_age Always - 39 (0 11 0 0)
195 Hardware_ECC_Recovered 0x001a 047 043 000 Old_age Always - 188548774
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 115611929676228
241 Unknown_Attribute 0x0000 100 253 000 Old_age Offline - 978381419
242 Unknown_Attribute 0x0000 100 253 000 Old_age Offline - 1156631671
254 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0
[...]

Again, there's a ton of other output from this command, which I'm not showing in the interests of space. There are smartctl options to just dump out specific pieces of the above info, and they're all documented in the manual page.

Frankly, I think it's pretty cool that I can retrieve the disk model and serial number without having to crack the case. I can also read the temp of the drive itself and at the airflow output, which is of interest to me right now. But as you can see, I can also get information about the number of hours on the drive and so on. This could be used to help alert you to drives that may be needing replacement before they actually fail.

So there's plenty of information available for me to keep an eye on things this week as I'm teaching my class. Keep your fingers crossed for me. In the meantime, let's see what Ed and Tim have up their sleeves.

Ed responds a little nervously:
I get shivers thinking about system failure as a presenter at conferences. I used to travel with two laptops to keep my mind at ease, but in the past year, I started carrying just one as my back started to hurt. Rumor has it that there are backup laptops that will materialize in an instant at a conference, but you never know. USB tokens with backup presentations are a good idea.

As for commands to check hardware information on Windows, our little friend WMIC comes in handy. For details about the motherboard, we could run:
C:\> wmic baseboard list full

The output here will show us Manufacturer and SerialNumber, among other things.

For CPU information, we can run:
C:\> wmic cpu list full

This will show us a description of the CPU, its manufacturer, and speed.

But, in that glut of output, there are also a couple of useful items that may indicate trouble on our system. Let's zoom in on them:
C:\> wmic cpu get currentclockspeed,maxclockspeed

If you see a big difference in these numbers, it could be due to a couple of reasons. First off, your system may be running under a low power condition, so it slows down the processor to save power, making currentclockspeed lower than maxclockspeed. That's nothing to worry about. The other condition, however, is that your system has gotten kinda hot, so it's slowing itself down. That's something to worry about.

To get a feel for the temperature of your system, you could run:
C:\> wmic /namespace:\\root\wmi PATH MSAcpi_ThermalZoneTemperature get 
CurrentTemperature
CurrentTemperature
3172
Now, it should be noted that pulling this temperature data isn't supported on all hardware, and on some hardware, it never changes beyond boot time. Still, on many modern non-virtual systems, it'll tell you your temperature in tenths of degree Kelvin. I just went to Google and did a search for "317.2 degrees kelvin to" and before I finished typing, the predictive search responded with:
317.2 kelvin = 44.05 degrees Celsius

Cool, Google. A little creepy, but cool. "Google: A little creepy, but cool" should be Google's new motto, supplanting "Don't be Evil."

Of course, then, I type "44.05 degrees Celsius to f" and it pops up and tells my system is running at 111.29 degrees Fahrenheit. Toasty.

The ScriptInternals guys have put together a list of the items you can read using this command besides the CurrentTemperature. You can pull all of this data with:

C:\> wmic /namespace:\\root\wmi PATH MSAcpi_ThermalZoneTemperature get *

While all this temperature stuff is nice, what about a prediction of whether our hard drive is hosed? We can pull that information with:

C:\> wmic /namespace:\\root\wmi PATH MSStorageDriver_FailurePredictStatus get 
predictfailure
PredictFailure
FALSE

Whew, that's a relief. If this output says TRUE, your drive is ready to give up the ghost soon, so you should backup immediately! You don't want to fall into the "Hal Pomeranz conference laptop deathwatch trap".

Tim sometimes wish a presenter's laptop would die:

We've all been there, a presentation where the presenter is just reading every word on every slide with no extra content or commentary. That presenter's laptop need to die, to take one for the team so the rest of us can live.

I've heard Ed and Hal present, both are great speakers, so their laptops are not required to become martyrs. Let's give them a bit of a check up.

Checking a laptop's status in PowerShell is very similar to what Ed did. Here are the PowerShell versions of Ed's commands.

Motherboard - Manufacturer and Serial Number:
PS C:\> gwmi win32_baseboard


CPU information - Description, Manufacturer, and Speed:
PS C:\> gwmi win32_processor


Temperature:
PS C:\> Get-WmiObject -class MSAcpi_ThermalZoneTemperature -Namespace root\WMI


We have the same problem as Ed, Kelvin. Let's convert to Fahrenheit. My undergraduate degree was in Engineering, and I had to take a Thermodynamics class. One thing I remember is that 0 Kelvin is 273.15 Celsius. I also remember how to convert Celsius to Fahrenheit: add 40, multiply by 9, divide by 5, and finally subtract 40. Here it is in only line.

PS C:\> (((Get-WmiObject -class "MSAcpi_ThermalZoneTemperature" -Namespace
"root\WMI").CurrentTemperature / 10 - 233.15) * 9 / 5) - 40

124.79


Let's check the drive status:
PS C:\> Get-WmiObject -class MSStorageDriver_FailurePredictStatus -Namespace root\WMI | Select Active, PredictFailure

Active PredictFailure
------ --------------
True False


Good news, the drive is alive, and not predicted to die!

One other think I like to check is the battery:

PS C:\> gwmi Win32_Battery | select est*

EstimatedChargeRemaining EstimatedRunTime
------------------------ ----------------
97 231


I can run for almost 4 hours. That's a long presentation, and a lot of slides to read.

Tuesday, May 18, 2010

Episode #95: I Screen, You Screen, We All Screen for...

Ed's Tan, Rested, and Ready:

I'm back from vacation, and wanted to thank my fellow CLKF'ers for holding down the fort while I was away. Tim and Hal did a bang up job responding to the hundreds of thousands of e-mails from adoring fans, managing the hordes of Bodacious Research Assistants on the 83rd floor of Kung Fu Towers (our skyscraper that holds the world-wide headquarters of our blog and the infrastructure necessary to support it), and dealing with any IT issues that came up in our shop while I was absent. Tim mentioned to me that one of these issues dealt with a user whose GUI was giving him problems. He was complaining that the program didn't fit on his screen. Hmmmm... probably an issue with the screen resolution.

We can check the screen resolution with cmd.exe of a remote system using every Window user's best friend at the commandline, wmic, thusly:
C:\> wmic /node:IPaddr /user:Admin /password:Password desktopmonitor
get screenwidth, screenheight


ScreenHeight ScreenWidth
600 800
So, we can see that this GUI was a little tiny by modern standards. Tim provided some verbal coaching to the user about how to change this, and... voila! Problem solved.

Even one's best friends can be annoying sometimes, and wmic certainly has its frustrating parts. Note how we asked for screenwidth followed by screenheight, but wmic gave them to us backwards? That's because wmic always returns attributes in alphabetical order by attribute name (screenheight is alphabetically before screenwidth). The alphabetical fetish is hard coded into wmic, and there's no way around it using wmic by itself. That's why I usually manual alphabetize the attributes I ask for in my wmic commands. It makes me feel like my computer is doing what I want. I ask for them alphabetically, and it gives them to me alphabetically. You see, one doesn't use cmd.exe... it uses you.

But, who wants to look at screen resolutions listed backwards (600X800)? Clearly, alphabetical order here is lame. We've gotta reverse that, which we can do with our other little buddy, the cmd.exe FOR /F loop, a quirky little parser dude:

C:\> for /f "skip=1 tokens=1,2" %i in ('"wmic /node:IPaddr /user:Admin 
/password:password desktopmonitor get screenheight, screenwidth"')
do @echo %jX%i
1024X768
Here, I'm running a FOR /F loop to parse the output of my wmic command. I set my parsing options to skip down 1 line (because I want to bypass the column titles), and tokenize around the first and second columns of my output. My iterator variable will be %i, and because I have two tokens, %j will be automagically allocated. I then include my wmic command, which is inside of single quote double quotes (' "). The single quote tells the FOR loop I'll be executing a command. The double quotes lets me use a command that has special characters in it, such as a comma, without having to resort to the funky ^ character to escape it. It reads a little nicer this way. Just a little.

Note that in my command, I have alphabetized my requested attributes, my standard practice with wmic, so that I can more easily keep in my head their order when dealing with them in the body of my parsing loop. Finally, in the body of the loop (after the "do"), I turn off command display (@) and echo out my variables, reversed, with an X in between (for resolution). So, we see %jX%i, or 1024X768 in this example.

Whew! That's ugly... but it is easily extensible for all kinds of wmic madness.

Furthermore, remember that we can replace our /node:IPaddr with /node:@filename, having a file with one IP address or machine name per line, and we can pull information from a bunch of boxes about their screen resolution.

Unfortunately, there is no way to alter the screen resolution at the cmd.exe command line using only built-in tools. The wmic desktopmonitor alias has no callable methods, nor does the desktop alias. There are some great third party tools for doing so, like Display Changer, which is free for personal and educational use.

Tim is pale, tired, and slow

Those silly people and their GUI's. All sorts of problems with color and resolution. Ironically, the problem was found via the command line since the user wasn't able to determine the resolution he was running.

Here is the PowerShell version of the command. It is very similar to Ed's command, except it will (usually) prompt for credentials via a dialog box (GUI). The credentials are stored in a secure string. A secure string is encrypted in memory and zero'ed when no longer used.

PS C:\> $cred = Get-Credential
PS C:\> Get-WmiObject win32_desktopmonitor -ComputerName GuiMachine -Credential $cred |
select screenwidth, screenheight

screenwidth screenheight
----------- ------------
800 600
One noticeable difference between wmic and Get-WmiObject (alias gwmi) is that the full class name has to be used in PowerShell. This means that you typically have to type Win32_ (case insensitive) before the class name.

We can shorten this command to one line as well as use aliases and shortened parameter names.

PS C:\> gwmi win32_desktopmonitor -comp GuiMachine -cred (Get-Credential) | select screen*
screenheight screenwidth
------------ -----------
600 800
Let's take a step back and look at the properties of the $cred variable that holds our credentials.

PS C:\> $cred
UserName Password
-------- --------
sillyuser System.Security.SecureString
Hrm, can we see what the password contains?

PS C:\> ConvertFrom-SecureString $cred.Password
01000000d08c9ddf0115d1118c7a00c04fc297be01000000477d77c
aaec31c478b9568787c422fb10000000002000000000003660000c0
00000010000000232a3a9ecb092c10661956b28dee0f63000000000
4800000a0000000100000009d240c479361e0156ba4b63f995270de
18000000521e807650133832cfe5fc675cf3c7b8f71d4a5b0d4fa1f
114000000da5bfe8edf24c21b17a326989a82dd83ad1fb69c
Nope. There is a way, but it can only be decrypted by the same user on the same machine. This is a much safer option than typing the clear text password on the command line. If you want, you can read the details on DPAPI, but we will go into this more in a future episode.

We can even save the credentials in a file and import them for later use. First, export:

PS C:\> ConvertFrom-SecureString $cred.Passord | Out-File encryptedpass.txt
Then import the password, and recreate the credential.

PS C:\> $pass = ConvertTo-SecureString (cat encryptedpass.txt)
PS C:\> $cred = New-Object System.Management.Automation.PSCredential
-ArgumentList "myuser",$pass
You can even use another key to encrypt the exported file by using the -key parameter.

The only goofy thing with Get-Credential is that it pops up a dialog box to prompt for the credentials, silly GUI's. You can edit the registry to change the behavior so it prompts on the command line.

PS C:\> Set-ItemProperty HKLM:\SOFTWARE\Microsoft\PowerShell\1\ShellIds
-Name ConsolePrompting -Value True
Now we see the prompt on the command line.

PS C:\> PS C:\> $cred = Get-Credential
Supply values for the following parameters:
Credential
User: myuser
Password for user myuser: ****************
If we wanted to get the resolution on a number of machines we can use the following command.

PS C:\> Get-Content servers.txt |
% { gwmi win32_desktopmonitor -comp $_ -cred $cred } |
select SystemName, ScreenWidth, ScreenHeight

SystemName ScreenWidth ScreenHeight
---------- ----------- ------------
Machine1 800 600
Machine2 640 480
Machine3 1440 900
Machine4 1024 768
Let's see how easy this is for Hal...
Hal Isn't Sure Which End Is Up:

It turns out there are a couple of answers to the "What's my screen resolution?" question on a typical Unix system running some X Windows based display. First there's the old, reliable xdpyinfo command. To tell you just how old this command is, I can remember that one of the first shell scripts I ever wrote back in the 1980's parsed the output of xdpyinfo when setting up my default windowing environment. xdpyinfo dumps out a ton of information-- some useful and some not so much-- but here's a quick idiom for grabbing the screen resolution from the output:

$ xdpyinfo | awk '/dimensions:/ {print $2}'
1920x1200

And, yes, that's "width x height" unlike the Windows "standard" ordering. Crazy Unix people, what will they think of next?

However, the modern mechanism for interacting with your display(s) is the xrandr command. Short for "X Rotate and Resize", xrandr lets you query the current state of the display but, as you might guess from the command name, is really designed to allow you to manipulate the display from the command line or from within a shell script.

You can output the current display info with "xrandr -q":

$ xrandr -q
Screen 0: minimum 320 x 200, current 1920 x 1200, maximum 8192 x 8192
VGA1 connected 1920x1200+0+0 (normal left inverted right x axis y axis) 519mm x 324mm
1920x1200 60.0*+
1280x1024 75.0
1024x768 75.1 60.0
800x600 75.0 60.3
640x480 75.0 60.0
720x400 70.1
LVDS1 connected (normal left inverted right x axis y axis)
1024x768 50.0 + 85.0 75.0 70.1 60.0 40.0
832x624 74.6
800x600 85.1 72.2 75.0 60.3 56.2
640x480 85.0 72.8 75.0 60.0 59.9
720x400 85.0
640x400 85.1
640x350 85.1
0x0 0.0

This is the output from my laptop in its configuration in my office, where I have it connected to an external display ("VGA1" in the xrandr output) in addition to its internal video display ("LVDS1" for Laptop Video Display System). You can see all of the supported resolutions for each display. The "*" marks the active display(s)-- here I'm only using my external monitor at 1920x1200 just like we saw in the xdpyinfo output.

But the power of xrandr is its ability to completely control how your displays are set up. For example, here's the xrandr command I use when I'm teaching and I want my laptop display and the external projector to be showing the exact same image:

xrandr --output LVDS1 --mode 1024x768 --output VGA1 --mode 1024x768 --same-as LVDS1

But the two displays don't have to be showing the same image:

xrandr --output VGA1 --auto --output LVDS1 --auto --right-of VGA1

Here the "--auto" after each display means "choose the highest available resolution": 1920x1200 in the case of my external monitor and 1024x768 for my laptop display. And note that instead of "--same-as" I'm using "--right-of" to position the laptop display virtually to the right of my external monitor (where it sits physically on my desk). The upshot is that I can drag windows off the right-hand side of my external monitor and they'll show up on my laptop display. It's kind of cool, but my laptop display is really too small to be of much use when I'm working at my desk. By the way, there's also "--left-of", "--above", and "--below" positioning options, just like you might expect.

If I want to reset things to my default desktop environment-- laptop display off and external monitor at max resolution-- all I need to do is:

xrandr --output LVDS1 --off --output VGA1 --auto

But suppose this was a desktop machine with dual displays. Personally, I prefer to run my dual displays in "portrait" mode (more code in my display windows that way):

xrandr --output VGA1 --auto --rotate right \
--output VGA2 --auto --rotate right --right-of VGA1

The "--rotate" option handles orienting the display into portrait mode, and you can go either "right" or "left", depending how your monitor mount swivels. There's even "--rotate inverted" which I suppose might be useful if you're trying to display from a projector suspended upside-down from the ceiling (though most projectors these days have an internal setting to deal with that).

I have to say that xrandr is one of the coolest things to happen in X Windows for a while. It used to be much more painful to manipulate display configurations. But now it's totally straightforward.

Tuesday, May 11, 2010

Episode #94: A Date With Death

Hal checks into the mailbag

We received a note recently from a new reader, Ray Kano, who had a question for the blog:
Is there any way using WMIC to write a taskkill command that will kill [processes by name and] based on a date-time stamp?

Now obviously Ray is looking for a Windows solution, and I'll let Tim clean up on that side of the house since Ed is still on vacation. But the question got me thinking if there was an analogous command on Unix for killing processes by name and by date. This turns out to be a lot harder in Unix than I thought it would be, but I learned a lot in the process of figuring out the solution.

My first thought was to do something clever with /proc. I had just assumed that the date-time stamps on the /proc/<pid> directories corresponded with the date the process was spawned. Nothing could be further from the truth:

# uptime
14:55:26 up 23:51, 6 users, load average: 0.43, 0.26, 0.19
# date
Sun May 2 14:55:28 PDT 2010
# stat /proc/1
File: `/proc/1'
Size: 0 Blocks: 0 IO Block: 1024 directory
Device: 3h/3d Inode: 533233 Links: 7
Access: (0555/dr-xr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2010-05-02 14:55:31.256904804 -0700
Modify: 2010-05-02 14:55:31.256904804 -0700
Change: 2010-05-02 14:55:31.256904804 -0700

Note that the system has been up just under a day, but the MAC times on the /proc/1 directory belonging to the init process are all set to the moment I ran the stat command to retrieve the data. Now I did this test on my Linux system and haven't checked other Unix platforms, but clearly relying on /proc isn't going to be a portable solution.

My next thought was to check and see if the killall or pkill commands had options for selecting processes based on date and time. It turns out pkill has the "-o" and "-n" options for killing the oldest or newest processes that match your search criteria, but nothing more selective than that. killall is no help at all.

If you've been reading this blog for a while, you can probably guess where I went next: my "little friend" lsof. But guess what? As far as I can tell, lsof has no capability to even output the starting date and time of a process, much less select processes based on that information.

This was starting to really interest me now. I was sure that the kernel keeps track of the starting date and time of each process, but there didn't seem to be any simple way of getting at this data. In desperation, I started reading the ps manual page and discovered that you can get ps to output a couple of different time values: the "start_time" and the "etime", which is short for "elapsed time". Let's check out "start_time" first with the ps output from one my Linux servers that's been up for a while:

# $ ps -eo pid,comm,start_time
PID COMMAND START
1 init 2009
...
1009 sendmail Jan04
1020 sendmail Jan04
...
29399 sshd 12:11
29401 sshd 12:11
...

The "-o" option allows me to specify a list of fields to output. Note that the names of the various fields and the list of available fields can vary from OS to OS, but the ones I'm using here are pretty standard across many Unix variants.

But.. *yuck*! The output format here is not helpful at all. Processes that were started today show up with HH:MM format. But processes started yesterday or earlier just show up as MonDD, and processes started before Jan 1 of the current year show up as YYYY. I can't do anything useful with this stuff.

Keeping my fingers crossed, I tried using "etime" instead of "stime":

# ps -eo pid,comm,etime
PID COMMAND ELAPSED
1 init 369-07:14:45
...
1009 sendmail 118-01:28:11
1020 sendmail 118-01:28:10
...
29399 sshd 03:47:33
29401 sshd 03:47:31
...
29777 ps 00:00
...

OK, I can work with this. The elapsed time format is [[days-]HH:]MM:SS, which is still kind of a pain but not impossible. I can easily break each line up into a number of tokens. But the problem is that sometimes minutes and seconds will be the third and fourth tokens, sometimes the fourth and fifth tokens, and sometimes even the fifth and sixth tokens. Life would be better if we could reverse the time format so that it was SS:MM[:HH[-days]], which would make everything nice and regular.

I can handle the necessary field reversal with a little awk fu:

# ps -eo pid,comm,etime | tail -n +2 | sed 's/[-:]/ /g' | \
awk '{print $1, $2, $6, $5, $4, $3}'

1 init 59 23 07 369
...
1009 sendmail 25 37 01 118
1020 sendmail 24 37 01 118
...
29399 sshd 47 56 03
29401 sshd 45 56 03
...
29803 ps 00 00
...

Here I'm using the tail command to drop the initial header line and then using sed to turn the dash and colons in the time format to spaces. From there it's a matter of using awk to selectively reverse the last four fields of output. awk doesn't complain if some of the fields don't exist, it simply outputs an empty string.

With the fields now in a canonical order, all I need to do is convert the time value into a format that's useful for comparisons-- like say total elapsed seconds:

# ps -eo pid,comm,etime | tail -n +2 | sed 's/[-:]/ /g' | \
awk '{print $1, $2, $6, $5, $4, $3}' | \
awk '{print $1, $2, ($3 + $4 * 60 + $5 * 3600 + $6 * 86400)}'

1 init 31908525
...
1009 sendmail 10201331
1020 sendmail 10201330
...
29399 sshd 14493
29401 sshd 14491
...
29809 ps 0
...

That's more like it! So I've demonstrated that I can get to a list of PIDs, process names, and total seconds that the process has been running. I'm sure that if I thought about it some more, I could come up with a single awk statement to do what I'm doing with two statements above, but I think the above code is clearer and it wasn't really that hard to type.

But remember the original request was for a command to kill processes by name and date-time stamp, and not just output data for all processes. So our second awk statement is going to change anyway. Let's suppose that we wanted to kill all sshd processes that had been around for longer than 10 days. We could output the PIDs of the matching processes as follows:

# ps -eo pid,comm,etime | tail -n +2 | sed 's/[-:]/ /g' | \
awk '{print $1, $2, $6, $5, $4, $3}' | \
awk '($2 == "sshd") && (($3 + $4 * 60 + $5 * 3600 + $6 * 86400) > 864000) {print $1}'

5725

Other queries would be simpler. For example, let's output the PIDs of all sshd processes that have been active less than one day:

# ps -eo pid,comm,etime | tail -n +2 | sed 's/[-:]/ /g' | \
awk '{print $1, $2, $6, $5, $4, $3}' | \
awk '($2 == "sshd") && ($6 == "") {print $1}'

4727
4729
4805
4807
29399
29401

Here all we're doing is confirming that the sixth field is unset, which must mean that the process has been running less than one day. We don't need to do any math at all.

Anyway, now that we can select and output PIDs at will, the final solution is just putting the whole command in backticks and using it as an argument to the kill command:

# kill -9 `ps -eo pid,comm,etime | ...`

Whoosh! That sure was a lot of work for a simple request! I'm sort of shocked that Unix makes this so difficult. Could this be an opportunity for Tim to show me up with some Windows magic?

Tim opens Ed's mail

If "glory is fleeting, but obscurity is forever" (Napoleon) then that "fu" is going to live longer than either of us. Too bad Ed isn't here to bask in the glory of how easy this is in Windows. Of course, he is basking in the sun on vaction this week.

While Ed is gone, I like take a peak through his mail. Bills. Junk. More Bills. Victoria Secret catalog. A shipment of peanut butter, a stuffed water bufallo and some latex? Uh...Anyway, I did steal some of it too. Not the "other" stuff, but this easy episode.

Way back in episode 22, Ed killed process with wmic. This topic has been revisited a few times, including my favorite episode, Advanced Process Whack-a-Mole. If "wmic process" were a dead horse, we would have severely beaten it. We do have the new twist of searching based on the creation date, and it is pretty easy.

C:\> wmic process where (name="cmd.exe" AND creationdate ^< "20100511060000.000000-300") delete
The date format is yyyymmddhhmmss.mmmmmm-TTT. I have no idea what the -300 means Edit: Where the TTT is the timezone and it is required in the query. If you remove it you will get an Invalid Query error. In my case -300 represents my timezone (GMT -6).

Also, we have to escape any greater than or less than signs. The greater than and less than signs are used for redirection and the caret (^) character is used to escape it. I don't know how to make it sound more confusing, like Hal's section.

Tim opens his mail

This task is even easier in PowerShell, and it is pretty self explanatory, too.

C:\> Get-Process cmd | ? { $_.StartTime -lt "2010/5/11 6:00" } | Stop-Process
We can even try to find processes that have been running for longer than an hour.

C:\> Get-Process cmd | ? { $_.StartTime -lt (Get-Date).AddHours(-1) } | Stop-Process
In both cases, we use Get-Process to find processes named cmd. The next step is to filter based on the start time. Finally, we kill it.

Sorry Hal, for not making this portion totally unreadable and for not making this way more complicated that it should be. Got a bit of shell envy this week?

Signed, sealed, delivered.

Tuesday, May 4, 2010

Episode #93: Of Ports and Paths

Tim is sweating in Texas:

This week's episode is inspired by another one of our readers, Aaron Goad. He was working on a cool bit of fu to map out all of the executables that are listening for incoming network connections. Based on the information gathered, he hoped to create profiles for the different server types in a given environment. The data would be used to create histograms based on server types, and make it easy find one off processes that could be anything from a backup client to a netcat backdoor. He is planning on writing a paper on it for his SANS Gold Certification. Good luck Aaron. Aaron also sent us his command which was 99% of the way there, but there was one problem, which I'll explain later.

I'm in Texas this week visiting some family. It is only May, but dang it is hot. Ed is out this week on vacation, so I'm going to work overtime this week in this (literal) sweat shop. I'm hoping I'll get paid overtime too. Let's see $0 times 1.5 times...never mind.

We'll start off with the command in the classic Windows shell since that is what Aaron sent us. Here is my version of the command.

C:\> for /f "tokens=1,2,3,7 delims=: " %a in ('netstat -nao ^| find 
^"LISTENING^" ^| find /v ^"::^"') do @(for /f "tokens=1,*" %n in ('"wmic process
where processId=%d get caption,executablepath | find ".""') do @echo Protocol=%a,
IP=%b, Port=%c, PID=%d, Name=%n, Path=%o)


Protocol=TCP, IP=0.0.0.0, Port=135, PID=776, Name=svchost.exe,
Path=C:\Windows\system32\svchost.exe
Protocol=TCP, IP=0.0.0.0, Port=912, PID=2368, Name=vmware-authd.exe,
Path=C:\Program Files\VMware\VMware Player\vmware-authd.exe
Protocol=TCP, IP=0.0.0.0, Port=49153, PID=892, Name=svchost.exe,
Path=C:\Windows\System32\svchost.exe
Protocol=TCP, IP=0.0.0.0, Port=49154, PID=952, Name=svchost.exe,
Path=C:\Windows\system32\svchost.exe
Protocol=TCP, IP=0.0.0.0, Port=49155, PID=520, Name=lsass.exe,
Path=C:\Windows\system32\lsass.exe
Protocol=TCP, IP=0.0.0.0, Port=49157, PID=512, Name=services.exe,
Path=C:\Windows\system32\services.exe
...
I did cheat a little this week by filtering out all IPv6 addresses. All the extra colons really screw up our makeshift parser. IPv6 addresses are filtered out by removing all the lines containing "::" using the /v switch with find.

The cleaned up netstat output, which is just IPv4 listeners, is split using our For loop. Regular readers are well aware that there is no good way to parse text in the classic shell, so we have to use our good ol' For loop for this task (again). We use the delimiters colon and space to get the 1st, 2nd, 3rd, and 7th tokens which represent the protocol, local address, local port, and process id respectively.

Next, we need to use wmic to get the executable name and path. This is the part that caused the problems for Aaron. When wmic returns the properties, it sorts the properties alphabetically. The ExecutablePath property comes before the Name property. So what? Well, the path typically contains spaces which our parser uses as delimiters. There isn't a way to know how many spaces are in the path, so we don't know which variable will contain the Name property. The problem can be fixed by getting the name property first, but how? The Caption property contains the same value as the Name property and C comes before E. Problem solved. We can then use the 1st and *th tokens, where the 1st is the Caption and the *th contains the rest of the line (the Executable Path).

Now we have all the values we want:
%a tcp or udp
%b local ip
%c local port
%d is pid
%n is name
%o is executable path

With these variables we can dump them to a file or do what ever we want with them.

Tim's second shift, PowerShell

Unfortunately, PowerShell does not include a nice objectified version of netstat, so we will have to parse it ourselves. However, we do have regular expressions to help us parse.

Here is the command in PowerShell.

PS C:\> netstat -ano | 
? { $_ -match [regex]'\s+(?<Protocol>\S+)\s+(?<LocalAddress>(\[.*?\])|([0-9\.]+)):
(?<LocalPort>\d+).+LISTENING.+?(?<PID>\d+$)' } |
select @{Name="Protocol";Expression={$matches.Protocol}},
@{Name="LocalAddress";Expression={$matches.LocalAddress}},
@{Name="LocalPort";Expression={$matches.LocalPort}},
@{Name="Name";Expression={(Get-Process -id $matches.PID).Name}},
@{Name="Path";Expression={(Get-Process -id $matches.PID).Path}}


Protocol LocalAddress LocalPort Name Path
-------- ------------ --------- ---- ----
TCP 0.0.0.0 135 svchost C:\Windows\system32\svchost.exe
TCP 0.0.0.0 445 System
TCP 0.0.0.0 49154 svchost C:\Windows\system32\svchost.exe
TCP 0.0.0.0 49155 lsass C:\Windows\system32\lsass.exe
TCP 0.0.0.0 49157 services C:\Windows\system32\services.exe
TCP 192.168.70.1 139 System
TCP [::] 135 svchost C:\Windows\system32\svchost.exe
TCP [::] 445 System
TCP [::] 49152 wininit C:\Windows\system32\wininit.exe
TCP [::] 49157 services C:\Windows\system32\services.exe
TCP [::1] 49159 ccApp C:\Program Files\Common Files\...
...
This command looks really nasty, but it isn't too bad. It is just three portions.

netstat -ano | [regular expression] | [output cleanup]

The middle section uses a regular expression for filtering and for named groups (also called named captures or named capture groups). It will filter out lines that do not contain LISTENING so we are left with only listeners. The named capture groups will contain the protocol, local address, local port, and process id (pid). The syntax for a capture groups is (?<Name>Expression). The variable $matches contains the information for the named captures, and it can be used later in the command in our output.

Next, we then use select object and calculated properties to clean up the output into a nice object. The calculated properties, also called custom columns, are created using a hashtable. A hashtable is specified by @{ key1=value1, key2=value2, ... }. The hashtable for a calculated property uses the Name and Expression keys. Our first three custom columns are just our named captures from the regular expression. The remaining two columns require a bit more work. Inside the property expression we use Get-Process to retrieve the details for a process and then select the property we want, name and path.

It does take a little more work to get the command into a nice object, but it does make it easy to export or pipe into other commands.

So there is all the Windows fu for the week. Hal, whatcha got?

Hal is sweating a bit in Oregon too:

I have to admit at first I was feeling pretty cocky about this one. "Oh gee, I have to parse the output of several commands and produce a nice report? <sarcasm>That's really tough for us Unix folks!</sarcasm>"

The easy part was pulling the basic information together. I'm going to use my little friend "lsof -i" to dump information about network sockets on the system, using the "-n" (show IPs, not hostnames) and "-P" (show port numbers, not port names) options. A little awk fu will get us the PID, protocol, address, and port information for just the processes that are in "LISTEN" state:

# lsof -nP -i | awk '/LISTEN/ {print $2 " " $7 " " $8}'
...
4107 TCP *:902
4219 TCP *:8903
4219 TCP *:8902
...
18877 TCP 127.0.0.1:53
18877 TCP 10.66.1.2:53
18877 TCP 172.17.18.1:53
18877 TCP 172.17.17.1:53
18877 TCP 127.0.0.1:953
18877 TCP [::1]:953
...

I've edited the output here a bit in the interests of space, but I've left in a few representative entries that will turn out to be interesting in various ways.

Our first issue is splitting the port numbers from the IP addresses. As Tim points out, IPv6 addressing makes this a little more difficult than just splitting on colons. I decided to opt for a sed soltution:

# lsof -nP -i | awk '/LISTEN/ {print $2 " " $7 " " $8}' | sed -r 's/:([0-9]+)$/ \1/'
...
4107 TCP * 902
4219 TCP * 8903
4219 TCP * 8902
...
18877 TCP 127.0.0.1 53
18877 TCP 10.66.1.2 53
18877 TCP 172.17.18.1 53
18877 TCP 172.17.17.1 53
18877 TCP 127.0.0.1 953
18877 TCP [::1] 953
...

Here my sed expression is matching the last "colon followed by some digits" at the end of the line and replacing that with a space followed by those digits. This effectively removes the colon and inserts a space. A little ugly, but I'm not working up a sweat so far.

The next trick is getting the executable path. Unfortunately, this is where everything goes pear-shaped. My little friend lsof only outputs the base name of the command, and will even truncate the command name if it exceeds 9 characters, so that's no help. But then I recalled that the /proc file system contains the information we need:

# readlink /proc/4219/exe
/usr/lib/vmware/bin/vmware-hostd

The /proc file system features a /proc/<pid>/exe is a symlink that points to the executable file. But guess what? This is only a feature of the Linux /proc file system. Unfortunately, other Unix operating systems (e.g. Solaris) may not have this link. So I needed to come up with something more portable.

When in doubt, dip back into the lsof bag of tricks:

# lsof -a -p 4219 -d txt
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
vmware-ho 4219 root txt REG 253,2 49355280 230822 /usr/lib/vmware/bin/vmware-hostd

Here I'm using lsof to dump the files related to the "text segment" ("-d txt") for PID 4219 ("-p 4219"). The "-a" option does a logical "and" of the two conditions rather than "or" which is (rather oddly, IMHO) the default for lsof.

As you can see, on my Linux system the output is a header line plus a line that describes the executable. On other Unix architectures, however, you may also get a bunch of additional lines that describe all of the shared libraries required by the executable. The good news is that the actual executable is always listed first. So the next trick is to extract the last field from the first line after the header:

# lsof -a -p 4219 -d txt | awk '/txt/ {print $NF}' | head -1
/usr/lib/vmware/bin/vmware-hostd

Here I'm matching on the string "txt" in the non-header lines and dumping the last field with $NF. I then use head to make sure I only get the first non-header line just in case there are multiple lines of output.

Looking good so far, but check out this interesting example:

# lsof -a -p 4107 -d txt | awk '/txt/ {print $NF}' | head -1
(deleted)
# lsof -a -p 4107 -d txt
COMMAND ... NAME
vmware-au ... /usr/sbin/vmware-authdlauncher.#prelink#.gvYLje (deleted)

Here I've edited out the middle columns of output from the second command so you can more clearly see what's going on. Our hero VMware is running an executable that was subsequently deleted. Because our first command using $NF to dump out the last field delimited by whitespace, we just get the "(deleted)" bit. The work-around is to explicitly dump the 9th column (the executable path) and then the 10th column (the "deleted" marker) if it exists:

# lsof -a -p 4107 -d txt | awk '/txt/ {print $9 " " $10}' | head -1
/usr/sbin/vmware-authdlauncher.#prelink#.gvYLje (deleted)
# lsof -a -p 4219 -d txt | awk '/txt/ {print $9 " " $10}' | head -1
/usr/lib/vmware/bin/vmware-hostd

Whew! With me so far? We're in the home stretch now. All we have to do is take our initial lsof pipeline that outputs PID, protocol, IP, and port and combine that with our hack to recover the executable names:

# lsof -nP -i | awk '/LISTEN/ {print $2 " " $7 " " $8}' | sed -r 's/:([0-9]+)$/ \1/' | \
while read pid rest; do
echo "$rest" `lsof -a -p $pid -d txt | awk '/txt/ {print $9 " " $10}' | head -1`;
done

...
TCP * 902 /usr/sbin/vmware-authdlauncher.#prelink#.gvYLje (deleted)
TCP * 8903 /usr/lib/vmware/bin/vmware-hostd
TCP * 8902 /usr/lib/vmware/bin/vmware-hostd
...
TCP 127.0.0.1 53 /usr/local/depot/bind/9.6.1-P1/sbin/named
TCP 10.66.1.2 53 /usr/local/depot/bind/9.6.1-P1/sbin/named
TCP 172.17.18.1 53 /usr/local/depot/bind/9.6.1-P1/sbin/named
TCP 172.17.17.1 53 /usr/local/depot/bind/9.6.1-P1/sbin/named
TCP 127.0.0.1 953 /usr/local/depot/bind/9.6.1-P1/sbin/named
TCP [::1] 953 /usr/local/depot/bind/9.6.1-P1/sbin/named
...

This looks pretty fugly, but it's actually quite simple. We're using a while loop to read the output of our first lsof command line-by-line. We pull the PID out of the first field of each line and then save the rest in $rest. The only statement inside the while loop simply echoes $rest followed by the executable path name extracted by our crazy lsof concoction.

Alert readers may note that my echo statement includes quotes around $rest. Why did I do that? Well remember that in many cases in our output the IP address appears as "*". If we just did "echo $rest" without quotes around $rest, then the "*" would actually be interpolated as a shell glob and we'd end up echoing the contents of whatever directory we were in when we ran the command. This is definitely not what we want!

I can't say that I'm overly happy with the amount of code I needed to sling around to solve this week's puzzle. The Linux-specific solution that uses readlink is much cleaner, but I'll leave that one as an exercise to the reader.