Tuesday, October 27, 2009

Episode #66: Log Jam

Tim logs in:

This episode we take a look at logs, the window to the soul of your computer. Ok, maybe not, but we'll still look at them anyway. Windows has different ways to view the Event Log via the command line depending on the version.

In Windows 2003, XP, and older versions the classic event logs were Application, System, and Security. Beginning with Vista, the entire event log system changed. There is now a plethora of new logs in addition to the classic logs. To fully discuss the differences we would need an entire white paper, so we will stick with getting the familiar information from the event logs. To make it even more fun, since Windows 7 and Server 2008 R2 are officially out, we now have PowerShell Version 2 with its additional cmdlets and another way to access the Event Log.

We will kick off this week with PowerShell and its methods for retrieving information from the Event Log. In v1, the only option for viewing the event log was Get-EventLog. It would "get" the standard event logs such as System or Application. In v2 we have Get-WinEvent, which allows retrieval of a wider range of logs. It also allows us to work with saved event log files (.evtx) which is a great new feature! With Vista and beyond, Get-WinEvent is the recommended method, but we will describe both cmdlets since there are a lot of XP and Windows 2003 machines in addition to Vista and Windows 2008 (R1) machines without PowerShell v2.

Here is a demonstration that there is a difference in the number of logs accessible by the powershell commands.

PS C:\> (Get-WinEvent -ListLog *).Count
164
PS C:\> (Get-EventLog -List).Count
12


Whoa, there are a lot more logs! We don't have the space to go over them all, so let's look at a practical example of searching the logs for specific events, such as an account lockout. I have a cheat sheet of Event IDs at my desk, but I'm not there right now so let's find any event containing the word "locked".

PS C:\> Get-WinEvent -logname security | ? {$_.Message -like "*locked*"} | fl
TimeCreated : 10/24/2009 8:57:20 AM
ProviderName : Microsoft-Windows-Security-Auditing
Id : 4740
Message : A user account was locked out.
Subject:
Security ID: S-1-5-19
Account Name: MYWIN7BOX$
Account Domain: MYDOMAIN
Logon ID: 0x3e7

Account That Was Locked Out:
Security ID: S-1-5-21-1111111111-2222222222-3333333333-4000
Account Name: mrjones

Additional Information:
Caller Computer Name: MYWIN7BOX

PS C:\> Get-EventLog -LogName security -Message *locked* | fl
Index : 3533232
EntryType : SuccessAudit
InstanceId : 4740
Message : A user account was locked out.

Subject:
Security ID: S-1-5-19
Account Name: MYWIN7BOX$
Account Domain: MYDOMAIN
Logon ID: 0x3e7

Account That Was Locked Out:
Security ID: S-1-5-21-1111111111-2222222222-3333333333-4000
Account Name: mrjones

Additional Information:
Caller Computer Name: MYWIN7BOX
Category : (13824)
CategoryNumber : 13824
ReplacementStrings : {test, MYWIN7BOX, S-1-5-21-1111111111-2222222222-33...}
Source : Microsoft-Windows-Security-Auditing
TimeGenerated : 10/24/2009 8:57:20 AM
TimeWritten : 10/24/2009 8:57:20 AM


You'll notice that Get-EventLog has a parameter that allows searching for a specific string in the message while Get-WinEvent does not. Oh well, you can't win them all. Now we know the Event ID we are looking for, so we can use that for the search. We pipe the command into fl, which is the alias for Format-List, so that it is easier to read. To save space going forward we'll let PowerShell use its default format (Format-Table).


PS C:\> Get-EventLog Security | ? { $_.EventId -eq 4740}
Index Time EntryType Source InstanceID Message
----- ---- --------- ------ ---------- -------
3533232 Oct 24 08:57 SuccessA... Microsoft-Windows... 4740 A user ...

PS C:\> Get-WinEvent -FilterHashtable @{LogName="Security"; ID=4740}
TimeCreated ProviderName Id Message
----------- ------------ -- -------
10/24/2009 8:57:... Microsoft-Window... 4740 A user account w...


Using the FilterHashtable parameter available with Get-WinEvent allows us to do some filtering within the command instead of further down the pipeline, which makes it much much faster. Here is the time difference:

PS C:\> measure-command {Get-EventLog Security | ? { $_.EventId -eq 4740}} |
select TotalMilliseconds

TotalMilliseconds : 1898.7783

PS C:\> measure-command {Get-WinEvent -logname security | ? {$_.ID -eq 4740}} |
select TotalMilliseconds

TotalMilliseconds : 24219.317

PS C:\> measure-command {Get-WinEvent -FilterHashtable @{LogName="Security"; ID=4740}} |
select TotalMilliseconds

TotalMilliseconds : 61.8189


The keys typically used for filtering are LogName, ID, StartTime, EndTime, and UserID (SID). You can see a full list in the help page for that parameter (Get-Help Get-WinEvent -Parameter FilterHashtable). The help page has a decent description of the parameter and its usage. The examples (Get-Help Get-WinEvent -Examples) are better at describing how it works.

Now we know there was an account locked at 8:57. To see what happened within a minute of that event the Before and After parameters are used to narrow the scope.

PS C:\> Get-EventLog -logname security -After 8:56 -Before 8:58
Index Time EntryType Source InstanceID Message
----- ---- --------- ------ ---------- -------
3533474 Oct 24 08:58 SuccessA... Microsoft-Windows... 5447 A Windo...
3533473 Oct 24 08:58 SuccessA... Microsoft-Windows... 5447 A Windo...
...

PS C:\> Get-WinEvent -FilterHashtable @{ logname="Security"; StartTime = "08:56";
EndTime = "08:58" }

TimeCreated ProviderName Id Message
----------- ------------ -- -------
10/24/2009 08:58... Microsoft-Window... 5447 A Windows Filter...
10/24/2009 08:58... Microsoft-Window... 5447 A Windows Filter...
...



Now let's try to find some other events such as service started (7036) or stopped (7035), event log service started (6006) or stopped (6005), or system shutdown (1074). These are all in the System Event Log and we can find these with one big command.

PS C:\> Get-EventLog -logname system | ? { $_.EventID -eq 7036 -or
$_.EventID -eq 7035 -or $_.EventID -eq 6006 -or $_.EventID -eq 6005
-or $_.EventID -eq 1074 }


PS C:\> Get-WinEvent -FilterHashtable @{LogName="System";
ID=7035,7036,6005,6006,1074}


In this example the Get-WinEvent command is much faster, my tests showed it to be 10x faster.

Getting context between event logs is always handy. Unlike Get-Eventlog, which limits us to viewing only one log, Get-WinEvent we can look across all the logs.

PS C:\> Get-WinEvent -FilterHashtable @{LogName="System"
ID=7035,7036,6005,6006,1074},@{LogName="Security"; ID=4740} -Oldest


We can use an array of hash tables for filtering to get all the account lockouts and all of the interesting events from the system log. These filters are a union of the results, not an intersection (adds not subtracts).

Both commands also support the -ComputerName option will allows us to interrogate another system. Get-WinEvent allows us to use another set of credentials to connect.
PS C:\> Get-EventLog -ComputerName Ed
...
PS C:\> $cred = Get-Credential
PS C:\> Get-WinEvent -ComputerName Ed -Credential $cred


The Get-Credential cmdlet will popup a prompt to ask you for your credentials. The credentials are stored in a variable then used to connect to Ed's machine.

So that is how we do it in powershell, but what about the accessing the "new" event log with regular old windows command line?

Windows Command Line - Vista, 2008 and 7

Since Vista there is a new sheriff in town for dealing with Event Log, wevtutil.exe. Too bad this is a bad sheriff. In episode 15, Ed stated, "The wevtutil query syntax is impossibly complex, and something I frankly loath." I completely agree. Everything about this command is sideways, even the default format isn't readable. I highly suggest using PowerShell since anything but the most basic query gets ugly. However, this command gives us a lot of control over the event log besides querying such as enumerating logs, getting or setting log configuration, getting log status, exporting, archiving, and clearing logs. Querying is the goal of this episode, so here is the syntax with the most common options:

wevtutil { qe | query-events } <LogName> /q:<XPathQuery>
/c:<# of events to return> /f:[XML|Text|RenderedXml]


Let's use the command to view the System log.

C:> wevtutil qe System
<Event xmlns='http://schemas.microsoft.com/win/2004/08/events/event'><System><Pr
ovider Name='Service Control Manager' Guid='{555908d1-a6d7-4695-8e1e-26931d2012f
4}' EventSourceName='Service Control Manager'/><EventID Qualifiers='49152'>7000<
/EventID><Version>0</Version><Level>2</Level><Task>0</Task><Opcode>0</Opcode><Ke
ywords>0x8080000000000000</Keywords><TimeCreated SystemTime='2009-10-23T19:07:02
.954026500Z'/><EventRecordID>433362</EventRecordID><Correlation/><Execution Proc
essID='460' ThreadID='5740'/><Channel>System</Channel><Computer>mycomputer.mydom
ain.locl</Computer><Security/></System><EventData><Data Name='param1'>Diagnostic
Service Host</Data><Data Name='param2'>%%1297</Data></EventData></Event>
<Event xmlns='http://schemas.microsoft.com/win/2004/08/events/event'><System><Pr
...


Uh oh, by default the output is in XML, which means the /f:Text (or /format:Text) option is something we are going to use a lot.

C:> wevtutil qe System
Event[0]:
Log Name: System
Source: Service Control Manager
Date: 2009-10-23T14:07:02.954
Event ID: 7000
Task: N/A
Level: Error
Opcode: N/A
Keyword: Classic
User: N/A
User Name: N/A
Computer: mywin7box
Description:
...


To search the logs you need to use an XPath query. As you remember from above we were trying to find a locked account event, here is the equivalent search using wevtutil.

C:\> wevtutil qe Security /q:*[System[(EventID=4740)]] /f:text

Event[0]:
Log Name: Security
Source: Microsoft-Windows-Security-Auditing
Date: 2009-10-24T14:08:59.279
Event ID: 4740
Task: User Account Management
Level: Information
...

In the powershell section there were five different Event IDs we were interested in, and wevtutil can do a similar search.

C:\> wevtutil qe System /q:"*[System[(EventID=7035 or EventID=7036 or
EventID=6005 or EventID=6006 or EventID=1074)]]" /f:text | find "Event ID"

Event ID: 7036
Event ID: 7036
Event ID: 1074
Event ID: 6006
Event ID: 7036
...


Unfortunately, there isn't a nice way to get results from two logs and combine them. Even worse, querying for a specific time range isn't pretty.

C:\> wevtutil.exe qe System /q:"*[System[(EventID=1074 and
TimeCreated[@SystemTime > '2009-10-23T21:04:15.000000000Z'])]]" /f:text


Enough of that nasty command, what do the rest of you guys have?

Ed kicks it Old-Skool:

On XP and 2003 boxen, we don't have wevtutil.exe... thank goodness. Instead, we have the really nice built-in VB script (Dear Mr. Editor... that's not a typo... I really did mean "really nice VB script." Please don't remove.) called eventquery.vbs. By default, eventquery.vbs runs locally, although you can provide it with a "/s " and it'll pull logs off of remote Windows boxen, provided that you have administrative SMB access of said machines. Of course, you should avoid logging in directly to a console or terminal session with admin privs, and instead use "/u \ /p [password]" to specify some user other than your current logon.

We can then specify which event log we're interested in with the /l option followed by the log name, which could include "application", "system", "security", "DNS Server", or user-defined log names. To hit all logs, specify a log of "/l *". To specify multiple logs, you can re-use /l multiple times.

The real magick with eventquery.vbs is its filter language, specified with /fi followed by a filter in quotes. Our filters are built of an event attribute name, followed by an operator, followed by a value. Attribute names include Datetime, Type, (event) ID, User, Computer, Source, and Category. We then have operators, such as eq (equals), ne (not equals), gt (I don't have to keep spelling them out, do I?), lt (of course not), le (this really isn't cute anymore), and ge (so I'll stop doing it). To specify more complex filters, just put multiple /fi "[filters]", one after the other, and the system will AND them all together.

For output, we specify /fo (which stands for "format output") followed by list, table, or csv. The default is table, but I find csv to be the most useful when I want to do more in-depth analysis at the command-line or in a spreadsheet. For quick looks, though table is prolly best.

You can list the most recent logs with the /r option followed by an integer N. If N is negative, it'll list the N oldest events. You can specify a range of integers to get the matching set of events in the logs, but I don't find that all that useful.

By default, the output of eventquery.vbs is really skimpy, showing only the Type, Event ID, Date Time, Source, and ComputerName of each log. To get more data (including descriptions and associated accounts), we can specify verbose mode, with a handy /v. That /v is hugely important, because, without it, eventquery.vbs doesn't include a description or many other really important details. I almost always use /v with this command.

So, let's apply our handy little eventquery.vbs to match Tim's searches above.

Let's start by looking for all events associated with the word "locked":

C:\> eventquery.vbs /L security /fo csv /v | find /i "locked"
"Audit Success","644","10/26/2009 5:26:48 AM","Security","WILMA","Account Manage
ment","NT AUTHORITY\SYSTEM","User Account Locked Out: Target Account Name:
bob Target Account ID: WILMA\bob Caller Machine Name: WILMA
Caller User Name: WILMA$ Caller Domain: WORKGROUP Caller L
ogon ID: (0x0,0x3E7)"
Note that I did my find to look for the string "locked" in a case-insensitive fashion with the /i. Also, note that the Event ID here is 644.

I can search based on that Event ID by specifying a filter (with /fi):

C:\> eventquery.vbs /L security /fo csv /v /fi "id eq 644"
Microsoft (R) Windows Script Host Version 5.6
Copyright (C) Microsoft Corporation 1996-2001. All rights reserved.

"Type","Event","Date Time","Source","ComputerName","Category","User","Descriptio
n"
"Audit Success","644","10/26/2009 5:26:48 AM","Security","WILMA","Account Manage
ment","NT AUTHORITY\SYSTEM","User Account Locked Out: Target Account Name:
bob Target Account ID: WILMA\bob Caller Machine Name: WILMA
Caller User Name: WILMA$ Caller Domain: WORKGROUP Caller L
ogon ID: (0x0,0x3E7)"

Like Tim, let's see what happened from a minute before up to a minute after this event occurred:

C:\> eventquery.vbs /L security /fo csv /v /fi "datetime gt 10/26/2009,05:25:48AM"
/fi "datetime lt 10/26/2009,05:27:48AM"
Microsoft (R) Windows Script Host Version 5.6
Copyright (C) Microsoft Corporation 1996-2001. All rights reserved.

"Type","Event","Date Time","Source","ComputerName","Category","User","Descriptio
n"
"Audit Success","644","10/26/2009 5:26:48 AM","Security","WILMA","Account Manage
ment","NT AUTHORITY\SYSTEM","User Account Locked Out: Target Account Name:
bob Target Account ID: WILMA\bob Caller Machine Name: WILMA
Caller User Name: WILMA$ Caller Domain: WORKGROUP Caller L
ogon ID: (0x0,0x3E7)"
"Audit Failure","680","10/26/2009 5:26:47 AM","Security","WILMA","Account Logon"
,"NT AUTHORITY\SYSTEM","Logon attempt by: MICROSOFT_AUTHENTICATION_PACKAGE
_V1_0 Logon account: bob Source Workstation: WILMA Error Code:
0xC000006A"
"Audit Failure","680","10/26/2009 5:26:45 AM","Security","WILMA","Account Logon"
,"NT AUTHORITY\SYSTEM","Logon attempt by: MICROSOFT_AUTHENTICATION_PACKAGE
_V1_0 Logon account: bob Source Workstation: WILMA Error Code:
0xC000006A"
"Audit Success","517","10/26/2009 5:26:38 AM","Security","WILMA","System Event",
"NT AUTHORITY\SYSTEM","The audit log was cleared Primary User Name:
SYSTEM Primary Domain: NT AUTHORITY Primary Logon ID: (0x0,0x3
E7) Client User Name: Administrator Client Domain: WILMA Client L
ogon ID: (0x0,0x11E5A)"
"Audit Failure","680","10/26/2009 5:26:48 AM","Security","WILMA","Account Logon"
,"NT AUTHORITY\SYSTEM","Logon attempt by: MICROSOFT_AUTHENTICATION_PACKAGE
_V1_0 Logon account: bob Source Workstation: WILMA Error Code:
0xC000006A"
"Audit Failure","680","10/26/2009 5:26:50 AM","Security","WILMA","Account Logon"
,"NT AUTHORITY\SYSTEM","Logon attempt by: MICROSOFT_AUTHENTICATION_PACKAGE
_V1_0 Logon account: bob Source Workstation: WILMA Error Code:
0xC0000234"

See here how I've bundled together two filters to implement a time range. And we see... a bunch of failed logon attempts. Well, that makes sense. That's what locked out the account.

Continuing to mimic Tim's fu, here's how we can find service started events:

C:\> eventquery.vbs /l system /fi "id eq 7036" /v
Microsoft (R) Windows Script Host Version 5.6
Copyright (C) Microsoft Corporation 1996-2001. All rights reserved.


------------------------------------------------------------------------------
Listing the events in 'system' log of host 'WILMA'
------------------------------------------------------------------------------
Type Event Date Time Source ComputerName
Category User Description
------------- ------ ----------------------- ----------------- ---------------
--------------- -------------------- -----------
Information 7036 10/26/2009 5:50:58 AM Service Control M WILMA
None N/A The Task Scheduler service entered the runn
ing state.
Information 7036 10/26/2009 5:50:16 AM Service Control M WILMA
None N/A The Task Scheduler service entered the stop
ped state.

To get information about event ID 7035 (service stopped), 6006 (event log service started), 6005 (event log service stopped), or 1074 (system shutdown), just substitute in those Event IDs in this query. "But," you might say, "Tim did all of those in a single command, with PowerShell doing a logical OR between them." Yes, and if Tim ran with scissors or jumped off the Brooklyn Bridge, would you do that too? Well, unfortunately, while eventquery.vbs does have "AND" (intersection) capabilities in its filters, I haven't been able to get it to implement an "OR" (union) style filters. If I want to do that, I simply run the command multiple times with different filters, usually separated on a single command-line with &.

So there. :P


Hal gets cut off at the knees:

You know something? I suggested the topic for this Episode. What the heck was I thinking? There's no possible way I'm going to be able to cover all of the different log file locations and log file formats for every flavor of Unix. So I'm going to stick with Red Hat type operating systems (including RHEL, CentOS, and Fedora) and hope you all will get enough hints to figure this out for the particular forest of Unix you find yourself in. Ready? Let's do it!

As far as login type events go, one fruitful source of information is wherever your system puts its LOG_AUTH type logs. The tricky bit is that Linux systems use LOG_AUTHPRIV for this instead of LOG_AUTH. So, let's first check out syslog.conf and find out where these logs are going:

# grep auth /etc/syslog.conf
# Don't log private authentication messages!
*.info;mail.none;news.none;authpriv.none;cron.none /var/log/messages
# The authpriv file has restricted access.
authpriv.* /var/log/secure

So on this system, the authpriv.* stuff is ending up in /var/log/secure. Note that the line that talks about /var/log/messages above explicitly prevents authpriv logs from ending up in the messages file-- that's what the "authpriv.none" bit is about.

To go along with Tim and Ed's example of looking for account lockout events, I set up a test system and deliberately failed logging in to activate the account lockout feature. Let's try some obvious grep searches on /var/log/secure:

# grep lock /var/log/secure
# grep hal /var/log/secure
...
Oct 26 17:19:03 deer sshd[10960]: Failed password for hal from 192.168.1.1 port 45903 ssh2
Oct 26 17:19:07 deer sshd[10960]: Failed password for hal from 192.168.1.1 port 45903 ssh2
Oct 26 17:19:11 deer sshd[10960]: Failed password for hal from 192.168.1.1 port 45903 ssh2
Oct 26 17:19:11 deer sshd[10960]: PAM 2 more authentication failures; logname= uid=0 euid=0 tty=ssh ruser= rhost=associates.deer-run.com user=hal
Oct 26 17:19:18 deer sshd[10962]: pam_tally(sshd:auth): user hal (500) tally 4, deny 3
Oct 26 17:19:20 deer sshd[10962]: Failed password for hal from 192.168.1.1 port 48941 ssh2
Oct 26 17:20:17 deer sshd[10962]: pam_tally(sshd:auth): user hal (500) tally 5, deny 3

Yes, that's right, the account lockout messages in Linux don't actually contain the string "lock" anywhere in the message, which is more than a little annoying. Unless you're searching for the messages related to a specific user as we're doing here, you need to know to search for the string "pam_tally" to find the account lockout events. pam_tally being the Linux PAM module that handles account lockout on failure. Yes, this is extremely non-obvious. Sorry about that, nobody asked for my opinion when pam_tally was being developed.

By the way, I elided out some other login messages from /var/log/secure in the example above so that it would be easier to see what the failure messages looked like. But here is the rest of the grep output so that you can see some successful authentication logs:

# grep hal /var/log/secure
Oct 26 15:23:00 deer sshd[9508]: Accepted password for hal from 192.168.1.1 port 27691 ssh2
Oct 26 15:23:00 deer sshd[9508]: pam_unix(sshd:session): session opened for user hal by (uid=0)
Oct 26 15:27:07 deer sshd[9508]: pam_unix(sshd:session): session closed for user hal
Oct 26 15:52:27 deer sshd[10667]: Accepted password for hal from 192.168.1.1 port 20043 ssh2
Oct 26 15:52:27 deer sshd[10667]: pam_unix(sshd:session): session opened for user hal by (uid=0)
Oct 26 17:17:30 deer su: pam_unix(su:session): session opened for user root by hal(uid=500)
...

As you can see, grep-ing for "pam_unix" (yet another Linux PAM module) will get you not only log in and log out events, but even su attempts. But those logs don't show you the remote IP address that the user is connecting in from-- you'll need to look for the "Accepted password" lines for that. Are we having fun yet?

You may find it easier to just use the last command:

# last
hal pts/0 192.168.1.1 Mon Oct 26 15:52 still logged in
hal pts/0 192.168.1.1 Mon Oct 26 15:23 - 15:27 (00:04)
...

Of course last only shows you successful logins. On Linux systems, lastb will show you failed logins:

# lastb
hal ssh:notty 192.168.1.1 Mon Oct 26 17:19 - 17:19 (00:00)
hal ssh:notty 192.168.1.1 Mon Oct 26 17:19 - 17:19 (00:00)
hal ssh:notty 192.168.1.1 Mon Oct 26 17:19 - 17:19 (00:00)
hal ssh:notty 192.168.1.1 Mon Oct 26 17:19 - 17:19 (00:00)
hal ssh:notty 192.168.1.1 Mon Oct 19 10:56 - 10:56 (00:00)


Now let's talk about service start-up events. Actually, let's not. It turns out that there is no consistently logged record of when particular services are restarted on the system. Oh, the daemon itself may choose to log some start-up events, but there's no global system-level logging of these kind of events.

Finding the logs created by a given daemon presents another problem. Often Unix daemons will log to LOG_DAEMON, so you can look at /etc/syslog.conf and find out where these logs end up (hint: it's /var/log/messages on a typical Linux system). But there will always be oddball daemons like Apache with their own application-specific log files (/var/log/httpd/* on Red Hat). It's a mess.

Maybe the easiest way to figure out when a daemon was started is to just look at the output of ps:

# ps -ef | grep http
root 11253 1 0 18:05 ? 00:00:00 /usr/sbin/httpd
apache 11255 11253 0 18:05 ? 00:00:00 /usr/sbin/httpd
apache 11256 11253 0 18:05 ? 00:00:00 /usr/sbin/httpd
apache 11257 11253 0 18:05 ? 00:00:00 /usr/sbin/httpd
apache 11258 11253 0 18:05 ? 00:00:00 /usr/sbin/httpd
apache 11259 11253 0 18:05 ? 00:00:00 /usr/sbin/httpd
# ps -ef | grep sshd
root 5725 1 0 Jul27 ? 00:00:00 /usr/sbin/sshd
root 11032 5725 0 17:27 ? 00:00:00 sshd: hal [priv]
hal 11034 11032 0 17:27 ? 00:00:00 sshd: hal@pts/2

As you can see, the web server was started at 18:05 today. But all we know about the ssh server is that it was started sometime on Jul27.

At this point you're probably asking yourself, "Why is this all so difficult?" Historically, Unix has always left logging up to the discretion of the application developer. There isn't a single central auditing service (or consistent formatting requirements) for applications in general. If you're working on a system that enforces kernel-level auditing (e.g. enabling BSM under Solaris) then you can pull out these sorts of events from the kernel audit logs if you know what you're doing. But unfortunately, kernel-level auditing is an optional feature and there are still plenty of sites out there that don't enable it. So unfortunately we often just have to take what the app developers decide to give us, even when it's not very much.