I'm one of those messy desktop people. There I said it, I keep a messy desktop with tons of files all over the place (partly due to the fact that when you do
# time find / -name msfconsole
I actually stopped it at around 15 minutes because I couldn't wait that long. There are many factors in the performance equation of the above command, such as the overall speed of the system, how busy the system is when you execute the command, and some even say that find is slower if its checking across different file system types ("/" would also include mounted USB drives). A quicker way to find files is to use the locate command:
$ locate msfconsole | grep -v .svn
This command reads from a database (which is generated on a regular basis) that consists of a listing of files on the system. It's MUCH faster:
$ time locate msfconsole
I'm wondering what Ed's going to do on Windows, unless he's come up with a way to get an animated ASCII search companion dog. :)
One thing I will note about the locate command is that it's going to do sub-expression matching, whereas "find ... -name ..." will do an exact match against the file name. To see the difference, check out the following two commands:
# find / -name vmware
# locate vmware
[... 5000+ addtl lines of output not shown ...]
Also, as Paul notes above, the database used by the locate command is updated regularly via cron. The program that builds the database is updatedb, and you can run this by hand if you want to index and search your current file system image, not the image from last night.
I was curious whether doing a find from the root was faster than running updatedb followed by locate. Note that before running the timing tests below, I did a "find / -name vmware" to force everything into the file cache on my machine. Then I ran:
# time find / -name vmware >/dev/null
# time updatedb
# time locate vmware >/dev/null
It's interesting to me that updatedb+locate is twice as fast as doing the find. I guess this shouldn't really be that surprising, since find is going to end up calling stat(2) on every file whereas updatedb just has to collect file names.
In Windows, the dir command is often used to search for files with a given name. There are a variety of ways to do this. One of the most obvious but less efficient ways to do this involves running dir recursively (/s) scraping through its results with the find or findstr command to look for what we want. I'll use the findstr command here, because it gives us more extensibility if we want to match on regex:
C:\> dir /b /s c:\ | findstr /i vmwareThere are a couple of things here that may not be intuitive. First off, what's with the /b? This indicates that we want the bare form of output, which will omit the extra stuff dir adds to a directory listing, including the volume name, number of files in a directory, free bytes, etc. But, when used with the /s option to recurse subdirectories, /b takes on an additional meaning. It tells dir to show full paths to files, which is what we really want to see to know the file's location. Try running the command without /b, and you'll see that it doesn't show what we want. The /b makes it show what we want: the full path to the file so we know its location. Oh, and the /i makes findstr case insensitive.
But, you know, dumping all of the directory and file names on standard out and then scraping through them with findstr is incredibly inefficient. There is a better way, more analogous to the "find / -name" feature Paul and Hal use above:
C:\> dir /b /s c:\*vmware*
This command seems to imply that it will simply look inside of the c:\ directory itself for vmware, doesn't it? But, it will actually recurse that directory looking for matching names because of the /s. And, when it finds one, it will then display its full path because of the /b. I put *vmware* here to make this look for any file that has the string vmware in its name so that its functionality matches what we had earlier. If you omit the *'s, you'll only see files and directories whose name exactly matches vmware. This approach is significantly faster than piping things through the findstr command. Also note that it is automatically case insensitive, because, well, that's the way that dir rolls.
How much faster? I'm going to use Cygwin so I can get the time command for comparison. The $ prompt you see below is from Cygwin running on my XP box:
$ time cmd.exe /c "dir /s /b C:\ | findstr /i vmware > nul"
Now, let's try the other approach:
$ time cmd.exe /c "dir /s /b C:\*vmware* > nul"
It takes about half the time doing it this more efficient way. Oh, and note how I'm using the Cygwin time command here. I use time to invoke a cmd.exe with the /c option, which will make cmd.exe run a command for me and then go away when the command is done. Cygwin's time command will then show me how long the command took. I use time to invoke a cmd.exe /c rather than directly invoking a dir so that I can rely on the dir command built-into cmd.exe instead of running the dir command included in Cygwin.
OK... so we have a more efficient way of finding files than simply scraping through standard output of dir. But, what about an analogous construct to the locate command that Hal and Paul talk about above? Well, Windows 2000 and later include the the Indexing Service, designed to make searching for files more efficient by creating an index. You can invoke this service at the command line by running:
C:\> sc start cisvc
Windows will then dutifully index your hard drive, making searches faster. What kind of searches? Well, let's see what it does for our searches using dir:
$ time cmd.exe /c "dir /s /b C:\*vmware* > nul"
Uh-oh... The Windows indexing service doesn't help the dir command, whether used this way or in combination with the find command. Sorry, but dir doesn't consult the index, and instead just looks through the complete file system directory every time. But, the indexing service does improve the performance of the Start-->Search GUI based search. You can control which directories are included in the index via a GUI tool that can be accessed by running:
C:\> ciadv.mscAlso, in that GUI, if you select System-->Query the Catalog, you get a nice GUI form for entering a query that relies on the indexing service. I haven't found a built-in cmd.exe feature for searching directories faster using the indexing service, but there is an API for writing your own tools in VBS or other languages for quering the index. Microsoft describes that API and the indexing service in more detail here.