Friday, May 22, 2009

Episode #39: Replacing Strings in Multiple Files

Hal Starts Off:

Wow, our last several Episodes have been really long! So I thought I'd give everybody a break and just show you a cool little sed idiom that I use all the time:

# sed -i.bak 's/foo/bar/g' *

Here we're telling sed to replace the all instances string "foo" with the string "bar" in all files in the current directory. The useful trick is the "-i.bak" option which causes sed to make an automatic backup copy of each file as <filename>.bak before doing the global search and replace.

By the way, you can even do this across an entire directory structure, with a little help from the find and xargs commands:

# find . -type f | xargs sed -i.bak 's/foo/bar/g'

Of course, you could use other search criteria than just "-type f" if you wanted to be more selective about which files you ran sed against.

Oh dear, I hope this isn't one of those "easy for Unix, hard for Windows" things again. Ed gets so grumpy when I do that.

Ed jumps in:
You nailed it, Hal, with that characterization. Unfortunately, cmd.exe doesn't include the ability to do find and replace of strings within lines of a file using a built-in command. We can search for strings using the find command, and even process regex with findstr. But, the replacement part just doesn't exist there.

Thus, most reasonable people will either rely on a separately installed tool to do this, or use Powershell.

For a separately installed tool, my first approach would be use Cygwin, the free Linux-like environment for Windows, and then just run the sed command Hal uses above. Nice, easy, and sensical.

Alternatively, you could download and install a tool called replace.exe.

Or, there's another one called Find And Replace Text, which, as you might guess, is called FART for short.

To do this in Powershell efficiently, I asked Tim Medin, our go-to guy for Powershell, to comment.

Tim (our Powershell Go-To Guy) says:
This morning when Ed asked me to do a "quick" write up for Powershell, I thought to myself, "This won't be too bad..." I was wrong.

By default there are aliases for many of the command in Powershell, so I'll show both the long and short version of the commands (yes, even the short command is long relative to sed).

The Long Way
PS C:\> Get-ChildItem -exclude *.bak | Where-Object {$_.Attributes -ne "Directory"} |
ForEach-Object { Copy-Item $_ "$($_).bak"; (Get-Content $_) -replace
"foo","bar" | Set-Content -path $_ }

The Short Way (using built in aliases)
PS C:\> gci -ex *.bak | ? {$_.Attributes -ne "Directory"} | % { cp $_ "$($_).bak";
(gc $_) -replace "foo","bar" | sc -path $_ }

This command is rather long, so let's go through it piece by piece.
gci -ex *.bak | ? {$_.Attributes -ne "Directory"}

The first portion gets all files that don't end in .bak. Without this exclusion, it will process file1.txt and the new file1.txt.bak. Processing file1.txt.bak results in file1.txt.bak.bak, but it doesn't do this endlessly, just twice.

The Where-Object (with an alias of ?) ensures that we only work with files and not directories because Get-Content on a directory throws an error.

ForEach-Object { Copy-Item $_ "$($_).bak"; (Get-Content $_) -replace "foo","bar" |
Set-Content -path $_ }
Once we get the files, not directories, we want, we then act on each file with the ForEach-Object (alias %). For those of you haven't yet fallen asleep, I'll further break down the inner portion of the ForEach-Object:

Copy-Item $_ "$($_).bak"
First, we copy the file to our backup .bak file. We have to use the $() in order to use our variable in a string so we can append .bak.

Finally, we get to the search and replace (and it's about time, too!).
(Get-Content $_) -replace "foo","bar" | Set-Content -path $_

Get-Content (gc) gets the contents of the file. We wrap it in parentheses so we can act on its output in order to do our replace. The output is then piped to Set-Content (sc) and written back to our file.

We could make this work a little better if we used variables, but then we are more in script-land instead of shell-land which probably violates the almighty laws of this blog. The use of variables turn this more into a scripting exercise instead of shell (OK, we may already be there). For kicks, I'll show you how we can use variables show you so you can add it to your big bloated belt of windows-fu.
$a = (gci | ? {$_.Attributes -ne "Directory"}); $a | % { cp $_ "$($_).bak";
(gc $_) -replace "foo","bar" | sc -path $_ }

The difference between our original command and this command is that the $a variable grabs a snapshot of the directory before we copy files, so we won't operate on the new .bak files.

After all this work we have done the same thing as the mighty sed. Sadly even the power of Powershell is no match for efficiency of sed.

Ed closes it out:
Thanks for that, Tim. Nice stuff!