Lately we've had some of our loyal readers-- mostly the Windows folk-- asking about command-line tools for accessing web pages. When these questions come up, I just smile serenely, because it's easy to do this in Unix. Ed and Tim on the other hand turn a little green and start sweating.
Among the Unix tribes, there seem to be two popular command-line tools for grabbing web pages. Some folks prefer curl, but I generally stick with wget since it's the tool I learned first. Both curl and wget support HTTP, HTTPS, and FTP as well as proxies of various types along with different forms of authentication. curl also supports other protocols like TFTP, FTPS (FTP over SSL), and SCP and SFTP.
Using wget couldn't be simpler:
$ wget blog.commandlinekungfu.com
--2009-11-18 18:47:51-- http://blog.commandlinekungfu.com/
Resolving blog.commandlinekungfu.com... 184.108.40.206
Connecting to blog.commandlinekungfu.com|220.127.116.11|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: `index.html'
[ <=> ] 173,805 18.3K/s in 9.3s
2009-11-18 18:48:01 (18.3 KB/s) - `index.html' saved 
Or, if you don't like all the noisy chatter, you can use the "-q" option:
$ wget -q blog.commandlinekungfu.com
Notice that wget doesn't overwrite the first index.html file we downloaded. Instead it appends a uniqe number to the file name. If we downloaded another index.html file, it would end up as index.html.2 and so on. You can also use "-O" to specify an output file name.
Maybe my favorite feature, however, is the "-r" (recursive) option that allows me to grab an entire tree of content in a single operation. So all I have to do when I want to back up the content here at Command Line Kung Fu is just run "wget -r -q blog.commandlinekungfu.com" and wait a few minutes for everything to download.
There are lots of other neat features to wget, but I think I've already demoralized my Windows brethren enough for one Episode. Let's see what Ed and Tim cook up.
Tim orders take-out:
Unfortunately (again), there isn't a built-in cmdlet to do the equivalent of wget or curl, but we can access the .NET libraries and recreate some of the functionality of the Linux commands. By default, the .NET Framework supports URIs that begin with http:, https:, ftp:, and file: scheme identifiers, so it isn't quite as full featured as Linux, but it is all we have.
PS C:\> (New-Object System.Net.WebClient).DownloadString(
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
This will grab files in text format and it can be used further down the pipeline, such as saving the file by piping it in to Out-File. What if we want to grab non-text files or just download the file? We can use the DownloadFile method and specify where we want to save the file.
PS C:\> (New-Object System.Net.WebClient).DownloadFile(
What happens if the file doesn't exist? It raises an 404 error and the file (obviously) isn't opened.
PS C:\> (New-Object System.Net.WebClient).DownloadString(
Exception calling "DownloadString" with "1" argument(s): "The remote
server returned an error: (404) Not Found."
At line:1 char:49
+ (New-Object System.Net.WebClient).DownloadString( <<<<
Sadly, there isn't way to perform recursive requests without doing some heavy duty scripting.
Ed Sits Here Trying Not To Think About What Hal Called His Part of this Article:
Ahhh... an age-old question. I remember about a year ago, I was having a discussion with Kevin Johnson, bon vivant and web attacker extraordinaire, when this issue came up. I posed the following question to KJ0 (as we call him): "Suppose you've just hacked a Windows box in a pen test. You have command shell access of said machine. Tell me what you want to do now, using only built in command-line capabilities." I went further, boldly mixing my metaphors: "Consider me as /dev/WindowsCommandLine... where do you want to go today?"
Kevin shrugged, smiled, and said, "Easy... wget."
I felt a piercing pain stab at my heart. "The wget tool has sooo many options... can we scale it back a bit? What do you really want to do?" Kevin said, "Sure... I just wanna fetch a web page and write it to the file system. How can I do that at the command line in Windows?"
I spent a little time fitzing around with various Windows commands, and ultimately settled on using the built-in Windows telnet client to formulate HTTP requests manually. It's crude, but works.
"But, Ed," you argue, "Microsoft removed the telnet client from Windows Vista and some of the 6 million versions of Windows 7." I respond: Yes, it's no longer installed by default, but on the professional versions of Vista and 7, you can run:
C:\> start /w pkgmgr /iu:"TelnetClient"
Even if it doesn't finish, kill that cmd.exe, and you should have the Telnet Client ready to run. Note that the installation package for the telnet client is built-in, but the tool itself just isn't installed. The machine doesn't have to reach out to the Internet to get it. Also, note that /iu stands for install update, you need that :, and you should observe the case of the Cap-T and Cap-C in TelnetClient. Oh, and by the way... if you want the Telnet service, change "TelnetClient" to "TelnetServer". To remove either after you've installed, it, run the same command, except substitute /uu: (uninstall update) for /iu:.
So, now equipped with the telnet client, let's make an HTTP request.
Now, you might think that we should simply echo an HTTP request and pipe it into the telnet client, right? Bzzzzzt. Sorry, my friend, but the Windows telnet client doesn't like anything to come into its Standard Input at the command line, and it certainly doesn't send it to the target machine. Try sniffing it sometime. Another annoying part about the Windows telnet client is that it doesn't like you to touch its Standard Output. The telnet client is somehow clairvoyant, and if you follow it with a > or a |, it knows and refuses to send any packets. Gee, thanks, Microsoft.
For this one, we're going to have to go interactive.
C:\> telnet -f log.txt
I'm logging output to the file log.txt, where our output will reside.
At our new telnet prompt, we can open a connection to port 80 on our destination web server. We'll use the "o" option to open a connection:
Microsoft Telnet> o www.google.com 80
Connecting To www.google.com...
Now, depending on the version of Windows you are using, it may or may not look like a connection is made. Either way, just trust that it has, and start typing your HTTP request. Typically, you'll be getting a page, which you can do by entering:
GET [pagename] HTTP/1.1
Hit Enter Enter at the end, and you'll have the page displayed on your screen. Hit Enter again, and you'll get back to your telnet prompt. Type "quit" to exit.
More importantly, you should now have your page in the output file log.txt. Yes, you'll have the HTTP response header in front of it, but that's not so bad. It's ugly, but it'll let you grab a file quickly from a server.