|
Overview
Wget is a network utility to retrieve files from the Web using http and ftp, the two
most widely used Internet protocols . It works non-interactively, so it will work in
the background, after having logged off. The program supports recursive retrieval of
web-authoring pages as well as ftp sites. You can use wget
to make mirrors of archives and home pages or to travel the Web like a WWW robot.
Examples
The examples are classified into three sections, because of
clarity. The first section is a tutorial for beginners. The second section explains
some of the more complex program features. The third section contains advice for
mirror administrators, as well as even more complex features (that some would call
perverted).
wget http://foo.bar.com/
-
But what will happen if the connection is slow, and the file is
lengthy? The connection will probably fail before the whole file is retrieved, more
than once. In this case, Wget will try getting the file until it either gets the
whole of it, or exceeds the default number of retries (this being 20). It is easy
to change the number of tries to 45, to insure that the whole file will arrive
safely:
wget --tries=45 http://foo.bar.com/jpg/flyweb.jpg
wget -t 45 -o log http://foo.bar.com/jpg/flyweb.jpg
&
The ampersand at the end of the line makes sure that Wget works in
the background. To unlimit the number of retries, use ' -t inf '.
wget ftp://foo.bar.com/welcome.msg
ftp://foo.download.com/welcome.msg
=> 'welcome.msg'
Connecting to foo.download.com:21... connected!
Logging in as anonymous ... Logged in!
==> TYPE I ... done. ==> CWD not needed.
==> PORT ... done. ==> RETR welcome.msg ... done.
wget -q --tries=45 -r \
http://download-east.oracle.com/otndoc/oracle9i/901_doc
wget -i file
If you specify ' - ' as file name, the URLs will be read from
standard input.
wget -r -t1 http://foo.bar.com/ -o gnulog
wget -r -l1 http://www.yahoo.com/
wget -S http://www.lycos.com/
wget -r -l1 --no-parent -A.gif http://host/dir/
It is a bit of a kludge, but it works perfectly. ' -r -l1 ' means to retrieve recursively, with maximum depth of 1.
' --no-parent ' means that references to the parent
directory are ignored, and ' -A.gif ' means to
download only the GIF files. ' -A " *.gif " ' would
have worked too.
wget -nc -r http://foo.bar.com/
wget ftp://name:password@foo.bar.com/myfile
-
If you wish Wget to keep a mirror of a page (or FTP
subdirectories), use ' --mirror ', which is the shorthand for ' -r -N '. You can
put Wget in the crontab file asking it to recheck a site each Sunday:
0 0 * * 0 wget --mirror ftp://x.y.z/pub -o
/var/weeklog
wget --mirror -A.html http://www.w3.org/
You find the sources of wget with all the documentation under the
following links
http://www.gnu.org/software/wget/wget.html
http://www.lns.cornell.edu/public/COMP/info/wget/wget_toc.html
http://www.interlog.com/~tcharron/wgetwin.html
|