Wget

From LaurasWiki

Table of contents

On Wget

Directory Options

Directory-Options (http://www.gnu.org/software/wget/manual/html_node/Directory-Options.html#Directory-Options)

-nH
--no-host-directories
   Disable generation of host-prefixed directories. By default, invoking Wget with -r http://fly.srk.fer.hr/ 
   will create a structure of directories beginning with fly.srk.fer.hr/. This option disables such behavior. 
-x
--force-directories
   The opposite of -nd—create a hierarchy of directories, even if one would not have been created otherwise. 
   E.g. wget -x http://fly.srk.fer.hr/robots.txt will save the downloaded file to fly.srk.fer.hr/robots.txt.

Download Options

Download-Options (http://www.gnu.org/software/wget/manual/html_node/Download-Options.html#Download-Options)

-c 
--continue
   Continue getting a partially-downloaded file. 
-w seconds
--wait=seconds
   Wait the specified number of seconds between the retrievals. Use of this option is recommended, as 
   it lightens the server load by making the requests less frequent.

HTTP Options

HTTP-Options (http://www.gnu.org/software/wget/manual/html_node/HTTP-Options.html#HTTP-Options)

-E
--html-extension
   If a file of type application/xhtml xml or text/html is downloaded and the URL does not end with 
   the regexp \.[Hh][Tt][Mm][Ll]?, this option will cause the suffix .html to be appended to the local 
   filename. This is useful, for instance, when you're mirroring a remote site that uses .asp pages, but 
   you want the mirrored pages to be viewable on your stock Apache server. Another good use for this 
   is when you're downloading CGI-generated materials. A URL like http://site.com/article.cgi?25 will 
   be saved as article.cgi?25.html.

Recursive Retrieval Options

Recursive-Retrieval-Options (http://www.gnu.org/software/wget/manual/html_node/Recursive-Retrieval-Options.html#Recursive-Retrieval-Options)

-k
--convert-links
   After the download is complete, convert the links in the document to make them suitable for local viewing. 
  This affects not only the visible hyperlinks, but any part of the document that links to external content, such 
  as embedded images, links to style sheets, hyperlinks to non-html content, etc.
-K
--backup-converted
   When converting a file, back up the original version with a .orig suffix. Affects the behavior of -N 
   (see HTTP Time-Stamping Internals). 
-l depth
--level=depth
   Specify recursion maximum depth level depth (see Recursive Download). The default maximum depth is 5.
-m
--mirror
   Turn on options suitable for mirroring. This option turns on recursion and time-stamping, sets infinite recursion 
   depth and keeps ftp directory listings. It is currently equivalent to -r -N -l inf --no-remove-listing.
-p
--page-requisites
   This option causes Wget to download all the files that are necessary to properly display a given html page. 
   This includes such things as inlined images, sounds, and referenced stylesheets.
-r
--recursive
   Turn on recursive retrieving. See Recursive Download, for more details.

Directory Based-Limits

Directory Based-Limits (http://www.gnu.org/software/wget/manual/html_node/Directory_002dBased-Limits.html#Directory_002dBased-Limits)

-np
--no-parent
no_parent = on
   The simplest, and often very useful way of limiting directories is disallowing retrieval of the links that 
   refer to the hierarchy above than the beginning directory, i.e. disallowing ascent to the parent 
  directory/directories.