Making wget/cURL appear as a browser

When downloading files with wget, or mirroring websites (-m option) it is quite common to see different content and website request behaviour, this is typically because the web server and/or the site code is detecting wget for what it is – a robot, a non-PEBKAC operated client.

Many sites assume you’re up to no good or are attempting to mirror their website and choose to block your requests.

However, it’s actually pretty easy to make wget’s requests look like they’ve been generated by a browser.

  1. Disable robots.txt reading
  2. Modify the user agent string to something “legitimate”
  3. Load any default cookies you think you need, keep session cookies as they may come in handy at a later date
  4. Optionally set your referer string to something the website wants to see
To avoid having to find a valid browser string I keep one in ~/.browser.txt, and I define in .bashrc an alias to wget with some specific options useful when grabbing content from sites that don’t play nice.

if [ -f ~/.browser.txt ] ; then

USERAGENT="`cat ~/.browser.txt`"


USERAGENT="Mozilla/5.0 (X11; U; Linux i686; en-us) AppleWebKit/531.2+ (KHTML, like Gecko) Safari/531.2+ Epiphany/2.29.5"


WGET_ARGS="-e robots=off --user-agent '${USERAGENT}'"
CURL_ARGS="--user-agent '${USERAGENT}'"
WGET_COOKIE_ARGS="--load-cookies ~/.cookies.txt --save-cookies ~/.cookies.txt --keep-session-cookies"
CURL_COOKIE_ARGS="--cookie ~/.cookies.txt --cookie-jar ~/.cookies.txt"

alias wget="wget ${WGET_ARGS}"
alias curl="curl ${CURL_ARGS}"
alias wgetcookies="wget ${WGET_ARGS} ${WGET_COOKIE_ARGS}"
alias curlcookies="curl ${CURL_ARGS} ${CURL_COOKIE_ARGS}"

The referer string you should simply use once you know what it is. Both wget and curl use –referer to set it as an argument.

Author image
About colin-stubbs