Do, or do not. There is no ‘try’

May 18

Source from: http://www.linux.com/articles/113867
By David Coulson on November 25, 2004 (8:00:00 AM)

With ever-present threats from online attackers and script kiddies, administrators need a firewall on the border of any network. A Linux box can make a particularly effective and capable firewall at a fraction of the cost of a Cisco or Check Point system.

The most obvious use for a firewall is to block unwanted traffic from entering or leaving a network. Firewalls can also make specific connections from outside hosts to internal systems, such as a mail or Web server, either behind the firewall or on a trusted or “de-militarized zone” (DMZ) segment.

Almost every version of the 2.x series of the Linux kernel has a different firewall implementation, with 2.0 using ipfwadm, 2.2 using ipchains, and 2.4 implementing netfilter. 2.6 continues to use netfilter, as it is essentially a plug-in framework within the network subsystem for whichever firewall implementation we choose to use. 2.4 and 2.6 support both ipfwadm and ipchains through backports of the systems to netfilter. They also support iptables, which was specifically written for use with netfilter. iptables is now the standard firewall on Linux systems. With a wide range of plug-in modules and third-party additions, it fits almost every need.

Getting the kernel ready for iptables

Nearly all distributions come with support for iptables. For those who like their own fresh kernel, compiled from scratch, simply select all of the options under “IPv4: netfilter” to provide support for everything iptables has on offer. Once you have a kernel with netfilter and iptables support, verify that iptables is available with the following command:

# iptables -L| grep Chain
Chain INPUT (policy ACCEPT)
Chain FORWARD (policy ACCEPT)
Chain OUTPUT (policy ACCEPT)

If iptables is compiled as modules, as it is in many distributions, the kernel will automatically load the necessary modules for you when it first executes iptables and as you add rules that require a specific module.

iptables structures the filtering processes into a number of tables, which are built by the kernel at boot time. Each table has a distinct function within the network stack and allows an administrator to construct a variety of rules to perform operations against packets heading to, from, or through a firewall. A standard installation will have three distinct tables: filter, which is used to perform filtering on IP packets; nat, used to modify IP or port information to permit, for example, Internet access by a non-routable block; and mangle, which allows you to modify packets’ Type of Service (ToS) values, or to mark packets for lookup in another rule. 2.6 kernels also have a “raw” table, used to perform packet filtering outside of the connection tracking processes.

Filtering packets

The filter table is split up into three separate chains, a list of rules, filtering packets at specific parts of the routing system on the firewall. The INPUT chain is used to match packets hitting the firewall host, OUTPUT to match packets originating on the firewall, and FORWARD contains rules to match packets routed from one interface to another across the firewall.

iptables offers support for connection tracking, which was lacking in both ipchains and ipfwadm. With connection tracking, the kernel keeps a database of existing connections to allow return packets for connections to pass through the firewall. Previous Linux firewall implementations had to check for specific packet types that were common with new connections, or even open ports up for connections. iptables instead allows the state of a connection to be used in a rule, permitting new or existing connections to be handled differently by the kernel. The connection tracking processes within iptables also track multiple connections that are associated with each other, such as FTP data traffic, or ICMP packets returned from a failed connection.

Generally it’s a good idea to populate a firewall ruleset with rules to allow all loopback traffic on the firewall, and allow existing connections permitted by other rules to pass traffic across the firewall. Firewall rules are constructed using a variety of checks, to match our rule against a specific type of packet, a packet to a host or port, or even a packet to or from a specific interface. Rules are inserted into the specific chain as desired by the location in the routeing process when we want to check packets. Should a packet match a rule, the kernel will process the packet based upon the target of the chain, such as dropping the packet, or allowing it to pass through the firewall unfiltered.

The iptables command manages the kernel iptables system, through which you can add, insert, and delete rules on the firewall. As we’ve not selected a specific table, our rules will manipulate the filter table, and the appropriate chain as defined in the command. Firewall rules are very simple, and have a selection of attributes which must be matched for the packet to activate the specific target. The structure of a typical firewall rule entry would be as follows, although IP arguments can be ommitted if they are no necessary for the rule to be matched. You can find detailed information on the specific format of each argument, and the variety of targets available, within the netfilter documentation.

We build our initial configuration with the following commands:

Drop all packets on the firewall in each of the three chains:
iptables -P INPUT DROP
iptables -P OUTPUT DROP
iptables -P FORWARD DROP

Allow traffic in and out over the loopback address for local services on the firewall:
iptables -A INPUT -i lo -j ACCEPT iptables -A OUTPUT -o lo -j ACCEPT

Drop packets with unknown connection states, such as TCP acknowledgement packets not related to an existing connection:
iptables -A FORWARD -m state –state INVALID -j DROP

Allow packets which are from an existing connection, or packets which are associated with another active connection:
iptables -A FORWARD -m state –state RELATED,ESTABLISHED -j ACCEPT

At this point the firewall is somewhat limited, as it does not allow connections in or out, nor does it allow any traffic to be forwarded between interfaces. If we have eth0 as our outside interface, facing the Internet, eth1 our DMZ with our untrusted hosts on, and eth2 as our trusted network, we can continue the configuration to permit traffic between and to the specific network interfaces:

iptables -A INPUT -i eth2 -p tcp –dport 22 -j ACCEPT
iptables -A FORWARD -i eth2 -o eth1 -j ACCEPT
iptables -A FORWARD -i eth2 -o eth0 -j ACCEPT

The first rule allows anyone coming from our internal network to connect to our firewall via SSH, which runs on TCP port 22. The iptables -p switch selects the IP protocol used, which could be tcp, udp, icmp, and so forth depending upon the specific requirements of the connection. We also allow the internal network attached to eth2 to connect to our DMZ network on eth1, as well as hosts on the Internet, without restriction.

Network address translation

It is rare nowadays for a network to use only routable IP blocks. The majority of production environments include address blocks commonly known as 1918 addresses, as they are defined in RFC1918. These specific addresses, 10.x.x.x, 172.16.x.x, and 192.168.x.x, are not routed on the Internet, so must be converted to a permitted address before you attempt to make a connection from the internal network to the Internet. This process of rewriting the addresses on a packet is known as network address translation. NAT can also be used to allow hosts on the Internet to access network services that run on devices with non-routable addresses. This means you can run multiple network services on the same public IP address, with each existing on distinct hosts on the DMZ or internal networks.

You add a NAT rule using iptables in almost exactly the same way you add a filter rule, although the target is somewhat different because you must inform the kernel you want to rewrite the packet.

iptables -t nat -A POSTROUTING -o eth0 -s 10.1.0.0/16 -j SNAT -to 207.166.198.1

This rule will rewrite any packet leaving eth0 that currently has a source address of 10.1.0.0/16, which would be a block within our internal network, so it leaves eth0 with a source address of 207.166.198.1. This type of rule is known as a Source NAT rule, as it modifies the source address of the packet, which is always placed into the POSTROUTING chain within the nat table. The kernel checks the POSTROUTING chain following the routing decision by the kernel to learn which interface the packet will head out of. You can also establish a Destination NAT rule, where you rewrite a packet coming in from the outside and translate it onto an internal address. These rules are placed in the PREROUTING chain, which is checked as soon as the packet enters the firewall, allowing the kernel to perform the routing based upon the translated destination, rather than the outside address.

For instance, if you wanted to permit Web traffic from the outside to reach the internal IP address 10.2.1.2 of your Web server on the DMZ, you would specify:

iptables -t nat -A PREROUTING -i eth0 -d 207.166.198.20 -p tcp –dport 80 -j SNAT –to 10.2.1.2

This rule rewrites packets heading to TCP port 80 on 207.166.198.20, an outside IP address, to the internal IP of 10.2.1.2. As we’ve not specified a TCP port for the inside address, the destination port will not be modified, so you can run the Web service on port 80 as you normally would.

Mangle

The mangle chain is rarely used, although it is particularly powerful as the basis for routing table manipulation or traffic prioritization on the network. The most common use for mangle is to mark a packet with an integer value, which can be looked up in another rule. Alternativly, using iproute2, we allow the packet to be handled by a distinct routing table. For example, if you use the POSTROUTING chain in the nat table, you can’t match rules against the interface the packet came in on. However, if you mark the packet using the mangle table when it first enters the firewall, you can create a POSTROUTING rule that checks for the mark and rewrites the packet on the way out as appropriate:

iptables -t mangle -A PREROUTING -i eth2 -j MARK –set-mark 0×2 iptables -t nat -I POSTROUTING -o eth2 -m mark -mark 0×2 -j SNAT –to 192.168.1.4

Going further

I’ve only touched the surface of what you can accomplish using iptables and netfilter. The projects have a substantial user base, with very active mailing lists and an IRC channel on irc.freenode.net, #netfilter, where users can discuss configuration issues.

May 18
Mastering Wget
icon1 nguyen | icon2 Linux Docs | icon4 05 18th, 2008| icon3No Comments »

by Gina Trapani

http://lifehacker.com/software/top/geek-to-live%E2%80%94mastering-wget-161202.php

Your browser does a good job of fetching web documents and displaying them, but there are times when you need an extra strength download manager to get those tougher HTTP jobs done.

A versatile, old school Unix program called Wget is a highly hackable, handy little tool that can take care of all your downloading needs. Whether you want to mirror an entire web site, automatically download music or movies from a set of favorite weblogs, or transfer huge files painlessly on a slow or intermittent network connection, Wget’s for you.

Wget, the “non-interactive network retriever,” is called at the command line. The format of a Wget command is:
wget [option]… [URL]…

The URL is the address of the file(s) you want Wget to download. The magic in this little tool is the long menu of options available that make some really neat downloading tasks possible. Here are some examples of what you can do with Wget and a few dashes and letters in the [option] part of the command.
Mirror an entire web site

Say you want to backup your blog or create a local copy of an entire directory of a web site for archiving or reading later. The command:
wget -m http://ginatrapani.googlepages.com

Will save the two pages that exist on the ginatrapani.googlepages.com site in a folder named just that on your computer. The -m in the command stands for “mirror this site.”

Say you want to retrieve all the pages in a site PLUS the pages that site links to. You’d go with:
wget -H -r –level=1 -k -p http://ginatrapani.googlepages.com

This command says, “Download all the pages (-r, recursive) on http://ginatrapani.googlepages.com plus one level (—level=1) into any other sites it links to (-H, span hosts), and convert the links in the downloaded version to point to the other sites’ downloaded version (-k). Oh yeah, and get all the components like images that make up each page (-p).”

Warning: Beware, those with small hard drives! This type of command will download a LOT of data from sites that link out a lot (like blogs)! Don’t try to backup the Internet, because you’ll run out of disk space!
Resume large file downloads on a flaky connection

Say you’re piggybacking the neighbor’s wifi and every time someone microwaves popcorn you lose the connection, and your video download (naughty you!) keeps crapping out halfway through. Direct Wget to resume partial downloads for big files on intermittent connections.

To set Wget to resume an interrupted download of this 16MB “Mavericks Surf Highlights 2006: Wipeouts” short from Google Video, use:
wget -c –output-document=mavericks.avi “http://vp.video.google.com/videodownload?version=0&secureurl=qgAAAJCWpcRd5eI2k3sm3LWJZMjLyLFiTxk_KqUrRYbrzLTEw8hwMV30m3MRz6rYMTxGqWIfWMQjNJsP0fNXUMc34jzoPcy6z-qHde5UVD29Po6_9b_-d3J5AQpVROUPRqzkJriangEl2IMkKBJd08Q7TTJIAC_r6XID-fNYPLKHm1KRvx0smOslivNLGmyZsCsZmVNVN0jaw5-dloWtzPlI86zIubh1XvJsTg2u_YaHcaAB&sigh=-BbV2h_bIFVuVg4D-h6MUTxuErM&begin=0&len=139433&docid=6059494448346363884″

(Apologies for the humungous, non-wrapping URL.)

The -c (“continue”) option sets Wget to resume a partial download if the transfer is interrupted. You’ll also notice the URL is in quotes, necessary for any address with &’s in it. Also, since that URL is so long, you can specify the name of the output file explicitly – in this case, mavericks.avi.
Schedule hourly downloads of a file

The nice thing about any command line script is that it’s very easy to automate. For instance, if there was a constantly-changing file that you wanted to download every hour, say, you could use cron or Windows Task Scheduler and Wget to do just that, or if there was a very large file you wanted your computer to fetch in the middle of the night while you slept instead of right this moment when you need all your bandwidth to get other work done. You could easily schedule the Wget command to run at a later time.

As proof of concept, yesterday I scheduled an hourly download of Lifehacker’s daily traffic chart to run automatically. The command looked like this:
wget –output-document=traffic_$(date +\%Y\%m\%d\%H).gif “http://sm3.sitemeter.com/rpc/v6/server.asp?a=GetChart&n=9&p1=sm3lifehacker&p2=&p3=3&p4=0&p5=64\%2E249\%2E116\%2E138&p6=HTML&p7=1&p8=\%2E\%3Fa\%3Dstatistics&p9=&rnd=7209″

Notice the use of %Y and %m datetime parameters which result in unique filenames, so each hour the command wouldn’t overwrite the file with the same name generated the hour before. Note also that the %’s have to be escaped with a backslash.

Just for fun I threw together a little animated gif of the hourly chart image, that displays the movement of Lifehacker’s traffic yesterday from 2PM to midnight:
animated-traffic-chart.gif
Automatically download music

This last technique, suggested by Jeff Veen, is by far my favorite use of Wget. These days there are tons of directories, aggregators, filters and weblogs that point off to interesting types of media. Using Wget, you can create a text file list of your favorite sites that say, link to MP3 files, and schedule it to automatically download any newly-added MP3’s from those sites each day or week.

First, create a text file called mp3_sites.txt, and list URLs of your favorite sources of music online one per line (like http://del.icio.us/tag/system:filetype:mp3 or stereogum.com). Be sure to check out my previous feature on how to find free music on the web for more ideas.

Then use the following Wget command to go out and fetch those MP3’s:
wget -r -l1 -H -t1 -nd -N -np -A.mp3 -erobots=off -i mp3_sites.txt

That Wget recipe recursively downloads only MP3 files linked from the sites listed in mp3_sites.txt that are newer than any you’ve already downloaded. There are a few other specifications in there – like to not create a new directory for every music file, to ignore robots.txt and to not crawl up to the parent directory of a link. Jeff breaks it all down in his original post.

The great thing about this technique is that once this command is scheduled, you get an ever-rotating jukebox of new music Wget fetches for you while you sleep. With a good set of trusted sources, you’ll never have to go looking for new music again – Wget will do all the work for you.