subhrendu's Blog: Book of Linux Sys Admin

1 Introduction

1.1 Duties of Administrator

1. Installing and Configuring Servers

2. Installing and Configuring Application Software

3. Creating and Maintaining User Accounts

4. Backing Up and Restoring Files

5. Monitoring and Tuning Performance

6. Configuring a Secure System

7. Using Tools to Monitor Security

2.1 Processes

A process is basically an executing program. All the work performed by a UNIX system is carried out by processes. The UNIX operating system stores a great deal of information about processes and provides a number of mechanisms by which you can manipulate both the files and the information about them.

All the long-term information stored on a UNIX system, like most computers today, is stored in files, which are organized into a hierarchal directory structure. Each file on a UNIX system has a number of attributes that serve different purposes. As with processes there are a collection of commands, which allow users and Systems Administrators to modify these attributes.

Among the most important attributes of files and processes are those associated with user identification and access control. Since UNIX is a multiuser operating system it must provide mechanisms, which restrict what and where users (and their processes) can go. An understanding of how this is achieved is essential for a Systems Administrator.

1.2.1 Multiple users

UNIX is a multi-user operating system. This means that at any one time there are multiple people all sharing the computer and its resources. The operating system must have some way of identifying the users and protecting one user's resources from the other users.

1.2.2 Identifying users

Before you can use a UNIX computer you must first log in. The login process requires that you have a username and a password. By entering your username you identify yourself to the operating system.

1.2.3 Users and groups

In addition to a unique username UNIX also places every user into at least one group. Groups are used to provide or restrict access to a collection of users and are specified by the /etc/group file.

To find out what groups you are a member of use the groups command. It is possible to be a member of more than one group.

1.2.4 Names and numbers

As you've seen each user and group has a unique name. However the operating system does not use these names internally. The names are used for the benefit of the human users.

For its own purposes the operating system actually uses numbers to represent each user and group (numbers are more efficient to store). This is achieved by each username having an equivalent user identifier (UID) and every group name having an equivalent group identifier (GID).

The association between username and UID is stored in the /etc/passwd file. The association between group name and GID is stored in the /etc/group file.

To find out the your UID and initial GID try the following command

$grep username /etc/passwd

Where username is your username. This command will display your entry in the /etc/passwd file. The third field is your UID and the fourth is your initial GID.

$id

The id command can be used to discover username, UID, group name and GID of any user.

1.2.5 Commands and processes

Whenever you run a program, whether it is by typing in at the command line or running it from X-Windows, a process is created. It is the process, a program in execution and a collection of executable code, data and operating system data structures, which perform the work of the program.

The UNIX command line that you use to enter commands is actually another program/command called the shell. The shell is responsible for asking you for a command and then attempting to execute the command.

12.5.1 Where are the commands?

For you to execute a command, for example ls, that command must be in one of the directories in your search path. The search path is a list of directories maintained by the shell.

When you ask the shell to execute a command it will look in each of the directories in your search path for a file with the same name as the command. When it finds the executable program it will run it. If it doesn't find the executable program it will report command_name: not found.

Linux and most UNIX operating systems supply a command called which. The purpose of this command is to search through your search path for a particular command and tell you where it is.

For example, the command which ls on my machine aldur returns /usr/bin/ls. This means that the program for ls is in the directory /usr/bin.

1.2.6 Controlling Processes

For every process that is created the UNIX operating system stores information including

o its real UID, GID and its effective UID and GID

o the code and variables used by the process (its address map)

o the status of the process

o its priority

o its parent process

1.2.7 Process UID and GID

In order for the operating system to know what a process is allowed to do it must store information about who owns the process (UID and GID). The UNIX operating system stores two types of UID and two types of GID.

12.7.1 Real UID and GID

A process' real UID and GID will be the same as the UID and GID of the user who ran the process. Therefore any process you execute will have your UID and GID. The real UID and GID are used for accounting purposes.

12.7.2 Effective UID and GID

The effective UID and GID are used to determine what operations a process can perform. In most cases the effective UID and GID will be the same as the real UID and GID.

However using special file permissions it is possible to change the effective UID and GID.

3.1 Files

All the information stored by UNIX onto disk is stored in files. Under UNIX even directories are just special types of files. A previous reading has already introduced you to the basic UNIX directory hierarchy. The purpose of this section is to fill in some of the detail.

1.3.1 File types

UNIX supports a small number of different file types. The following table summarises these different file types. What the different file types are and what their purpose is will be explained as we progress. File types are signified by a single character.

File type	Meaning
-	a normal file
d	a directory
l	symbolic link
b	block device file
c	character device file
p	a fifo or named pipe

For current purposes you can think of these file types as falling into three categories “normal” files, directories or directory files,
Remember, for UNIX a directory is just another file which happens to contain the names of files and their I-node. An I-node is an operating system data structure which is used to store information about the file (explained later). special or device files.
Explained in more detail later on in the text these special files provide access to devices, which are connected to the computer. Why these exist and what they are used for will be explained.

Quite obviously it is possible to have different types of normal files based on the data they contain. You can have text files, executable files, sound files and images. If you’re unsure what type of normal file you have the UNIX file command might help.

The file command looks for a magic number inside a data file. If the file contains a certain magic number then it must be a certain type of file. The magic numbers and the corresponding file description is contained in a text data file. On RedHat system you should find this information in the file /usr/lib/magic.

1.3.2 File attributes

UNIX stores a variety of information about each file including

o where the file's data is stored on the disk

o what the file's name is

o who owns the file

o who is allowed to do what with the file

o how big the file is

o when was the file last modified

o how many links there are to the file

UNIX uses a data structure called an inode to store all of this information (except for the filename). Every file on a UNIX system must have an associated inode. You can find out which inode a file has by using the ls -i command.

13.2.1 Viewing file attributes

To examine the various attributes associated with a file you can use the -l switch of the ls command.

1.3.3 File protection

Given that there can be many people sharing a UNIX computer it is important that the operating system provide some method of restricting access to files. I don't want you to be able to look at my personal files.

UNIX achieves this by restricting users to three valid operations,
Under UNIX there are only three things you can do to a file (or directory): read, write or execute it. Allow the file owner to specify who can do these operations on a file. The file owner can use the user and group concepts of UNIX to restrict which users (actually it restricts which processes that are owned by particular users) can perform these tasks.

1.3.4 File operations

UNIX provides three basic operations that can be performed on a file or a directory. The following table summarises those operations.

It is important to recognise that the operations are slightly different depending whether they are being applied to a file or a directory.

Operation	Effect on a file	Effect on a directory
read	read the contents of the file	find out what files are in the directory, e.g. ls
write	delete the file or add something to the file	be able to create or remove a file from the directory
execute	be able to run a file/program	be able to access a file within a directory

1.3.5 Users, groups and others

Processes wishing to access a file on a UNIX computer are placed into one of three categories

user

The individual user who owns the file (by default the user that created the file but this can be changed). In figure 5.1 the owner is the user david.

group

The collection of people that belong to the group that owns the file (by default the group to which the file's creator belongs). In figure 5.1 the group is staff.

other

Anybody that doesn't fall into the first two categories.

1.3.6 File permissions

Each user category (user, group and other) have their own set of file permissions. These control what file operation each particular user category can perform.

File permissions are the first field of file attributes to appear in the output of ls -l. File permissions actually consist of four fields file type, user permissions, group permissions, and other permissions.

Three sets of file permissions

As the diagram shows the file permissions for a file are divided into three different sets one for the user, one for a group which owns the file and one for everyone else.

A letter indicates that the particular category of user has permission to perform that operation on the file. A - indicates that they can't.

In the above diagram the owner can read, write and execute the file (rwx). The group can read and write the file (rw-), while other cannot do anything with the file (---).

Symbolic and numeric permissions

rwxr-x-w- is referred to as symbolic permissions. The permissions are represented using a variety of symbols.

There is another method for representing file permissions called numeric or absolute permissions where the file permissions are represented using numbers.

Symbols

The following table summarises the symbols that can be used in representing file permissions using the symbolic method.

Symbol	Purpose
r	read
w	write
x	execute
s	setuid or setgid (depending on location)
t	sticky bit

1.3.7 Special permissions

13.7.1 Sticky bit on a file

In the past having the sticky bit set on a file meant that when the file was executed the code for the program would "stick" in RAM. Normally once a program has finished its code was taken out of RAM and that area used for something else.

The sticky bit was used on programs that were executed regularly. If the code for a program is already in RAM the program will start much quicker because the code doesn't have to be loaded from disk.

However today with the advent of shared libraries and cheap RAM most modern Unices ignore the sticky bit when it is set on a file.

13.7.2 Sticky bit on a directory

The /tmp directory on UNIX is used by a number of programs to store temporary files regardless of the user. For example when you use elm (a UNIX mail program) to send a mail message, while you are editing the message it will be stored as a file in the /tmp directory.

Modern UNIX operating systems (including Linux) use the sticky bit on a directory to make /tmp directories more secure. Try the command ls -ld /tmp what do you notice about the file permissions of /tmp.

If the sticky bit is set on a directory you can only delete or rename a file in that directory if you are the owner of the directory, the owner of the file, or

the super user

1.3.8 Changing passwords

When you use the passwd command to change your password the command will actually change the contents of either the /etc/passwd or /etc/shadow files. These are the files where your password is stored. By default most Linux systems use /etc/passwd

As has been mentioned previously the UNIX operating system uses the effective UID and GID of a process to decide whether or not that process can modify a file. Also the effective UID and GID are normally the UID and GID of the user who executes the process.

This means that if I use the passwd command to modify the contents of the /etc/passwd file (I write to the file) then I must have write permission on the /etc/passwd file. Let's find out.

What are the file permissions on the /etc/passwd file?

$ ls -l /etc/passwd

-rw-r--r-- 1 root root 697 Feb 1 21:21 /etc/passwd

On the basis of these permissions should I be able to write to the /etc/passwd file? No. Only the user who owns the file, root, has write permission. Then how do does the passwd command change my password?

setuid and setgid

This is where the setuid and setgid file permissions enter the picture. Let's have a look at the permissions for the passwd command (first we find out where it is).

$ which passwd

/usr/bin/passwd

$ ls -l /usr/bin/passwd

-rws--x--x 1 root bin 7192 Oct 16 06:10 /usr/bin/passwd

Notice the s symbol in the file permissions of the passwd command, this specifies that this command is setuid.

The setuid and setgid permissions are used to change the effective UID and GID of a process. When I execute the passwd command a new process is created. The real UID and GID of this process will match my UID and GID. However the effective UID and GID (the values used to check file permissions) will be set to that of the command.

In the case of the passwd command the effective UID will be that of root because the setuid permission is set, while the effective GID will be my group's because the setgid bit is not set.

1.3.9 Numeric permissions

Up until now we have been using symbols like r w x s t to represent file permissions. However the operating system itself doesn't use symbols, instead it uses numbers. When you use symbolic permissions, the commands translate between the symbolic permission and the numeric permission.

With numeric or absolute permissions the file permissions are represented using octal (base 8) numbers rather than symbols. The following table summarises the relationship between the symbols used in symbolic permissions and the numbers used in numeric permissions.

To obtain the numeric permissions for a file you add the numbers for all the permissions that are allowed together.

Symbol	Number
s	4000 setuid 2000 setgid
t	1000
r	400 user 40 group 4 other
w	200 user 20 group 2 other
x	100 user 10 group 1 other

1.3.10 Changing file permissions

The UNIX operating system provides a number of commands for users to change the permissions associated with a file. The following table provides a summary.

Command	Purpose
chmod	change the file permissions for a file
	set the default file permissions for any files to be created. Usually run as the user logs in.
	change the group owner of a file
	change the user owner of a file.

4.1 Process Control and Multitasking

The UNIX kernel can keep track of many processes at once, dividing its time between the jobs submitted to it. Each process submitted to the kernel is given a unique process ID.

Single-tasking operating systems, like DOS, or the Macintosh System, can only perform one job at a time. A user of a single-tasking system can switch to different windows, running different applications, but only the application that is currently being used is active. Any other task that has been started is suspended until the user switches back to it. A suspended job receives no operating system resources, and stays just as it was when it was suspended. When a suspended job is reactivated, it begins right where it left off, as if nothing had happened.

The UNIX operating system will simultaneously perform multiple tasks for a single user. Activating an application does not have to cause other applications to be suspended.

Actually, it only appears that UNIX is performing the jobs simultaneously. In reality, it is running only one job at a time, but quickly switching between all of its ongoing tasks. The UNIX kernel will execute some instructions from job A, and then set job A aside, and execute instructions from job B. The concept of switching between queued jobs is called process scheduling.

1.4.1 Viewing processes

UNIX provides a utility called ps (process status) for viewing the status of all the unfinished jobs that have been submitted to the kernel. The ps command has a number of options to control which processes are displayed, and how the output is formatted.

ps

to see the status of the "interesting" jobs that belong to you. The output of the ps command, without any options specified, will include the process ID, the terminal from which the process was started, the amount of time the process has been running, and the name of the command that started the process.

ps -ef

to see a complete listing of all the processes currently scheduled. The -e option causes ps to include all processes (including ones that do not belong to you), and the -f option causes ps to give a long listing. The long listing includes the process owner, the process ID, the ID of the parent process, processor utilization, the time of submission, the process's terminal, the total time for the process, and the command that started the process.

ps -ef grep yourusername

where "yourusername" is replaced by your user name, will cause the output of the ps -ef command to be filtered for those entries that contain your username.

1.4.2 Killing processes

Occasionally, you will find a need to terminate a process. The UNIX shell provides a utility called kill to terminate processes. You may only terminate processes that you own (i.e., processes that you started). The syntax for the kill command is kill [-options] process-ID.

To kill a process, you must first find its process ID number using the ps command. Some processes refuse to die easily, and you can use the "-9" option to force termination of the job.

To force termination of a job whose process ID is 111, enter the command

kill -9 111

5.1 Boot Process, Init, and Shutdown

An important and powerful aspect of Red Hat Linux is the open, user-configurable method it uses for starting the operating system. Users are free to configure many aspects of the boot process, including specifying the programs launched at boot-time. Similarly, system shutdown gracefully terminates processes in an organized and configurable way, although customization of this process is rarely required. Understanding how the boot and shutdown processes work not only allows customization of Red Hat Linux, but also makes it easier to troubleshoot problems related to starting or shutting down the system.

6.1 The Boot Process

Below are the basic stages of the boot process for an x86 system:

1. The system BIOS checks the system and launches the first stage boot loader on the MBR of the primary hard disk.

2. The first stage boot loader loads itself into memory and launches the second stage boot loader from the /boot/ partition.

3. The second stage boot loader loads the kernel into memory, which in turn loads any necessary modules and mounts the root partition read-only.

4. The kernel transfers control of the boot process to the /sbin/init program.

5. The /sbin/init program loads all services and user-space tools, and mounts all partitions listed in /etc/fstab.

6. The user is presented with a login prompt for the freshly booted Linux system.

Because configuration of the boot process is more common than the customization of the shutdown process, the remainder of this chapter discusses in detail how the boot process works and how it can be customized to suite specific needs.

7.1 A Detailed Look at the Boot Process

The beginning of the boot process varies depending on the hardware platform being used. However, once the kernel is found and loaded by the boot loader, the default boot process is identical across all architectures. This chapter focuses on the x86 architecture.

1.7.1 The BIOS

When an x86 computer is booted, the processor looks at the end of system memory for the Basic Input/Output System or BIOS program and runs it. The BIOS controls not only the first step of the boot process, but also provides the lowest level interface to peripheral devices. For this reason it is written into read-only, permanent memory and is always available for use.

Other platforms use different programs to perform low-level tasks roughly equivalent to those of the BIOS on an x86 system. For instance, Itanium-based computers use the Extensible Firmware Interface (EFI) Shell, while Alpha systems use the SRM console.

Once loaded, the BIOS tests the system, looks for and checks peripherals, and then locates a valid device with which to boot the system. Usually, it checks any diskette drives and CD-ROM drives present for bootable media, then, failing that, looks to the system's hard drives. In most cases, the order of the drives searched while booting is controlled with a setting in BIOS, and it looks on the master IDE device on the primary IDE bus. The BIOS then loads into memory whatever program is residing in the first sector of this device, called the Master Boot Record or MBR. The MBR is only 512 bytes in size and contains machine code instructions for booting the machine, called a boot loader, along with the partition table. Once the BIOS finds and loads the boot loader program into memory, it yields control of the boot process

1.7.2 The Boot Loader

This section looks at the boot loaders for the x86 platform. Depending on the system's architecture, the boot process may differ slightly.

Under Red Hat Linux two boot loaders are available: GRUB or LILO. GRUB is the default boot loader, but LILO is available for those who require or prefer it. Both boot loaders for the x86 platform are broken into at least two stages. The first stage is a small machine code binary on the MBR. Its sole job is to locate the second stage boot loader and load the first part of it into memory.

GRUB is the newer boot loader and has the advantage of being able read ext2 and ext3 partitions and load its configuration file — /boot/grub/grub.conf — at boot time. With LILO, the second stage boot loader uses information on the MBR to determine the boot options available to the user. This means that any time a configuration change is made or kernel is manually upgraded, the /sbin/lilo -v -v command must be executed to write the appropriate information to the MBR.

Once the second stage boot loader is in memory, it presents the user with the Red Hat Linux initial, graphical screen showing the different operating systems or kernels it has been configured to boot. On this screen a user can use the arrow keys to choose which operating system or kernel they wish to boot and press [Enter]. If no key is pressed, the boot loader will load the default selection after a configurable period of time has passed.

If Symmetric Multi-Processor (SMP) kernel support is installed, there will be more than one option present the first time the system is booted. In this situation, LILO will display linux, which is the SMP kernel, and linux-up, which is for single processors. GRUB displays Red Hat Linux (<kernel-version>-smp), which is the SMP kernel, and Red Hat Linux (<kernel-version>), which is for single processors. If any problems occur using the SMP kernel, try selecting the non-SMP kernel upon rebooting.

Once the second stage boot loader has determined which kernel to boot, it locates the corresponding kernel binary in the /boot/ directory. The kernel binary is named using the following format — /boot/vmlinuz-<kernel-version> file (where <kernel-version> corresponds to the kernel version specified in the boot loader's settings).

The boot loader then places the appropriate initial RAM disk image, called an initrd, into memory. The initrd is used by the kernel to load drivers necessary to boot the system. This is particularly important if SCSI hard drives are present or if the systems uses the ext3 file system.

Do not remove the /initrd/ directory from the file system for any reason. Removing this directory will cause the system to fail with a kernel panic error message at boot time.

Once the kernel and the initrd image are loaded into memory, the boot loader hands control of the boot process to the kernel.

17.2.1 Boot Loaders for Other Architectures

Once the Red Hat Linux kernel loads and hands off the boot process to the init command, the same sequence of events occurs on every architecture. So the main difference between each architecture's boot process is in the application used to find and load the kernel.

For example, the Alpha architecture uses the aboot boot loader, while the Itanium architecture uses the ELILO boot loader.

1.7.3 The Kernel

When the kernel is loaded, it immediately initializes and configures the computer's memory and configures the various hardware attached to the system, including all processors, I/O subsystems, and storage devices. It then looks for the compressed initrd image in a predetermined location in memory, decompresses it, mounts it, and loads all necessary drivers. Next, it initializes virtual devices related to the file system, such as LVM or software RAID before unmounting the initrd disk image and freeing up all the memory the disk image once occupied.

The kernel then creates a root device, mounts the root partition read-only, and frees any unused memory.

At this point, the kernel is loaded into memory and operational. However, since there are no user applications that allow meaningful input to the system, not much can be done with it. In order to set up the user environment, the kernel executes the /sbin/init program.

1.7.4 The `/sbin/init` Program

The /sbin/init program (also called init) coordinates the rest of the boot process and configures the environment for the user. When the init command starts, it becomes the parent or grandparent of all of the processes that start up automatically on a Red Hat Linux system. First, it runs the /etc/rc.d/rc.sysinit script, which sets the environment path, starts swap, checks the file systems, and takes care of everything the system needs to have done at system initialization. For example, most systems use a clock, so on them rc.sysinit reads the /etc/sysconfig/clock configuration file to initialize the hardware clock. Another example is if there are special serial port processes which must be initialized, rc.sysinit will execute the /etc/rc.serial file.

The init command then runs the /etc/inittab script, which describes how the system should be set up in each SysV init runlevel. Among other things, the /etc/inittab sets the default runlevel and dictates that /sbin/update should be run whenever it starts a given runlevel.

Next, the init command sets the source function library, /etc/rc.d/init.d/functions, for the system. This spells out how to start or kill a program and how to determine the PID of a program.

The init program starts all of the background processes by looking in the appropriate rc directory for the runlevel specified as default in /etc/inittab. The rc directories are numbered to corresponds to the runlevel they represent. For instance, /etc/rc.d/rc5.d/ is the directory for runlevel 5.

When booting to runlevel 5, the init program looks in the /etc/rc.d/rc5.d/ directory to determine which processes to start and stop.

Below is an example listing of the /etc/rc.d/rc5.d/ directory:

K05innd -> ../init.d/innd

K05saslauthd -> ../init.d/saslauthd

K10psacct -> ../init.d/psacct

K12mysqld -> ../init.d/mysqld

K15httpd -> ../init.d/httpd

K15postgresql -> ../init.d/postgresql

K16rarpd -> ../init.d/rarpd

K20bootparamd -> ../init.d/bootparamd

K20nfs -> ../init.d/nfs

K20rstatd -> ../init.d/rstatd

K25squid -> ../init.d/squid

K34yppasswdd -> ../init.d/yppasswdd

K35dhcpd -> ../init.d/dhcpd

K35smb -> ../init.d/smb

K45arpwatch -> ../init.d/arpwatch

K45named -> ../init.d/named

K50snmpd -> ../init.d/snmpd

K50snmptrapd -> ../init.d/snmptrapd

K55routed -> ../init.d/routed

K61ldap -> ../init.d/ldap

K65identd -> ../init.d/identd

K74ypserv -> ../init.d/ypserv

K74ypxfrd -> ../init.d/ypxfrd

S08ip6tables -> ../init.d/ip6tables

S08ipchains -> ../init.d/ipchains

S08iptables -> ../init.d/iptables

S90crond -> ../init.d/crond

S90cups -> ../init.d/cups

S90xfs -> ../init.d/xfs

As illustrated in this listing, none of the scripts that actually start and stop the services are located in the /etc/rc.d/rc5.d/ directory. Rather, all of the files in /etc/rc.d/rc5.d/ are symbolic links pointing to scripts located in the /etc/rc.d/init.d/ directory. Symbolic links are used in each of the rc directories so that the runlevels can be reconfigured by creating, modifying, and deleting the symbolic links without affecting the actual scripts they reference.

The name of each symbolic link begin with either a K or an S. The K links are processes that are killed on that runlevel, while those beginning with an S are started.

The init command first stops all of the K symbolic links in the directory by issuing the /etc/rc.d/init.d/<command> stop command, where <command> is the process to be killed. It then starts all of the S symbolic links by issuing /etc/rc.d/init.d/<command> start.

After the system is finished booting, it is possible to log in as root and execute these same scripts to start and stop services. For instance, the command /etc/rc.d/init.d/httpd stop will stop the Apache Web server.

Each of the symbolic links are numbered to dictate start order. The order in which the services are started or stopped can be altered by changing this number. The lower the number, the earlier it is started. Those symbolic links with the same number are started alphabetically.

One of the last things the init program executes is the /etc/rc.d/rc.local file. This file is useful for system customization. See Section 1.3 Running Additional Programs at Boot Time for more on using the rc.local file.

After the init command has progressed through the appropriate rc directory for the runlevel, the /etc/inittab script forks a /sbin/mingetty process for each virtual console (login prompts) allocated to the runlevel. Runlevels 2 through 5 get all six virtual consoles, while runlevel 1 (single user mode) gets only one and runlevels 0 and 6 get none. The /sbin/mingetty process opens communication pathways to tty devices, sets their modes, prints the login prompt, gets the user name, and initiates the login process for the user.

In runlevel 5, the /etc/inittab runs a script called /etc/X11/prefdm. The prefdm script executes the preferred X display manager — gdm, kdm, or xdm, depending on the contents of the /etc/sysconfig/desktop file.

At this point, the system is operating on runlevel 5 and displaying a login screen.

8.1 Running Additional Programs at Boot Time

The /etc/rc.d/rc.local script is executed by the init command at boot time or when changing runlevels. Adding commands to this script is an easy way to perform necessary tasks like starting special services or initialize devices without writing complex initialization scripts in the /etc/rc.d/init.d/ directory and creating symbolic links.

The /etc/rc.serial script is used if serial ports must be setup at boot time. This script runs setserial commands to configure the system's serial ports. See the setserial man page for more information.

9.1 SysV Init Runlevels

The SysV init runlevel system provides a standard process for controlling which programs init launches or halts when initializing a runlevel. SysV init was chosen because it is easier to use and more flexible than the traditional BSD-style init process.

The configuration files for SysV init are located in the /etc/rc.d/ directory. Within this directory, are the rc, rc.local, rc.sysinit, and, optionally, the rc.serial scripts as well as the following directories:

init.d/

rc0.d/

rc1.d/

rc2.d/

rc3.d/

rc4.d/

rc5.d/

rc6.d/

The init.d/ directory contains the scripts used by the /sbin/init command when controlling services. Each of the numbered directories represent the six default runlevels configured by default under Red Hat Linux.

1.9.1 Runlevels

Runlevels are a state, or mode, defined by the services listed in the SysV /etc/rc.d/rc<x>.d/ directory, where <x> is the number of the runlevel.

The idea behind SysV init runlevels revolves around the fact that different systems can be used in a different ways. For example, a server runs more efficiently without the drag on system resources created by the X Window System. Other times, a system administrator may need to operate the system at a lower runlevel to perform diagnostic tasks, like fixing disk corruption in runlevel 1, when no other users can possibly be on the system.

The characteristics of a given runlevel determines which services are halted and started by init. For instance, runlevel 1 (single user mode) halts any network services, while runlevel 3 starts these services. By assigning specific services to be halted or started on a given runlevel, init can quickly change the mode of the machine without the user manually stopping and starting services.

The following runlevels are defined by default for Red Hat Linux:

0 — Halt

1 — Single-user text mode

2 — Not used (user-definable)

3 — Full multi-user text mode

4 — Not used (user-definable)

5 — Full multi-user graphical mode (with an X-based login screen)

6 — Reboot

In general, users operate Red Hat Linux at runlevel 3 or runlevel 5 — both full multi-user modes. Users sometimes customize runlevels 2 and 4 to meet specific needs. since they are not used.

The default runlevel for the system is listed in /etc/inittab. To find out the default runlevel for a system, look for the line similar to the one below near the top of /etc/inittab:

id:5:initdefault:

The default runlevel listed in the example above is five, as the number after the first colon indicates. To change it, edit /etc/inittab as root.

Be very careful when editing /etc/inittab. Simple typos can cause the system to become unbootable. If this happens, either use a boot diskette, enter single-user mode, or enter rescue mode to boot the computer and repair the file.

It is possible to change the default runlevel at boot-time by modifying the arguments passed by the boot loader to the kernel.

1.9.2 Runlevel Utilities

One of the best ways to configure runlevels is to use an initscript utility. These tools are designed to simplify the task of maintaining files in the SysV init directory hierarchy and relieves system administrators from having to directly manipulate the numerous symbolic links in the subdirectories of /etc/rc.d/.

Red Hat Linux provides three such utilities:

/sbin/chkconfig — The /sbin/chkconfig utility is a simple command-line tool for maintaining the /etc/rc.d/init.d directory hierarchy.

/sbin/ntsysv — The ncurses-based /sbin/ntsysv utility provides an interactive text-based interface, which some find easier to use than chkconfig.

Services Configuration Tool — The graphical Services Configuration Tool (redhat-config-services) program is a flexible GTK2-based utility for configuring runlevels.

10.1 Shutting Down

To shut down Red Hat Linux, the root user may issue the /sbin/shutdown command. The shutdown man page has a complete list of options, but the two most common uses are:

/sbin/shutdown -h now

/sbin/shutdown -r now

After shutting everything down, the -h option will halt the machine, and the -r option will reboot.

Non-root users can use the reboot and halt commands to shut down the system while in runlevels 1 through 5. However, not all Linux operating systems support this feature.

If the computer does not power itself down, be careful not turn off the computer until a message appears indicating that the system is halted.

Failure to wait for this message can mean that not all the hard drive partitions are unmounted, and can lead to file system corruption.

2 Managing User Accounting

1.2 What's an account?

When a computer is used by many people it is usually necessary to differentiate between the users, for example, so that their private files can be kept private. This is important even if the computer can only be used by a single person at a time, as with most microcomputers. Thus, each user is given a unique username, and that name is used to log in.

There's more to a user than just a name, however. An account is all the files, resources, and information belonging to one user. The term hints at banks, and in a commercial system each account usually has some money attached to it, and that money vanishes at different speeds depending on how much the user stresses the system. For example, disk space might have a price per megabyte and day, and processing time might have a price per second.

2.2 Creating a user

The Linux kernel itself treats users are mere numbers. Each user is identified by a unique integer, the user id or uid, because numbers are faster and easier for a computer to process than textual names. A separate database outside the kernel assigns a textual name, the username, to each user id. The database contains additional information as well.

To create a user, you need to add information about the user to the user database, and create a home directory for him. It may also be necessary to educate the user, and set up a suitable initial environment for him.

Most Linux distributions come with a program for creating accounts. There are several such programs available. Two command line alternatives are adduser and useradd; there may be a GUI tool as well. Whatever the program, the result is that there is little if any manual work to be done. Even if the details are many and intricate, these programs make everything seem trivial.

3.2 `/etc/passwd` and other informative files

The basic user database in a Unix system is the text file, /etc/passwd (called the password file), which lists all valid usernames and their associated information. The file has one line per username, and is divided into seven colon-delimited fields:

o Username.

o Password, in an encrypted form.

o Numeric user id.

o Numeric group id.

o Full name or other description of account.

o Home directory.

o Login shell (program to run at login).

The format is explained in more detail on the passwd manual page.

Any user on the system may read the password file, so that they can, for example, learn the name of another user. This means that the password (the second field) is also available to everyone. The password file encrypts the password, so in theory there is no problem. However, the encryption is breakable, especially if the password is weak (e.g., it is short or it can be found in a dictionary). Therefore it is not a good idea to have the password in the password file.

Many Linux systems have shadow passwords. This is an alternative way of storing the password: the encrypted password is stored in a separate file, /etc/shadow, which only root can read. The /etc/passwd file only contains a special marker in the second field. Any program that needs to verify a user is setuid, and can therefore access the shadow password file. Normal programs, which only use the other fields in the password file, can't get at the password.

4.2 Picking numeric user and group ids

On most systems it doesn't matter what the numeric user and group ids are, but if you use the Network filesystem (NFS), you need to have the same uid and gid on all systems. This is because NFS also identifies users with the numeric uids. If you aren't using NFS, you can let your account creation tool pick them automatically.

If you are using NFS, you'll have to be invent a mechanism for synchronising account information. One alternative is to the NIS system (see XXX network-admin-guide).

However, you should try to avoid re-using numeric uids (and textual usernames), because the new owner of the uid (or username) may get access to the old owner's files (or mail, or whatever).

5.2 Initial environment: `/etc/skel`

When the home directory for a new user is created, it is initialised with files from the /etc/skel directory. The system administrator can create files in /etc/skel that will provide a nice default environment for users. For example, he might create a /etc/skel/.profile that sets the EDITOR environment variable to some editor that is friendly towards new users.

However, it is usually best to try to keep /etc/skel as small as possible, since it will be next to impossible to update existing users' files. For example, if the name of the friendly editor changes, all existing users would have to edit their .profile. The system administrator could try to do it automatically, with a script, but that is almost certain going to break someone's file.

Whenever possible, it is better to put global configuration into global files, such as /etc/profile. This way it is possible to update it without breaking users' own setups.

6.2 Creating a user by hand

To create a new account manually, follow these steps:

Edit /etc/passwd with vipw and add a new line for the new account. Be careful with the syntax. Do not edit directly with an editor! vipw locks the file, so that other commands won't try to update it at the same time. You should make the password field be `*', so that it is impossible to log in.

Similarly, edit /etc/group with vigr, if you need to create a new group as well.

Create the home directory of the user with mkdir.

Copy the files from /etc/skel to the new home directory.

Fix ownerships and permissions with chown and chmod. The -R option is most useful. The correct permissions vary a little from one site to another, but usually the following commands do the right thing:

cd /home/newusername

chown -R username.group .

chmod -R go=u,go-w .

chmod go= .

Set the password with passwd.

After you set the password in the last step, the account will work. You shouldn't set it until everything else has been done, otherwise the user may inadvertently log in while you're still copying the files.

It is sometimes necessary to create dummy accounts that are not used by people. For example, to set up an anonymous FTP server (so that anyone can download files from it, without having to get an account first), you need to create an account called ftp. In such cases, it is usually not necessary to set the password (last step above). Indeed, it is better not to, so that no-one can use the account, unless they first become root, since root can become any user.

7.2 Changing user properties

There are a few commands for changing various properties of an account (i.e., the relevant field in /etc/passwd):

chfn

Change the full name field.

chsh

Change the login shell.

passwd

Change the password.

The super-user may use these commands to change the properties of any account. Normal users can only change the properties of their own account. It may sometimes be necessary to disable these commands (with chmod) for normal users, for example in an environment with many novice users.

Other tasks need to be done by hand. For example, to change the username, you need to edit /etc/passwd directly (with vipw, remember). Likewise, to add or remove the user to more groups, you need to edit /etc/group (with vigr). Such tasks tend to be rare, however, and should be done with caution: for example, if you change the username, e-mail will no longer reach the user, unless you also create a mail alias.

8.2 Removing a user

To remove a user, you first remove all his files, mailboxes, mail aliases, print jobs, cron and at jobs, and all other references to the user. Then you remove the relevant lines from /etc/passwd and /etc/group (remember to remove the username from all groups it's been added to). It may be a good idea to first disable the account (see below), before you start removing stuff, to prevent the user from using the account while it is being removed.

Remember that users may have files outside their home directory. The find command can find them:

Find / -user username

However, note that the above command will take a long time, if you have large disks. If you mount network disks, you need to be careful so that you won't trash the network or the server.

Some Linux distributions come with special commands to do this; look for deluser or userdel. However, it is easy to do it by hand as well, and the commands might not do everything.

9.2 Disabling a user temporarily

It is sometimes necessary to temporarily disable an account, without removing it. For example, the user might not have paid his fees, or the system administrator may suspect that a cracker has got the password of that account.

The best way to disable an account is to change its shell into a special program that just prints a message. This way, whoever tries to log into the account, will fail, and will know why. The message can tell the user to contact the system administrator so that any problems may be dealt with.

It would also be possible to change the username or password to something else, but then the user won't know what is going on. Confused users mean more work.

A simple way to create the special programs is to write `tail scripts':

#!/usr/bin/tail +2

This account has been closed due to a security breach.

Please call 555-1234 and wait for the men in black to arrive.

The first two characters (`#!') tell the kernel that the rest of the line is a command that needs to be run to interpret this file. The tail command in this case outputs everything except the first line to the standard output.

If user billg is suspected of a security breach, the system administrator would do something like this:

# chsh -s

/usr/local/lib/no-login/security billg

# su - tester

This account has been closed due to a security breach.

Please call 555-1234 and wait for the men in black to arrive.

#

The purpose of the su is to test that the change worked, of course.

Tail scripts should be kept in a separate directory, so that their names don't interfere with normal user commands.

10.2 Command Line Configuration

If you prefer command line tools or do not have the X Window System installed, use this chapter to configure users and groups.

2.10.1 Adding a User

To add a user to the system: Issue the useradd command to create a locked user account:

useradd <username>

Unlock the account by issuing the passwd command to assign a password and set password aging guidelines:

passwd <username>

The command line options for useradd are in Table 25-1

Option	Description
`-c` `comment`	Comment for the user
`-d` `home-dir`	Home directory to be used instead of default `/home/username`
`-e` `date`	Date for the account to be disabled in the format YYYY-MM-DD
`-f` `days`	Number of days after the password expires until the account is disabled. (If `0` is specified, the account is disabled immediately after the password expires. If `-1` is specified, the account will not be disabled after the password expires.)
`-g` `group-name`	Group name or group number for the user's default group (The group must exist prior to being specified here.)
`-G` `group-list`	List of additional (other than default) group names or group numbers, separated by commas, of which the user is a member. (The groups must exist prior to being specified here.)
`-m`	Create the home directory if it does not exist.
`-M`	Do not create the home directory.
`-n`	Do not create a user private group for the user.
`-r`	Create a system account with a UID less than 500 and without a home directory.
`-p` `password`	The password encrypted with `crypt`.
`-s`	User's login shell, which defaults to `/bin/bash.`
`-u` `uid`	User ID for the user, which must be unique and greater than 499.

Table 25-1. useradd Command Line Options

2.10.2 Adding a Group

To add a group to the system, use the command groupadd:

groupadd <group-name>

The command line options for groupadd are in Table 25-2

Option	Description
`-g` `gid`	Group ID for the group, which must be unique and greater than 499.
`-r`	Create a system group with a GID less than 500.
`-f`	Exit with an error if the group already exists. (The group is not altered.) If `-g` and `-f` are specified, but the group already exists, the `-g` option is ignored.

Table 25-2. groupadd Command Line Options

2.10.3 Password Aging

For security reasons, it is a good practice to require users to change their passwords periodically. This can be done when adding or editing a user on the Password Info tab of the User Manager.

To configure password expiration for a user from a shell prompt, use the chage command, followed by an option from Table 25-3, followed by the username of the user.

Shadow passwords must be enabled to use the chage command.

Option	Description
`-m` `days`	Specify the minimum number of days between which the user must change passwords. If the value is 0, the password does not expire.
`-M` `days`	Specify the maximum number of days for which the password is valid. When the number of days specified by this option plus the number of days specified with the `-d` option is less than the current day, the user must change passwords before using the account.
`-d` `days`	Specify the number of days since January 1, 1970 the password was changed.
`-I` `days`	Specify the number of inactive days after the password expiration before locking the account. If the value is 0, the account is not locked after the password expires.
`-E` `date`	Specify the date on which the account is locked, in the format YYYY-MM-DD. Instead of the date, the number of days since January 1, 1970 can also be used.
`-W` `days`	Specify the number of days before the password expiration date to warn the user.

Table 25-3. change Command Line Options

If the chage command is followed directly by a username (with no options), it displays the current password aging values and allows them to be changed.

If a system administrator wants a user to set a password the first time the user log in, the user's password can be set to expire immediately, forcing the user to change it immediately after logging in for the first time.

To force a user to configure a password the first time the user logs in at the console, follow these steps. Note, this process does not work if the user logs in using the SSH protocol.

Lock the user's password — If the user does not exist, use the useradd command to create the user account, but do not give it a password so that it remains locked.

If the password is already enabled, lock it with the command:

usermod -L username

Force immediate password expiration — Type the following command:

chage -d 0 username

This command sets the value for the date the password was last changed to the epoch (January 1, 1970). This value forces immediate password expiration no matter what password aging policy, if any, is in place.

Unlock the account — There are two common approaches to this step. The administrator can assign an initial password or assign a null password.

Do not use the passwd command to set the password as it disables the immediate password expiration just configured.

To assign an initial password, use the following steps:

Start the command line Python interpreter with the python command. It displays the following:

Python 2.2.2 (#1, Dec 10 2002, 09:57:09)

[GCC 3.2.1 20021207 (Red Hat Linux 8.0 3.2.1-2)] on linux2

Type "help", "copyright", "credits" or "license" for more information.

>>>

At the prompt, type the following (replacing password with the password to encrypt and salt with a combination of exactly 2 upper or lower case alphabetic characters, digits, the dot (.) character, or the slash (/) character such as ab or 12:

import crypt; print crypt.crypt("password","salt")

The output is the encrypted password similar to 12CsGd8FRcMSM.

Type [Ctrl]-[D] to exit the Python interpreter.

Cut and paste the exact encrypted password output, without a leading or trailing blank spaces, into the following command:

usermod -p "encrypted-password" username

Instead of assigning an initial password, a null password can be assigned using the command:

While using a null password is convenient for both the user and the administrator, there is a slight risk that a third party can log in first and access the system. To minimize this threat, it is recommended that the administrator verifies that the user is ready to log in when the account is unlocked.

usermod -p "" username

In either case, upon initial log in, the user is prompted for a new password.

2.10.4 Explaining the Process

The following steps illustrate what happens if the command useradd juan is issued on a system that has shadow passwords enabled:

A new line for juan is created in /etc/passwd. The line has the following characteristics:

o It begins with the username juan.

o There is an x for the password field indicating that the system is using shadow passwords.

o A UID at or above 500 is created. (Under Red Hat Linux, UIDs and GIDs below 500 are reserved for system use.)

o A GID at or above 500 is created.

o The optional GECOS information is left blank.

o The home directory for juan is set to /home/juan/.

o The default shell is set to /bin/bash.

A new line for juan is created in /etc/shadow. The line has the following characteristics:

o It begins with the username juan.

o Two exclamation points (!!) appear in the password field of the /etc/shadow file, which locks the account.

If an encrypted password is passed using the -p flag, it is placed in the /etc/shadow file on the new line for the user.

The password is set to never expire.

A new line for a group named juan is created in /etc/group. A group with the same name as a user is called a user private group. The line created in /etc/group has the following characteristics:

o It begins with the group name juan.

o An x appears in the password field indicating that the system is using shadow group passwords.

o The GID matches the one listed for user juan in /etc/passwd.

A new line for a group named juan is created in /etc/gshadow. The line has the following characteristics:

o It begins with the group name juan.

o An exclamation point (!) appears in the password field of the /etc/gshadow file, which locks the group.

o All other fields are blank.

o A directory for user juan is created in the /home/ directory. This directory is owned by user juan and group juan. However, it has read, write, and execute privileges only for the user juan. All other permissions are denied.

o The files within the /etc/skel/ directory (which contain default user settings) are copied into the new /home/juan/ directory.

o At this point, a locked account called juan exists on the system. To activate it, the administrator must next assign a password to the account using the passwd command and, optionally, set password aging guidelines.

3 File System Structure

Why Share a Common Structure?

An operating system's file system structure is its most basic level of organization. Almost all of the ways an operating system interacts with its users, applications, and security model are dependent upon the way it stores its files on a storage device. It is crucial for a variety of reasons that users, as well as programs, be able to refer to a common guideline to know where to read and write files.

A file system can be seen in terms of two different logical categories of files:

o Shareable vs. unsharable files

o Variable vs. static files

Shareable files are those that can be accessed by various hosts; unsharable files are not available to any other hosts. Variable files can change at any time without any intervention; static files, such as read-only documentation and binaries, do not change without an action from the system administrator or an agent that the system administrator has placed in motion to accomplish that task.

The reason for looking at files in this manner is to help correlate the function of the file with the permissions assigned to the directories which hold them. The way in which the operating system and its users interact with a given file determines the directory in which it is placed, whether that directory is mounted read-only or read-write, and the level of access each user has to that file. The top level of this organization is crucial, as the access to the underlying directories can be restricted or security problems may manifest themselves if the top level is left disorganized or without a widely-used structure.

However, having a structure does not mean very much unless it is a standard. Competing structures can actually cause more problems than they fix. Because of this, Red Hat has chosen the the most widely-used file system structure and extended it only slightly to accommodate special files used within Red Hat Linux.

1.3 Overview of File System Hierarchy Standard (FHS)

Red Hat is committed to the Filesystem Hierarchy Standard (FHS), a collaborative document that defines the names and locations of many files and directories.

The FHS document is the authoritative reference to any FHS-compliant file system, but the standard leaves many areas undefined or extensible. This section is an overview of the standard and a description of the parts of the file system not covered by the standard.

The complete standard is available at: http://www.pathname.com/fhs

Compliance with the standard means many things, but the two most important are compatibility with other compliant systems and the ability to mount a /usr/ partition as read-only because it contains common executables and should not be changed by users. Since the /usr/ directory is mounted read-only, it can be mounted from the CD-ROM or from another machine via a read-only NFS mount.

3.1.1 FHS Organization

The directories and files noted here are a small subset of those specified by the FHS document. Refer to the latest FHS document for the most complete information.

The /dev/ Directory

The /dev/ directory contains file system entries which represent devices that are attached to the system. These files are essential for the system to function properly.

/etc/ Directory

The /etc/ directory is reserved for configuration files that are local to the machine. No binaries are to be put in /etc/. Any binaries that were once located in /etc/ should be placed into /sbin/ or possibly /bin/.

The X11/ and skel/ directories are subdirectories of the /etc/ directory:

/etc

- X11/

- skel/

The /etc/X11/ directory is for X11 configuration files such as XF86Config. The /etc/skel/ directory is for "skeleton" user files, which are used to populate a home directory when a user is first created.

The /lib/ Directory

The /lib/ directory should contain only those libraries that are needed to execute the binaries in /bin/ and /sbin/. These shared library images are particularly important for booting the system and executing commands within the root file system.

The /mnt/ Directory

The /mnt/ directory is for temporarily mounted file systems, such as CD-ROMs and floppy disks.

The /opt/ Directory

The /opt/ directory provides storage for large, static application software packages.

A package placing files in the /opt/ directory creates a directory bearing the same name as the package. This directory in turn holds files that otherwise would be scattered throughout the file system, giving the system administrator an easy way to determine the role of each file within a particular package.

For example, if sample is the name of a particular software package located within the /opt/ directory, then all of its files could be placed within directories inside the /opt/sample/ directory, such as /opt/sample/bin/ for binaries and /opt/sample/man/ for manual pages.

Large packages that encompass many different sub-packages, each of which accomplish a particular task, also go within the /opt/ directory, giving that large package a standardized way to organize itself. In this way, our sample package may have different tools that each go in their own sub-directories, such as /opt/sample/tool1/ and /opt/sample/tool2/, each of which can have their own bin/, man/, and other similar directories.

The /proc/ Directory

The /proc/ directory contains special files that either extract information from or send information to the kernel.

Due to the great variety of data available within /proc/ and the many ways this directory can be used to communicate with the kernel, an entire chapter has been devoted to the subject.

The /sbin/ Directory

The /sbin/ directory is for executables used only by the root user. The executables in /sbin/ are only used to boot and mount /usr/ and perform system recovery operations. The FHS says: "/sbin typically contains files essential for booting the system in addition to the binaries in /bin. Anything executed after /usr is known to be mounted (when there are no problems) should be placed in /usr/sbin. Local-only system administration binaries should be placed into /usr/local/sbin."

At a minimum, the following programs should be in /sbin/:

arp, clock,

getty, halt,

init, fdisk,

fsck.*, grub,

ifconfig, lilo,

mkfs.*, mkswap,

reboot, route,

shutdown, swapoff,

swapon, update

The /usr/ Directory

The /usr/ directory is for files that can be shared across a whole site. The /usr/ directory usually has its own partition, and it should be mountable read-only. At minimum, the following directories should be subdirectories of /usr/:

/usr

- bin/

- dict/

- doc/

- etc/

- games/

- include/

- kerberos/

- lib/

- libexec/

- local/

- sbin/

- share/

- src/

- tmp -> ../var/tmp/

- X11R6/

The bin/ directory contains executables, dict/ contains non-FHS compliant documentation pages, etc/ contains system-wide configuration files, games is for games, include/ contains C header files, kerberos/ contains binaries and much more for Kerberos, and lib/ contains object files and libraries that are not designed to be directly utilized by users or shell scripts. The libexec/ directory contains small helper programs called by other programs, sbin/ is for system administration binaries (those that do not belong in the /sbin/ directory), share/ contains files that are not architecture-specific, src/ is for source code, and X11R6/ is for the X Window System (XFree86 on Red Hat Linux).

The /usr/local/ Directory

The FHS says: "The /usr/local hierarchy is for use by the system administrator when installing software locally. It needs to be safe from being overwritten when the system software is updated. It may be used for programs and data that are shareable among a group of hosts, but not found in /usr."

The /usr/local/ directory is similar in structure to the /usr/ directory. It has the following subdirectories, which are similar in purpose to those in the /usr/ directory:

/usr/local

- bin/

- doc/

- etc/

- games/

- include/

- lib/

- libexec/

- sbin/

- share/

- src/

The /var/ Directory

Since the FHS requires Linux to mount /usr/ read-only, any programs that write log files or need spool/ or lock/ directories should write them to the /var/ directory. The FHS states /var/ is for: "variable data files. This includes spool directories and files, administrative and logging data, and transient and temporary files."

System log files such as messages/ and lastlog/ go in the /var/log/ directory. The /var/lib/rpm/ directory also contains the RPM system databases. Lock files go in the /var/lock/ directory, usually in directories particular for the program using the file. The /var/spool/ directory has subdirectories for various systems that need to store data files.

/usr/local/ in Red Hat Linux

In Red Hat Linux, the intended use for the /usr/local/ directory is slightly different from that specified by the FHS. The FHS says that /usr/local/ should be where software that is to remain safe from system software upgrades is stored. Since system upgrades from under Red Hat Linux performed safely with the rpm command and graphical Package Management Tool application, it is not necessary to protect files by putting them in /usr/local/. Instead, the /usr/local/ directory is used for software that is local to the machine.

For instance, if the /usr/ directory is mounted as a read-only NFS share from a remote host, it is still possible to install a package or program under the /usr/local/ directory.

3.1.2 Special File Locations

Red Hat Linux extends the FHS structure slightly to accommodate special files.

Most files pertaining to the Red Hat Package Manager (RPM) are kept in the /var/lib/rpm/ directory.

The /var/spool/up2date/ directory contains files used by Red Hat Update Agent, including RPM header information for the system. This location may also be used to temporarily store RPMs downloaded while updating the system.

Another location specific to Red Hat Linux is the /etc/sysconfig/ directory. This directory stores a variety of configuration information. Many scripts that run at boot time use the files in this directory.

Finally, one more directory worth noting is the /initrd/ directory. It is empty, but is used as a critical mount point during the boot process.

Do not remove the /initrd/ directory for any reason. Removing this directory will cause the system to fail to boot with a kernel panic error message.

2.3 The ext2 File System

The standard on-disk file system used by Linux is called ext2fs, for historical reason. Linux was originally programmed with a Minix-compatible file system, to ease exchanging data with the Minix development system, but that file system was severely restricted by 14 character file-name limits and the maximum file-system size of 64 MB. The Minix file system was superseded by a new file system, which was christened the extended file system (extfs). A later redesign of this file system to improve performance and scalability and to add a few features led to the second extended file system (ext2fs).

This file system uses multi-level indexed allocation method to allocate disk blocks to files. The information regarding a file is kept in special blocks called inodes or index nodes. The inode contains 15 pointers to data blocks apart form other information. The first 12 of these ponters point directly to data blocks. The next three pointers point to indirect blocks. Default block size of ext2fs is 1KB, although 2 KB and 4 KB are also supported.

3.3 The ext3 File System

The revised version of ext2fs is the ext3fs, which has superior performance in terms of availability, data integrity, speed, and easy transition.

Availability

After an unclean system shutdown (unexpected power failure, system crash), each ext2 file system cannot be mounted until its consistency has been checked by the e2fsck program. The amount of time that the e2fsck program takes is determined primarily by the size of the file system, and for today's relatively large (many tens of gigabytes) file systems, this takes a long time. Also, the more files you have on the file system, the longer the consistency check takes. File systems that are several hundreds of gigabytes in size may take an hour or more to check. This severely limits availability.

By contrast, ext3 does not require a file system check, even after an unclean system shutdown, except for certain rare hardware failure cases (e.g. hard drive failures). This is because the data is written to disk in such a way that the file system is always consistent. The time to recover an ext3 file system after an unclean system shutdown does not depend on the size of the file system or the number of files; rather, it depends on the size of the "journal" used to maintain consistency. The default journal size takes about a second to recover (depending on the speed of the hardware).

Data Integrity

Using the ext3 file system can provide stronger guarantees about data integrity in case of an unclean system shutdown. You choose the type and level of protection that your data receives. You can choose to keep the file system consistent, but allow for damage to data on the file system in the case of unclean system shutdown; this can give a modest speed up under some but not all circumstances. Alternatively, you can choose to ensure that the data is consistent with the state of the file system; this means that you will never see garbage data in recently-written files after a crash. The safe choice, keeping the data consistent with the state of the file system, is the default.

Speed

Despite writing some data more than once, ext3 is often faster (higher throughput) than ext2 because ext3's journaling optimizes hard drive head motion. You can choose from three journaling modes to optimize speed, optionally choosing to trade off some data integrity.

Easy Transition

It is easy to change from ext2 to ext3 and gain the benefits of a robust journaling file system, without reformatting. That's right, there is no need to do a long, tedious, and error-prone backup-reformat-restore operation in order to experience the advantages of ext3.

4 Configuring TCP/IP Networking

In this chapter, we walk you through all the necessary steps to set up TCP/IP networking on your machine. Starting with the assignment of IP addresses, we slowly work our way through the configuration of TCP/IP network interfaces and introduce a few tools that come in handy when hunting down network installation problems.

Most of the tasks covered in this chapter will generally have to be done only once. Afterward, you have to touch most configuration files only when adding a new system to your network or when you reconfigure your system entirely. Some of the commands used to configure TCP/IP, however, have to be executed each time the system is booted. This is usually done by invoking them from the system /etc/rc* scripts.

Commonly, the network-specific part of this procedure is contained in a script. The name of this script varies in different Linux distributions. In many older Linux distributions, it is known as rc.net or rc.inet. Sometimes you will also see two scripts named rc.inet1 and rc.inet2; the former initializes the kernel part of networking and the latter starts basic networking services and applications. In modern distributions, the rc files are structured in a more sophisticated arrangement; here you may find scripts in the /etc/init.d/ (or /etc/rc.d/init.d/) directory that create the network devices and other rc files that run the network application programs. This book's examples are based on the latter arrangement.

This chapter discusses parts of the script that configure your network interfaces, while applications will be covered in later chapters. After finishing this chapter, you should have established a sequence of commands that properly configure TCP/IP networking on your computer. You should then replace any sample commands in your configuration scripts with your commands, make sure the script is executed from the basic rc script at startup time, and reboot your machine. The networking rc scripts that come along with your favorite Linux distribution should provide a solid example from which to work.

1.4 Mounting the /proc Filesystem

Some of the configuration tools of the Linux NET-2 and NET-3 release rely on the /proc filesystem for communicating with the kernel. This interface permits access to kernel runtime information through a filesystem-like mechanism. When mounted, you can list its files like any other filesystem, or display their contents. Typical items include the loadavg file, which contains the system load average, and meminfo, which shows current core memory and swap usage.

To this, the networking code adds the net directory. It contains a number of files that show things like the kernel ARP tables, the state of TCP connections, and the routing tables. Most network administration tools get their information from these files.

The proc filesystem (or procfs, as it is also known) is usually mounted on /proc at system boot time. The best method is to add the following line to /etc/fstab:

# procfs mount point:

none /proc proc defaults

Then execute mount /proc from your /etc/rc script. The procfs is now configured into most kernels by default. If the procfs is not in your kernel, you will get a message such as: mount: fs type procfs not supported by kernel. You will then have to recompile the kernel and answer “yes” when asked for procfs support.

2.4 Installing the Binaries

If you are using one of the prepackaged Linux distributions, it will contain the major networking applications and utilities along with a coherent set of sample files. The only case in which you might have to obtain and install new utilities is when you install a new kernel release. As they occasionally involve changes in the kernel networking layer, you will need to update the basic configuration tools. This update at least involves recompiling, but sometimes you may also be required to obtain the latest set of binaries. These binaries are available at their official home site at ftp.inka.de/pub/comp/Linux/networking/NetTools/, packaged in an archive called net-tools-XXX.tar.gz, where XXX is the version number. The release matching Linux 2.0 is net-tools-1.45.

If you want to compile and install the standard TCP/IP network applications yourself, you can obtain the sources from most Linux FTP servers. All modern Linux distributions include a fairly comprehensive range of TCP/IP network applications, such as World Wide Web browsers, telnet and ftp programs, and other network applications, such as talk. If you do find something that you do need to compile yourself, the chances are good that it will compile under Linux from source quite simply if you follow the instructions included in the source package.

3.4 Setting the Hostname

Most, if not all, network applications rely on you to set the local host's name to some reasonable value. This setting is usually made during the boot procedure by executing the hostname command. To set the hostname to name, enter:

# hostname name

It is common practice to use the unqualified hostname without specifying the domain name. For instance, hosts at the Virtual Brewery might be called vale.vbrew.com or vlager.vbrew.com. These are their official fully qualified domain names (FQDNs). Their local hostnames would be the first component of the name, such as vale. However, as the local hostname is frequently used to look up the host's IP address, you have to make sure that the resolver library is able to look up the host's IP address. This usually means that you have to enter the name in /etc/hosts.

Some people suggest using the domainname command to set the kernel's idea of a domain name to the remaining part of the FQDN. This way you could combine the output from hostname and domainname to get the FQDN again. However, this is at best only half correct. domainname is generally used to set the host's NIS domain, which may be entirely different from the DNS domain to which your host belongs. Instead, to ensure that the short form of your hostname is resolvable with all recent versions of the hostname command, either add it as an entry in your local Domain Name Server or place the fully qualified domain name in the /etc/hosts file. You may then use the --fqdn argument to the hostname command, and it will print the fully qualifed domain name.

4.4 Assigning IP Addresses

If you configure the networking software on your host for standalone operation (for instance, to be able to run the INN Netnews software), you can safely skip this section, because the only IP address you will need is for the loopback interface, which is always 127.0.0.1.

Things are a little more complicated with real networks like Ethernets. If you want to connect your host to an existing network, you have to ask its administrators to give you an IP address on this network. When setting up a network all by yourself, you have to assign IP addresses yourself.

Hosts within a local network should usually share addresses from the same logical IP network. Hence, you have to assign an IP network address. If you have several physical networks, you have to either assign them different network numbers, or use subnetting to split your IP address range into several subnetworks. Subnetting will be revisited in the next section.

When picking an IP network number, much depends on whether you intend to get on the Internet in the near future. If so, you should obtain an official IP address now. Ask your network service provider to help you. If you want to obtain a network number, just in case you might get on the Internet someday, request a Network Address Application Form from hostmaster@internic.net, or your country's own Network Information Center, if there is one.

If your network is not connected to the Internet and won't be in the near future, you are free to choose any legal network address. Just make sure no packets from your internal network escape to the real Internet. To make sure no harm can be done even if packets did escape, you should use one of the network numbers reserved for private use. The Internet Assigned Numbers Authority (IANA) has set aside several network numbers from classes A, B, and C that you can use without registering. These addresses are valid only within your private network and are not routed between real Internet sites.

Picking your addresses from one of these network numbers is not only useful for networks completely unconnected to the Internet; you can still implement a slightly more restricted access using a single host as a gateway. To your local network, the gateway is accessible by its internal IP address, while the outside world knows it by an officially registered address (assigned to you by your provider).

Throughout the remainder of the book, we will assume that the brewery's network manager uses a class B network number, say 172.16.0.0. Of course, a class C network number would definitely suffice to accommodate both the Brewery's and the Winery's networks. We'll use a class B network here for the sake of simplicity; it will make the subnetting examples in the next section of this chapter a little more intuitive.

5.4 Creating Subnets

To operate several Ethernets (or other networks, once a driver is available), you have to split your network into subnets. Note that subnetting is required only if you have more than one broadcast network—point-to-point links don't count. For instance, if you have one Ethernet, and one or more SLIP links to the outside world, you don't need to subnet your network.

To accommodate the two Ethernets, the Brewery's network manager decides to use 8 bits of the host part as additional subnet bits. This leaves another 8 bits for the host part, allowing for 254 hosts on each of the subnets. She then assigns subnet number 1 to the brewery, and gives the winery number 2. Their respective network addresses are thus 172.16.1.0 and 172.16.2.0. The subnet mask is 255.255.255.0.

vlager, which is the gateway between the two networks, is assigned a host number of 1 on both of them, which gives it the IP addresses 172.16.1.1 and 172.16.2.1, respectively.

Note that in this example we are using a class B network to keep things simple, but a class C network would be more realistic. With the new networking code, subnetting is not limited to byte boundaries, so even a class C network may be split into several subnets. For instance, you could use two bits of the host part for the netmask, giving you 4 possible subnets with 64 hosts on each.

6.4 Writing hosts and networks Files

After you have subnetted your network, you should prepare for some simple sort of hostname resolution using the /etc/hosts file. If you are not going to use DNS or NIS for address resolution, you have to put all hosts in the hosts file.

Even if you want to run DNS or NIS during normal operation, you should have some subset of all hostnames in /etc/hosts. You should have some sort of name resolution, even when no network interfaces are running, for example, during boot time. This is not only a matter of convenience, but it allows you to use symbolic hostnames in your network rc scripts. Thus, when changing IP addresses, you only have to copy an updated hosts file to all machines and reboot, rather than edit a large number of rc files separately. Usually you put all local hostnames and addresses in hosts, adding those of any gateways and NIS servers used.

You should make sure your resolver only uses information from the hosts file during initial testing. Sample files that come with your DNS or NIS software may produce strange results. To make all applications use /etc/hosts exclusively when looking up the IP address of a host, you have to edit the /etc/host.conf file. Comment out any lines that begin with the keyword order by preceding them with a hash sign, and insert the line:

order hosts

The hosts file contains one entry per line, consisting of an IP address, a hostname, and an optional list of aliases for the hostname. The fields are separated by spaces or tabs, and the address field must begin in the first column. Anything following a hash sign (#) is regarded as a comment and is ignored.

Hostnames can be either fully qualified or relative to the local domain. For vale, you would usually enter the fully qualified name, vale.vbrew.com, and vale by itself in the hosts file, so that it is known by both its official name and the shorter local name.

This is an example how a hosts file at the Virtual Brewery might look. Two special names are included, vlager-if1 and vlager-if2, which give the addresses for both interfaces used on vlager:

# Hosts file for Virtual Brewery/Virtual Winery

# IP FQDN aliases

127.0.0.1 localhost

172.16.1.1 vlager.vbrew.com vlager vlager-if1

172.16.1.2 vstout.vbrew.com vstout

172.16.1.3 vale.vbrew.com vale

172.16.2.1 vlager-if2

172.16.2.2 vbeaujolais.vbrew.com vbeaujolais

172.16.2.3 vbardolino.vbrew.com vbardolino

172.16.2.4 vchianti.vbrew.com vchianti

Just as with a host's IP address, you should sometimes use a symbolic name for network numbers, too. Therefore, the hosts file has a companion called /etc/networks that maps network names to network numbers, and vice versa. At the Virtual Brewery, we might install a networks file like this:

# /etc/networks for the Virtual Brewery

brew-net 172.16.1.0

wine-net 172.16.2.0

7.4 Interface Configuration for IP

After setting up your hardware, you have to make these devices known to the kernel networking software. A couple of commands are used to configure the network interfaces and initialize the routing table. These tasks are usually performed from the network initialization script each time you boot the system. The basic tools for this process are called ifconfig (where “if” stands for interface) and route.

ifconfig is used to make an interface accessible to the kernel networking layer. This involves the assignment of an IP address and other parameters, and activation of the interface, also known as “bringing up” the interface. Being active here means that the kernel will send and receive IP datagrams through the interface. The simplest way to invoke it is with:

ifconfig interface ip-address

This command assigns ip-address to interface and activates it. All other parameters are set to default values. For instance, the default network mask is derived from the network class of the IP address, such as 255.255.0.0 for a class B address. route allows you to add or remove routes from the kernel routing table. It can be invoked as:

route [adddel] [-net-host] target [if]

The add and del arguments determine whether to add or delete the route to target. The -net and -host arguments tell the route command whether the target is a network or a host (a host is assumed if you don't specify). The if argument is again optional, and allows you to specify to which network interface the route should be directed—the Linux kernel makes a sensible guess if you don't supply this information. This topic will be explained in more detail in succeeding sections.

4.7.1 The Loopback Interface

The very first interface to be activated is the loopback interface:

# ifconfig lo 127.0.0.1

Occasionally, you will see the dummy hostname localhost being used instead of the IP address. ifconfig will look up the name in the hosts file, where an entry should declare it as the hostname for 127.0.0.1:

# Sample /etc/hosts entry for localhost

localhost 127.0.0.1

To view the configuration of an interface, you invoke ifconfig, giving it only the interface name as argument:

$ ifconfig lo

lo Link encap:Local Loopback

inet addr:127.0.0.1 Mask:255.0.0.0

UP LOOPBACK RUNNING MTU:3924 Metric:1

RX packets:0 errors:0 dropped:0 overruns:0 frame:0

TX packets:0 errors:0 dropped:0 overruns:0 carrier:0

Collisions:0

As you can see, the loopback interface has been assigned a netmask of 255.0.0.0, since 127.0.0.1 is a class A address.

Now you can almost start playing with your mini-network. What is still missing is an entry in the routing table that tells IP that it may use this interface as a route to destination 127.0.0.1. This is accomplished by using:

# route add 127.0.0.1

Again, you can use localhost instead of the IP address, provided you've entered it into your /etc/hosts.

Next, you should check that everything works fine, for example by using ping. ping is the networking equivalent of a sonar device. The command is used to verify that a given address is actually reachable, and to measure the delay that occurs when sending a datagram to it and back again. The time required for this process is often referred to as the “round-trip time”:

# ping localhost

PING localhost (127.0.0.1): 56 data bytes

64 bytes from 127.0.0.1: icmp_seq=0 ttl=255 time=0.4 ms

64 bytes from 127.0.0.1: icmp_seq=1 ttl=255 time=0.4 ms

64 bytes from 127.0.0.1: icmp_seq=2 ttl=255 time=0.4 ms

--- localhost ping statistics ---

3 packets transmitted, 3 packets received, 0% packet loss

round-trip min/avg/max = 0.4/0.4/0.4 ms

When you invoke ping as shown here, it will continue emitting packets forever, unless interrupted by the user. The ^C marks the place where we pressed Ctrl-C.

The previous example shows that packets for 127.0.0.1 are properly delivered and a reply is returned to ping almost instantaneously. This shows that you have successfully set up your first network interface.

If the output you get from ping does not resemble that shown in the previous example, you are in trouble. Check any errors if they indicate that some file hasn't been installed properly. Check that the ifconfig and route binaries you use are compatible with the kernel release you run, and above all, that the kernel has been compiled with networking enabled (you see this from the presence of the /proc/net directory). If you get an error message saying “Network unreachable,” you probably got the route command wrong. Make sure you use the same address you gave to ifconfig.

The steps previously described are enough to use networking applications on a standalone host. After adding the lines mentioned earlier to your network initialization script and making sure it will be executed at boot time, you may reboot your machine and try out various applications. For instance, telnet localhost should establish a telnet connection to your host, giving you a login: prompt.

However, the loopback interface is useful not only as an example in networking books, or as a test bed during development, but is actually used by some applications during normal operation. Therefore, you always have to configure it, regardless of whether your machine is attached to a network or not.

4.7.2 Ethernet Interfaces

Configuring an Ethernet interface is pretty much the same as the loopback interface; it just requires a few more parameters when you are using subnetting.

At the Virtual Brewery, we have subnetted the IP network, which was originally a class B network, into class C subnetworks. To make the interface recognize this, the ifconfig incantation would look like this:

# ifconfig eth0 vstout netmask 255.255.255.0

This command assigns the eth0 interface the IP address of vstout (172.16.1.2). If we omitted the netmask, ifconfig would deduce the netmask from the IP network class, which would result in an incorrect netmask of 255.255.0.0. Now a quick check shows:

# ifconfig eth0

eth0 Link encap 10Mps Ethernet HWaddr 00:00:C0:90:B3:42

inet addr 172.16.1.2 Bcast 172.16.1.255 Mask 255.255.255.0

UP BROADCAST RUNNING MTU 1500 Metric 1

RX packets 0 errors 0 dropped 0 overrun 0

TX packets 0 errors 0 dropped 0 overrun 0

You can see that ifconfig automatically sets the broadcast address (the Bcast field) to the usual value, which is the host's network number with all the host bits set. Also, the maximum transmission unit (the maximum size of IP datagrams the kernel will generate for this interface) has been set to the maximum size of Ethernet packets: 1,500 bytes. The defaults are usually what you will use, but all these values can be overidden if required, with special options.

Just as for the loopback interface, you now have to install a routing entry that informs the kernel about the network that can be reached through eth0. For the Virtual Brewery, you might invoke route as:

# route add -net 172.16.1.0

At first this looks a little like magic, because it's not really clear how route detects which interface to route through. However, the trick is rather simple: the kernel checks all interfaces that have been configured so far and compares the destination address (172.16.1.0 in this case) to the network part of the interface address (that is, the bitwise AND of the interface address and the netmask). The only interface that matches is eth0.

Now, what's that –net option for? This is used because route can handle both routes to networks and routes to single hosts (as you saw before with localhost). When given an address in dotted quad notation, route attempts to guess whether it is a network or a hostname by looking at the host part bits. If the address's host part is zero, route assumes it denotes a network; otherwise, route takes it as a host address. Therefore, route would think that 172.16.1.0 is a host address rather than a network number, because it cannot know that we use subnetting. We have to tell route explicitly that it denotes a network, so we give it the –net flag.

Of course, the route command is a little tedious to type, and it's prone to spelling mistakes. A more convenient approach is to use the network names we defined in /etc/networks. This approach makes the command much more readable; even the –net flag can be omitted because route knows that 172.16.1.0 denotes a network:

# route add brew-net

Now that you've finished the basic configuration steps, we want to make sure that your Ethernet interface is indeed running happily. Choose a host from your Ethernet, for instance vlager, and type:

# ping vlager

PING vlager: 64 byte packets

64 bytes from 172.16.1.1: icmp_seq=0. time=11. ms

64 bytes from 172.16.1.1: icmp_seq=1. time=7. ms

64 bytes from 172.16.1.1: icmp_seq=2. time=12. ms

64 bytes from 172.16.1.1: icmp_seq=3. time=3. ms

----vstout.vbrew.com PING Statistics----

4 packets transmitted, 4 packets received, 0

round-trip (ms) min/avg/max = 3/8/12

If you don't see similar output, something is broken. If you encounter unusual packet loss rates, this hints at a hardware problem, like bad or missing terminators. If you don't receive any replies at all, you should check the interface configuration with netstat. The packet statistics displayed by ifconfig should tell you whether any packets have been sent out on the interface at all. If you have access to the remote host too, you should go over to that machine and check the interface statistics. This way you can determine exactly where the packets got dropped. In addition, you should display the routing information with route to see if both hosts have the correct routing entry. route prints out the complete kernel routing table when invoked without any arguments (–n just makes it print addresses as dotted quad instead of using the hostname):

# route –n

Kernel routing table

Destination Gateway Genmask Flags Metric Ref Use Iface

127.0.0.1 * 255.255.255.255 UH 1 0 112 lo

172.16.1.0 * 255.255.255.0 U 1 0 10 eth0

The Flags column contains a list of flags set for each interface. U is always set for active interfaces, and H says the destination address denotes a host. If the H flag is set for a route that you meant to be a network route, you have to reissue the route command with the –net option. To check whether a route you have entered is used at all, check to see if the Use field in the second to last column increases between two invocations of ping.

4.7.3 Routing Through a Gateway

In the previous section, we covered only the case of setting up a host on a single Ethernet. Quite frequently, however, one encounters networks connected to one another by gateways. These gateways may simply link two or more Ethernets, but may also provide a link to the outside world, such as the Internet. In order to use a gateway, you have to provide additional routing information to the networking layer.

The Ethernets of the Virtual Brewery and the Virtual Winery are linked through such a gateway, namely the host vlager. Assuming that vlager has already been configured, we just have to add another entry to vstout's routing table that tells the kernel it can reach all hosts on the Winery's network through vlager. The appropriate incantation of route is shown below; the gw keyword tells it that the next argument denotes a gateway:

# route add wine-net gw vlager

Of course, any host on the Winery network you wish to talk to must have a routing entry for the Brewery's network. Otherwise you would only be able to send data to the Winery network from the Brewery network, but the hosts on the Winery would be unable to reply.

This example describes only a gateway that switches packets between two isolated Ethernets. Now assume that vlager also has a connection to the Internet (say, through an additional SLIP link). Then we would want datagrams to any destination network other than the Brewery to be handed to vlager. This action can be accomplished by making it the default gateway for vstout:

# route add default gw vlager

The network name default is a shorthand for 0.0.0.0, which denotes the default route. The default route matches every destination and will be used if there is no more specific route that matches. You do not have to add this name to /etc/networks because it is built into route.

If you see high packet loss rates when pinging a host behind one or more gateways, this may hint at a very congested network. Packet loss is not so much due to technical deficiencies as to temporary excess loads on forwarding hosts, which makes them delay or even drop incoming datagrams.

4.7.4 Configuring a Gateway

Configuring a machine to switch packets between two Ethernets is pretty straightforward. Assume we're back at vlager, which is equipped with two Ethernet cards, each connected to one of the two networks. All you have to do is configure both interfaces separately, giving them their respective IP addresses and matching routes, and that's it.

It is quite useful to add information on the two interfaces to the hosts file as shown in the following example, so we have handy names for them, too:

172.16.1.1 vlager.vbrew.com vlager vlager-if1

172.16.2.1 vlager-if2

The sequence of commands to set up the two interfaces is then:

# ifconfig eth0 vlager-if1

# route add brew-net

# ifconfig eth1 vlager-if2

# route add wine-net

If this sequence doesn't work, make sure your kernel has been compiled with support for IP forwarding enabled. One good way to do this is to ensure that the first number on the second line of /proc/net/snmp is set to 1.

4.7.5 IP Alias

New kernels support a feature that can completely replace the dummy interface and serve other useful functions. IP Alias allows you to configure multiple IP addresses onto a physical device. In the simplest case, you could replicate the function of the dummy interface by configuring the host address as an alias onto the loopback interface and completely avoid using the dummy interface. In more complex uses, you could configure your host to look like many different hosts, each with its own IP address. This configuration is sometimes called “Virtual Hosting,” although technically it is also used for a variety of other techniques.

To configure an alias for an interface, you must first ensure that your kernel has been compiled with support for IP Alias (check that you have a /proc/net/ip_alias file; if not, you will have to recompile your kernel). Configuration of an IP alias is virtually identical to configuring a real network device; you use a special name to indicate it's an alias that you want. For example:

# ifconfig lo:0 172.16.1.1

This command would produce an alias for the loopback interface with the address 172.16.1.1. IP aliases are referred to by appending :n to the actual network device, in which “n” is an integer. In our example, the network device we are creating the alias on is lo, and we are creating an alias numbered zero for it. This way, a single physical device may support a number of aliases.

Each alias may be treated as though it is a separate device, and as far as the kernel IP software is concerned, it will be; however, it will be sharing its hardware with another interface.

8.4 All About ifconfig

There are many more parameters to ifconfig than we have described so far. Its normal invocation is this:

ifconfig interface [address [parameters]]

interface is the interface name, and address is the IP address to be assigned to the interface. This may be either an IP address in dotted quad notation or a name that ifconfig will look up in /etc/hosts.

If ifconfig is invoked with only the interface name, it displays that interface's configuration. When invoked without any parameters, it displays all interfaces you have configured so far; a –a option forces it to show the inactive ones as well. A sample invocation for the Ethernet interface eth0 may look like this:

# ifconfig eth0

eth0 Link encap 10Mbps Ethernet HWaddr 00:00:C0:90:B3:42

inet addr 172.16.1.2 Bcast 172.16.1.255 Mask 255.255.255.0

UP BROADCAST RUNNING MTU 1500 Metric 0

RX packets 3136 errors 217 dropped 7 overrun 26

TX packets 1752 errors 25 dropped 0 overrun 0

The MTU and Metric fields show the current MTU and metric value for that interface. The metric value is traditionally used by some operating systems to compute the cost of a route. Linux doesn't use this value yet, but defines it for compatibility, nevertheless.

The RX and TX lines show how many packets have been received or transmitted error free, how many errors occurred, how many packets were dropped (probably because of low memory), and how many were lost because of an overrun. Receiver overruns usually occur when packets come in faster than the kernel can service the last interrupt. The flag values printed by ifconfig roughly correspond to the names of its command-line options; they will be explained later.

The following is a list of parameters recognized by ifconfig with the corresponding flag names. Options that simply turn on a feature also allow it to be turned off again by preceding the option name by a dash.

up

This option makes an interface accessible to the IP layer. This option is implied when an address is given on the command line. It may also be used to reenable an interface that has been taken down temporarily using the down option.This option corresponds to the flags UP and RUNNING.

down

This option marks an interface inaccessible to the IP layer. This effectively disables any IP traffic through the interface. Note that this option will also automatically delete all routing entries that use this interface.

netmask mask

This option assigns a subnet mask to be used by the interface. It may be given as either a 32-bit hexadecimal number preceded by 0x, or as a dotted quad of decimal numbers. While the dotted quad format is more common, the hexadecimal representation is often easier to work with. Netmasks are essentially binary, and it is easier to do binary-to-hexadecimal than binary-to-decimal conversion.

pointopoint address

This option is used for point-to-point IP links that involve only two hosts. This option is needed to configure SLIP or PLIP interfaces, for example. If a point-to-point address has been set, ifconfig displays the POINTOPOINT flag.

broadcast address

The broadcast address is usually made up from the network number by setting all bits of the host part. Some IP implementations (systems derived from BSD 4.2, for instance) use a different scheme in which all host part bits are cleared instead. The broadcast option adapts to these strange environments. If a broadcast address has been set, ifconfig displays the BROADCAST flag.

irq

This option allows you to set the IRQ line used by certain devices. This is especially useful for PLIP, but may also be useful for certain Ethernet cards.

metric number

This option may be used to assign a metric value to the routing table entry created for the interface. This metric is used by the Routing Information Protocol (RIP) to build routing tables for the network. The default metric used by ifconfig is zero. If you don't run a RIP daemon, you don't need this option at all; if you do, you will rarely need to change the metric value.

mtu bytes

This sets the Maximum Transmission Unit, which is the maximum number of octets the interface is able to handle in one transaction. For Ethernets, the MTU defaults to 1,500 (the largest allowable size of an Ethernet packet); for SLIP interfaces, it is 296. (There is no constraint on the MTU of SLIP links; this value is a good compromise.)

arp

This is an option specific to broadcast networks such as Ethernets or packet radio. It enables the use of the Address Resolution Protocol (ARP) to detect the physical addresses of hosts attached to the network. For broadcast networks, it is on by default. If ARP is disabled, ifconfig displays the NOARP flag.

–arp

This option disables the use of ARP on this interface.

promisc

This option puts the interface in promiscuous mode. On a broadcast network, this makes the interface receive all packets, regardless of whether they were destined for this host or not. This allows network traffic analysis using packet filters and such, also called Ethernet snooping. Usually, this is a good technique for hunting down network problems that are otherwise hard to detect. Tools such as tcpdump rely on this.

On the other hand, this option allows attackers to do nasty things, such as skim the traffic of your network for passwords. You can protect against this type of attack by prohibiting just anyone from plugging their computers into your Ethernet. You could also use secure authentication protocols, such as Kerberos or the secure shell login suite. This option corresponds to the PROMISC flag.

–promisc

This option turns promiscuous mode off.

allmulti

Multicast addresses are like Ethernet broadcast addresses, except that instead of automatically including everybody, the only people who receive packets sent to a multicast address are those programmed to listen to it. This is useful for applications like Ethernet-based videoconferencing or network audio, to which only those interested can listen. Multicast addressing is supported by most, but not all, Ethernet drivers. When this option is enabled, the interface receives and passes multicast packets for processing. This option corresponds to the ALLMULTI flag.

–allmulti

This option turns multicast addresses off.

9.4 The netstat Command

netstat is a useful tool for checking your network configuration and activity. It is in fact a collection of several tools lumped together. We discuss each of its functions in the following sections.

4.9.1 Displaying the Routing Table

When you invoke netstat with the –r flag, it displays the kernel routing table in the way we've been doing with route. On vstout, it produces:

# netstat -nr

Kernel IP routing table

Destination Gateway Genmask Flags MSS Window irtt Iface

127.0.0.1 * 255.255.255.255 UH 0 0 0 lo

172.16.1.0 * 255.255.255.0 U 0 0 0 eth0

172.16.2.0 172.16.1.1 255.255.255.0 UG 0 0 0 eth0

The –n option makes netstat print addresses as dotted quad IP numbers rather than the symbolic host and network names. This option is especially useful when you want to avoid address lookups over the network (e.g., to a DNS or NIS server).

The second column of netstat's output shows the gateway to which the routing entry points. If no gateway is used, an asterisk is printed instead. The third column shows the “generality” of the route, i.e., the network mask for this route. When given an IP address to find a suitable route for, the kernel steps through each of the routing table entries, taking the bitwise AND of the address and the genmask before comparing it to the target of the route.

The fourth column displays the following flags that describe the route:

o G : The route uses a gateway.

o U : The interface to be used is up.

o H : Only a single host can be reached through the route. For example, this is the case for the loopback entry 127.0.0.1.

o D : This route is dynamically created. It is set if the table entry has been generated by a routing daemon like gated or by an ICMP redirect message.

o M : This route is set if the table entry was modified by an ICMP redirect message.

o ! : The route is a reject route and datagrams will be dropped.

The next three columns show the MSS, Window and irtt that will be applied to TCP connections established via this route. The MSS is the Maximum Segment Size and is the size of the largest datagram the kernel will construct for transmission via this route. The Window is the maximum amount of data the system will accept in a single burst from a remote host. The acronym irtt stands for “initial round trip time.” The TCP protocol ensures that data is reliably delivered between hosts by retransmitting a datagram if it has been lost. The TCP protocol keeps a running count of how long it takes for a datagram to be delivered to the remote end, and an acknowledgement to be received so that it knows how long to wait before assuming a datagram needs to retransmitted; this process is called the round-trip time. The initial round-trip time is the value that the TCP protocol will use when a connection is first established. For most network types, the default value is okay, but for some slow networks, notably certain types of amateur packet radio networks, the time is too short and causes unnecessary retransmission. The irtt value can be set using the route command. Values of zero in these fields mean that the default is being used.

Finally, the last field displays the network interface that this route will use.

4.9.2 Displaying Interface Statistics

When invoked with the –i flag, netstat displays statistics for the network interfaces currently configured. If the –a option is also given, it prints all interfaces present in the kernel, not only those that have been configured currently. On vstout, the output from netstat will look like this:

# netstat -i

Kernel Interface table

Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flags

lo 0 0 3185 0 0 0 3185 0 0 0 BLRU

eth0 1500 0 972633 17 20 120 628711 217 0 0 BRU

The MTU and Met fields show the current MTU and metric values for that interface. The RX and TX columns show how many packets have been received or transmitted error-free (RX-OK/TX-OK) or damaged (RX-ERR/TX-ERR); how many were dropped (RX-DRP/TX-DRP); and how many were lost because of an overrun (RX-OVR/TX-OVR).

The last column shows the flags that have been set for this interface. These characters are one-character versions of the long flag names that are printed when you display the interface configuration with ifconfig:

o B : A broadcast address has been set.

o L : This interface is a loopback device.

o M : All packets are received (promiscuous mode).

· : ARP is turned off for this interface.

o P : This is a point-to-point connection.

o R : Interface is running.

o U : Interface is up.

4.9.3 Displaying Connections

netstat supports a set of options to display active or passive sockets. The options –t, –u, –w, and –x show active TCP, UDP, RAW, or Unix socket connections. If you provide the –a flag in addition, sockets that are waiting for a connection (i.e., listening) are displayed as well. This display will give you a list of all servers that are currently running on your system.

Invoking netstat -ta on vlager produces this output:

$ netstat -ta

Active Internet Connections

Proto Recv-Q Send-Q Local Address Foreign Address (State)

tcp 0 0 *:domain *:* LISTEN

tcp 0 0 *:time *:* LISTEN

tcp 0 0 *:smtp *:* LISTEN

tcp 0 0 vlager:smtp vstout:1040 ESTABLISHED

tcp 0 0 *:telnet *:* LISTEN

tcp 0 0 localhost:1046 vbardolino:telnet ESTABLISHED

tcp 0 0 *:chargen *:* LISTEN

tcp 0 0 *:daytime *:* LISTEN

tcp 0 0 *:discard *:* LISTEN

tcp 0 0 *:echo *:* LISTEN

tcp 0 0 *:shell *:* LISTEN

tcp 0 0 *:login *:* LISTEN

This output shows most servers simply waiting for an incoming connection. However, the fourth line shows an incoming SMTP connection from vstout, and the sixth line tells you there is an outgoing telnet connection to vbardolino.

Using the –a flag by itself will display all sockets from all families.

10.4 Checking the ARP Tables

On some occasions, it is useful to view or alter the contents of the kernel's ARP tables, for example when you suspect a duplicate Internet address is the cause for some intermittent network problem. The arp tool was made for situations like this. Its command-line options are:

arp [-v] [-t hwtype] -a [hostname]

arp [-v] [-t hwtype] -s hostname hwaddr

arp [-v] -d hostname [hostname…]

All hostname arguments may be either symbolic hostnames or IP addresses in dotted quad notation. The first invocation displays the ARP entry for the IP address or host specified, or all hosts known if no hostname is given. For example, invoking arp on vlager may yield:

# arp -a

IP address HW type HW address

172.16.1.3 10Mbps Ethernet 00:00:C0:5A:42:C1

172.16.1.2 10Mbps Ethernet 00:00:C0:90:B3:42

172.16.2.4 10Mbps Ethernet 00:00:C0:04:69:AA

which shows the Ethernet addresses of vlager, vstout and vale.

You can limit the display to the hardware type specified using the –t option. This may be ether, ax25, or pronet, standing for 10 Mbps Ethernet; AMPR AX.25, and IEEE 802.5 token ring equipment, respectively.

The –s option is used to permanently add hostname's Ethernet address to the ARP tables. The hwaddr argument specifies the hardware address, which is by default expected to be an Ethernet address specified as six hexadecimal bytes separated by colons. You may also set the hardware address for other types of hardware, using the –t option.

For some reason, ARP queries for the remote host sometimes fail, for instance when its ARP driver is buggy or there is another host in the network that erroneously identifies itself with that host's IP address; this problem requires you to manually add an IP address to the ARP table. Hard-wiring IP addresses in the ARP table is also a (very drastic) measure to protect yourself from hosts on your Ethernet that pose as someone else.

Invoking arp using the –d switch deletes all ARP entries relating to the given host. This switch may be used to force the interface to re-attempt obtaining the Ethernet address for the IP address in question. This is useful when a misconfigured system has broadcasted wrong ARP information (of course, you have to reconfigure the broken host first).

The –s option may also be used to implement proxy ARP. This is a special technique through which a host, say gate, acts as a gateway to another host named fnord by pretending that both addresses refer to the same host, namely gate. It does so by publishing an ARP entry for fnord that points to its own Ethernet interface. Now when a host sends out an ARP query for fnord, gate will return a reply containing its own Ethernet address. The querying host will then send all datagrams to gate, which dutifully forwards them to fnord.

These contortions may be necessary when you want to access fnord from a DOS machine with a broken TCP implementation that doesn't understand routing too well. When you use proxy ARP, it will appear to the DOS machine as if fnord was on the local subnet, so it doesn't have to know about how to route through a gateway.

Another useful application of proxy ARP is when one of your hosts acts as a gateway to some other host only temporarily, for instance, through a dial-up link. In a previous example, we encountered the laptop vlite, which was connected to vlager through a PLIP link from time to time. Of course, this application will work only if the address of the host you want to provide proxy ARP for is on the same IP subnet as your gateway. vstout could proxy ARP for any host on the Brewery subnet (172.16.1.0), but never for a host on the Winery subnet (172.16.2.0).

The proper invocation to provide proxy ARP for fnord is given below; of course, the given Ethernet address must be that of gate:

# arp -s fnord 00:00:c0:a1:42:e0 pub

The proxy ARP entry may be removed again by invoking:

# arp -d fnord

11.4 Name Service and Resolver Configuration

TCP/IP networking may rely on different schemes to convert names into addresses. The simplest way is a host table stored in /etc/hosts. This is useful only for small LANs that are run by one single administrator and otherwise have no IP traffic with the outside world.

Alternatively, you can use the Berkeley Internet Name Domain service (BIND) for resolving hostnames to IP addresses. Configuring BIND can be a real chore, but once you've done it, you can easily make changes in the network topology. On Linux, as on many other Unixish systems, name service is provided through a program called named. At startup, it loads a set of master files into its internal cache and waits for queries from remote or local user processes. There are different ways to set up BIND, and not all require you to run a name server on every host.

This chapter can do little more than give a rough sketch of how DNS works and how to operate a name server. It should be sufficient if you have a small LAN and an Internet uplink. For the most current information, you may want to check the documentation contained in the BIND source package, which supplies manual pages, release notes, and the BIND Operator's Guide (BOG). Don't let this name scare you off; it's actually a very useful document. For a more comprehensive coverage of DNS and associated issues, you may find DNS and BIND by Paul Albitz and Cricket Liu (O'Reilly) a useful reference. DNS questions may be answered in a newsgroup called comp.protocols.tcp-ip.domains. For technical details, the Domain Name System is defined by RFC numbers 1033, 1034, and 1035.

4.11.1 The Resolver Library

The term resolver refers not to a special application, but to the resolver library. This is a collection of functions that can be found in the standard C library. The central routines are gethostbyname(2) and gethostbyaddr(2), which look up all IP addresses associated with a host name, and vice versa. They may be configured to simply look up the information in hosts, to query a number of DNS name servers, or to use the hosts database of Network Information Service (NIS).

The resolver functions read configuration files when they are invoked. From these configuration files, they determine what databases to query, in which order, and other details relevant to how you've configured your environment. The older Linux standard library, libc, used /etc/host.conf as its master configuration file, but Version 2 of the GNU standard library, glibc, uses /etc/nsswitch.conf. We'll describe each in turn, since both are commonly used.

4.11.2 The host.conf File

The /etc/host.conf tells the older Linux standard library resolver functions which services to use, and in what order.

Options in host.conf must appear on separate lines. Fields may be separated by white space (spaces or tabs). A hash sign (#) introduces a comment that extends to the next newline. The following options are available:

order

This option determines the order in which the resolving services are tried. Valid options are bind for querying the name server, hosts for lookups in /etc/hosts, and nis for NIS lookups. Any or all of them may be specified. The order in which they appear on the line determines the order in which the respective services are tried.

multi

multi takes on or off as options. This determines if a host in /etc/hosts is allowed to have several IP addresses, which is usually referred to as being “multi-homed.” The default is off. This flag has no effect on DNS or NIS queries.

nospoof

As we'll explain in the section Section 6.2.4,” DNS allows you to find the hostname belonging to an IP address by using the in-addr.arpa domain. Attempts by name servers to supply a false hostname are called spoofing. To guard against this, the resolver can be configured to check whether the original IP address is in fact associated with the obtained hostname. If not, the name is rejected and an error is returned. This behavior is turned on by setting nospoof on.

alert

This option takes on or off as arguments. If it is turned on, any spoof attempts will cause the resolver to log a message to the syslog facility.

trim

This option takes an argument specifying a domain name that will be removed from hostnames before lookup. This is useful for hosts entries, for which you might only want to specify hostnames without a local domain. If you specify your local domain name here, it will be removed from a lookup of a host with the local domain name appended, thus allowing the lookup in /etc/hosts to succeed. The domain name you add must end with the (.) character (e.g., :linux.org.au.) if trim is to work correctly. trim options accumulate; you can consider your host as being local to several domains.

Example 6-1. Sample host.conf File

# /etc/host.conf

# We have named running, but no NIS (yet)

order bind,hosts

# Allow multiple addrs

multi on

# Guard against spoof attempts

nospoof on

# Trim local domain (not really necessary).

trim vbrew.com.

4.11.3 The nsswitch.conf File

Version 2 of the GNU standard library includes a more powerful and flexible replacement for the older host.conf mechanism. The concept of the name service has been extended to include a variety of different types of information. Configuration options for all of the different functions that query these databases have been brought back into a single configuration file; the nsswitch.conf file.

The nsswitch.conf file allows the system administrator to configure a wide variety of different databases. We'll limit our discussion to options that relate to host and network IP address resolution. You can easily find more information about the other features by reading the GNU standard library documentation.

Options in nsswitch.conf must appear on separate lines. Fields may be separated by whitespace (spaces or tabs). A hash sign (#) introduces a comment that extends to the next newline. Each line describes a particular service; hostname resolution is one of these. The first field in each line is the name of the database, ending with a colon. The database name associated with host address resolution is hosts. A related database is networks, which is used for resolution of network names into network addresses. The remainder of each line stores options that determine the way lookups for that database are performed.

The following options are available:

dns

Use the Domain Name System (DNS) service to resolve the address. This makes sense only for host address resolution, not network address resolution. This mechanism uses the /etc/resolv.conf file that we'll describe later in the chapter.

files

Search a local file for the host or network name and its corresponding address. This option uses the traditional /etc/hosts and /etc/network files.

nis or nisplus

Use the Network Information System (NIS) to resolve the host or network address. The order in which the services to be queried are listed determines the order in which they are queried when attempting to resolve a name. The query-order list is in the service description in the /etc/nsswitch.conf file. The services are queried from left to right and by default searching stops when a resolution is successful.

A simple example of host and network database specification that would mimic our configuration using the older libc standard library is shown in Example 6-2.

Example 6-2. Sample nsswitch.conf File

# /etc/nsswitch.conf

# Example configuration of GNU Name Service Switch functionality.

# Information about this file is available in the `libc6-doc' package.

hosts: dns files

networks: files

This example causes the system to look up hosts first in the Domain Name System, and the /etc/hosts file, if that can't find them. Network name lookups would be attempted using only the /etc/networks file.

You are able to control the lookup behavior more precisely using “action items” that describe what action to take given the result of the previous lookup attempt. Action items appear between service specifications, and are enclosed within square brackets, [ ]. The general syntax of the action statement is:

[ [!] status = action ... ]

There are two possible actions:

return

Controls returns to the program that attempted the name resolution. If a lookup attempt was successful, the resolver will return with the details, otherwise it will return a zero result.

continue

The resolver will move on to the next service in the list and attempt resolution using it.

The optional (!) character specifies that the status value should be inverted before testing; that is, it means “not.”

The available status values on which we can act are:

success

The requested entry was found without error. The default action for this status is return.

notfound

There was no error in the lookup, but the target host or network could not be found. The default action for this status is continue.

unavail

The service queried was unavailable. This could mean that the hosts or networks file was unreadable for the files service or that a name server or NIS server did not respond for the dns or nis services. The default action for this status is continue.

tryagain

This status means the service is temporarily unavailable. For the files files service, this would usually indicate that the relevant file was locked by some process. For other services, it may mean the server was temporarily unable to accept connections. The default action for this status is continue.

A simple example of how you might use this mechanism is shown in Example 6-3.

Example 6-3. Sample nsswitch.conf File Using an Action Statement

# /etc/nsswitch.conf

# Example configuration of GNU Name Service Switch functionality.

# Information about this file is available in the `libc6-doc' package.

hosts: dns [!UNAVAIL=return] files

networks: files

This example attempts host resolution using the Domain Name Service system. If the return status is anything other than unavailable, the resolver returns whatever it has found. If, and only if, the DNS lookup attempt returns an unavailable status, the resolver attempts to use the local /etc/hosts. This means that we should use the hosts file only if our name server is unavailable for some reason.

4.11.4 Configuring Name Server Lookups Using resolv.conf

When configuring the resolver library to use the BIND name service for host lookups, you also have to tell it which name servers to use. There is a separate file for this called resolv.conf. If this file does not exist or is empty, the resolver assumes the name server is on your local host.

To run a name server on your local host, you have to set it up separately, as will be explained in the following section. If you are on a local network and have the opportunity to use an existing name server, this should always be preferred. If you use a dialup IP connection to the Internet, you would normally specify the name server of your service provider in the resolv.conf file.

The most important option in resolv.conf is name server, which gives the IP address of a name server to use. If you specify several name servers by giving the name server option several times, they are tried in the order given. You should therefore put the most reliable server first. The current implementation allows you to have up to three name server statements in resolv.conf. If no name server option is given, the resolver attempts to connect to the name server on the local host.

Two other options, domain and search, let you use shortcut names for hosts in your local domain. Usually, when just telnetting to another host in your local domain, you don't want to type in the fully qualified hostname, but use a name like gauss on the command line and have the resolver tack on the mathematics.groucho.edu part.

This is just the domain statement's purpose. It lets you specify a default domain name to be appended when DNS fails to look up a hostname. For instance, when given the name gauss, the resolver fails to find gauss. in DNS, because there is no such top-level domain. When given mathematics.groucho.edu as a default domain, the resolver repeats the query for gauss with the default domain appended, this time succeeding.

That's just fine, you may think, but as soon you get out of the Math department's domain, you're back to those fully qualified domain names. Of course, you would also want to have shorthands like quark.physics for hosts in the Physics department's domain.

This is when the search list comes in. A search list can be specified using the search option, which is a generalization of the domain statement. Where the latter gives a single default domain, the former specifies a whole list of them, each to be tried in turn until a lookup succeeds. This list must be separated by blanks or tabs.

The search and domain statements are mutually exclusive and may not appear more than once. If neither option is given, the resolver will try to guess the default domain from the local hostname using the getdomainname(2) system call. If the local hostname doesn't have a domain part, the default domain will be assumed to be the root domain.

If you decide to put a search statement into resolv.conf, you should be careful about what domains you add to this list. Resolver libraries prior to BIND 4.9 used to construct a default search list from the domain name when no search list was given. This default list was made up of the default domain itself, plus all of its parent domains up to the root. This caused some problems because DNS requests wound up at name servers that were never meant to see them.

Assume you're at the Virtual Brewery and want to log in to foot.groucho.edu. By a slip of your fingers, you mistype foot as foo, which doesn't exist. GMU's name server will therefore tell you that it knows no such host. With the old-style search list, the resolver would now go on trying the name with vbrew.com and com appended. The latter is problematic because groucho.edu.com might actually be a valid domain name. Their name server might then even find foo in their domain, pointing you to one of their hosts—which clearly was not intended.

For some applications, these bogus host lookups can be a security problem. Therefore, you should usually limit the domains on your search list to your local organization, or something comparable. At the Mathematics department of Groucho Marx University, the search list would commonly be set to maths.groucho.edu and groucho.edu.

If default domains sound confusing to you, consider this sample resolv.conf file for the Virtual Brewery:

# /etc/resolv.conf

# Our domain

domain vbrew.com

# We use vlager as central name server:

name server 172.16.1.1

When resolving the name vale, the resolver looks up vale and, failing this, vale.vbrew.com.

4.11.5 Resolver Robustness

When running a LAN inside a larger network, you definitely should use central name servers if they are available. The name servers develop rich caches that speed up repeat queries, since all queries are forwarded to them. However, this scheme has a drawback: when a fire destroyed the backbone cable at Olaf's university, no more work was possible on his department's LAN because the resolver could no longer reach any of the name servers. This situation caused difficulties with most network services, such as X terminal logins and printing.

Although it is not very common for campus backbones to go down in flames, one might want to take precautions against cases like this.

One option is to set up a local name server that resolves hostnames from your local domain and forwards all queries for other hostnames to the main servers. Of course, this is applicable only if you are running your own domain.

Alternatively, you can maintain a backup host table for your domain or LAN in /etc/hosts. This is very simple to do. You simply ensure that the resolver library queries DNS first, and the hosts file next. In an /etc/host.conf file you'd use “order bind,hosts”, and in an /etc/nsswitch.conf file you'd use “hosts: dns files”, to make the resolver fall back to the hosts file if the central name server is unreachable.

4.11.6 How DNS Works

DNS organizes hostnames in a domain hierarchy. A domain is a collection of sites that are related in some sense—because they form a proper network (e.g., all machines on a campus, or all hosts on BITNET), because they all belong to a certain organization (e.g., the U.S. government), or because they're simply geographically close. For instance, universities are commonly grouped in the edu domain, with each university or college using a separate subdomain, below which their hosts are subsumed. Groucho Marx University have the groucho.edu domain, while the LAN of the Mathematics department is assigned maths.groucho.edu. Hosts on the departmental network would have this domain name tacked onto their hostname, so erdos would be known as erdos.maths.groucho.edu. This is called the fully qualified domain name (FQDN), which uniquely identifies this host worldwide.

Figure 6-1 shows a section of the namespace. The entry at the root of this tree, which is denoted by a single dot, is quite appropriately called the root domain and encompasses all other domains. To indicate that a hostname is a fully qualified domain name, rather than a name relative to some (implicit) local domain, it is sometimes written with a trailing dot. This dot signifies that the name's last component is the root domain.

Figure 6-1. A part of the domain namespace

Depending on its location in the name hierarchy, a domain may be called top-level, second-level, or third-level. More levels of subdivision occur, but they are rare. This list details several top-level domains you may see frequently:

Domain	Description
edu	(Mostly U.S.) educational institutions like universities.
com	Commercial organizations and companies.
org	Non-commercial organizations. Private UUCP networks are often in this domain.
net	Gateways and other administrative hosts on a network.
mil	U.S. military institutions.
gov	U.S. government institutions.
uucp	Officially, all site names formerly used as UUCP names without domains have been moved to this domain.

Historically, the first four of these were assigned to the U.S., but recent changes in policy have meant that these domains, named global Top Level Domains (gTLD), are now considered global in nature. Negotiations are currently underway to broaden the range of gTLDs, which may result in increased choice in the future.

Outside the U.S., each country generally uses a top-level domain of its own named after the two-letter country code defined in ISO-3166. Finland, for instance, uses the fi domain; fr is used by France, de by Germany, and au by Australia. Below this top-level domain, each country's NIC is free to organize hostnames in whatever way they want. Australia has second-level domains similar to the international top-level domains, named com.au and edu.au. Other countries, like Germany, don't use this extra level, but have slightly long names that refer directly to the organizations running a particular domain. It's not uncommon to see hostnames like ftp.informatik.uni-erlangen.de. Chalk that up to German efficiency.

Of course, these national domains do not imply that a host below that domain is actually located in that country; it means only that the host has been registered with that country's NIC. A Swedish manufacturer might have a branch in Australia and still have all its hosts registered with the se top-level domain.

Organizing the namespace in a hierarchy of domain names nicely solves the problem of name uniqueness; with DNS, a hostname has to be unique only within its domain to give it a name different from all other hosts worldwide. Furthermore, fully qualified names are easy to remember. Taken by themselves, these are already very good reasons to split up a large domain into several subdomains.

DNS does even more for you than this. It also allows you to delegate authority over a subdomain to its administrators. For example, the maintainers at the Groucho Computing Center might create a subdomain for each department; we already encountered the math and physics subdomains above. When they find the network at the Physics department too large and chaotic to manage from outside (after all, physicists are known to be an unruly bunch of people), they may simply pass control of the physics.groucho.edu domain to the administrators of this network. These administrators are free to use whatever hostnames they like and assign them IP addresses from their network in whatever fashion they desire, without outside interference.

To this end, the namespace is split up into zones, each rooted at a domain. Note the subtle difference between a zone and a domain: the domain groucho.edu encompasses all hosts at Groucho Marx University, while the zone groucho.edu includes only the hosts that are managed by the Computing Center directly; those at the Mathematics department, for example. The hosts at the Physics department belong to a different zone, namely physics.groucho.edu. In Figure 6-1, the start of a zone is marked by a small circle to the right of the domain name.

4.11.7 Name Lookups with DNS

At first glance, all this domain and zone fuss seems to make name resolution an awfully complicated business. After all, if no central authority controls what names are assigned to which hosts, how is a humble application supposed to know?

Now comes the really ingenious part about DNS. If you want to find the IP address of erdos, DNS says, “Go ask the people who manage it, and they will tell you.”

In fact, DNS is a giant distributed database. It is implemented by so-called name servers that supply information on a given domain or set of domains. For each zone there are at least two, or at most a few, name servers that hold all authoritative information on hosts in that zone. To obtain the IP address of erdos, all you have to do is contact the name server for the groucho.edu zone, which will then return the desired data.

Easier said than done, you might think. So how do I know how to reach the name server at Groucho Marx University? In case your computer isn't equipped with an address-resolving oracle, DNS provides for this, too. When your application wants to look up information on erdos, it contacts a local name server, which conducts a so-called iterative query for it. It starts off by sending a query to a name server for the root domain, asking for the address of erdos.maths.groucho.edu. The root name server recognizes that this name does not belong to its zone of authority, but rather to one below the edu domain. Thus, it tells you to contact an edu zone name server for more information and encloses a list of all edu name servers along with their addresses. Your local name server will then go on and query one of those, for instance, a.isi.edu. In a manner similar to the root name server, a.isi.edu knows that the groucho.edu people run a zone of their own, and points you to their servers. The local name server will then present its query for erdos to one of these, which will finally recognize the name as belonging to its zone, and return the corresponding IP address.

This looks like a lot of traffic being generated for looking up a measly IP address, but it's really only miniscule compared to the amount of data that would have to be transferred if we were still stuck with HOSTS.TXT. There's still room for improvement with this scheme, however.

To improve response time during future queries, the name server stores the information obtained in its local cache. So the next time anyone on your local network wants to look up the address of a host in the groucho.edu domain, your name server will go directly to the groucho.edu name server.

Of course, the name server will not keep this information forever; it will discard it after some time. The expiration interval is called the time to live, or TTL. Each datum in the DNS database is assigned such a TTL by administrators of the responsible zone.

4.11.8 Types of Name Servers

Name servers that hold all information on hosts within a zone are called authoritative for this zone, and sometimes are referred to as master name servers. Any query for a host within this zone will end up at one of these master name servers.

Master servers must be fairly well synchronized. Thus, the zone's network administrator must make one the primary server, which loads its zone information from data files, and make the others secondary servers, which transfer the zone data from the primary server at regular intervals.

Having several name servers distributes workload; it also provides backup. When one name server machine fails in a benign way, like crashing or losing its network connection, all queries will fall back to the other servers. Of course, this scheme doesn't protect you from server malfunctions that produce wrong replies to all DNS requests, such as from software bugs in the server program itself.

You can also run a name server that is not authoritative for any domain.[2] This is useful, as the name server will still be able to conduct DNS queries for the applications running on the local network and cache the information. Hence it is called a caching-only server.

4.11.9 The DNS Database

We have seen that DNS not only deals with IP addresses of hosts, but also exchanges information on name servers. DNS databases may have, in fact, many different types of entries.

A single piece of information from the DNS database is called a resource record (RR). Each record has a type associated with it describing the sort of data it represents, and a class specifying the type of network it applies to. The latter accommodates the needs of different addressing schemes, like IP addresses (the IN class), Hesiod addresses (used by MIT's Kerberos system), and a few more. The prototypical resource record type is the A record, which associates a fully qualified domain name with an IP address.

A host may be known by more than one name. For example you might have a server that provides both FTP and World Wide Web servers, which you give two names: ftp.machine.org and www.machine.org. However, one of these names must be identified as the official or canonical hostname, while the others are simply aliases referring to the official hostname. The difference is that the canonical hostname is the one with an associated A record, while the others only have a record of type CNAME that points to the canonical hostname.

We will not go through all record types here, but we will give you a brief example. Example 6-4 shows a part of the domain database that is loaded into the name servers for the physics.groucho.edu zone.

Example 6-4. An Excerpt from the named.hosts File for the Physics Department

; Authoritative Information on physics.groucho.edu.

@ IN SOA niels.physics.groucho.edu. janet.niels.physics.groucho.edu. {

1999090200 ; serial no

360000 ; refresh

3600 ; retry

3600000 ; expire

3600 ; default ttl

}

;

; Name servers

IN NS niels

IN NS gauss.maths.groucho.edu.

gauss.maths.groucho.edu. IN A 149.76.4.23

;

; Theoretical Physics (subnet 12)

niels IN A 149.76.12.1

IN A 149.76.1.12

name server IN CNAME niels

otto IN A 149.76.12.2

quark IN A 149.76.12.4

down IN A 149.76.12.5

strange IN A 149.76.12.6

...

; Collider Lab. (subnet 14)

boson IN A 149.76.14.1

muon IN A 149.76.14.7

bogon IN A 149.76.14.12

...

Apart from the A and CNAME records, you can see a special record at the top of the file, stretching several lines. This is the SOA resource record signaling the Start of Authority, which holds general information on the zone the server is authoritative for. The SOA record comprises, for instance, the default time to live for all records.

Note that all names in the sample file that do not end with a dot should be interpreted relative to the physics.groucho.edu domain. The special name (@) used in the SOA record refers to the domain name by itself.

We have seen earlier that the name servers for the groucho.edu domain somehow have to know about the physics zone so that they can point queries to their name servers. This is usually achieved by a pair of records: the NS record that gives the server's FQDN, and an A record that associates an address with that name. Since these records are what holds the namespace together, they are frequently called glue records. They are the only instances of records in which a parent zone actually holds information on hosts in the subordinate zone. The glue records pointing to the name servers for physics.groucho.edu are shown in Example 6-5

Example 6-5. An Excerpt from the named.hosts File for GMU

; Zone data for the groucho.edu zone.

@ IN SOA vax12.gcc.groucho.edu. joe.vax12.gcc.groucho.edu. {

1999070100 ; serial no

360000 ; refresh

3600 ; retry

3600000 ; expire

3600 ; default ttl

}

....

;

; Glue records for the physics.groucho.edu zone

physics IN NS niels.physics.groucho.edu.

IN NS gauss.maths.groucho.edu.

niels.physics IN A 149.76.12.1

gauss.maths IN A 149.76.4.23

...

4.11.10 Reverse Lookups

Finding the IP address belonging to a host is certainly the most common use for the Domain Name System, but sometimes you'll want to find the canonical hostname corresponding to an address. Finding this hostname is called reverse mapping, and is used by several network services to verify a client's identity. When using a single hosts file, reverse lookups simply involve searching the file for a host that owns the IP address in question.

With DNS, an exhaustive search of the namespace is out of the question. Instead, a special domain, in-addr.arpa, has been created that contains the IP addresses of all hosts in a reversed dotted quad notation. For instance, an IP address of 149.76.12.4 corresponds to the name 4.12.76.149.in-addr.arpa. The resource-record type linking these names to their canonical hostnames is PTR.

Creating a zone of authority usually means that its administrators have full control over how they assign addresses to names. Since they usually have one or more IP networks or subnets at their hands, there's a one-to-many mapping between DNS zones and IP networks. The Physics department, for instance, comprises the subnets 149.76.8.0, 149.76.12.0, and 149.76.14.0.

Consequently, new zones in the in-addr.arpa domain have to be created along with the physics zone, and delegated to the network administrators at the department: 8.76.149.in-addr.arpa, 12.76.149.in-addr.arpa, and 14.76.149.in-addr.arpa. Otherwise, installing a new host at the Collider Lab would require them to contact their parent domain to have the new address entered into their in-addr.arpa zone file.

The zone database for subnet 12 is shown in Example 6-6. The corresponding glue records in the database of their parent zone are shown in Example 6-7

Example 6-6. An Excerpt from the named.rev File for Subnet 12

; the 12.76.149.in-addr.arpa domain.

@ IN SOA niels.physics.groucho.edu. janet.niels.physics.groucho.edu. {

1999090200 360000 3600 3600000 3600

}

2 IN PTR otto.physics.groucho.edu.

4 IN PTR quark.physics.groucho.edu.

5 IN PTR down.physics.groucho.edu.

6 IN PTR strange.physics.groucho.edu.

Example 6-7. An Excerpt from the named.rev File for Network 149.76

; the 76.149.in-addr.arpa domain.

@ IN SOA vax12.gcc.groucho.edu. joe.vax12.gcc.groucho.edu. {

1999070100 360000 3600 3600000 3600

}

...

; subnet 4: Mathematics Dept.

1.4 IN PTR sophus.maths.groucho.edu.

17.4 IN PTR erdos.maths.groucho.edu.

23.4 IN PTR gauss.maths.groucho.edu.

...

; subnet 12: Physics Dept, separate zone

12 IN NS niels.physics.groucho.edu.

IN NS gauss.maths.groucho.edu.

niels.physics.groucho.edu. IN A 149.76.12.1

gauss.maths.groucho.edu. IN A 149.76.4.23

...

in-addr.arpa system zones can only be created as supersets of IP networks. An even more severe restriction is that these networks' netmasks have to be on byte boundaries. All subnets at Groucho Marx University have a netmask of 255.255.255.0, hence an in-addr.arpa zone could be created for each subnet. However, if the netmask were 255.255.255.128 instead, creating zones for the subnet 149.76.12.128 would be impossible, because there's no way to tell DNS that the 12.76.149.in-addr.arpa domain has been split into two zones of authority, with hostnames ranging from 1 through 127, and 128 through 255, respectively.

4.11.11 Running named

named (pronounced name-dee) provides DNS on most Unix machines. It is a server program originally developed for BSD to provide name service to clients, and possibly to other name servers. BIND Version 4 was around for some time and appeared in most Linux distributions. The new release, Version 8, has been introduced in most Linux distributions, and is a big change from previous versions.[1] It has many new features, such as support for DNS dynamic updates, DNS change notifications, much improved performance, and a new configuration file syntax. Please check the documentation contained in the source distribution for details.

This section requires some understanding of the way DNS works. If the following discussion is all Greek to you, you may want to reread the section Section 6.2."

named is usually started at system boot time and runs until the machine goes down again. Implementations of BIND prior to Version 8 take their information from a configuration file called /etc/named.boot and various files that map domain names to addresses. The latter are called zone files. Versions of BIND from Version 8 onwards use /etc/named.conf in place of /etc/named.boot.

To run named at the prompt, enter:

# /usr/sbin/named

named will come up and read the named.boot file and any zone files specified therein. It writes its process ID to /var/run/named.pid in ASCII, downloads any zone files from primary servers, if necessary, and starts listening on port 53 for DNS queries.

4.11.12 The BIND 8 host.conf File

BIND Version 8 introduced a range of new features, and with these came a new configuration file syntax. The named.boot, with its simple single line statements, was replaced by the named.conf file, with a syntax like that of gated and resembling C source syntax.

The new syntax is more complex, but fortunately a tool has been provided that automates conversion from the old syntax to the new syntax. In the BIND 8 source package, a perl program called named-bootconf.pl is provided that will read your existing named.boot file from stdin and convert it into the equivalent named.conf format on stdout. To use it, you must have the perl interpreter installed.

You should use the script somewhat like this:

# cd /etc

# named-bootconf.pl <named.boot >named.conf

The script then produces a named.conf that looks like that shown in Example 6-9. We've cleaned out a few of the helpful comments the script includes to help show the almost direct relationship between the old and the new syntax.

Example 6-9. The BIND 8 equivalent named.conf File for vlager

// /etc/named.boot file for vlager.vbrew.com

options {

directory "/var/named";

};

zone "." {

type hint;

file "named.ca";

};

zone "vbrew.com" {

type master;

file "named.hosts";

};

zone "0.0.127.in-addr.arpa" {

type master;

file "named.local";

};

zone "16.172.in-addr.arpa" {

type master;

file "named.rev";

};

If you take a close look, you will see that each of the one-line statements in named.boot has been converted into a C-like statement enclosed within { } characters in the named.conf file.

The comments, which in the named.boot file were indicated by a semicolon (;), are now indicated by two forward slashes ( // ).

The directory statement has been translated into an options paragraph with a directory clause.

The cache and primary statements have been converted into zone paragraphs with type clauses of hint and master, respectively.

The zone files do not need to be modified in any way; their syntax remains unchanged.

The new configuration syntax allows for many new options that we haven't covered here. If you'd like information on the new options, the best source of information is the documentation supplied with the BIND Version 8 source package.

4.11.13 The DNS Database Files

Master files included with named, like named.hosts, always have a domain associated with them, which is called the origin. This is the domain name specified with the cache and primary options. Within a master file, you are allowed to specify domain and host names relative to this domain. A name given in a configuration file is considered absolute if it ends in a single dot, otherwise it is considered relative to the origin. The origin by itself may be referred to using (@).

The data contained in a master file is split up in resource records(RRs). RRs are the smallest units of information available through DNS. Each resource record has a type. A records, for instance, map a hostname to an IP address, and a CNAME record associates an alias for a host with its official hostname. To see an example, look at Example 6-11, which shows the named.hosts master file for the Virtual Brewery.

Resource record representations in master files share a common format:

[domain] [ttl] [class] type rdata

Fields are separated by spaces or tabs. An entry may be continued across several lines if an opening brace occurs before the first newline and the last field is followed by a closing brace. Anything between a semicolon and a newline is ignored. A description of the format terms follows:

domain

This term is the domain name to which the entry applies. If no domain name is given, the RR is assumed to apply to the domain of the previous RR.

ttl

In order to force resolvers to discard information after a certain time, each RR is associated a time to live (ttl ). The ttl field specifies the time in seconds that the information is valid after it has been retrieved from the server. It is a decimal number with at most eight digits. If no ttl value is given, the field value defaults to that of the minimum field of the preceding SOA record.

class

This is an address class, like IN for IP addresses or HS for objects in the Hesiod class. For TCP/IP networking, you have to specify IN.

If no class field is given, the class of the preceding RR is assumed.

type

This describes the type of the RR. The most common types are A, SOA, PTR, and NS. The following sections describe the various types of RRs.

rdata

This holds the data associated with the RR. The format of this field depends on the type of RR. In the following discussion, it will be described for each RR separately.

The following is partial list of RRs to be used in DNS master files. There are a couple more of them that we will not explain; they are experimental and of little use, generally.

SOA

This RR describes a zone of authority (SOA means “Start of Authority”). It signals that the records following the SOA RR contain authoritative information for the domain. Every master file included by a primary statement must contain an SOA record for this zone. The resource data contains the following fields:

origin

This field is the canonical hostname of the primary name server for this domain. It is usually given as an absolute name.

contact

This field is the email address of the person responsible for maintaining the domain, with the "@" sign replaced by a dot. For instance, if the responsible person at the Virtual Brewery were janet, this field would contain janet.vbrew.com.

serial

This field is the version number of the zone file, expressed as a single decimal number. Whenever data is changed in the zone file, this number should be incremented. A common convention is to use a number that reflects the date of the last update, with a version number appended to it to cover the case of multiple updates occurring on a single day, e.g., 2000012600 being update 00 that occurred on January 26, 2000.

The serial number is used by secondary name servers to recognize zone information changes. To stay up to date, secondary servers request the primary server's SOA record at certain intervals and compare the serial number to that of the cached SOA record. If the number has changed, the secondary servers transfer the whole zone database from the primary server.

refresh

This field specifies the interval in seconds that the secondary servers should wait between checking the SOA record of the primary server. Again, this is a decimal number with at most eight digits.

Generally, the network topology doesn't change too often, so this number should specify an interval of roughly a day for larger networks, and even more for smaller ones.

retry

This number determines the intervals at which a secondary server should retry contacting the primary server if a request or a zone refresh fails. It must not be too low, or a temporary failure of the server or a network problem could cause the secondary server to waste network resources. One hour, or perhaps one-half hour, might be a good choice.

expire

This field specifies the time in seconds after which a secondary server should finally discard all zone data if it hasn't been able to contact the primary server. You should normally set this field to at least a week (604,800 seconds), but increasing it to a month or more is also reasonable.

minimum

This field is the default ttl value for resource records that do not explicitly contain one. The ttl value specifies the maximum amount of time other name servers may keep the RR in their cache. This time applies only to normal lookups, and has nothing to do with the time after which a secondary server should try to update the zone information.

If the topology of your network does not change frequently, a week or even more is probably a good choice. If single RRs change more frequently, you could still assign them smaller ttls individually. If your network changes frequently, you may want to set minimum to one day (86,400 seconds).

This record associates an IP address with a hostname. The resource data field contains the address in dotted quad notation.

For each hostname, there must be only one A record. The hostname used in this A record is considered the official or canonical hostname. All other hostnames are aliases and must be mapped onto the canonical hostname using a CNAME record. If the canonical name of our host were vlager, we'd have an A record that associated that hostname with its IP address. Since we may also want another name associated with that address, say news, we'd create a CNAME record that associates this alternate name with the canonical name. We'll talk more about CNAME records shortly.

NS records are used to specify a zone's primary server and all its secondary servers. An NS record points to a master name server of the given zone, with the resource data field containing the hostname of the name server.

You will meet NS records in two situations: The first situation is when you delegate authority to a subordinate zone; the second is within the master zone database of the subordinate zone itself. The sets of servers specified in both the parent and delegated zones should match.

The NS record specifies the name of the primary and secondary name servers for a zone. These names must be resolved to an address so they can be used. Sometimes the servers belong to the domain they are serving, which causes a “chicken and egg” problem; we can't resolve the address until the name server is reachable, but we can't reach the name server until we resolve its address. To solve this dilemma, we can configure special A records directly into the name server of the parent zone. The A records allow the name servers of the parent domain to resolve the IP address of the delegated zone name servers. These records are commonly called glue records because they provide the “glue” that binds a delegated zone to its parent.

CNAME

This record associates an alias with a host's canonical hostname. It provides an alternate name by which users can refer to the host whose canonical name is supplied as a parameter. The canonical hostname is the one the master file provides an A record for; aliases are simply linked to that name by a CNAME record, but don't have any other records of their own.

PTR

This type of record is used to associate names in the in-addr.arpa domain with hostnames. It is used for reverse mapping of IP addresses to hostnames. The hostname given must be the canonical hostname.

This RR announces a mail exchanger for a domain. Mail exchangers are discussed in Section 17.4.1.” The syntax of an MX record is:

[domain] [ttl] [class] MX preference host

host names the mail exchanger for domain. Every mail exchanger has an integer preference associated with it. A mail transport agent that wants to deliver mail to domain tries all hosts who have an MX record for this domain until it succeeds. The one with the lowest preference value is tried first, then the others, in order of increasing preference value.

HINFO

This record provides information on the system's hardware and software. Its syntax is:

[domain] [ttl] [class] HINFO hardware software

The hardware field identifies the hardware used by this host. Special conventions are used to specify this. A list of valid “machine names” is given in the Assigned Numbers RFC (RFC-1700). If the field contains any blanks, it must be enclosed in double quotes. The software field names the operating system software used by the system. Again, a valid name from the Assigned Numbers RFC should be chosen.

An HINFO record to describe an Intel-based Linux machine should look something like:

tao 36500 IN HINFO IBM-PC LINUX2.2

and HINFO records for Linux running on Motorola 68000-based machines might look like:

cevad 36500 IN HINFO ATARI-104ST LINUX2.0

jedd 36500 IN HINFO AMIGA-3000 LINUX2.0

4.11.14 Caching-only named Configuration

There is a special type of named configuration that we'll talk about before we explain how to build a full name server configuration. It is called a caching-only configuration. It doesn't really serve a domain, but acts as a relay for all DNS queries produced on your host. The advantage of this scheme is that it builds up a cache so only the first query for a particular host is actually sent to the name servers on the Internet. Any repeated request will be answered directly from the cache in your local name server.

A named.boot file for a caching-only server looks like this:

; named.boot file for caching-only server

directory /var/named

primary 0.0.127.in-addr.arpa named.local ; localhost network

cache . named.ca ; root servers

In addition to this named.boot file, you must set up the named.ca file with a valid list of root name servers. You could copy and use Example 6-10 for this purpose. No other files are needed for a caching-only server configuration.

4.11.15 Writing the Master Files

Example 6-10, Example 6-11, Example 6-12, and Example 6-13 give sample files for a name server at the brewery, located on vlager. Due to the nature of the network discussed (a single LAN), the example is pretty straightforward.

The named.ca cache file shown in Example 6-10 shows sample hint records for a root name server. A typical cache file usually describes about a dozen name servers. You can obtain the current list of name servers for the root domain using the nslookup tool described in the next section.

Example 6-10. The named.ca File

;

; /var/named/named.ca Cache file for the brewery.

; We're not on the Internet, so we don't need

; any root servers. To activate these

; records, remove the semicolons.

;

;. 3600000 IN NS A.ROOT-SERVERS.NET.

;A.ROOT-SERVERS.NET. 3600000 A 198.41.0.4

;. 3600000 NS B.ROOT-SERVERS.NET.

;B.ROOT-SERVERS.NET. 3600000 A 128.9.0.107

;. 3600000 NS C.ROOT-SERVERS.NET.

;C.ROOT-SERVERS.NET. 3600000 A 192.33.4.12

;. 3600000 NS D.ROOT-SERVERS.NET.

;D.ROOT-SERVERS.NET. 3600000 A 128.8.10.90

;. 3600000 NS E.ROOT-SERVERS.NET.

;E.ROOT-SERVERS.NET. 3600000 A 192.203.230.10

;. 3600000 NS F.ROOT-SERVERS.NET.

;F.ROOT-SERVERS.NET. 3600000 A 192.5.5.241

;. 3600000 NS G.ROOT-SERVERS.NET.

;G.ROOT-SERVERS.NET. 3600000 A 192.112.36.4

;. 3600000 NS H.ROOT-SERVERS.NET.

;H.ROOT-SERVERS.NET. 3600000 A 128.63.2.53

;. 3600000 NS I.ROOT-SERVERS.NET.

;I.ROOT-SERVERS.NET. 3600000 A 192.36.148.17

;. 3600000 NS J.ROOT-SERVERS.NET.

;J.ROOT-SERVERS.NET. 3600000 A 198.41.0.10

;. 3600000 NS K.ROOT-SERVERS.NET.

;K.ROOT-SERVERS.NET. 3600000 A 193.0.14.129

;. 3600000 NS L.ROOT-SERVERS.NET.

;L.ROOT-SERVERS.NET. 3600000 A 198.32.64.12

;. 3600000 NS M.ROOT-SERVERS.NET.

;M.ROOT-SERVERS.NET. 3600000 A 202.12.27.33

;

Example 6-11. The named.hosts File

;

; /var/named/named.hosts Local hosts at the brewery

; Origin is vbrew.com

;

@ IN SOA vlager.vbrew.com. janet.vbrew.com. (

2000012601 ; serial

86400 ; refresh: once per day

3600 ; retry: one hour

3600000 ; expire: 42 days

604800 ; minimum: 1 week

)

IN NS vlager.vbrew.com.

;

; local mail is distributed on vlager

IN MX 10 vlager

;

; loopback address

localhost. IN A 127.0.0.1

;

; Virtual Brewery Ethernet

vlager IN A 172.16.1.1

vlager-if1 IN CNAME vlager

; vlager is also news server

news IN CNAME vlager

vstout IN A 172.16.1.2

vale IN A 172.16.1.3

;

; Virtual Winery Ethernet

vlager-if2 IN A 172.16.2.1

vbardolino IN A 172.16.2.2

vchianti IN A 172.16.2.3

vbeaujolais IN A 172.16.2.4

;

; Virtual Spirits (subsidiary) Ethernet

vbourbon IN A 172.16.3.1

vbourbon-if1 IN CNAME vbourbon

Example 6-12. The named.local File

;

; /var/named/named.local Reverse mapping of 127.0.0

; Origin is 0.0.127.in-addr.arpa.

;

@ IN SOA vlager.vbrew.com. joe.vbrew.com. (

1 ; serial

360000 ; refresh: 100 hrs

3600 ; retry: one hour

3600000 ; expire: 42 days

360000 ; minimum: 100 hrs

)

IN NS vlager.vbrew.com.

1 IN PTR localhost.

Example 6-13. The named.rev File

;; /var/named/named.rev Reverse mapping of our IP addresses

; Origin is 16.172.in-addr.arpa.

;

@ IN SOA vlager.vbrew.com. joe.vbrew.com. (

16 ; serial

86400 ; refresh: once per day

3600 ; retry: one hour

3600000 ; expire: 42 days

604800 ; minimum: 1 week

)

IN NS vlager.vbrew.com.

; brewery

1.1 IN PTR vlager.vbrew.com.

2.1 IN PTR vstout.vbrew.com.

3.1 IN PTR vale.vbrew.com.

; winery

1.2 IN PTR vlager-if2.vbrew.com.

2.2 IN PTR vbardolino.vbrew.com.

3.2 IN PTR vchianti.vbrew.com.

4.2 IN PTR vbeaujolais.vbrew.com.

4.11.16 Verifying the Name Server Setup

nslookup is a great tool for checking the operation of your name server setup. It can be used both interactively with prompts and as a single command with immediate output. In the latter case, you simply invoke it as:

$ nslookup

hostname

nslookup queries the name server specified in resolv.conf for hostname. (If this file names more than one server, nslookup chooses one at random.)

The interactive mode, however, is much more exciting. Besides looking up individual hosts, you may query for any type of DNS record and transfer the entire zone information for a domain.

When invoked without an argument, nslookup displays the name server it uses and enters interactive mode. At the > prompt, you may type any domain name you want to query. By default, it asks for class A records, those containing the IP address relating to the domain name.

You can look for record types by issuing:

> set type=type

in which type is one of the resource record names described earlier, or ANY. You might have the following nslookup session:

$ nslookup

Default Server: tao.linux.org.au

Address: 203.41.101.121

> metalab.unc.edu

Server: tao.linux.org.au

Address: 203.41.101.121

Name: metalab.unc.edu

Address: 152.2.254.81

The output first displays the DNS server being queried, and then the result of the query.

If you try to query for a name that has no IP address associated with it, but other records were found in the DNS database, nslookup returns with an error message saying “No type A records found.” However, you can make it query for records other than type A by issuing the set type command. To get the SOA record of unc.edu, you would issue:

> unc.edu

Server: tao.linux.org.au

Address: 203.41.101.121

*** No address (A) records available for unc.edu

> set type=SOA

> unc.edu

Server: tao.linux.org.au

Address: 203.41.101.121

unc.edu

origin = ns.unc.edu

mail addr = host-reg.ns.unc.edu

serial = 1998111011

refresh = 14400 (4H)

retry = 3600 (1H)

expire = 1209600 (2W)

minimum ttl = 86400 (1D)

unc.edu name server = ns2.unc.edu

unc.edu name server = ncnoc.ncren.net

unc.edu name server = ns.unc.edu

ns2.unc.edu internet address = 152.2.253.100

ncnoc.ncren.net internet address = 192.101.21.1

ncnoc.ncren.net internet address = 128.109.193.1

ns.unc.edu internet address = 152.2.21.1

In a similar fashion, you can query for MX records:

> set type=MX

> unc.edu

Server: tao.linux.org.au

Address: 203.41.101.121

unc.edu preference = 0, mail exchanger = conga.oit.unc.edu

unc.edu preference = 10, mail exchanger = imsety.oit.unc.edu

unc.edu name server = ns.unc.edu

unc.edu name server = ns2.unc.edu

unc.edu name server = ncnoc.ncren.net

conga.oit.unc.edu internet address = 152.2.22.21

imsety.oit.unc.edu internet address = 152.2.21.99

Using a type of ANY returns all resource records associated with a given name.

A practical application of nslookup, besides debugging, is to obtain the current list of root name servers. You can obtain this list by querying for all NS records associated with the root domain:

> set type=NS

> .

Server: tao.linux.org.au

Address: 203.41.101.121

Non-authoritative answer:

(root) name server = A.ROOT-SERVERS.NET

(root) name server = H.ROOT-SERVERS.NET

(root) name server = B.ROOT-SERVERS.NET

Authoritative answers can be found from:

A.ROOT-SERVERS.NET internet address = 198.41.0.4

H.ROOT-SERVERS.NET internet address = 128.63.2.53

B.ROOT-SERVERS.NET internet address = 128.9.0.107

To see the complete set of available commands, use the help command in nslookup.

4.11.17 Other Useful Tools

There are a few tools that can help you with your tasks as a BIND administrator. We will briefly describe two of them here. Please refer to the documentation that comes with these tools for more information on how to use them.

hostcvt helps you with your initial BIND configuration by converting your /etc/hosts file into master files for named. It generates both the forward (A) and reverse mapping (PTR) entries, and takes care of aliases. Of course, it won't do the whole job for you, as you may still want to tune the timeout values in the SOA record, for example, or add MX records. Still, it may help you save a few aspirins. hostcvt is part of the BIND source, but can also be found as a standalone package on a few Linux FTP servers.

After setting up your name server, you may want to test your configuration. Some good tools that make this job much simpler: the first is called dnswalk, which is a Perl-based package. The second is called nslint. They both walk your DNS database looking for common mistakes and verify that the information they find is consistent. Two other useful tools are host and dig, which are general purpose DNS database query tools. You can use these tools to manually inspect and diagnose DNS database entries.

These tools are likely to be available in prepackaged form. dnswalk and nslint are available in source from http://www.visi.com/~barr/dnswalk/ and ftp://ftp.ee.lbl.gov/nslint.tar.Z.

5 Questions

1. Explain the different duties of a system administrator.

2. How can you obtain the UID of a user given the username?

3. How can you obtain the username of a user given the UID?

4. What is the difference between real UID and effective UID of a process?

5. What is the difference between real GID and effective GID of a process?

6. Give an example where effective UID and GID is useful.

7. What are the different types of files that are supported by Unix?

8. What are the effects of read, write and execute operation on a directory?

9. What is significance of sticky bit on a file?

10. What is significance of sticky bit on a directory?

11. What is significance of setuid and setgid permission of a file?

12. What is the usage of the umask command?

13. Explain the Linux boot process.

14. What is the function of the /sbin/init process?

15. What is a runlevel? Explain the different runlevels supported by Linux.

16. Explain how the initialization scripts are organized in the /etc/rc.d directory?

17. What is the significance of /etc/inittab file?

18. How can you change default login type from text to graphical?

19. How can you disable system shutdown when alt+ctrl+del is pressed?

20. What is a shadow password?

21. How can you remove a user?

22. How can you temporarily disable a user account?

23. What is password aging?

24. Explain the layout of a Linux file system?

25. What is an I-node?

26. What is /proc file system? What are it’s usefulness?

27. What is the difference between the domainname and hostname commands?

28. What is the significance of /etc/hosts file?

29. What is the significance of /etc/networks file?

30. What is the significance of /etc/host.conf file?

31. How can you configure a network interface?

32. What is IP aliasing?

33. What is ARP cache? How can you check it?

34. What is MTU (Maximum Transfer Unit)?

35. What is the usage of netstat command?

36. What is resolver?

37. What is the significance of /etc/nsswitch.conf file?

38. What is the significance of /etc/resolv.conf file?

39. What are the different types of name servers?

40. What is DNS resource record? Describe different fields in a “resource record”.

41. What is reverse lookup?

42. What is the difference between a zone and a domain? What are zone files?

43. What is FQDN?

44. What are different classes of IP addresses? Identify range of networks and hosts in each of the classes.

45. What is the significance of the two network addresses 0.0.0.0 and 127.0.0.0?

46. What is the significance of all 0’s and all 1’s in the host part of an IP address?

47. What are the different IP address ranges that are reserved for private networks?

48. What is Address Resolution Protocol (ARP)? How does it work? What is ARP cache? What is Reverse Address Resolution Protocol (RARP)?

49. What is a subnet-mask or net-mask?

50. What is domain name service? In Unix systems which program is responsible for providing this service and how does it do so?

6 Why Internet Firewalls?

A firewall is a form of protection that allows a network to connect to the Internet while maintaining a degree of security.

7 What Are You Trying to Protect?

A firewall is basically a protective device. If you are building a firewall, the first thing you need to worry about is what you’re trying to protect. When you connect to the Internet, you’re putting three things at risk:

o Your data: the information you keep on the computers

o Your resources: the computers themselves

o Your reputation

8 Your Data

Your data has three separate characteristics that need to be protected:

o Secrecy: you might not want other people to know it.

o Integrity: you probably don't want other people to change it.

o Availability: you almost certainly want to be able to use it yourself.

9 Your Resources

Even if you have data you don't care about - even if you enjoy reinstalling your operating system every week because it exercises the disks, or something like that - if other people are going to use your computers, you probably would like to benefit from this use in some way. Most people want to use their own computers, or they want to charge other people for using them. Even people who give away computer time and disk space usually expect to get good publicity and thanks for it; they aren't going to get it from intruders. You spend good time and money on your computing resources, and it is your right to determine how they are used.

10 Your Reputation

An intruder appears on the Internet with your identity. Anything he does appears to come from you. What are the consequences?

Most of the time, the consequences are simply that other sites - or law enforcement agencies - start calling you to ask why you're trying to break into their systems. Sometimes, such impostors cost you a lot more than lost time. An intruder who actively dislikes you, or simply takes pleasure in making life difficult for strangers, may send electronic mail or post news messages that purport to come from you. Generally, people who choose to do this aim for maximum hatefulness, rather than believability, but even if only a few people believe these messages, the cleanup can be long and humiliating. Anything even remotely believable can do permanent damage to your reputation.

11 What Is an Internet Firewall?

A firewall is a secure and trusted machine that sits between a private network and a public network. The firewall machine is configured with a set of rules that determine which network traffic will be allowed to pass and which will be blocked or refused. In some large organizations, you may even find a firewall located inside their corporate network to segregate sensitive areas of the organization from other employees. Many cases of computer crime occur from within an organization, not just from outside.

Firewalls can be constructed in quite a variety of ways. The most sophisticated arrangement involves a number of separate machines and is known as a perimeter network. Two machines act as "filters" called chokes to allow only certain types of network traffic to pass, and between these chokes reside network servers such as a mail gateway or a World Wide Web proxy server. This configuration can be very safe and easily allows quite a great range of control over who can connect both from the inside to the outside, and from the outside to the inside. This sort of configuration might be used by large organizations.

As we've mentioned, firewalls are a very effective type of network security. In building construction, a firewall is designed to keep a fire from spreading from one part of the building to another. In theory, an Internet firewall serves a similar purpose: it prevents the dangers of the Internet from spreading to your internal network. In practice, an Internet firewall is more like a moat of a medieval castle than a firewall in a modern building. It serves multiple purposes:

o It restricts people to entering at a carefully controlled point.

o It prevents attackers from getting close to your other defenses.

o It restricts people to leaving at a carefully controlled point.

An Internet firewall is most often installed at the point where your protected internal network connects to the Internet, as shown in the following figure.

All traffic coming from the Internet or going out from your internal network passes through the firewall. Because it does, the firewall has the opportunity to make sure that this traffic is acceptable. What does “acceptable” mean to the firewall? It means that whatever is being done - email, file transfers, remote logins, or any kinds of specific interactions between specific systems - conforms to the security policy of the site. Security policies are different for every site; some are highly restrictive and others fairly open.

Logically, a firewall is a separator, a restricter, an analyzer. The physical implementation of the firewall varies from site to site. Most often, a firewall is a set of hardware components - a router, a host computer, or some combination of routers, computers, and networks with appropriate software. There are various ways to configure this equipment; the configuration will depend upon a site's particular security policy, budget, and overall operations.

A firewall is very rarely a single physical object, although some of the newest commercial products attempt to put everything into the same box. Usually, a firewall has multiple parts, and some of these parts may do other tasks besides function as part of the firewall. Your Internet connection is almost always part of your firewall. Even if you have a firewall in a box, it isn't going to be neatly separable from the rest of your site; it's not something you can just drop in.

We’ve compared a firewall to the moat of a medieval castle, and like a moat, a firewall is not invulnerable. It doesn't protect against people who are already inside; it works best if coupled with internal defenses; and, even if you stock it with alligators, people sometimes manage to swim across. A firewall is also not without its drawbacks; building one requires significant expense and effort, and the restrictions it places on insiders can be a major annoyance.

Given the limitations and drawbacks of firewalls, why would anybody bother to install one? Because a firewall is the most effective way to connect a network to the Internet and still protect that network. The Internet presents marvelous opportunities. Millions of people are out there exchanging information. The benefits are obvious: the chances for publicity, customer service, and information gathering. The popularity of the information superhighway is increasing everybody's desire to get out there. The risks should also be obvious: any time you get millions of people together, you get crime; it's true in a city, and it's true on the Internet. Any superhighway is fun only while you're in a car. If you have to live or work by the highway, it's loud, smelly, and dangerous.

How can you benefit from the good parts of the Internet without being overwhelmed by the bad? Just as you’d like to drive on a highway without suffering the nasty effects of putting a freeway off-ramp into your living room, you need to carefully control the contact that your network has to the Internet. A firewall is a tool for doing that, and in most situations, it’s the single most effective tool for doing that.

There are other uses of firewalls. For example, they can be used as firewalls in a building that divide parts of a site from each other when these parts have distinct security needs (and we'll discuss these uses in passing, as appropriate). The focus of this book, however, is on firewalls as they're used between a site and the Internet.

Firewalls offer significant benefits, but they can't solve every security problem. The following sections briefly summarize what firewalls can and cannot do to protect your systems and your data.

12 What Can a Firewall Do?

Firewalls can do a lot for your site's security. In fact, some advantages of using firewalls extend even beyond security, as described below.

1. A firewall is a focus for security decisions

Think of a firewall as a choke point. All traffic in and out must pass through this single, narrow checkpoint. A firewall gives you an enormous amount of leverage for network security because it lets you concentrate your security measures on this checkpoint: the point where your network connects to the Internet. Focusing your security in this way is far more efficient than spreading security decisions and technologies around, trying to cover all the bases in a piecemeal fashion.

2. A firewall can enforce security policy

Many of the services that people want from the Internet are inherently insecure. The firewall is the traffic cop for these services. It enforces the site’s security policy, allowing only “approved” services to pass through and those only within the rules set up for them. For example, one site's management may decide that certain services such as Sun's Network File System (NFS) and Network Information Services (NIS) are simply too risky to be used across the firewall. It doesn't matter what system tries to run them or what user wants them. The firewall will keep potentially dangerous services strictly inside the firewall. A firewall may be called upon to help enforce more complicated policies. For example, perhaps only certain systems within the firewall are allowed to transfer files to and from the Internet; by using other mechanisms to control which users have access to those systems, you can control which users have these capabilities. Depending on the technologies you choose to implement your firewall, a firewall may have a greater or lesser ability to enforce such policies.

3. A firewall can log Internet activity efficiently

Because all traffic passes through the firewall, the firewall provides a good place to collect information about system and network use - and misuse. As a single point of access, the firewall can record what occurs between the protected network and the external network.

4. A firewall limits your exposure

Although this point is most relevant to the use of internal firewalls, sometimes, a firewall will be used to keep one section of your site’s network separate from another section. By doing this, you keep problems that impact one section from spreading through the entire network. In some cases, you’ll do this because one section of your network may be more trusted than another; in other cases, because one section is more sensitive than another. Whatever the reason, the existence of the firewall limits the damage that a network security problem can do to the overall network.

13 What Can't a Firewall Do?

Firewalls offer excellent protection against network threats, but they aren’t a complete security solution. Certain threats are outside the control of the firewall. You need to figure out other ways to protect against these threats by incorporating physical security, host security, and user education into your overall security plan. Some of the weaknesses of firewalls are discussed below.

1. A firewall can't protect you against malicious insiders

A firewall might keep a system user from being able to send proprietary information out of an organization over a network connection; so would simply not having a network connection. But that same user could copy the data onto disk, tape, or paper and carry it out of the building in his or her briefcase.

If the attacker is already inside the firewall, a firewall can do virtually nothing for you. Inside users can steal data, damage hardware and software, and subtly modify programs without ever coming near the firewall. Insider threats require internal security measures, such as host security and user education.

2. A firewall can’t protect you against connections that don’t go through it

A firewall can effectively control the traffic that passes through it; however, there is nothing a firewall can do about traffic that doesn’t pass through it. For example, what if the site allows dial-in access to internal systems behind the firewall? The firewall has absolutely no way of preventing an intruder from getting in through such a modem.

Sometimes, technically expert users or system administrators set up their own “back doors” into the network (such as a dial-up modem connection), either temporarily or permanently, because they chafe at the restrictions that the firewall places upon them and their systems. The firewall can do nothing about this. It’s really a people-management problem, not a technical problem.

3. A firewall can't protect against completely new threats

A firewall is designed to protect against known threats. A well-designed one may also protect against new threats. However, no firewall can automatically defend against every new threat that arises. Periodically people discover new ways to attack, using previously trustworthy services, or using attacks that simply hadn’t occurred to anyone before.

4. A firewall can't protect against viruses

Firewalls can’t keep viruses out of a network. Although many firewalls scan all incoming traffic to determine whether it is allowed to pass through to the internal network, the scanning is mostly for source and destination addresses and port numbers, not for the details of the data. Even with sophisticated packet filtering or proxying software, virus protection in a firewall is not very practical. There are simply too many types of viruses and too many ways a virus can hide within data.

14 Methods of Attack

As a network administrator, it is important that you understand the nature of potential attacks on computer security. We'll briefly describe the most important types of attacks so that you can better understand precisely what the Linux IP firewall will protect you against. You should do some additional reading to ensure that you are able to protect your network against other types of attacks. Here are some of the more important methods of attack and ways of protecting yourself against them:

Unauthorized access

This simply means that people who shouldn't use your computer services are able to connect and use them. For example, people outside your company might try to connect to your company accounting machine or to your NFS server.

There are various ways to avoid this attack by carefully specifying who can gain access through these services. You can prevent network access to all except the intended users.

Exploitation of known weaknesses in programs

Some programs and network services were not originally designed with strong security in mind and are inherently vulnerable to attack. The BSD remote services (rlogin, rexec, etc.) are an example.

The best way to protect yourself against this type of attack is to disable any vulnerable services or find alternatives. With Open Source, it is sometimes possible to repair the weaknesses in the software.

Denial of service

Denial of service attacks cause the service or program to cease functioning or prevent others from making use of the service or program. These may be performed at the network layer by sending carefully crafted and malicious datagrams that cause network connections to fail. They may also be performed at the application layer, where carefully crafted application commands are given to a program that cause it to become extremely busy or stop functioning.

Preventing suspicious network traffic from reaching your hosts and preventing suspicious program commands and requests are the best ways of minimizing the risk of a denial of service attack. It's useful to know the details of the attack method, so you should educate yourself about each new attack as it gets publicized.

Spoofing

This type of attack causes a host or application to mimic the actions of another. Typically the attacker pretends to be an innocent host by following IP addresses in network packets. For example, a well-documented exploit of the BSD rlogin service can use this method to mimic a TCP connection from another host by guessing TCP sequence numbers.

To protect against this type of attack, verify the authenticity of datagrams and commands. Prevent datagram routing with invalid source addresses. Introduce unpredictablility into connection control mechanisms, such as TCP sequence numbers and the allocation of dynamic port addresses.

Eavesdropping

This is the simplest type of attack. A host is configured to "listen" to and capture data not belonging to it. Carefully written eavesdropping programs can take usernames and passwords from user login network connections. Broadcast networks like Ethernet are especially vulnerable to this type of attack.

To protect against this type of threat, avoid use of broadcast network technologies and enforce the use of data encryption.

IP firewalling is very useful in preventing or reducing unauthorized access, network layer denial of service, and IP spoofing attacks. It not very useful in avoiding exploitation of weaknesses in network services or programs and eavesdropping.

15 Packet Filtering

Packet filtering systems route packets between internal and external hosts, but they do it selectively. They allow or block certain types of packets in a way that reflects a site’s own security policy as shown in the following figure. The type of router used in a packet filtering firewall is known as a screening router.

Every packet has a set of headers containing certain information. The main information is:

o IP source address

o IP destination address

o Protocol (whether the packet is a TCP, UDP, or ICMP packet)

o TCP or UDP source port

o TCP or UDP destination port

o ICMP message type

In addition, the router knows things about the packet that aren’t reflected in the packet headers, such as:

o The interface the packet arrives on

o The interface the packet will go out on

The fact that servers for particular Internet services reside at certain port numbers lets the router block or allow certain types of connections simply by specifying the appropriate port number (e.g., TCP port 23 for Telnet connections) in the set of rules specified for packet filtering.

To understand how packet filtering works, let’s look at the difference between an ordinary router and a screening router. An ordinary router simply looks at the destination address of each packet and picks the best way it knows to send that packet towards that destination. The decision about how to handle the packet is based solely on its destination. There are two possibilities: the router knows how to send the packet towards its destination, and it does so; or the router does not know how to send the packet towards its destination, and it returns the packet, via an ICMP “destination unreachable” message, to its source.

A screening router, on the other hand, looks at packets more closely. In addition to determining whether or not it can route a packet towards its destination, a screening router also determines whether or not it should. “Should” or “should not” are determined by the site's security policy, which the screening router has been configured to enforce.

Although it is possible for only a screening router to sit between an internal network and the Internet, this places an enormous responsibility on the screening router. Not only does it need to perform all routing and routing decision-making, but it is the only protecting system; if its security fails (or crumbles under attack), the internal network is exposed. Furthermore, a straightforward screening router can’t modify services. A screening router can permit or deny a service, but it can’t protect individual operations within a service. If a desirable service has insecure operations, or if the service is normally provided with an insecure server, packet filtering alone can’t protect it.

16 Proxy Services

Proxy services are specialized application or server programs that run on a firewall host: either a dual-homed host with an interface on the internal network and one on the external network, or some other bastion host that has access to the Internet and is accessible from the internal machines. These programs take users’ requests for Internet services (such as FTP and Telnet) and forward them, as appropriate according to the site’s security policy, to the actual services. The proxies provide replacement connections and act as gateways to the services. For this reason, proxies are sometimes known as application-level gateways.Firewall terminologies differ. Whereas we use the term proxy service to encompass the entire proxy approach, other authors refer to application-level gateways and circuit-level gateways. Proxy services sit, more or less transparently, between a user on the inside (on the internal network) and a service on the outside (on the Internet). Instead of talking to each other directly, each talks to a proxy. Proxies handle all the communication between users and Internet services behind the scenes.

Transparency is the major benefit of proxy services. It's essentially smoke and mirrors. To the user, a proxy server presents the illusion that the user is dealing directly with the real server. To the real server, the proxy server presents the illusion that the real server is dealing directly with a user on the proxy host (as opposed to the user’s real host).

How do proxy services work? Let's look at the simplest case, where we add proxy services to a dual-homed host.

As the above figure shows, a proxy service requires two components: a proxy server and a proxy client. In this situation, the proxy server runs on the dual-homed host. A proxy client is a special version of a normal client program (i.e., a Telnet or FTP client) that talks to the proxy server rather than to the “real” server out on the Internet; in addition, if users are taught special procedures to follow, normal client programs can often be used as proxy clients. The proxy server evaluates requests from the proxy client, and decides which to approve and which to deny. If a request is approved, the proxy server contacts the real server on behalf of the client (thus the term “proxy”), and proceeds to relay requests from the proxy client to the real server, and responses from the real server to the proxy client.

In some proxy systems, instead of installing custom client proxy software, you'll use standard software, but set up custom user procedures for using it. A proxy service is a software solution, not a firewall architecture per se. You can use proxy services in conjunction with any of the firewall architectures.

The proxy server doesn’t always just forward users' requests on to the real Internet services. The proxy server can control what users do, because it can make decisions about the requests it processes. Depending on your site’s security policy, requests might be allowed or refused. For example, the FTP proxy might refuse to let users export files, or it might allow users to import files only from certain sites. More sophisticated proxy services might allow different capabilities to different hosts, rather than enforcing the same restrictions on all hosts.

Using a Combination of Techniques and Technologies

The “right solution” to building a firewall is seldom a single technique; it’s usually a carefully crafted combination of techniques to solve different problems. Which problems you need to solve depend on what services you want to provide your users and what level of risk you’re willing to accept. Which techniques you use to solve those problems depend on how much time, money, and expertise you have available. Some protocols (e.g., Telnet and SMTP) can be more effectively handled with packet filtering. Others (e.g., FTP, Archie, Gopher, and WWW) are more effectively handled with proxies.

17 Firewalls

Information security is commonly thought of as a process and not a product. However, standard security implementations usually employ some form of dedicated mechanism to control access privileges and restrict network resources to users who are authorized, identifiable, and traceable. Linux includes several powerful tools to assist administrators and security engineers with network-level access control issues.

Aside from VPN solutions such as CIPE or IPSec, firewalls are one of the core components of network security implementation. Several vendors market firewall solutions catering to all levels of the marketplace: from home users protecting one PC to data center solutions safeguarding vital enterprise information. Firewalls can be standalone hardware solutions, such as firewall appliances by Cisco, Nokia, and Sonicwall. There are also proprietary software firewall solutions developed for home and business markets by vendors such as Checkpoint, McAfee, and Symantec.

Apart from the differences between hardware and software firewalls, there are also differences in the way firewalls function that separate one solution from another. Table 1 details three common types of firewalls and how they function:

Method

Description

Advantages

Disadvantages

NAT

Network Address Translation (NAT) places internal network IP subnetworks behind one or a small pool of external IP addresses, masquerading all requests to one source rather than several

Can be configured transparently to machines on a LAN

Protection of many machines and services behind one or more external IP address(es), simplifying administration duties

Restriction of user access to and from the LAN can be configured by opening and closing ports on the NAT firewall/gateway

Cannot prevent malicious activity once users connect to a service outside of the firewall

Packet Filter

Packet filtering firewalls read each data packet that passes within and outside of a LAN. It can read and process packets by header information and filters the packet based on sets of programmable rules implemented by the firewall administrator. The Linux kernel has built-in packet filtering functionality through the netfilter kernel subsystem.

Customizable through the iptables front-end utility

Does not require any customization on the client side, as all network activity is filtered at the router level rather than at the application level

Since packets are not transmitted through a proxy, network performance is faster due to direct connection from client to remote host

Cannot filter packets for content like proxy firewalls

Processes packets at the protocol layer, but cannot filter packets at an application layer

Complex network architectures can make establishing packet filtering rules difficult, especially if coupled with IP masquerading or local subnets and DMZ networks

Proxy

Proxy firewalls filter all requests of a certain protocol or type from LAN clients to a proxy machine, which then makes those requests to the Internet on behalf of the local client. A proxy machine acts as a buffer between malicious remote users and the internal network client machines.

Gives administrators control over what applications and protocols function outside of the LAN

Some proxy servers can cache data so that clients can access frequently requested data from the local cache rather than having to use the Internet connection to request it, which is convenient for cutting down on unnecessary bandwidth consumption

Proxy services can be logged and monitored closely, allowing tighter control over resource utilization on the network

Proxies are often application specific (HTTP, telnet, etc.) or protocol restricted (most proxies work with TCP connected services only)

Application services cannot run behind a proxy, so your application servers must use a separate form of network security

Proxies can become a network bottleneck, as all requests and transmissions are passed through one source rather than direct client to remote service connections

Table 1. Firewall Types

18 Packet Filtering Using Linux

Traffic moves through a network in packets. A network packet is a collection of data in a specific size and format. In order to transmit a file over a network, the sending computer must first break the file into packets using the rules of the network protocol. Each of these packets holds a small part of the file data. Upon receiving the transmission, the target computer reassembles the packets into the file.

Every packet contains information, which helps it navigate the network and move toward its destination. The packet can tell computers along the way, as well as the destination machine, where it came from, where it is going, and what type of packet it is, among other things. Most packets are designed to carry data, although some protocols use packets in special ways. For example, the Transmission Control Protocol (TCP) uses a SYN packet, which contains no data, to initiate communication between two systems.

The Linux kernel has the built-in ability to filter packets, allowing some of them into the system while stopping others. The 2.4 kernel’s netfilter has three built-in tables or rules lists. They are as follows:

o filter — The default table for handling network packets.

o nat — Used to alter packets that create a new connection.

o mangle — Used for specific types of packet alteration.

Each of these tables in turn have a group of built-in chains which correspond to the actions performed on the packet by the netfilter. The built-in chains for the filter table are as follows:

o INPUT — Applies to network packets that are targeted for the host.

o OUTPUT — Applies to locally-generated network packets.

o FORWARD — Applies to network packets routed through the host.

The built-in chains for the nat table are as follows:

o PREROUTING — Alters network packets when they arrive

o OUTPUT — Alters locally-generated network packets before they are sent out

o POSTROUTING — Alters network packets before they are sent out

The built-in chains for the mangle table are as follows:

o INPUT — Alters network packets targeted for the host

o OUTPUT — Alters locally-generated network packets before they are sent out

o FORWARD — Alters network packets routed through the host

o PREROUTING — Alters incoming network packets before they are routed

o POSTROUTING — Alters network packets before they are sent out

Every network packet received by or sent out of a Linux system is subject to at least one table.

A packet may be checked against multiple rules within each table before emerging at the end of the chain. The structure and purpose of these rules may vary, but they usually seek to identify a packet coming from or going to a particular IP address or set of addresses when using a particular protocol and network service.

Regardless of their destination, when packets match a particular rule in one of the tables, a target or action is applied to them. If the rule specifies an ACCEPT target for a matching packet, the packet skips the rest of the rule checks and is allowed to continue to its destination. If a rule specifies a DROP target, that packet is refused access to the system and nothing is sent back to the host that sent the packet. If a rule specifies a QUEUE target, the packet to be passed to user-space. If a rule specifies the optional REJECT target, the packet is dropped, but an error packet is sent to the packet’s originator.

Every chain has a default policy to ACCEPT, DROP, REJECT, or QUEUE. If none of the rules in the chain apply to the packet, then the packet is dealt with in accordance with the default policy.

The iptables command configures these tables, as well as sets up new tables if necessary.

19 Differences between `iptables` and `ipchains`

At first glance, ipchains and iptables appear to be quite similar. Both methods of packet filtering use chains of rules operating within the Linux kernel to decide not only which packets to let in or out, but also what to do with packets that match certain rules. However, iptables offers a much more extensible way of filtering packets, giving the administrator a greater amount of control without building a great deal of complexity into the system.

Specifically, users comfortable with ipchains should be aware of the following significant differences between ipchains and iptables before attempting to use iptables:

Under iptables, each filtered packet is processed using rules from only one chain rather than multiple chains. For instance, a FORWARD packet coming into a system using ipchains would have to go through the INPUT, FORWARD, and OUTPUT chains in order to move along to its destination. However, iptables only sends packets to the INPUT chain if they are destined for the local system and only sends them to the OUTPUT chain if the local system generated the packets. For this reason, place the rule designed to catch a particular packet in the rule that will actually see the packet.

The DENY target has been changed to DROP. In ipchains, packets that matched a rule in a chain could be directed to the DENY target. This target must be changed to DROP under iptables.

Order matters when placing options in a rule. Previously, with ipchains, the order of the rule options did not matter. The iptables command uses stricter syntax. For example, in iptables commands the protocol (ICMP, TCP, or UDP) must be specified before the source or destination ports.

When specifying network interfaces to be used with a rule, you must only use incoming interfaces (-i option) with INPUT or FORWARD chains and outgoing interfaces (-o option) with FORWARD or OUTPUT chains. This is necessary because OUTPUT chains are no longer used by incoming interfaces, and INPUT chains are not seen by packets moving through outgoing interfaces.

This is not a comprehensive list of the changes, given that iptables represents a fundamentally rewritten network filter.

20 Options Used in `iptables` Commands

Rules that allow packets to be filtered by the kernel are put in place by running the iptables command. When using the iptables command, specify the following options:

Packet Type — Dictates what type of packets the command filters.

Packet Source/Destination — Dictates which packets the command filters based on the source or destination of the packet.

Target — Dictates what action is taken on packets matching the above criteria.

The options used with given iptables rule must be grouped logically, based on the purpose and conditions of the overall rule, in order for the rule to be valid.

21 Tables

A powerful aspect of iptables is that multiple tables can be used to decide the fate of a particular packet. Thanks to the extensible nature of iptables, specialized tables can be created and stored in the /lib/modules/<kernel-version>/kernel/net/ipv4/netfilter/ directory, where <kernel-version> corresponds to the version kernel number.

The default table, named filter, contains the standard built-in INPUT, OUTPUT, and FORWARD chains. This is similar to the standard chains in use with ipchains. However, by default, iptables also includes two additional tables that perform specific packet filtering jobs. The nat table can be used to modify the source and destination addresses recorded in packets, and the mangle table alters packets in specialized ways.

Each table contains default chains that perform necessary tasks based on the purpose of the table, although new chains can be added to any table.

Structure

Many iptables commands have the following structure:

iptables [-t <table-name>] <command> <chain-name> <parameter-1> <option-1> <parameter-n> <option-n>

In this example, the <table-name> option allows the user to select a table other than the default filter table to use with the command. The <command> option dictates a specific action to perform, such as appending or deleting the rule specified by the <chain-name> option. Following the <chain-name> are pairs of parameters and options that define what will happen when a packet matches the rule.

When looking at the structure of an iptables command, it is important to remember that, unlike most other commands, the length and complexity of an iptables command can change based on its purpose. A simple command to remove a rule from a chain can be very short, while a command designed to filter packets from a particular subnet using a variety of specific parameters and options can be rather lengthy. When creating iptables commands it is helpful to recognize that some parameters and options may create the need for other parameters and options to further specify the previous option's request. In order to construct a valid rule, this must continue until every parameter and option that requires another set of options is satisfied.

Type iptables -h to see a comprehensive list of iptables command structures.

22 Commands

Commands tell iptables to perform a specific action. Only one command is allowed per iptables command string. With the exception of the help command, all commands are written in upper-case characters.

The iptables commands are as follows:

-A — Appends the iptables rule to the end of the specified chain. This is the command used to simply add a rule when rule order in the chain does not matter.

-C — Checks a particular rule before adding it to the user-specified chain. This command can help you construct complicated iptables rules by prompting you for additional parameters and options.

-D — Deletes a rule in a particular chain by number (such as 5 for the fifth rule in a chain). You can also type the entire rule, and iptables will delete the rule in the chain that matches it.

-E — Renames a user-defined chain. This does not affect the structure of the table.

-F — Flushes the selected chain, which effectively deletes every rule in the the chain. If no chain is specified, this command flushes every rule from every chain.

-h — Provides a list of command structures, as well as a quick summary of command parameters and options.

-I — Inserts a rule in a chain at a point specified by a user-defined integer value. If no number is specified, iptables will place the command at the top of the chain.

-L — Lists all of the rules in the chain specified after the command. To list all rules in all chains in the default filter table, do not specify a chain or table. Otherwise, the following syntax should be used to list the rules in a specific chain in a particular table:

iptables -L <chain-name> -t <table-name>

Powerful options for the -L command that provide rule numbers and allow more verbose rule descriptions.

-N — Creates a new chain with a user-specified name.

-P — Sets the default policy for a particular chain, so that when packets traverse an entire chain without matching a rule, they will be sent on to a particular target, such as ACCEPT or DROP.

-R — Replaces a rule in a particular chain. The rule's number must be specified after the chain's name. The first rule in a chain corresponds to rule number one.

-X — Deletes a user-specified chain. Deleting a built-in chain for any table is not allowed.

-Z — Zeros the byte and packet counters in all chains for a particular table.

23 Parameters

Once certain iptables commands are specified, including those used to add, append, delete, insert, or replace rules within a particular chain, parameters are required to construct a packet filtering rule.

-c — Resets the counters for a particular rule. This parameter accepts the PKTS and BYTES options to specify what counter to reset.

-d — Sets the destination hostname, IP address, or network of a packet that will match the rule. When matching a network, the following IP address/netmask formats are supported:

N.N.N.N/M.M.M.M — Where N.N.N.N is the IP address range and M.M.M.M is the netmask.

N.N.N.N/M — Where N.N.N.N is the IP address range and M is the netmask.

-f — Applies this rule only to fragmented packets.

By using the ! option after this parameter, only unfragmented packets will be matched.

-i — Sets the incoming network interface, such as eth0 or ppp0. With iptables, this optional parameter may only be used with the INPUT and FORWARD chains when used with the filter table and the PREROUTING chain with the nat and mangle tables.

This parameter also supports the following special options:

! — Tells this parameter not to match, meaning that any specified interfaces are specifically excluded from this rule.

+ — A wildcard character used to match all interfaces which match a particular string. For example, the parameter -i eth+ would apply this rule to any Ethernet interfaces but exclude any other interfaces, such as ppp0.

If the -i parameter is used but no interface is specified, then every interface is affected by the rule.

-j — Tells iptables to jump to a particular target when a packet matches a particular rule. Valid targets to be used after the -j option include the standard options, ACCEPT, DROP, QUEUE, and RETURN, as well as extended options that are available through modules loaded by default with the Red Hat Linux iptables RPM package, such as LOG, MARK, and REJECT, among others. See the iptables man page for more information on these and other targets.

You may also direct a packet matching this rule to a user-defined chain outside of the current chain so that other rules can be applied to the packet.

If no target is specified, the packet moves past the rule with no action taken. However, the counter for this rule is still increased by one, as the packet matched the specified rule.

-o — Sets the outgoing network interface for a rule and may only be used with OUTPUT and FORWARD chains in the filter table, and the POSTROUTING chain in the nat and mangle tables. This parameter's options are the same as those of the incoming network interface parameter (-i).

-p — Sets the IP protocol for the rule, which can be either icmp, tcp, udp, or all, to match every supported protocol. In addition, any protocols listed in /etc/protocols may also be used. If this option is omitted when creating a rule, the all option is the default.

-s — Sets the source for a particular packet using the same syntax as the destination (-d) parameter.

24 Match Options

Different network protocols provide specialized matching options which may be set in specific ways to match a particular packet using that protocol. Of course, the protocol must first be specified in the iptables command, by using -p tcp <protocol-name> (where <protocol-name> is the target protocol), to make the options for that protocol available.

TCP Protocol

These match options are available for the TCP protocol (-p tcp):

--dport — Sets the destination port for the packet. Use either a network service name (such as www or smtp), port number, or range of port numbers to configure this option. To browse the names and aliases of network services and the port numbers they use, view the /etc/services file. The --destination-port match option is synonymous with --dport.

To specify a specific range of port numbers, separate the two numbers with a colon (:), such as -p tcp --dport 3000:3200. The largest acceptable valid range is 0:65535.

Use an exclamation point character (!) after the --dport option to tell iptables to match all packets which do not use that network service or port.

--sport — Sets the source port of the packet using the same options as --dport. The --source-port match option is synonymous with --sport.

--syn — Applies to all TCP packets designed to initiate communication, commonly called SYN packets. Any packets that carry a data payload are not touched. Placing an exclamation point character (!) as a flag after the --syn option causes all non-SYN packets to be matched.

--tcp-flags — Allows TCP packets with specific bits, or flags, set to be matched with a rule. The --tcp-flags match option accepts two parameters. The first parameter is the mask, which sets the flags to be examined in the packet. The second parameter refers to the flag that must be set in order to match.

The possible flags are: ACK, FIN, PSH, RST, SYN, URG, ALL, NONE.

For example, an iptables rule which contains -p tcp --tcp-flags ACK,FIN,SYN SYN will only match TCP packets that have the SYN flag set and the ACK and FIN flags unset.

Using the exclamation point character (!) after --tcp-flags reverses the effect of the match option.

--tcp-option — Attempts to match with TCP-specific options that can be set within a particular packet. This match option can also be reversed with the exclamation point character (!).

UDP Protocol

These match options are available for the UDP protocol (-p udp):

--dport — Specifies the destination port of the UDP packet, using the service name, port number, or range of port numbers. The --destination-port match option is synonymous with --dport.

--sport — Specifies the source port of the UDP packet, using the service name, port number, or range of port numbers. The --source-port match option is synonymous with --sport.

ICMP Protocol

These match options are available for the Internet Control Message Protocol (ICMP) (-p icmp):

--icmp-type — Sets the name or number of the ICMP type to match with the rule. A list of valid ICMP names can be seen by typing the iptables -p icmp -h command.

Modules with Additional Match Options

Additional match options are also available through modules loaded by the iptables command. To use a match option module, load the module by name using the -m option, such as -m <module-name> (replacing <module-name> with the name of the module).

A large number of modules are available by default. It is even possible to create your own modules to provide additional match option functionality.

Many modules exist, but only the most popular modules are discussed here.

limit module — Allows limit to be placed on how many packets are matched to a particular rule. This is especially beneficial when logging rule matches so that a flood of matching packets will not fill up the system logs with repetitive messages or use up system resources.

The limit module enables the following options:

--limit — Sets the number of matches for a particular range of time, specified with a number and time modifier arranged in a <number>/<time> format. For example, using --limit 5/hour only lets a rule match five times in a single hour.

If a number and time modifier are not used, the default value of 3/hour is assumed.

--limit-burst — Sets a limit on the number of packets able to match a rule at one time. This option should be used in conjunction with the --limit option, and it accepts a number to set the burst threshold.

If no number is specified, only five packets are initially able to match the rule.

state module — Enables state matching. The state module enables the following options:

--state — match a packet with the following connection states:

ESTABLISHED — The matching packet is associated with other packets in an established connection.

INVALID — The matching packet cannot be tied to a known connection.

NEW — The matching packet is either creating a new connection or is part of a two-way connection not previously seen.

RELATED — The matching packet is starting a new connection related in some way to an existing connection.

These connection states can be used in combination with one another by separating them with commas, such as -m state --state INVALID,NEW.

mac module — Enables hardware MAC address matching.

The mac module enables the following option:

--mac-source — Matches a MAC address of the network interface card that sent the packet. To exclude a MAC address from a rule, place an exclamation point (!) after the --mac-source match option.

To view other match options available through modules, refer to the iptables man page.

Target Options

Once a packet has matched a particular rule, the rule can direct the packet to a number of different targets that decide its fate and, possibly, take additional actions. Each chain has a default target, which is used if none of the rules on that chain match a packet or if none of the rules which match the packet specify a target.

The following are the standard targets:

<user-defined-chain> — Replace <user-defined-chain> with the name of a user-defined chain within the table. This target passes the packet to the target chain.

ACCEPT — Allows the packet to successfully move on to its destination or another chain.

DROP — Drops the packet without responding to the requester. The system that sent the packet is not notified of the failure.

QUEUE — The packet is queued for handling by a user-space application.

RETURN — Stops checking the packet against rules in the current chain. If the packet with a RETURN target matches a rule in a chain called from another chain, the packet is returned to the first chain to resume rule checking where it left off. If the RETURN rule is used on a built-in chain and the packet cannot move up to its previous chain, the default target for the current chain decides what action to take.

In addition to these standard targets, various other targets may be used with extensions called target modules. For more information about match option modules.

There are many extended target modules, most of which only apply to specific tables or situations. A couple of the most popular target modules included by default in Red Hat Linux are:

LOG — Logs all packets that match this rule. Since the packets are logged by the kernel, the /etc/syslog.conf file determines where these log entries are written. By default, they are placed in the /var/log/messages file.

Various options can be used after the LOG target to specify the way in which logging occurs:

--log-level — Sets the priority level of a logging event. A list of priority levels can be found in the syslog.conf man page.

--log-ip-options — Any options set in the header of a IP packet is logged.

--log-prefix — Places a string of up to 29 characters before the log line when it is written. This is useful for writing syslog filters for use in conjunction with packet logging.

--log-tcp-options — Any options set in the header of a TCP packet are logged.

--log-tcp-sequence — Writes the TCP sequence number for the packet in the log.

REJECT — Sends an error packet back to the remote system and drops the packet.

The REJECT target accepts --reject-with <type> (where <type> is the rejection type) which allows more detailed information to be sent back with the error packet. The message port-unreachable is the default <type> error given if no other option is used. For a full list of <type> options that can be used, see the iptables man page.

Other target extensions, including several that are useful for IP masquerading using the nat table or with packet alteration using the mangle table, can be found in the iptables man page.

Listing Options

The default list command, iptables -L, provides a very basic overview of the default filter table's current chains. Additional options provide more information:

-v — Display verbose output, such as the number of packets and bytes each chain has seen, the number of packets and bytes each rule has matched, and which interfaces apply to a particular rule.

-x — Expands numbers into their exact values. On a busy system, the number of packets and bytes seen by a particular chain or rule may be abbreviated using K (thousands), M (millions), and G (billions) at the end of the number. This option forces the full number to be displayed.

-n — Displays IP addresses and port numbers in numeric format, rather than the default hostname and network service format.

--line-numbers — Lists rules in each chain next to their numeric order in the chain. This option is useful when attempting to delete a specific rule in a chain, or to locate where to insert a rule within a chain.

-t — Specifies a table name.

Chains and rules

If you’ve used ipchains in the past, you will be familiar with the concept of chains and rules already.

A rule is simply an entry, which has a specific set of parameters that must be met before the operation, which the rule performs is taken.

You might check the source IP, destination port, protocol and so on, to block or enable traffic, for example. These rules are entered into a chain, which applies to a specific type of traffic, so there is a chain for traffic originating from our gateway machine, another for traffic for the machine and another for traffic being forwarded by it between two other machines. But this is iptables, so there is bound to be a table in there somewhere. In ipchains, filter, masquerading and mangling are all done within the same chains, which while being simple, makes organising your configuration a nightmare if you want to do anything really complex. iptables separates these three types of packet handling into different tables, each containing an independent set of chains.

The filter table contains the INPUT, OUTPUT and FORWARD chains, as in ipchains, although all default chains are named in uppercase now. The nat – which is used for masquerading and similar functions – and mangle tables have slightly different chains, as they need to handle packets in different ways. Both have a PREROUTING chain, which applies to packets heading into the machine, whether they are going to be routed or accepted by the local machine and an OUTPUT chain for packets which originated locally. The nat table also has a POSTROUTING chain, for packets which have already had their routing decision made, but are yet to actually travel across an interface. Using iptables is very straightforward, and the syntax is much like ipchains, so if you’re used to the latter already, it won’t take too long to become familiar with iptables.

Firstly, we want to enable all local traffic and drop everything else:

iptables -A INPUT -i lo -j ACCEPT

iptables -A INPUT -i eth0 -d 10.0.0.0/8 -j ACCEPT

iptables -P INPUT DROP

The first noticable difference is the use of DROP, rather than DENY.ipchains use DENY and REJECT, but since they sound the same, REJECT was renamed to DENY and DENY became DROP in iptables –DROP literally just drops the packet and doesn’t return anything to

All traffic through a network is sent in the form of packets. The start of each packet says where it's going, where it came from, the type of the packet, and other administrative details. This start of the packet is called the header. The rest of the packet, containing the actual data being transmitted, is usually called the body. Some protocols, such TCP, which is used for web traffic, mail, and remote logins, use the concept of a connection before any packets with actual data are sent, various setup packets (with special headers) are exchanged saying “I want to connect”, “OK” and “Thanks”. Then normal packets are exchanged.

25 What is a packet filter?

A packet filter is a piece of software, which looks at the header of packets as they pass through, and decides the fate of the entire packet. It might decide to deny the packet (i.e. discard the packet as if it had never received it), accept the packet (i.e. let the packet go through), or reject the packet (like deny, but tell the source of the packet that it has done so). Under Linux, packet filtering is built into the kernel (as a kernel module at the moment), and there are a few trickier things we can do with packets, but the general principle of looking at the headers and deciding the fate of the packet is still there.

26 What is the need of a packet filter?

Control: When you are using a Linux box to connect your internal network to another network (say, the Internet) you have an opportunity to allow certain types of traffic, and disallow others. For example, the header of a packet contains the destination address of the packet, so you can prevent packets going to a certain part of the outside network.

Security: When your Linux box is the only thing between the chaos of the Internet and your nice, orderly network, it's nice to know you can restrict what comes tromping in your door. For example, you might allow anything to go out from your network, but you might be worried about the well-known `Ping of Death' coming in from malicious outsiders. As another example, you might not want outsiders telnetting to your Linux box, even though all your accounts have passwords; maybe you want (like most people) to be an observer on the Internet, and not a server (willing or otherwise). Simply don't let anyone connect in, by having the packet filter reject incoming packets used to set up connections.

Watchfulness: Sometimes a badly configured machine on the local network will decide to spew packets to the outside world. It's nice to tell the packet filter to let you know if anything abnormal occurs; maybe you can do something about it, or maybe you're just curious by nature.

Linux kernels have had packet filtering since the 1.1 series. The first generation, based on ipfw from BSD, was ported by Alan Cox in late 1994. This was enhanced by Jos Vos and others for Linux 2.0. The user-space tool “ipfwadm” controlled the kernel filtering rules. In mid-1998, for Linux 2.2, Rusty Russel reworked the kernel quite heavily, with the help of Michael Neuling, and introduced the userspace tool “ipchains”. Finally, the fourth-generation tool, “iptables”, and another kernel rewrite occurred in mid-1999 for Linux 2.4.

To configure a Linux box as a firewall you need a kernel, which has the netfilter infrastructure in it. netfilter is a general framework inside the Linux kernel to which other things (such as the iptables module) can plug into. The tool iptables talks to the kernel and tells it what packets to filter. The iptables tool inserts and deletes rules from the kernel’s packet filtering table. This means that whatever you set up, it will be lost upon reboot.

27 Netfilter and IP Tables (2.4 Kernels)

While developing IP Firewall Chains, Paul Russell decided that IP firewalling should be less difficult; he soon set about the task of simplifying aspects of datagram processing in the kernel firewalling code and produced a filtering framework that was both much cleaner and much more flexible. He called this new framework netfilter.

Note: At the time of preparation of this book the netfilter design had not yet stabilized. We hope you'll forgive any errors in the description of netfilter or its associated configuration tools that result from changes that occurred after preparation of this material. We considered the netfilter work important enough to justify the inclusion of this material, despite parts of it being speculative in nature. If you're in any doubt, the relevant HOWTO documents will contain the most accurate and up-to-date information on the detailed issues associated with the netfilter configuration.

So what was wrong with IP chains? They vastly improved the efficiency and management of firewall rules. But the way they processed datagrams was still complex, especially in conjunction with firewall-related features like IP masquerade and other forms of address translation. Part of this complexity existed because IP masquerade and Network Address Translation were developed independently of the IP firewalling code and integrated later, rather than having been designed as a true part of the firewall code from the start. If a developer wanted to add yet more features in the datagram processing sequence, he would have had difficulty finding a place to insert the code and would have been forced to make changes in the kernel in order to do so.

Still, there were other problems. In particular, the “input” chain described input to the IP networking layer as a whole. The input chain affected both datagrams to be destined for this host and datagrams to be routed by this host. This was somewhat counterintuitive because it confused the function of the input chain with that of the forward chain, which applied only to datagrams to be forwarded, but which always followed the input chain. If you wanted to treat datagrams for this host differently from datagrams to be forwarded, it was necessary to build complex rules that excluded one or the other. The same problem applied to the output chain.

Inevitably some of this complexity spilled over into the system administrator's job because it was reflected in the way that rulesets had to be designed. Moreover, any extensions to filtering required direct modifications to the kernel, because all filtering policies were implemented there and there was no way of providing a transparent interface into it. netfilter addresses both the complexity and the rigidity of older solutions by implementing a generic framework in the kernel that streamlines the way datagrams are processed and provides a capability to extend filtering policy without having to modify the kernel.

Let's take a look at two of the key changes made. Figure 4 illustrates how they are processed in the netfilter implementation. The key differences are the removal of the masquerading function from the core code and a change in the locations of the input and output chains. To accompany these changes, a new and extensible configuration tool called iptables was created.

In IP chains, the input chain applies to all datagrams received by the host, irrespective of whether they are destined for the local host or routed to some other host. In netfilter, the input chain applies only to datagrams destined for the local host, and the forward chain applies only to datagrams destined for another host. Similarly, in IP chains, the output chain applies to all datagrams leaving the local host, irrespective of whether the datagram is generated on the local host or routed from some other host. In netfilter, the output chain applies only to datagrams generated on this host and does not apply to datagrams being routed from another host. This change alone offers a huge simplification of many firewall configurations.

In Figure 4, the components labeled “demasq” and “masq” are separate kernel components responsible for the incoming and outgoing processing of masqueraded datagrams. These have been reimplemented as netfilter modules.

Consider the case of a configuration for which the default policy for each of the input, forward, and output chains is deny. In IP chains, six rules would be needed to allow any session through a firewall host: two each in the input, forward, and output chains (one would cover each forward path and one would cover each return path). You can imagine how this could easily become extremely complex and difficult to manage when you want to mix sessions that could be routed and sessions that could connect to the local host without being routed. IP chains allow you to create chains that would simplify this task a little, but the design isn't obvious and requires a certain level of expertise.

In the netfilter implementation with iptables, this complexity disappears completely. For a service to be routed across the firewall host, but not terminate on the local host, only two rules are required: one each for the forward and the reverse directions in the forward chain. This is the obvious way to design firewalling rules, and will serve to simplify the design of firewall configurations immensely.

Figure 4. Datagram processing chain in netfilter

28 How Packets Traverse The Filters

The kernel starts with three lists of rules; these lists are called firewall chains or just chains. The three chains are called INPUT, OUTPUT and FORWARD.

The three circles represent the three chains mentioned above. When a packet reaches a circle in the diagram, that chain is examined to decide the fate of the packet. If the chain says to DROP the packet, it is killed there, but if the chain says to ACCEPT the packet, it continues traversing the diagram.

A chain is a checklist of rules. Each rule says “if the packet header looks like this, then here’s what to do with the packet”. If the rule doesn't match the packet, then the next rule in the chain is consulted. Finally, if there are no more rules to consult, then the kernel looks at the chain policy to decide what to do. In a security-conscious system, this policy usually tells the kernel to DROP the packet.

1. When a packet comes in (say, through the Ethernet card) the kernel first looks at the destination of the packet: this is called “routing”.

2. If it's destined for this box, the packet passes downwards in the diagram, to the INPUT chain. If it passes this, any processes waiting for that packet will receive it.

3. Otherwise, if the kernel does not have forwarding enabled, or it doesn't know how to forward the packet, the packet is dropped. If forwarding is enabled, and the packet is destined for another network interface (if you have another one), then the packet goes rightwards on our diagram to the FORWARD chain. If it is ACCEPTed, it will be sent out.

4. Finally, a program running on the box can send network packets. These packets pass through the OUTPUT chain immediately: if it says ACCEPT, then the packet continues out to whatever interface it is destined for.

29 Specifying Fragments

Sometimes a packet is too large to fit down a wire all at once. When this happens, the packet is divided into fragments, and sent as multiple packets. The other end reassembles these fragments to reconstruct the whole packet.

The problem with fragments is that after the IP header is a part of the internal packet: looking inside the packet for protocol headers (such as is done by the TCP, UDP and ICMP extensions) is not possible, as these headers are only contained in the first fragment.

If you are doing connection tracking or NAT, then all fragments will get merged back together before they reach the packet filtering code, so you need never worry about fragments. Otherwise, you can insert the tiny `ip_defrag.o' module which performs the same task (note, this is only allowed if you are the only connection between the two networks).

Otherwise, it is important to understand how fragments get treated by the filtering rules. Any filtering rule that asks for information we don't have will not match. This means that the first fragment is treated like any other packet. Second and further fragments won't be. Thus a rule –p tcp –sport www (specifying a source port of “www”) will never match a fragment (other than the first fragment). Neither will the opposite rule –p tcp –sport ! www.

However, you can specify a rule specifically for second and further fragments, using the “-f” (or “—fragment”) flag. It is also legal to specify that a rule does not apply to second and further fragments, by preceding the “-f” with “!”. Usually it is regarded as safe to let second and further fragments through, since filtering will effect the first fragment, and thus prevent reassembly on the target host, however, bugs have been known to allow crashing of machines simply by sending fragments. Note for network-heads: malformed packets (TCP, UDP and ICMP packets too short for the firewalling code to read the ports or ICMP code and type) are dropped when such examinations are attempted. So are TCP fragments starting at position 8.

As an example, the following rule will drop any fragments going to 192.168.1.1:

# iptables -A OUTPUT -f -d 192.168.1.1 -j DROP

Extensions to iptables: New tests

iptables is extensible, meaning that both the kernel and the iptables tool can be extended to provide new features. Some of these extensions are standard, and other are more exotic. Extensions can be made by other people and distributed separately for niche users.

Kernel extensions normally live in the kernel module subdirectory, such as /lib/modules/2.3.15/net. They are currently (Linux 2.3.15) not demand loaded, so you will need to manually insert the ones you want. In future they may be loaded on-demand again. Extensions to the iptables program are shared libraries which usually live in /usr/local/lib/iptables/, although a distribution would put them in /lib/iptables or /usr/lib/iptables.

Extensions come in two types: new targets, and new tests; we'll talk about new targets below. Some protocols automatically offer new tests: currently these are TCP, UDP and ICMP as shown below. For these you will be able to specify the new tests on the command line after the “-p” option, which will load the extension. For explicit new tests, the “-m” option to load the extension, after which the extended options will be available. To get help on an extension, use the option to load it (“-p” or “-m”) followed by “-h” or “—help”.

30 Examples of IPtables

The Linux kernel features a powerful networking subsystem called netfilter. The netfilter subsystem provides stateful or stateless packet filtering as well as NAT and IP masquerading services. Netfilter also has the ability to mangle IP header information for advanced routing and connection state management. Netfilter is controlled through the IPTables utility.

IPTables Overview

The power and flexibility of netfilter is implemented through the IPTables interface. This command-line tool is similar in syntax to its predecessor, IPChains; however, IPTables uses the netfilter subsystem to enhance network connection, inspection, and processing; whereas IPChains used intricate rule sets for filtering source and destination paths, as well as connection ports for both. IPTables features advanced logging, pre- and post-routing actions, network address translation, and port forwarding all in one command-line interface.

This section provides an overview of IPTables.

Using IPTables

The first step in using IPTables is to start the IPTables service. This can be done with the command:

service iptables start

To make IPTables start by default whenever the system is booted, you must change runlevel status on the service using chkconfig.

chkconfig --level 345 iptables on

The syntax of IPTables is separated into tiers. The main tier is the chain. A chain specifies the state at which a packet will be manipulated. The usage is as follows:

iptables -A chain -j target

The -A appends a rule at the end of an existing ruleset. The chain is the name of the chain for a rule. The three built-in chains of IPTables (that is, the chains that affect every packet which traverses a network) are INPUT, OUTPUT, and FORWARD. These chains are permanent and cannot be deleted.

When creating an IPTables ruleset, it is critical to remember that order is important. For example, a chain that specifies that any packets from the local 192.168.100.0/24 subnet be dropped, and then a chain is appended (-A) which allows packets from 192.168.100.13 (which is within the dropped restricted subnet), then the appended rule is ignored. You must set a rule to allow 192.168.100.13 first, and then set a drop rule on the subnet.

The only exception to rule ordering and IPTables is with setting default policies (-P ), as IPTables will honor any rules that follow default policies.

Basic Firewall Policies

Some basic policies established from the beginning can aid as a foundation for building more detailed, user-defined rules. IPTables uses policies (-P) to create default rules. Security-minded administrators usually elect to drop all packets as a policy and only allow specific packets on a case-by-case basis. The following rules will block all incoming and outgoing packets on a network gateway:

iptables -P INPUT DENY

iptables -P OUTPUT REJECT

Additionally, it is recommended that any forwarded packets — network traffic that is to be routed from the firewall to its destination node — be denied as well, to restrict internal clients from inadvertent exposure to the Internet. To do this, use the following rule:

iptables -P FORWARD REJECT

After setting the policy chains, you can now create new rules for your particular network and security requirements. The following sections will outline some common rules you may implement in the course of building your IPTables firewall.

Saving and Restoring IPtables Rules

Firewall rules are only valid for the time the computer is on. If the system is rebooted, the rules are automatically flushed and reset. To save the rules so that they will load later, use the following command:

/sbin/service iptables save

The rules will be stored in the file /etc/sysconfig/iptables and will be applied whenever the service is started, restarted, or the machine rebooted.

`INPUT` Filtering

Keeping remote attackers out of a LAN is an important aspect of network security, if not the most important. The integrity of a LAN should be protected from malicious remote users through the use of stringent firewall rules. In the following example, The LAN (which uses a private class C 192.168.1.0/24 IP range) rejects telnet access to the firewall from the outside:

iptables -A INPUT -p tcp --sport telnet -j REJECT

iptables -A INPUT -p udp --sport telnet -j REJECT

The rule rejects all outside tcp and udp connections using the telnet protocol (typically port 23) with a connection refused error message. Rules using the --sport or --dport options can use either port numbers or common service names. So, using both --sport telnet and --sport 23 are acceptable. However, if the port number is changed in /etc/services, then using the telnet option, instead of explicitly stating the port number, will not work.

There is a distinction between the REJECT and DROP target actions. The REJECT target denies access and returns a connection refused error to users who attempt to connect to the service. The DROP, as the name implies, drops the packet without any warning to telnet users. Administrators can use their own discretion when using these targets; however, to avoid user confusion and attempts to continue connecting, the REJECT target is recommended.

There may be times when certain users require remote access to the LAN from outside the LAN. Secure services, such as SSH and CIPE, can be used for encrypted remote connection to LAN services. For administrators with PPP-based resources (such as modem banks or bulk ISP accounts), dialup access can be used to circumvent firewall barriers securely, as modem connections are typically behind a firewall/gateway because they are direct connections. However, for remote users with broadband connections, special cases can be made. You can configure IPTables to accept connections from remote SSH and CIPE clients. For example, to allow remote SSH access to the LAN, the following may be used:

iptables -A INPUT -p tcp --sport 22 -j ACCEPT

iptables -A INPUT -p udp --sport 22 -j ACCEPT

CIPE connection requests from the outside can be accepted with the following command (replacing x with your device number):

iptables -A INPUT -p udp -i cipcbx  -j ACCEPT

Since CIPE uses its own virtual device which transmits datagram (UDP) packets, the rule allows the cipcb interface for incoming connections, instead of source or destination ports (though they can be used in place of device options). For information about using CIPE.

There are other services for which you may need to define INPUT rules.

`OUTPUT` Filtering

There may be instances when an administrator must allow certain users on the internal network to make outbound connections. Perhaps the administrator wants an accountant to connect to a special port specialized rules can be established using OUTPUT action in IPTables. The OUTPUT action places restrictions on outbound data.

For example, it may be prudent for an administrator to install a VPN client on the gateway to allow the entire internal network to access a remote LAN (such as a satelite office). To use CIPE as the VPN client installed on the gateway, use a rule similar to the following:

iptables -A OUTPUT -p udp -i cipcbx  -j ACCEPT

More elaborate rules can be created that control access to specific subnets, or even specific nodes, within a LAN. You can also restrict certain dubious services such as trojans, worms, and other client/server viruses from contacting their server. For example, there are some trojans that scan networks for services on ports from 31337 to 31340 (called the elite ports in cracking lingo). Since there are no legitimate services that communicate via these non-standard ports, blocking it can effectively diminish the chances that potentially infected nodes on your network independently communicate with their remote master servers. Note that the following rule is only useful if your default OUTPUT policy is set to ACCEPT. If you set OUTPUT policy to REJECT, then this rule is not needed.

iptables -A OUTPUT -o eth0 -p tcp --dport 31337 --sport 31337 -j DROP

`FORWARD` and NAT Rules

Most organizations are allotted a limited number of publicly routable IP addresses from their ISP. Due to this limited allowance, administrators must find creative ways to share access to Internet services without giving scarce IP addresses to every node on the LAN. Using class C private IP address is the common way to allow all nodes on a LAN to properly access network services internally and externally. Edge routers (such as firewalls) can receive incoming transmissions from the Internet and route the bits to the intended LAN node; at the same time, it can also route outgoing requests from a LAN node to the remote Internet service. This forwarding of network traffic can become dangerous at times, especially with the availability of modern cracking tools that can spoof internal IP addresses and make the remote attacker's machine act as a node on your LAN. To prevent this, IPTables provides routing and forwarding policies that can be implemented to prevent aberrant usage of network resources.

The FORWARD policy allows an administrator to control where packets can be routed. For example, to allow forwarding for an entire internal IP address range (assuming the gateway has an internal IP address on eth1), the following rule can be set:

iptables -A FORWARD -i eth1 -j ACCEPT

In this example, the -i is used to only accept incoming packets on the internal interface. The -i option is used for packets bound for that device (in this case, an internally assigned device).

By default, IPv4 policy in Linux kernels disables support for IP forwarding, which prevents machines running Linux from functioning as dedicated edge routers. To enable IP forwarding, run the following command or place it in your firewall initialization script:

echo "1" > /proc/sys/net/ipv4/ip_forward

If this command is run via shell prompt, then the setting is not remembered after a reboot. Thus it is recommended that it be added to the firewall initialization script.

FORWARD rules can be implemented to restrict certain types of traffic to the LAN only, such as local network file shares through NFS or Samba. The following rules reject outside connections to Samba shares:

iptables -A FORWARD -p tcp --sport 137:139 -j DROP

iptables -A FORWARD -p udp --sport 137:139 -j DROP

To take the restrictions a step further, block all outside connections that attempt to spoof private IP address ranges to infiltrate your LAN. If a LAN uses the 192.168.1.0/24 range, a rule can set the Internet facing network device (for example, eth0) to drop any packets to that device with an address in your LAN IP range. Because it is recommended to reject forwarded packets as a default policy, any other spoofed IP address to the external-facing device (eth0) will be rejected automatically.

iptables -A FORWARD -p tcp -s 192.168.1.0/24 -i eth0 -j DROP

iptables -A FORWARD -p udp -s 192.168.1.0/24 -i eth0 -j DROP

Rules can also be set to route traffic to certain machines, such as a dedicated HTTP or FTP server, preferably one that is isolated from the internal network on a de-militarized zone (DMZ). To set a rule for routing all incoming HTTP requests to a dedicated HTTP server at IP address 10.0.4.2 and port 80 (outside of the 192.168.1.0/24 range of the LAN), network address translation (NAT) calls a PREROUTING chain to forward the packets to the proper destination:

iptables -t nat -A PREROUTING -i eth0 -p tcp --dport 80 -j DNAT --to 10.0.4.2:80

With this command, all HTTP connections to port 80 from the outside of the LAN are routed to the HTTP server on a separate network from the rest of the internal network. This form of network segmentation can prove safer than allowing HTTP connections to a machine on the network. If the HTTP server is configured to accept secure connections, then port 443 must be forwarded as well.

31 Sample Firewall Configuration

#!/bin/bash

##########################################################################

# IPTABLES VERSION

# This sample configuration is for a single host firewall configuration

# with no services supported by the firewall machine itself.

##########################################################################

# USER CONFIGURABLE SECTION

# The name and location of the ipchains utility.

IPTABLES=iptables

# The path to the ipchains executable.

PATH="/sbin"

# Our internal network address space and its supporting network device.

OURNET="172.29.16.0/24"

OURBCAST="172.29.16.255"

OURDEV="eth0"

# The outside address and the network device that supports it.

ANYADDR="0/0"

ANYDEV="eth1"

# The TCP services we wish to allow to pass - "" empty means all ports

# note: comma separated

TCPIN="smtp,www"

TCPOUT="smtp,www,ftp,ftp-data,irc"

# The UDP services we wish to allow to pass - "" empty means all ports

# note: comma separated

UDPIN="domain"

UDPOUT="domain"

# The ICMP services we wish to allow to pass - "" empty means all types

# ref: /usr/include/netinet/ip_icmp.h for type numbers

# note: comma separated

ICMPIN="0,3,11"

ICMPOUT="8,3,11"

# Logging; uncomment the following line to enable logging of datagrams

# that are blocked by the firewall.

# LOGGING=1

# END USER CONFIGURABLE SECTION

###########################################################################

# Flush the Input table rules

$IPTABLES -F FORWARD

# We want to deny incoming access by default.

$IPTABLES -P FORWARD deny

# Drop all datagrams destined for this host received from outside.

$IPTABLES -A INPUT -i $ANYDEV -j DROP

# SPOOFING

# We should not accept any datagrams with a source address matching ours

# from the outside, so we deny them.

$IPTABLES -A FORWARD -s $OURNET -i $ANYDEV -j DROP

# SMURF

# Disallow ICMP to our broadcast address to prevent "Smurf" style attack.

$IPTABLES -A FORWARD -m multiport -p icmp -i $ANYDEV -d $OURNET -j DENY

# We should accept fragments, in iptables we must do this explicitly.

$IPTABLES -A FORWARD -f -j ACCEPT

# TCP

# We will accept all TCP datagrams belonging to an existing connection

# (i.e. having the ACK bit set) for the TCP ports we're allowing through.

# This should catch more than 95 % of all valid TCP packets.

$IPTABLES -A FORWARD -m multiport -p tcp -d $OURNET --dports $TCPIN /

! --tcp-flags SYN,ACK ACK -j ACCEPT

$IPTABLES -A FORWARD -m multiport -p tcp -s $OURNET --sports $TCPIN /

! --tcp-flags SYN,ACK ACK -j ACCEPT

# TCP - INCOMING CONNECTIONS

# We will accept connection requests from the outside only on the

# allowed TCP ports.

$IPTABLES -A FORWARD -m multiport -p tcp -i $ANYDEV -d $OURNET $TCPIN /

--syn -j ACCEPT

# TCP - OUTGOING CONNECTIONS

# We will accept all outgoing tcp connection requests on the allowed /

TCP ports.

$IPTABLES -A FORWARD -m multiport -p tcp -i $OURDEV -d $ANYADDR /

--dports $TCPOUT --syn -j ACCEPT

# UDP - INCOMING

# We will allow UDP datagrams in on the allowed ports and back.

$IPTABLES -A FORWARD -m multiport -p udp -i $ANYDEV -d $OURNET /

--dports $UDPIN -j ACCEPT

$IPTABLES -A FORWARD -m multiport -p udp -i $ANYDEV -s $OURNET /

--sports $UDPIN -j ACCEPT

# UDP - OUTGOING

# We will allow UDP datagrams out to the allowed ports and back.

$IPTABLES -A FORWARD -m multiport -p udp -i $OURDEV -d $ANYADDR /

--dports $UDPOUT -j ACCEPT

$IPTABLES -A FORWARD -m multiport -p udp -i $OURDEV -s $ANYADDR /

--sports $UDPOUT -j ACCEPT

# ICMP - INCOMING

# We will allow ICMP datagrams in of the allowed types.

$IPTABLES -A FORWARD -m multiport -p icmp -i $ANYDEV -d $OURNET /

--dports $ICMPIN -j ACCEPT

# ICMP - OUTGOING

# We will allow ICMP datagrams out of the allowed types.

$IPTABLES -A FORWARD -m multiport -p icmp -i $OURDEV -d $ANYADDR /

--dports $ICMPOUT -j ACCEPT

# DEFAULT and LOGGING

# All remaining datagrams fall through to the default

# rule and are dropped. They will be logged if you've

# configured the LOGGING variable above.

if [ "$LOGGING" ]

then

# Log barred TCP

$IPTABLES -A FORWARD -m tcp -p tcp -j LOG

# Log barred UDP

$IPTABLES -A FORWARD -m udp -p udp -j LOG

# Log barred ICMP

$IPTABLES -A FORWARD -m udp -p icmp -j LOG

# end.

32 IP Accounting

In today’s world of commercial Internet service, it is becoming increasingly important to know how much data you are transmitting and receiving on your network connections. If you are an Internet Service Provider and you charge your customers by volume, this will be essential to your business. If you are a customer of an Internet Service Provider that charges by data volume, you will find it useful to collect your own data to ensure the accuracy of your Internet charges.

There are other uses for network accounting that have nothing to do with dollars and bills. If you manage a server that offers a number of different types of network services, it might be useful to you to know exactly how much data is being generated by each one. This sort of information could assist you in making decisions, such as what hardware to buy or how many servers to run.

The Linux kernel provides a facility that allows you to collect all sorts of useful information about the network traffic it sees. This facility is called IP accounting.

33 Configuring the Kernel for IP Accounting

The Linux IP accounting feature is very closely related to the Linux firewall software. The places you want to collect accounting data are the same places that you would be interested in performing firewall filtering: into and out of a network host, and in the software that does the routing of datagrams. If you haven't read the section on firewalls, now is probably a good time to do so, as we will be using some of the concepts described in Chapter 9.

To activate the Linux IP accounting feature, you should first see if your Linux kernel is configured for it. Check to see if the /proc/net/ip_acct file exists. If it does, your kernel already supports IP accounting. If it doesn't, you must build a new kernel, ensuring that you answer “Y” to the options in 2.0 and 2.2 series kernels:

Networking options  --->

                    [*] Network firewalls

                    [*] TCP/IP networking

...

                    [*] IP: accounting

or in 2.4 series kernels:

Networking options  --->

    [*] Network packet filtering (replaces ipchains)

34 Configuring IP Accounting

Because IP accounting is closely related to IP firewall, the same tool was designated to configure it, so ipchains or iptables are used to configure IP accounting. The command syntax is very similar to that of the firewall rules, so we won't focus on it, but we will discuss what you can discover about the nature of your network traffic using this feature.

The general command syntax for ipchains and iptables is:

# ipchains -A chain rule-specification

# iptables -A chain rule-specification

The ipchains and iptables commands allow you to specify direction in a manner more consistent with the firewall rules. IP Firewall Chains doesn't allow you to configure a rule that aggregates both directions, but it does allow you to configure rules in the forward chain that the older implementation did not. We'll see the difference that makes in some examples a little later.

The commands are much the same as firewall rules, except that the policy rules do not apply here. We can add, insert, delete, and list accounting rules. In the case of ipchains and iptables, all valid rules are accounting rules, and any command that doesn't specify the -j option performs accounting only.

The rule specification parameters for IP accounting are the same as those used for IP firewall. These are what we use to define precisely what network traffic we wish to count and total.

35 Accounting by Address

Let's work with an example to illustrate how we'd use IP accounting.

Imagine we have a Linux-based router that serves two departments at the Virtual Brewery. The router has two Ethernet devices, eth0 and eth1, each of which services a department; and a PPP device, ppp0, that connects us via a high-speed serial link to the main campus of the Groucho Marx University.

Let's also imagine that for billing purposes we want to know the total traffic generated by each of the departments across the serial link, and for management purposes we want to know the total traffic generated between the two departments.

The following table shows the interface addresses we will use in our example:

Iface	address	netmask
eth0	172.16.3.0	255.255.255.0
eth1	172.16.4.0	255.255.255.0

To answer the question, “How much data does each department generate on the PPP link?”, we could use a rule that looks like this with iptables:

# iptables -A FORWARD -i ppp0 -d 172.16.3.0/24

# iptables -A FORWARD -o ppp0 -s 172.16.3.0/24

# iptables -A FORWARD -i ppp0 -d 172.16.4.0/24

# iptables -A FORWARD -o ppp0 -s 172.16.4.0/24

The first half of each of these set of rules say, “Count all data traveling in either direction across the interface named ppp0 with a source or destination address of 172.16.3.0/24.” The second half of each ruleset is the same, but for the second Ethernet network at our site.

To answer the second question, “How much data travels between the two departments?”, we need a rule that looks like this:

# iptables -A FORWARD -s 172.16.3.0/24 -d 172.16.4.0/24

# iptables -A FORWARD -s 172.16.4.0/24 -d 172.16.3.0/24

These rules will count all datagrams with a source address belonging to one of the department networks and a destination address belonging to the other.

36 Accounting by Service Port

Okay, let's suppose we also want a better idea of exactly what sort of traffic is being carried across our PPP link. We might, for example, want to know how much of the link the FTP, smtp, and World Wide Web services are consuming.

A script of rules to enable us to collect this information might look like:

#!/bin/sh

# Collect ftp, smtp and www volume statistics for data carried on our

# PPP link using iptables.

iptables -A FORWARD -i ppp0 -m tcp -p tcp --sport ftp-data:ftp

iptables -A FORWARD -o ppp0 -m tcp -p tcp --dport ftp-data:ftp

iptables -A FORWARD -i ppp0 -m tcp -p tcp --sport smtp

iptables -A FORWARD -o ppp0 -m tcp -p tcp --dport smtp

iptables -A FORWARD -i ppp0 -m tcp -p tcp --sport www

iptables -A FORWARD -o ppp0 -m tcp -p tcp --dport www

There are a couple of interesting features to this configuration. Firstly, we've specified the protocol. When we specify ports in our rules, we must also specify a protocol because TCP and UDP provide separate sets of ports. Since all of these services are TCB-based, we've specified it as the protocol. Secondly, we've specified the two services ftp and ftp-data in one command. The ipchains command allows either single ports or ranges of ports, which is what we've used here. The syntax "ftp-data:ftp" means "ports ftp-data (20) through ftp (21)," and is how we encode ranges of ports in both ipchains and iptables. When you have a list of ports in an accounting rule, it means that any data received for any of the ports in the list will cause the data to be added to that entry's totals. Remembering that the FTP service uses two ports, the command port and the data transfer port, we've added them together to total the FTP traffic. Lastly, we've specified the source address as “0/0,” which is special notation that matches all addresses and is required by both the ipfwadm and ipchains commands in order to specify ports.

We can expand on the second point a little to give us a different view of the data on our link. Let's now imagine that we class FTP, SMTP, and World Wide Web traffic as essential traffic, and all other traffic as nonessential. If we were interested in seeing the ratio of essential traffic to nonessential traffic, we could do something like:

How do we do this with the ipchains or iptables commands, since they allow only one argument in their port specification? We can exploit user-defined chains in accounting just as easily as in firewall rules. Consider the following approach:

# ipchains -N a-essent

# ipchains -N a-noness

# ipchains -A a-essent -j ACCEPT

# ipchains -A a-noness -j ACCEPT

# ipchains -A forward -i ppp0 -p tcp -s 0/0 ftp-data:ftp -j a-essent

# ipchains -A forward -i ppp0 -p tcp -s 0/0 smtp -j a-essent

# ipchains -A forward -i ppp0 -p tcp -s 0/0 www -j a-essent

# ipchains -A forward -j a-noness

Here we create two user-defined chains, one called a-essent, where we capture accounting data for essential services and another called a-noness, where we capture accounting data for nonessential services. We then add rules to our forward chain that match our essential services and jump to the a-essent chain, where we have just one rule that accepts all datagrams and counts them. The last rule in our forward chain is a rule that jumps to our a-noness chain, where again we have just one rule that accepts all datagrams and counts them. The rule that jumps to the a-noness chain will not be reached by any of our essential services, as they will have been accepted in their own chain. Our tallies for essential and nonessential services will therefore be available in the rules within those chains. This is just one approach you could take; there are others. Our iptables implementation of the same approach would look like:

# iptables -N a-essent

# iptables -N a-noness

# iptables -A a-essent -j ACCEPT

# iptables -A a-noness -j ACCEPT

# iptables -A FORWARD -i ppp0 -m tcp -p tcp --sport ftp-data:ftp -j a-essent

# iptables -A FORWARD -i ppp0 -m tcp -p tcp --sport smtp -j a-essent

# iptables -A FORWARD -i ppp0 -m tcp -p tcp --sport www -j a-essent

# iptables -A FORWARD -j a-noness

This looks simple enough. Unfortunately, there is a small but unavoidable problem when trying to do accounting by service type. You will remember that we discussed the role the MTU plays in TCP/IP networking in an earlier chapter. The MTU defines the largest datagram that will be transmitted on a network device. When a datagram is received by a router that is larger than the MTU of the interface that needs to retransmit it, the router performs a trick called fragmentation. The router breaks the large datagram into small pieces no longer than the MTU of the interface and then transmits these pieces. The router builds new headers to put in front of each of these pieces, and these are what the remote machine uses to reconstruct the original data. Unfortunately, during the fragmentation process the port is lost for all but the first fragment. This means that the IP accounting can't properly count fragmented datagrams. It can reliably count only the first fragment, or unfragmented datagrams. There is a small trick permitted by ipfwadm that ensures that while we won't be able to know exactly what port the second and later fragments were from, we can still count them. An early version of Linux accounting software assigned the fragments a fake port number, 0xFFFF, that we could count. To ensure that we capture the second and later fragments, we could use a rule like:

# iptables -A FORWARD -i ppp0 -m tcp -p tcp -f

These won't tell us what the original port for this data was, but at least we are able to see how much of our data is fragments, and be able to account for the volume of traffic they consume.

In 2.2 kernels you can select a kernel compile-time option that negates this whole issue if your Linux machine is acting as the single access point for a network. If you enable the IP: always defragment option when you compile your kernel, all received datagrams will be reassembled by the Linux router before routing and retransmission. This operation is performed before the firewall and accounting software sees the datagram, and thus you will have no fragments to deal with. In 2.4 kernels you compile and load the netfilter forward-fragment module.

37 Accounting of ICMP Datagrams

The ICMP protocol does not use service port numbers and is therefore a little bit more difficult to collect details on. ICMP uses a number of different types of datagrams. Many of these are harmless and normal, while others should only be seen under special circumstances. Sometimes people with too much time on their hands attempt to maliciously disrupt the network access of a user by generating large numbers of ICMP messages. This is commonly called ping flooding. While IP accounting cannot do anything to prevent this problem (IP firewalling can help, though!) we can at least put accounting rules in place that will show us if anybody has been trying.

ICMP doesn't use ports as TCP and UDP do. Instead ICMP has ICMP message types. We can build rules to account for each ICMP message type.

An IP accounting rule to collect information about the volume of ping data that is being sent to you or that you are generating might look like with iptables:

# iptables -A FORWARD -m icmp -p icmp --sports echo-request

# iptables -A FORWARD -m icmp -p icmp --sports echo-reply

# iptables -A FORWARD -m icmp -p icmp –f

The first rule collects information about the “ICMP Echo Request” datagrams (ping requests), and the second rule collects information about the “ICMP Echo Reply” datagrams (ping replies). The third rule collects information about ICMP datagram fragments. This is a trick similar to that described for fragmented TCP and UDP datagrams.

If you specify source and/or destination addresses in your rules, you can keep track of where the pings are coming from, such as whether they originate inside or outside your network. Once you've determined where the rogue datagrams are coming from, you can decide whether you want to put firewall rules in place to prevent them or take some other action, such as contacting the owner of the offending network to advise them of the problem, or perhaps even legal action if the problem is a malicious act.

38 Accounting by Protocol

Let's now imagine that we are interested in knowing how much of the traffic on our link is TCP, UDP, and ICMP. We would use rules like the following:

# iptables -A FORWARD -i ppp0 -m tcp -p tcp

# iptables -A FORWARD -o ppp0 -m tcp -p tcp

# iptables -A FORWARD -i ppp0 -m udp -p udp

# iptables -A FORWARD -o ppp0 -m udp -p udp

# iptables -A FORWARD -i ppp0 -m icmp -p icmp

# iptables -A FORWARD -o ppp0 -m icmp -p icmp

With these rules in place, all of the traffic flowing across the ppp0 interface will be analyzed to determine whether it is TCP, UDP, or IMCP traffic, and the appropriate counters will be updated for each. The iptables example splits incoming flow from outgoing flow as its syntax demands it.

39 Using IP Accounting Results

It is all very well to be collecting this information, but how do we actually get to see it? To view the collected accounting data and the configured accounting rules, we use our firewall configuration commands, asking them to list our rules. The packet and byte counters for each of our rules are listed in the output.

The ipfwadm, ipchains, and iptables commands differ in how accounting data is handled, so we will treat them independently.

Listing Accounting Data with iptables

The iptables command behaves very similarly to the ipchains command. Again, we must use the -v when listing tour rules to see the accounting counters. To list our accounting data, we would use:

# iptables -L –v

Just as for the ipchains command, you can use the -x argument to show the output in expanded format with unit figures.

40 Resetting the Counters

The IP accounting counters will overflow if you leave them long enough. If they overflow, you will have difficulty determining the value they actually represent. To avoid this problem, you should read the accounting data periodically, record it, and then reset the counters back to zero to begin collecting accounting information for the next accounting interval.

# iptables –Z

You can even combine the list and zeroing actions together to ensure that no accounting data is lost in between:

# iptables -L -Z –v

These commands will first list the accounting data and then immediately zero the counters and begin counting again. If you are interested in collecting and using this information regularly, you would probably want to put this command into a script that recorded the output and stored it somewhere, and execute the script periodically using the cron command.

41 Flushing the Ruleset

One last command that might be useful allows you to flush all the IP accounting rules you have configured. This is most useful when you want to radically alter your ruleset without rebooting the machine. iptables supports the -F argument, which does the same:

# iptables –F

This flushes all of your configured IP accounting rules, removing them all and saving you having to remove each of them individually. Note that flushing the rules with ipchains does not cause any user-defined chains to be removed, only the rules within them.

42 IP Masquerade and Network Address Translation

You don't have to have a good memory to remember a time when only large organizations could afford to have a number of computers networked together by a LAN. Today network technology has dropped so much in price that two things have happened. First, LANs are now commonplace, even in many household environments. Certainly many Linux users will have two or more computers connected by some Ethernet. Second, network resources, particularly IP addresses, are now a scarce resource and while they used to be free, they are now being bought and sold.

Most people with a LAN will probably also want an Internet connection that every computer on the LAN can use. The IP routing rules are quite strict in how they deal with this situation. Traditional solutions to this problem would have involved requesting an IP network address, perhaps a class C address for small sites, assigning each host on the LAN an address from this network and using a router to connect the LAN to the Internet.

In a commercialized Internet environment, this is quite an expensive proposition. First, you'd be required to pay for the network address that is assigned to you. Second, you'd probably have to pay your Internet Service Provider for the privilege of having a suitable route to your network put in place so that the rest of the Internet knows how to reach you. This might still be practical for companies, but domestic installations don't usually justify the cost.

Fortunately, Linux provides an answer to this dilemma. This answer involves a component of a group of advanced networking features called Network Address Translation (NAT). NAT describes the process of modifying the network addresses contained with datagram headers while they are in transit. This might sound odd at first, but we'll show that it is ideal for solving the problem we've just described and many have encountered. IP masquerade is the name given to one type of network address translation that allows all of the hosts on a private network to use the Internet at the price of a single IP address.

IP masquerading allows you to use a private (reserved) IP network address on your LAN and have your Linux-based router perform some clever, real-time translation of IP addresses and ports. When it receives a datagram from a computer on the LAN, it takes note of the type of datagram it is, “TCP,” “UDP,” “ICMP,” etc., and modifies the datagram so that it looks like it was generated by the router machine itself (and remembers that it has done so). It then transmits the datagram onto the Internet with its single connected IP address. When the destination host receives this datagram, it believes the datagram has come from the routing host and sends any reply datagrams back to that address. When the Linux masquerade router receives a datagram from its Internet connection, it looks in its table of established masqueraded connections to see if this datagram actually belongs to a computer on the LAN, and if it does, it reverses the modification it did on the forward path and transmits the datagram to the LAN computer.

A simple example is illustrated in Figure 1

Figure 1. A typical IP masquerade configuration

We have a small Ethernet network using one of the reserved network addresses. The network has a Linux-based masquerade router providing access to the Internet. One of the workstations on the network (192.168.1.3) wishes to establish a connection to the remote host 209.1.106.178 on port 8888. The workstation routes its datagram to the masquerade router, which identifies this connection request as requiring masquerade services. It accepts the datagram and allocates a port number to use (1035), substitutes its own IP address and port number for those of the originating host, and transmits the datagram to the destination host. The destination host believes it has received a connection request from the Linux masquerade host and generates a reply datagram. The masquerade host, upon receiving this datagram, finds the association in its masquerade table and reverses the substution it performed on the outgoing datagram. It then transmits the reply datagram to the originating host.

The local host believes it is speaking directly to the remote host. The remote host knows nothing about the local host at all and believes it has received a connection from the Linux masquerade host. The Linux masquerade host knows these two hosts are speaking to each other, and on what ports, and performs the address and port translations necessary to allow communication.

This might all seem a little confusing, and it can be, but it works and is really quite simple to configure. So don't worry if you don't understand all the details yet.

43 Side Effects and Fringe Benefits

The IP masquerade facility comes with its own set of side effects, some of which are useful and some of which might become bothersome.

None of the hosts on the supported network behind the masquerade router are ever directly seen; consequently, you need only one valid and routable IP address to allow all hosts to make network connections out onto the Internet. This has a downside; none of those hosts are visible from the Internet and you can't directly connect to them from the Internet; the only host visible on a masqueraded network is the masquerade machine itself. This is important when you consider services such as mail or FTP. It helps determine what services should be provided by the masquerade host and what services it should proxy or otherwise treat specially.

Second, because none of the masqueraded hosts are visible, they are relatively protected from attacks from outside; this could simplify or even remove the need for firewall configuration on the masquerade host. You shouldn't rely too heavily on this, though. Your whole network will be only as safe as your masquerade host, so you should use firewall to protect it if security is a concern.

Third, IP masquerade will have some impact on the performance of your networking. In typical configurations this will probably be barely measurable. If you have large numbers of active masquerade sessions, though, you may find that the processing required at the masquerade machine begins to impact your network throughput. IP masquerade must do a good deal of work for each datagram compared to the process of conventional routing. That 386SX16 machine you have been planning on using as a masquerade machine supporting a dial-up link to the Internet might be fine, but don't expect too much if you decide you want to use it as a router in your corporate network at Ethernet speeds.

Last, some network services just won't work through masquerade, or at least not without a lot of help. Typically, these are services that rely on incoming sessions to work, such as some types of Direct Communications Channels (DCC), features in IRC, or certain types of video and audio multicasting services. Some of these services have specially developed kernel modules to provide solutions for these, and we'll talk about those in a moment. For others, it is possible that you will find no support, so be aware,it won't be suitable in all situations.

44 Configuring the Kernel for IP Masquerade

To use the IP masquerade facility, your kernel must be compiled with masquerade support. You must select the following options when configuring a 2.2 series kernel:

Networking options  --->

                    [*] Network firewalls

                    [*] TCP/IP networking

                    [*] IP: firewalling

                    [*] IP: masquerading

                    --- Protocol-specific masquerading support will be built as modules.

                    [*] IP: ipautofw masq support

                    [*] IP: ICMP masquerading

Note that some of the masquerade support is available only as a kernel module. This means that you must ensure that you “make modules” in addition to the usual “make zImage” when building your kernel.

The 2.4 series kernels no longer offer IP masquerade support as a kernel compile time option. Instead, you should select the network packet filtering option:

Networking options  --->

    [M] Network packet filtering (replaces ipchains)

In the 2.2 series kernels, a number of protocol-specific helper modules are created during kernel compilation. Some protocols begin with an outgoing request on one port, and then expect an incoming connection on another. Normally these cannot be masqueraded, as there is no way of associating the second connection with the first without peering inside the protocols themselves. The helper modules do just that; they actually look inside the datagrams and allow masquerading to work for supported protocols that otherwise would be impossible to masquerade. The supported protocols are:

Module	Protocol
ip_masq_ftp	FTP
ip_masq_irc	IRC
ip_masq_raudio	RealAudio
ip_masq_cuseeme	CU-See-Me
ip_masq_vdolive	For VDO Live
ip_masq_quake	IdSoftware's Quake

You must load these modules manually using the insmod command to implement them. Note that these modules cannot be loaded using the kerneld daemon. Each of the modules takes an argument specifying what ports it will listen on. For the RealAudio™ module you might use:

# insmod ip_masq_raudio.o ports=7070,7071,7072

The ports you need to specify depend on the protocol. An IP masquerade mini-HOWTO written by Ambrose Au explains more about the IP masquerade modules and how to configure them.[2]

The netfilter package includes modules that perform similar functions. For example, to provide connection tracking of FTP sessions, you'd load and use the ip_conntrack_ ftp and ip_nat_ ftp.o modules.

45 Configuring IP Masquerade

If you've already read the firewall and accounting chapters, it probably comes as no surprise that the ipfwadm, ipchains, and iptables commands are used to configure the IP masquerade rules as well.

Masquerade rules are a special class of filtering rule. You can masquerade only datagrams that are received on one interface that will be routed to another interface. To configure a masquerade rule you construct a rule very similar to a firewall forwarding rule, but with special options that tell the kernel to masquerade the datagram. The ipfwadm command uses the -m option, ipchains uses -j MASQ, and iptables uses -j MASQUERADE to indicate that datagrams matching the rule specification should be masqueraded.

Let's look at an example. A computing science student at Groucho Marx University has a number of computers at home internetworked onto a small Ethernet-based local area network. She has chosen to use one of the reserved private Internet network addresses for her network. She shares her accomodation with other students, all of whom have an interest in using the Internet. Because student living conditions are very frugal, they cannot afford to use a permanent Internet connection, so instead they use a simple dial-up PPP Internet connection. They would all like to be able to share the connection to chat on IRC, surf the Web, and retrieve files by FTP directly to each of their computers—IP masquerade is the answer.

The student first configures a Linux machine to support the dial-up link and to act as a router for the LAN. The IP address she is assigned when she dials up isn't important. She configures the Linux router with IP masquerade and uses one of the private network addresses for her LAN: 192.168.1.0. She ensures that each of the hosts on the LAN has a default route pointing at the Linux router.

The following iptables commands are all that are required to make masquerading work in her configuration:

# iptables -t nat -P POSTROUTING DROP

# iptables -t nat -A POSTROUTING -o ppp0 -j MASQUERADE

Now whenever any of the LAN hosts try to connect to a service on a remote host, their datagrams will be automatically masqueraded by the Linux masquerade router. The first rule in each example prevents the Linux machine from routing any other datagrams and also adds some security.

To list the masquerade rules you have created, use the -l argument to the ipfwadm command, as we described in earlier while discussing firewalls.

To list the rule we created earlier we use:

# iptables -t nat -L

Chain PREROUTING (policy ACCEPT)

target     prot opt source               destination

Chain POSTROUTING (policy DROP)

target     prot opt source               destination

MASQUERADE  all  --  anywhere             anywhere           MASQUERADE

Chain OUTPUT (policy ACCEPT)

target     prot opt source               destination

Again, masquerade rules appear with a target of MASQUERADE.

46 Setting Timing Parameters for IP Masquerade

When each new connection is established, the IP masquerade software creates an association in memory between each of the hosts involved in the connection. You can view these associations at any time by looking at the /proc/net/ip_masquerade file. These associations will timeout after a period of inactivity, though.The iptables implementation uses much longer default timers and does not allow you to set them.

Each of these values represents a timer used by the IP masquerade software and are in units of seconds. The following table summarizes the timers and their meanings:

Name	Description
tcp	TCP session timeout. How long a TCP connection may remain idle before the association for it is removed.
tcpfin	TCP timeout after FIN. How long an association will remain after a TCP connection has been disconnected.
udp	UDP session timeout. How long a UDP connection may remain idle before the association for it is removed.

47 Handling Name Server Lookups

Handling domain name server lookups from the hosts on the LAN with IP masquerading has always presented a problem. There are two ways of accomodating DNS in a masquerade environment. You can tell each of the hosts that they use the same DNS that the Linux router machine does, and let IP masquerade do its magic on their DNS requests. Alternatively, you can run a caching name server on the Linux machine and have each of the hosts on the LAN use the Linux machine as their DNS. Although a more aggressive action, this is probably the better option because it reduces the volume of DNS traffic travelling on the Internet link and will be marginally faster for most requests, since they'll be served from the cache. The downside to this configuration is that it is more complex. Section 6.3.4,” in Chapter 6, describes how to configure a caching name server.

48 More About Network Address Translation

The netfilter software is capable of many different types of Network Address Translation. IP Masquerade is one simple application of it.

It is possible, for example, to build NAT rules that translate only certain addresses or ranges of addresses and leave all others untouched, or to translate addresses into pools of addresses rather than just a single address, as masquerade does. You can in fact use the iptables command to generate NAT rules that map just about anything, with combinations of matches using any of the standard attributes, such as source address, destination address, protocol type, port number, etc.

Translating the Source Address of a datagram is referred to as “Source NAT,” or SNAT, in the netfilter documentation. Translating the Destination Address of a datagram is known as “Destination NAT,” or DNAT. Translating the TCP or UDP port is known by the term REDIRECT. SNAT, DNAT, and REDIRECT are targets that you may use with the iptables command to build more complex and sophisticated rules.

49 Questions

1. What are the common methods of attacks in computer networks?

2. What is spoofing?

3. What is Denial of Service (DoS) attack?

4. What is an Internet firewall?

5. What are the advantages of an Internet firewall?

6. What are the disadvantages of Internet firewall?

7. What is IP packet filtering? How does it work?

8. What is a proxy server?

9. What is Network Address Translation (NAT)?

10. How does Linux Kernel v2.4 onwards provide firewall capabilities?

11. What is netfilter engine?

12. What is iptables?

13. What is difference between ipchains and iptables?

14. Define the following in the context of Linux firewalls

rules, chains, tables

15. What are the different tables and built-in chains available with iptables?

16. Explain briefly different chains in the filter tables?

17. Explain briefly different chains in the nat tables?

18. Explain briefly different chains in the mangle tables?

19. Describe the general format of an iptables rule?

20. What are the different targets or actions those can be associated with a packet when it matches a firewall rule?

21. What is difference between the DROP or REJECT target options of an iptables rule?

22. How can you save iptables rules?

23. Write ipchains/iptables rules to block all input tcp packets except http (port 80) and telnet (port 23) traffic to a Linux machine.

24. Write ipchains/iptables rules to allow only pinging to a Linux machine.

25. Write ipchains/iptables rules to allow only outgoing http (port 80) traffic from a Linux machine.

26. What is IP accounting?

27. What are the usages of IP accounting?

28. Explain with examples how can you use ipchains/iptables to perform IP accounting by addresses?

29. Explain with examples how can you use ipchains/iptables to perform IP accounting by ports?

30. Explain with examples how can you use ipchains/iptables to perform IP accounting of ICMP packets?

31. What is IP Masquerading? What are its’ benefits?

32. Write iptables rules to configure a Linux machine to do masquerading.

50 The Network Information System (NIS)

When you're running a local area network, your overall goal is usually to provide an environment for your users that makes the network transparent. An important stepping stone is keeping vital data such as user account information synchronized among all hosts. This provides users with the freedom to move from machine to machine without the inconvenience of having to remember different passwords and copy data from one machine to another. Data that is centrally stored doesn't need to be replicated, so long as there is some convenient means of accessing it from a network-connected host. By storing important administrative information centrally, you can make ensure consistency of that data, increase flexibility for the users by allowing them to move from host to host in a transparent way, and make the system administrator's life much easier by maintaining a single copy of information to maintain when required.

We previously discussed an important example of this concept that is used on the Internet—the Domain Name System (DNS). DNS serves a limited range of information, the most important being the mapping between hostname and IP address. For other types of information, there is no such specialized service. Moreover, if you manage only a small LAN with no Internet connectivity, setting up DNS may not seem to be worth the trouble.

This is why Sun developed the Network Information System (NIS). NIS provides generic database access facilities that can be used to distribute, for example, information contained in the passwd and groups files to all hosts on your network. This makes the network appear as a single system, with the same accounts on all hosts. Similarly, you can use NIS to distribute the hostname information from /etc/hosts to all machines on the network.

NIS is based on RPC, and comprises a server, a client-side library, and several administrative tools. Originally, NIS was called Yellow Pages, or YP, which is still used to refer to it. Unfortunately, the name is a trademark of British Telecom, which required Sun to drop that name. As things go, some names stick with people, and so YP lives on as a prefix to the names of most NIS-related commands such as ypserv and ypbind.

Today, NIS is available for virtually all Unixes, and there are even free implementations.

51 Getting Acquainted with NIS

NIS keeps database information in files called maps, which contain key-value pairs. An example of a key-value pair is a user's login name and the encrypted form of their login password. Maps are stored on a central host running the NIS server, from which clients may retrieve the information through various RPC calls. Quite frequently, maps are stored in DBM files.

The maps themselves are usually generated from master text files such as /etc/hosts or /etc/passwd. For some files, several maps are created, one for each search key type. For instance, you may search the hosts file for a hostname as well as for an IP address. Accordingly, two NIS maps are derived from it, called hosts.byname and hosts.byaddr. Table 1 lists common maps and the files from which they are generated.

Table 1. Some Standard NIS Maps and Corresponding Files

Master File	Map(s)	Description
`/etc/hosts`	`Hosts.byname`, `hosts.byaddr`	Maps IP addresses to host names
`/etc/networks`	`networks.byname`, `networks.byaddr`	Maps IP network addresses to network names
`/etc/passwd`	`passwd.byname`, `passwd.byuid`	Maps encrypted passwords to user login names
`/etc/group`	`Group.byname`, `group.bygid`	Maps Group IDs to group names
`/etc/services`	`services.byname`, `services.bynumber`	Maps service descriptions to service names
`/etc/rpc`	`rpc.byname`, `rpc.bynumber`	Maps Sun RPC service numbers to RPC service names
`/etc/protocols`	`protocols.byname`, `protocols.bynumber`	Maps protocol numbers to protocol names
`/usr/lib/aliases`	`mail.aliases`	Maps mail aliases to mail alias names

For some maps, people commonly use nicknames, which are shorter and therefore easier to type. Note that these nicknames are understood only by ypcat and ypmatch, two tools for checking your NIS configuration. To obtain a full list of nicknames understood by these tools, run the following command:

$ ypcat –x

Use "passwd" for "passwd.byname"

Use "group" for "group.byname"

Use "networks" for "networks.byaddr"

Use "hosts" for "hosts.byaddr"

Use "protocols" for "protocols.bynumber"

Use "services" for "services.byname"

Use "aliases" for "mail.aliases"

Use "ethers" for "ethers.byname"

The NIS server program is traditionally called ypserv. For an average network, a single server usually suffices; large networks may choose to run several of these on different machines and different segments of the network to relieve the load on the server machines and routers. These servers are synchronized by making one of them the master server, and the others slave servers. Maps are created only on the master server's host. From there, they are distributed to all slaves.

We have been talking very vaguely about “networks.” There's a distinctive term in NIS that refers to a collection of all hosts that share part of their system configuration data through NIS: the NIS domain. Unfortunately, NIS domains have absolutely nothing in common with the domains we encountered in DNS. To avoid any ambiguity throughout this chapter, we will therefore always specify which type of domain we mean.

NIS domains have a purely administrative function. They are mostly invisible to users, except for the sharing of passwords between all machines in the domain. Therefore, the name given to an NIS domain is relevant only to the administrators. Usually, any name will do, as long as it is different from any other NIS domain name on your local network. For instance, the administrator at the Virtual Brewery may choose to create two NIS domains, one for the Brewery itself, and one for the Winery, which she names brewery and winery respectively. Another quite common scheme is to simply use the DNS domain name for NIS as well.

To set and display the NIS domain name of your host, you can use the domainname command. When invoked without any argument, it prints the current NIS domain name; to set the domain name, you must become the superuser:

# domainname brewery

NIS domains determine which NIS server an application will query. For instance, the login program on a host at the Winery should, of course, query only the Winery's NIS server (or one of them, if there are several) for a user's password information, while an application on a Brewery host should stick with the Brewery's server.

One mystery now remains to be solved: how does a client find out which server to connect to? The simplest approach would use a configuration file that names the host on which to find the server. However, this approach is rather inflexible because it doesn't allow clients to use different servers (from the same domain, of course) depending on their availability. Therefore, NIS implementations rely on a special daemon called ypbind to detect a suitable NIS server in their NIS domain. Before performing any NIS queries, an application first finds out from ypbind which server to use.

ypbind probes for servers by broadcasting to the local IP network; the first to respond is assumed to be the fastest one and is used in all subsequent NIS queries. After a certain interval has elapsed, or if the server becomes unavailable, ypbind probes for active servers again.

Dynamic binding is useful only when your network provides more than one NIS server. Dynamic binding also introduces a security problem. ypbind blindly believes whoever answers, whether it be a humble NIS server or a malicious intruder. Needless to say, this becomes especially troublesome if you manage your password databases over NIS. To guard against this, the Linux ypbind program provides you with the option of probing the local network to find the local NIS server, or configuring the NIS server hostname in a configuration file.

52 NIS Versus NIS+

NIS and NIS+ share little more than their name and a common goal. NIS+ is structured entirely differently from NIS. Instead of a flat namespace with disjoint NIS domains, NIS+ uses a hierarchical namespace similar to that of DNS. Instead of maps, so-called tables are used that are made up of rows and columns, in which each row represents an object in the NIS+ database and the columns cover properties of the objects that NIS+ knows and cares about. Each table for a given NIS+ domain comprises those of its parent domains. In addition, an entry in a table may contain a link to another table. These features make it possible to structure information in many ways.

NIS+ additionally supports secure and encrypted RPC, which helps greatly to solve the security problems of NIS. Traditional NIS has an RPC Version number of 2, while NIS+ is Version 3.

53 The Client Side of NIS

If you are familiar with writing or porting network applications, you may notice that most of the NIS maps listed previously correspond to library functions in the C library. For instance, to obtain passwd information, you generally use the getpwnam and getpwuid functions, which return the account information associated with the given username or numerical user ID, respectively. Under normal circumstances, these functions perform the requested lookup on the standard file, such as /etc/passwd.

An NIS-aware implementation of these functions, however, modifies this behavior and places an RPC call to the NIS server, which looks up the username or user ID. This happens transparently to the application. The function may treat the NIS data as though it has been appended to the original passwd file so both sets of information are available to the application and used, or as though it has completely replaced it so that the information in the local passwd is ignored and only the NIS data is used.

For traditional NIS implementations, there were certain conventions for which maps were replaced and which were appended to the original information. Some, like the passwd maps, required kludgy modifications of the passwd file which, when done incorrectly, would open up security holes. To avoid these pitfalls, NYS and the GNU libc use a general configuration scheme that determines whether a particular set of client functions uses the original files, NIS, or NIS+, and in which order. This scheme will be described later in this chapter.

54 Running an NIS Server

There are two possible NIS server configurations: master and slave. The slave configuration provides a live backup machine, should your master server fail. We will cover the configuration only for a master server here. The server documentation will explain the differences, should you wish to configure a slave server.

There are currently two NIS servers freely available for Linux: one contained in Tobias Reber's yps package, and the other in Peter Eriksson's ypserv package. It doesn't matter which one you run.

After installing the server program (ypserv) in /usr/sbin, you should create the directory that is going to hold the map files your server is to distribute. When setting up an NIS domain for the brewery domain, the maps would go to /var/yp/brewery. The server determines whether it is serving a particular NIS domain by checking if the map directory is present. If you are disabling service for some NIS domain, make sure to remove the directory as well.

Maps are usually stored in DBM files to speed up lookups. They are created from the master files using a program called makedbm (for Tobias's server) or dbmload (for Peter's server).

Transforming a master file into a form that dbmload can parse usually requires some awk or sed magic, which tends to be a little tedious to type and hard to remember. Therefore, Peter Eriksson's ypserv package contains a Makefile (called ypMakefile) that manages the conversion of the most common master files for you. You should install it as Makefile in your map directory and edit it to reflect the maps you want the NIS server to share. Towards the top of the file, you'll find the all target that lists the services ypserv offers. By default, the line looks something like this:

all: ethers hosts networks protocols rpc services passwd group netid

If you don't want to produce, for example, the ethers.byname and ethers.byaddr maps, simply remove the ethers prerequisite from this rule. To test your setup, you can start with just one or two maps, like the services.* maps.

After editing the Makefile, while in the map directory, type make. This will automatically generate and install the maps. You have to make sure to update the maps whenever you change the master files, otherwise the changes will remain invisible to the network.

The section “Setting Up an NIS Client with GNU libc” will explain how to configure the NIS client code. If your setup doesn't work, you should try to find out whether requests are arriving at your server. If you specify the ––debug command-line flag to ypserv, it prints debugging messages to the console about all incoming NIS queries and the results returned. These should give you a hint as to where the problem lies. Tobias's server doesn't have this option.

55 NIS Server Security

NIS used to have a major security flaw: it left your password file readable by virtually anyone in the entire Internet, which made for quite a number of possible intruders. As long as an intruder knew your NIS domain name and the address of your server, he could simply send it a request for the passwd.byname map and instantly receive all your system's encrypted passwords. With a fast password-cracking program like crack and a good dictionary, guessing at least a few of your users' passwords is rarely a problem.

This is what the securenets option is all about. It simply restricts access to your NIS server to certain hosts, based on their IP addresses or network numbers. The latest version of ypserv implements this feature in two ways. The first relies on a special configuration file called /etc/ypserv.securenets and the second conveniently uses the /etc/hosts.allow and /etc/hosts.deny files. Thus, to restrict access to hosts from within the Brewery, their network manager would add the following line to hosts.allow:

ypserv: 172.16.2.

This would let all hosts from IP network 172.16.2.0 access the NIS server. To shut out all other hosts, a corresponding entry in hosts.deny would have to read:

ypserv: ALL

IP numbers are not the only way you can specify hosts or networks in hosts.allow and hosts.deny. Please refer to the hosts_access(5) manual page on your system for details. However, be warned that you cannot use host or domain names for the ypserv entry. If you specify a hostname, the server tries to resolve this hostname—but the resolver in turn calls ypserv, and you fall into an endless loop.

To configure securenets security using the /etc/ypserv.securenets method, you need to create its configuration file, /etc/ypserv.securenets. This configuration file is simple in structure. Each line describes a host or network of hosts that will be allowed access to the server. Any address not described by an entry in this file will be refused access. A line beginning with a # will be treated as a comment. Example 1 shows what a simple /etc/ypserv.securenets would look like:

Example 1. Sample ypserv.securenets File

# allow connections from local host – necessary

host 127.0.0.1

# same as 255.255.255.255 127.0.0.1

# allow connections from any host on the Virtual Brewery network

255.255.255.0   172.16.1.0

The first entry on each line is the netmask to use for the entry, with host being treated as a special keyword meaning “netmask 255.255.255.255.” The second entry on each line is the IP address to which to apply the netmask.

A third option is to use the secure portmapper instead of the securenets option in ypserv. The secure portmapper (portmap-5.0) uses the hosts.allow scheme as well, but offers this for all RPC servers, not just ypserv. However, you should not use both the securenets option and the secure portmapper at the same time, because of the overhead this authorization incurs.

56 Setting Up an NIS Client with GNU libc

We will now describe and discuss the configuration of an NIS client using the GNU libc library support. Your first step should be to tell the GNU libc NIS client which server to use for NIS service. We mentioned earlier that the Linux ypbind allows you to configure the NIS server to use. The default behavior is to query the server on the local network. If the host you are configuring is likely to move from one domain to another, such as a laptop, you would leave the /etc/yp.conf file empty and it would query on the local network for the local NIS server wherever it happens to be.

A more secure configuration for most hosts is to set the server name in the /etc/yp.conf configuration file. A very simple file for a host on the Winery's network may look like this:

# yp.conf - YP configuration for GNU libc library.

ypserver vbardolino

The ypserver statement tells your host to use the host supplied as the NIS server for the local domain. In this example we've specified the NIS server as vbardolino. Of course, the IP address corresponding to vbardolino must be set in the hosts file; alternatively, you may use the IP address itself with the server argument.

In the form shown in the example, the ypserver command tells ypbind to use the named server regardless of what the current NIS domain may be. If, however, you are moving your machine between different NIS domains frequently, you may want to keep information for several domains in the yp.conf file. You can have information on the servers for various NIS domains in yp.conf by specifying the information using the domain statement. For instance, you might change the previous sample file to look like this for a laptop:

# yp.conf - YP configuration for GNU libc library.

domain winery server vbardolino

domain brewery server vstout

This lets you bring up the laptop in either of the two domains by simply setting the desired NIS domain at boot time using the domainname command. The NIS client then uses whichever server is relevant for the current domain.

There is a third option you may want to use. It covers the case when you don't know the name or IP address of the server to use in a particular domain, but still want the ability use a fixed server on certain domains. Imagine we want to insist on using a specified server while operating within the Winery domain, but want to probe for the server to use while in the Brewery domain. We would modify our yp.conf file again to look like this instead:

# yp.conf - YP configuration for GNU libc library.

domain winery server vbardolino

domain brewery broadcast

The broadcast keyword tells ypbind to use whichever NIS server it finds for the domain. After creating this basic configuration file and making sure it is world-readable, you should run your first test to connect to your server. Make sure to choose a map your server distributes, like hosts.byname, and try to retrieve it by using the ypcat utility:

# ypcat hosts.byname

172.16.2.2      vbeaujolais.vbrew.com    vbeaujolais

172.16.2.3      vbardolino.vbrew.com     vbardolino

172.16.1.1      vlager.vbrew.com         vlager

172.16.2.1      vlager.vbrew.com         vlager

172.16.1.2      vstout.vbrew.com         vstout

172.16.1.3      vale.vbrew.com           vale

172.16.2.4      vchianti.vbrew.com       vchianti

The output you get should resemble that just shown. If you get an error message instead that says: Can't bind to server which serves domain, then either the NIS domain name you've set doesn't have a matching server defined in yp.conf, or the server is unreachable for some reason. In the latter case, make sure that a ping to the host yields a positive result, and that it is indeed running an NIS server. You can verify the latter by using rpcinfo, which should produce the following output:

# rpcinfo -u serverhost ypserv

program 100004 version 1 ready and waiting

program 100004 version 2 ready and waiting

1 Network File System (NFS)

Network File System (NFS) is a way to share files between machines on a network as if the files were located on the client's local hard drive. The Network File System (NFS) is probably the most prominent network service using RPC. A mixture of kernel support and user-space daemons on the client side, along with an NFS server on the server side, makes this possible. A Linux machine can be both an NFS server and an NFS client, which means that it can export file systems to other systems and mount file systems exported from other machines.

NFS offers a number of useful features:

o Data accessed by all users can be kept on a central host, with clients mounting this directory at boot time. For example, you can keep all user accounts on one host and have all hosts on your network mount /home from that host. If NFS is installed beside NIS, users can log into any system and still work on one set of files.

o Data consuming large amounts of disk space can be kept on a single host.

o Administrative data can be kept on a single host.

NFS is useful for sharing directories of files between multiple users on the same network. For example, a group of users working on the same project can have access to the files for that project using a shared directory of the NFS file system (commonly known as an NFS share) mounted in the directory /myproject. To access the shared files, the user goes into the /myproject directory on his machine. There are no passwords to enter or special commands to remember. Users work as if the directory is on their local machines.

57 Preparing NFS

Before you can use NFS, be it as server or client, you must make sure your kernel has NFS support compiled in. Newer kernels have a simple interface on the proc filesystem for this, the /proc/filesystems file, which you can display using cat:

$ cat /proc/filesystems

minix

ext2

msdos

nodev proc

nodev nfs

If nfs is missing from this list, you have to compile your own kernel with NFS enabled, or perhaps you will need to load the kernel module if your NFS support was compiled as a module.

58 Mounting an NFS Volume

The mounting of NFS volumes closely resembles regular file systems. Invoke mount using the following syntax:

# mount -t nfs nfs_volume local_dir options

nfs_volume is given as remote_host:remote_dir. Since this notation is unique to NFS filesystems, you can leave out the –t nfs option.

There are a number of additional options that you can specify to mount upon mounting an NFS volume. These may be given either following the –o switch on the command line or in the options field of the /etc/fstab entry for the volume. In both cases, multiple options are separated by commas and must not contain any whitespace characters. Options specified on the command line always override those given in the fstab file.

Here is a sample entry from /etc/fstab:

# volume mount point type options

news:/var/spool/news /var/spool/news nfs timeo=14,intr

This volume can then be mounted using this command:

# mount news:/var/spool/news

In the absence of an fstab entry, NFS mount invocations look a lot uglier. For instance, suppose you mount your users' home directories from a machine named moonshot, which uses a default block size of 4 K for read/write operations. You might increase the block size to 8 K to obtain better performance by issuing the command:

# mount moonshot:/home /home -o rsize=8192,wsize=8192

The list of all valid options is described in its entirety in the nfs manual page. The following is a partial list of options you would probably want to use:

rsize=n and wsize=n

These specify the datagram size used by the NFS clients on read and write requests, respectively. The default depends on the version of kernel, but is normally 1,024 bytes.

timeo=n

This sets the time (in tenths of a second) the NFS client will wait for a request to complete. The default value is 7 (0.7 seconds). What happens after a timeout depends on whether you use the hard or soft option.

hard

Explicitly mark this volume as hard-mounted. This is on by default. This option causes the server to report a message to the console when a major timeout occurs and continues trying indefinitely.

soft

Soft-mount (as opposed to hard-mount) the driver. This option causes an I/O error to be reported to the process attempting a file operation when a major timeout occurs.

intr

Allow signals to interrupt an NFS call. Useful for aborting when the server doesn't respond.

Except for rsize and wsize, all of these options apply to the client's behavior if the server should become temporarily inaccessible. They work together in the following way: Whenever the client sends a request to the NFS server, it expects the operation to have finished after a given interval (specified in the timeout option). If no confirmation is received within this time, a so-called minor timeout occurs, and the operation is retried with the timeout interval doubled. After reaching a maximum timeout of 60 seconds, a major timeout occurs.

By default, a major timeout causes the client to print a message to the console and start all over again, this time with an initial timeout interval twice that of the previous cascade. Potentially, this may go on forever. Volumes that stubbornly retry an operation until the server becomes available again are called hard-mounted. The opposite variety, called soft-mounted, generate an I/O error for the calling process whenever a major timeout occurs. Because of the write-behind introduced by the buffer cache, this error condition is not propagated to the process itself before it calls the write function the next time, so a program can never be sure that a write operation to a soft-mounted volume has succeeded at all.

Whether you hard- or soft-mount a volume depends partly on taste but also on the type of information you want to access from a volume. For example, if you mount your X programs by NFS, you certainly would not want your X session to go berserk just because someone brought the network to a grinding halt by firing up seven copies of Doom at the same time or by pulling the Ethernet plug for a moment. By hard-mounting the directory containing these programs, you make sure that your computer waits until it is able to re-establish contact with your NFS server. On the other hand, non-critical data such as NFS-mounted news partitions or FTP archives may also be soft-mounted, so if the remote machine is temporarily unreachable or down, it doesn't hang your session. If your network connection to the server is flaky or goes through a loaded router, you may either increase the initial timeout using the timeo option or hard-mount the volumes. NFS volumes are hard-mounted by default.

Hard mounts present a problem because, by default, the file operations are not interruptible. Thus, if a process attempts, for example, a write to a remote server and that server is unreachable, the user's application hangs and the user can't do anything to abort the operation. If you use the intr option in conjuction with a hard mount, any signals received by the process interrupt the NFS call so that users can still abort hanging file accesses and resume work (although without saving the file).

Usually, the rpc.mountd daemon in some way or other keeps track of which directories have been mounted by what hosts. This information can be displayed using the showmount program, which is also included in the NFS server package:

# showmount -e moonshot

Export list for localhost:

/home <anon clnt>

# showmount -d moonshot

Directories on localhost:

/home

# showmount -a moonshot

All mount points on localhost:

localhost:/home

59 Mounting NFS File Systems using `/etc/fstab`

An alternate way to mount an NFS share from another machine is to add a line to the /etc/fstab file. The line must state the hostname of the NFS server, the directory on the server being exported, and the directory on the local machine where the NFS share is to be mounted. You must be root to modify the /etc/fstab file.

The general syntax for the line in /etc/fstab is as follows:

server:/usr/local/pub /pub nfs rsize=8192,wsize=8192,timeo=14,intr

The mount point /pub must exist on the client machine. After adding this line to /etc/fstab on the client system, type the command mount /pub at a shell prompt, and the mount point /pub will be mounted from the server.

60 Mounting NFS File Systems using autofs

A third option for mounting an NFS share is the use of autofs. Autofs uses the automount daemon to manage your mount points by only mounting them dynamically when they are accessed.

Autofs consults the master map configuration file /etc/auto.master to determine which mount points are defined. It then starts an automount process with the appropriate parameters for each mount point. Each line in the master map defines a mount point and a separate map file that defines the file systems to be mounted under this mount point. For example, the /etc/auto.misc file might define mount points in the /misc directory; this relationship would be defined in the /etc/auto.master file.

Each entry in auto.master has three fields. The first field is the mount point. The second field is the location of the map file, and the third field is optional. The third field can contain information such as a timeout value.

For example, to mount the directory /proj52 on the remote machine penguin.example.net at the mount point /misc/myproject on your machine, add the following line to auto.master:

/misc /etc/auto.misc --timeout 60

Add the following line to /etc/auto.misc:

Myproject -rw,soft,intr,rsize=8192,wsize=8192 penguin.example.net:/proj52

The first field in /etc/auto.misc is the name of the /misc subdirectory. This directory is created dynamically by automount. It should not actually exist on the client machine. The second field contains mount options such as rw for read and write access. The third field is the location of the NFS export including the hostname and directory.

Autofs is a service. To start the service, at a shell prompt, type the following commands:

/sbin/service autofs restart

To view the active mount points, type the following command at a shell prompt:

/sbin/service autofs status

If you modify the /etc/auto.master configuration file while autofs is running, you must tell the automount daemon(s) to reload by typing the following command at a shell prompt:

/sbin/service autofs reload

61 How NFS Works?

Let's have a look at how NFS works. First, a client tries to mount a directory from a remote host on a local directory just the same way it does a physical device. However, the syntax used to specify the remote directory is different. For example, to mount /home from host vlager to /users on vale, the administrator issues the following command on vale:

# mount -t nfs vlager:/home /users

mount will try to connect to the rpc.mountd mount daemon on vlager via RPC. The server will check if vale is permitted to mount the directory in question, and if so, return it a file handle. This file handle will be used in all subsequent requests to files below /users.

When someone accesses a file over NFS, the kernel places an RPC call to rpc.nfsd (the NFS daemon) on the server machine. This call takes the file handle, the name of the file to be accessed, and the user and group IDs of the user as parameters. These are used in determining access rights to the specified file. In order to prevent unauthorized users from reading or modifying files, user and group IDs must be the same on both hosts.

On most Unix implementations, the NFS functionality of both client and server is implemented as kernel-level daemons that are started from user space at system boot. These are the NFS Daemon (rpc.nfsd) on the server host, and the Block I/O Daemon (biod) on the client host. To improve throughput, biod performs asynchronous I/O using read-ahead and write-behind; also, several rpc.nfsd daemons are usually run concurrently.

The current NFS implementation of Linux is a little different from the classic NFS in that the server code runs entirely in user space, so running multiple copies simultaneously is more complicated. The current rpc.nfsd implementation offers an experimental feature that allows limited support for multiple servers. Olaf Kirch developed kernel-based NFS server support featured in 2.2 Version Linux kernels. Its performance is significantly better than the existing userspace implementation.

62 The NFS Daemons

If you want to provide NFS service to other hosts, you have to run the rpc.nfsd and rpc.mountd daemons on your machine. As RPC-based programs, they are not managed by inetd, but are started up at boot time and register themselves with the portmapper; therefore, you have to make sure to start them only after rpc.portmap is running. Usually, you'd use something like the following example in one of your network boot scripts:

If [ -x /usr/sbin/rpc.mountd ]; then

/usr/sbin/rpc.mountd; echo -n " mountd"

if [ -x /usr/sbin/rpc.nfsd ]; then

/usr/sbin/rpc.nfsd; echo -n " nfsd"

The ownership information of the files an NFS daemon provides to its clients usually contains only numerical user and group IDs. If both client and server associate the same user and group names with these numerical IDs, they are said to their share uid/gid space. For example, this is the case when you use NIS to distribute the passwd information to all hosts on your LAN.

On some occasions, however, the IDs on different hosts do not match. Rather than updating the uids and gids of the client to match those of the server, you can use the rpc.ugidd mapping daemon to work around the disparity. Using the map_daemon option explained later, you can tell rpc.nfsd to map the server's uid/gid space to the client's uid/gid space with the aid of the rpc.ugidd on the client. Unfortunately, the rpc.ugidd daemon isn't supplied on all modern Linux distributions, so if you need it and yours doesn't have it, you will need to compile it from source.

rpc.ugidd is an RPC-based server that is started from your network boot scripts, just like rpc.nfsd and rpc.mountd:

If [ -x /usr/sbin/rpc.ugidd ]; then

/usr/sbin/rpc.ugidd; echo -n " ugidd"

63 Exporting NFS File Systems

Now we'll look at how we configure the NFS server. Specifically, we'll look at how we tell the NFS server what filesystems it should make available for mounting, and the various parameters that control the access clients will have to the filesystem. The server determines the type of access that is allowed to the server's files. The /etc/exports file lists the filesystems that the server will make available for clients to mount and use.

By default, rpc.mountd disallows all directory mounts, which is a rather sensible attitude. If you wish to permit one or more hosts to NFS-mount a directory, you must export it, that is, specify it in the exports file. A sample file may look like this:

# exports file for vlager

/home vale(rw) vstout(rw) vlight(rw)

/usr/X11R6 vale(ro) vstout(ro) vlight(ro)

/usr/TeX vale(ro) vstout(ro) vlight(ro)

/ vale(rw,no_root_squash)

/home/ftp (ro)

Each line defines a directory and the hosts that are allowed to mount it. A hostname is usually a fully qualified domain name but may additionally contain the * and ? wildcards, which act the way they do with the Bourne shell. For instance, lab*.foo.com matches lab01.foo.com as well as laboratory.foo.com. The host may also be specified using an IP address range in the form address/netmask. If no hostname is given, as with the /home/ftp directory in the previous example, any host matches and is allowed to mount the directory.

When checking a client host against the exports file, rpc.mountd looks up the client's hostname using the gethostbyaddr call. With DNS, this call returns the client's canonical hostname, so you must make sure not to use aliases in exports. In an NIS environment the returned name is the first match from the hosts database, and with neither DNS or NIS, the returned name is the first hostname found in the hosts file that matches the client's address.

64 Hostname Formats

The host(s) can be in the following forms:

Single machine — A fully qualified domain name (that can be resolved by the server), hostname (that can be resolved by the server), or an IP address

Series of machines specified with wildscards — Use the * or ? character to specify a string match. Wildcards are not to be used with IP addresses; however, they may accidently work if reverse DNS lookups fail. When specifying wildcards in fully qualified domain names, dots (.) are not included in the wildcard. For example, *.example.com includes one.example.com but does not include one.two.example.com.

IP networks — Use a.b.c.d/z, where a.b.c.d is the network and z is the number of bits in the netmask (for example 192.168.0.0/24). Another acceptable format is a.b.c.d/netmask, where a.b.c.d is the network and netmask is the netmask (for example, 192.168.100.8/255.255.255.0).

Netgroups — In the format @group-name, where group-name is the NIS netgroup name.

The hostname is followed by an optional comma-separated list of flags, enclosed in parentheses. Some of the values these flags may take are:

secure

This flag insists that requests be made from a reserved source port, i.e., one that is less than 1,024. This flag is set by default.

insecure

This flag reverses the effect of the secure flag.

This flag causes the NFS mount to be read-only. This flag is enabled by default.

This option mounts file hierarchy read-write.

root_squash

This security feature denies the superusers on the specified hosts any special access rights by mapping requests from uid 0 on the client to the uid 65534 (that is, -2) on the server. This uid should be associated with the user nobody.

no_root_squash

Don't map requests from uid 0. This option is on by default, so superusers have superuser access to your system's exported directories.

link_relative

This option converts absolute symbolic links (where the link contents start with a slash) into relative links. This option makes sense only when a host's entire filesystem is mounted; otherwise, some of the links might point to nowhere, or even worse, to files they were never meant to point to. This option is on by default.

link_absolute

This option leaves all symbolic links as they are (the normal behavior for Sun-supplied NFS servers).

map_identity

This option tells the server to assume that the client uses the same uids and gids as the server. This option is on by default.

map_daemon

This option tells the NFS server to assume that client and server do not share the same uid/gid space. rpc.nfsd then builds a list that maps IDs between client and server by querying the client's rpc.ugidd daemon.

map_static

This option allows you to specify the name of a file that contains a static map of uids and gids. For example, map_static=/etc/nfs/vlight.map would specify the /etc/nfs/vlight.map file as a uid/gid map. The syntax of the map file is described in the exports(5) manual page.

map_nis

This option causes the NIS server to do the uid and gid mapping.

anonuid and anongid

These options allow you to specify the uid and gid of the anonymous account. This is useful if you have a volume exported for public mounts.

sync or async

If sync is specified, the server does not reply to requests before the changes made by the request are written to the disk.

Any error in parsing the exports file is reported to syslogd's daemon facility at level notice whenever rpc.nfsd or rpc.mountd is started up.

Note that hostnames are obtained from the client's IP address by reverse mapping, so the resolver must be configured properly. If you use BIND and are very security conscious, you should enable spoof checking in your host.conf file.

The NFS Server Configuration Tool can also be used to configure a system as an NFS server.

To use the NFS Server Configuration Tool, you must be running the X Window System, have root privileges, and have the redhat-config-nfs RPM package installed. To start the application, select Main Menu Button (on the Panel) => System Settings => Server Settings => NFS Server, or type the command redhat-config-nfs.

Figure 1. NFS Server Configuration Tool

To add an NFS share, click the Add button. The dialog box shown in Figure 2 will appear.

The Basic tab requires the following information:

Directory — Specify the directory to share, such as /tmp.

Host(s) — Specify the host(s) to which to share the directory.

Basic permissions — Specify whether the directory should have read-only or read/write permissions.

Figure 2. Add Share

The General Options tab allows the following options to be configured:

o Allow connections from port 1024 and higher — Services started on port numbers less than 1024 must be started as root. Select this option to allow the NFS service to be started by a user other than root. This option corresponds to insecure.

o Allow insecure file locking — Do not require a lock request. This option corresponds to insecure_locks.

o Disable subtree checking — If a subdirectory of a file system is exported, but the entire file system is not exported, the server checks to see if the requested file is in the subdirectory exported. This check is called subtree checking. Select this option to disable subtree checking. If the entire file system is exported, selecting to disable subtree checking can increase the transfer rate. This option corresponds to no_subtree_check.

o Sync write operations on request — Enabled by default, this option does not allow the server to reply to requests before the changes made by the request are written to the disk. This option corresponds to sync. If this is not selected, the async option is used.

o Force sync of write operations immediately — Do not delay writing to disk. This option corresponds to no_wdelay.

The User Access tab allows the following options to be configured:

o Treat remote root user as local root — By default, the user and group IDs of the root user are both 0. Root squashing maps the user ID 0 and the group ID 0 to the user and group IDs of anonymous so that root on the client does not have root privileges on the NFS server. If this option is selected, root is not mapped to anonymous, and root on a client has root privileges to exported directories. Selecting this option can greatly decrease the security of the system. Do not select it unless it is absolutely necessary. This option corresponds to no_root_squash.

o Treat all client users as anonymous users — If this option is selected, all user and group IDs are mapped to the anonymous user. This option corresponds to all_squash.

o Specify local user ID for anonymous users — If Treat all client users as anonymous users is selected, this option lets you specify a user ID for the anonymous user. This option corresponds to anonuid.

o Specify local group ID for anonymous users — If Treat all client users as anonymous users is selected, this option lets you specify a group ID for the anonymous user. This option corresponds to anongid.

To edit an existing NFS share, select the share from the list, and click the Properties button. To delete an existing NFS share, select the share from the list, and click the Delete button.

After clicking OK to add, edit, or delete an NFS share from the list, the changes take place immediately — the server daemon is restarted, and the old configuration file is saved as /etc/exports.bak. The new configuration is written to /etc/exports.

The NFS Server Configuration Tool reads and writes directly to the /etc/exports configuration file. Thus, the file can be modified manually after using the tool, and the tool can be used after modifying the file manually (provided the file was modified with correct syntax).

65 Starting and Stopping the Server

On the server that is exporting NFS file systems, the nfs service must be running. View the status of the NFS daemon with the following command:

/sbin/service nfs status

Start the NFS daemon with the following command:

/sbin/service nfs start

Stop the NFS daemon with the following command:

/sbin/service nfs stop

To start the nfs service at boot time, use the command:

/sbin/chkconfig –level 345 nfs on

You can also use chkconfig, ntsysv or the Services Configuration Tool to configure which services start at boot time.

2 Backups

Your data is valuable. It will cost you time and effort to re-create it, and that costs money or at least personal grief and tears; sometimes it can't even be re-created, e.g., if it is the results of some experiments. Since it is an investment, you should protect it and take steps to avoid losing it.

There are basically four reasons why you might lose data: hardware failures, software bugs, human action, or natural disasters. Although modern hardware tends to be quite reliable, it can still break seemingly spontaneously. The most critical piece of hardware for storing data is the hard disk, which relies on tiny magnetic fields remaining intact in a world filled with electromagnetic noise. Modern software doesn't even tend to be reliable; a rock solid program is an exception, not a rule. Humans are quite unreliable, they will either make a mistake, or they will be malicious and destroy data on purpose. Nature might not be evil, but it can wreak havoc even when being good. All in all, it is a small miracle that anything works at all.

Backups are a way to protect the investment in data. By having several copies of the data, it does not matter as much if one is destroyed (the cost is only that of the restoration of the lost data from the backup).

It is important to do backups properly. Like everything else that is related to the physical world, backups will fail sooner or later. Part of doing backups well is to make sure they work; you don't want to notice that your backups didn't work. Adding insult to injury, you might have a bad crash just as you're making the backup; if you have only one backup medium, it might destroyed as well, leaving you with the smoking ashes of hard work. Or you might notice, when trying to restore, that you forgot to back up something important, like the user database on a 15000 user site. Best of all, all your backups might be working perfectly, but the last known tape drive reading the kind of tapes you used was the one that now has a bucketful of water in it.

66 Selecting the Backup Medium

The most important decision regarding backups is the choice of backup medium. You need to consider cost, reliability, speed, availability, and usability.

Cost is important, since you should preferably have several times more backup storage than what you need for the data. A cheap medium is usually a must.

Reliability is extremely important, since a broken backup can make a grown man cry. A backup medium must be able to hold data without corruption for years. The way you use the medium affects it reliability as a backup medium. A hard disk is typically very reliable, but as a backup medium it is not very reliable, if it is in the same computer as the disk you are backing up.

Speed is usually not very important, if backups can be done without interaction. It doesn't matter if a backup takes two hours, as long as it needs no supervision. On the other hand, if the backup can't be done when the computer would otherwise be idle, then speed is an issue.

Availability is obviously necessary, since you can't use a backup medium if it doesn't exist. Less obvious is the need for the medium to be available even in the future, and on computers other than your own. Otherwise you may not be able to restore your backups after a disaster.

Usability is a large factor in how often backups are made. The easier it is to make backups, the better. A backup medium mustn't be hard or boring to use.

The typical alternatives are floppies and tapes. Floppies are very cheap, fairly reliable, not very fast, very available, but not very usable for large amounts of data. Tapes are cheap to somewhat expensive, fairly reliable, fairly fast, quite available, and, depending on the size of the tape, quite comfortable.

There are other alternatives. They are usually not very good on availability, but if that is not a problem, they can be better in other ways. For example, magneto-optical disks can have good sides of both floppies (they're random access, making restoration of a single file quick) and tapes (contain a lot of data).

67 Selecting the backup tool

There are many tools that can be used to make backups. The traditional UNIX tools used for backups are tar, cpio, and dump. In addition, there are large number of third party packages (both freeware and commercial) that can be used. The choice of backup medium can affect the choice of tool.

tar and cpio are similar, and mostly equivalent from a backup point of view. Both are capable of storing files on tapes, and retrieving files from them. Both are capable of using almost any media, since the kernel device drivers take care of the low level device handling and the devices all tend to look alike to user level programs. Some UNIX versions of tar and cpio may have problems with unusual files (symbolic links, device files, files with very long pathnames, and so on), but the Linux versions should handle all files correctly.

dump is different in that it reads the filesystem directly and not via the filesystem. It is also written specifically for backups; tar and cpio are really for archiving files, although they work for backups as well.

Reading the filesystem directly has some advantages. It makes it possible to back files up without affecting their time stamps; for tar and cpio, you would have to mount the filesystem read-only first. Directly reading the filesystem is also more effective, if everything needs to be backed up, since it can be done with much less disk head movement. The major disadvantage is that it makes the backup program specific to one filesystem type; the Linux dump program understands the ext2 filesystem only.

dump also directly supports backup levels (which we'll be discussing below); with tar and cpio this has to be implemented with other tools.

A comparison of the third party backup tools is beyond the scope of this book. The Linux Software Map lists many of the freeware ones.

68 Simple backups

A simple backup scheme is to back up everything once, then back up everything that has been modified since the previous backup. The first backup is called a full backup, the subsequent ones are incremental backups. A full backup is often more labourious than incremental ones, since there is more data to write to the tape and a full backup might not fit onto one tape (or floppy). Restoring from incremental backups can be many times more work than from a full one. Restoration can be optimised so that you always back up everything since the previous full backup; this way, backups are a bit more work, but there should never be a need to restore more than a full backup and an incremental backup.

If you want to make backups every day and have six tapes, you could use tape 1 for the first full backup (say, on a Friday), and tapes 2 to 5 for the incremental backups (Monday through Thursday). Then you make a new full backup on tape 6 (second Friday), and start doing incremental ones with tapes 2 to 5 again. You don't want to overwrite tape 1 until you've got a new full backup, lest something happens while you're making the full backup. After you've made a full backup to tape 6, you want to keep tape 1 somewhere else, so that when your other backup tapes are destroyed in the fire, you still have at least something left. When you need to make the next full backup, you fetch tape 1 and leave tape 6 in its place.

If you have more than six tapes, you can use the extra ones for full backups. Each time you make a full backup, you use the oldest tape. This way you can have full backups from several previous weeks, which is good if you want to find an old, now deleted file, or an old version of a file.

69 Making Backups with tar

A full backup can easily be made with tar:

# tar --create --file /dev/ftape /usr/src

tar: Removing leading / from absolute path names in the archive

The example above uses the GNU version of tar and its long option names. The traditional version of tar only understands single character options. The GNU version can also handle backups that don't fit on one tape or floppy, and also very long paths; not all traditional versions can do these things. (Linux only uses GNU tar.) If your backup doesn't fit on one tape, you need to use the --multi-volume (-M) option:

# tar -cMf /dev/fd0H1440 /usr/src

tar: Removing leading / from absolute path names in the archive

Prepare volume #2 for /dev/fd0H1440 and hit return:

Note that you should format the floppies before you begin the backup, or else use another window or virtual terminal and do it when tar asks for a new floppy. After you've made a backup, you should check that it is OK, using the --compare (-d) option:

# tar --compare --verbose -f /dev/ftape

usr/src/

usr/src/linux

usr/src/linux-1.2.10-includes/

....

Failing to check a backup means that you will not notice that your backups aren't working until after you've lost the original data. An incremental backup can be done with tar using the --newer (-N) option:

# tar --create --newer '8 Sep 1995' –file /dev/ftape /usr/src --verbose

tar: Removing leading / from absolute path names in the archive

usr/src/

usr/src/linux-1.2.10-includes/

usr/src/linux-1.2.10-includes/include/

usr/src/linux-1.2.10-includes/include/linux/

usr/src/linux-1.2.10-includes/include/linux/modules/

usr/src/linux-1.2.10-includes/include/asm-generic/

usr/src/linux-1.2.10-includes/include/asm-i386/

usr/src/linux-1.2.10-includes/include/asm-mips/

usr/src/linux-1.2.10-includes/include/asm-alpha/

usr/src/linux-1.2.10-includes/include/asm-m68k/

usr/src/linux-1.2.10-includes/include/asm-sparc/

usr/src/patch-1.2.11.gz

Unfortunately, tar can't notice when a file's inode information has changed, for example, that its permission bits have been changed, or when its name has been changed. This can be worked around using

find and comparing current filesystem state with lists of files that have been previously backed up. Scripts and programs for doing this can be found on Linux ftp sites.

Restoring files with tar

The --extract (-x) option for tar extracts files:

# tar --extract --same-permissions --verbose --file /dev/fd0H1440

usr/src/

usr/src/linux

usr/src/linux-1.2.10-includes/

usr/src/linux-1.2.10-includes/include/

usr/src/linux-1.2.10-includes/include/linux/

usr/src/linux-1.2.10-includes/include/linux/hdreg.h

usr/src/linux-1.2.10-includes/include/linux/kernel.h

...

You also extract only specific files or directories (which includes all their files and subdirectories) by naming on the command line:

# tar xpvf /dev/fd0H1440 usr/src/linux-1.2.10-includes/include/linux/hdreg.h

usr/src/linux-1.2.10-includes/include/linux/hdreg.h

Use the --list (-t) option, if you just want to see what files are on a backup volume:

# tar --list --file /dev/fd0H1440

usr/src/

usr/src/linux

usr/src/linux-1.2.10-includes/

usr/src/linux-1.2.10-includes/include/

usr/src/linux-1.2.10-includes/include/linux/

usr/src/linux-1.2.10-includes/include/linux/hdreg.h

usr/src/linux-1.2.10-includes/include/linux/kernel.h

...

Note that tar always reads the backup volume sequentially, so for large volumes it is rather slow. It is not possible, however, to use random access database techniques when using a tape drive or some other sequential medium.

tar doesn't handle deleted files properly. If you need to restore a filesystem from a full and an incremental backup, and you have deleted a file between the two backups, it will exist again after you have done the restore. This can be a big problem, if the file has sensitive data that should no longer be available.

70 Multilevel backups

The simple backup method outlined in the previous section is often quite adequate for personal use or small sites. For more heavy duty use, multilevel backups are more appropriate.

The simple method has two backup levels: full and incremental backups. This can be generalised to any number of levels. A full backup would be level 0, and the different levels of incremental backups levels 1, 2, 3, etc. At each incremental backup level you back up everything that has changed since the previous backup at the same or a previous level.

The purpose for doing this is that it allows a longer backup history cheaply. In the example in the previous section, the backup history went back to the previous full backup. This could be extended by having more tapes, but only a week per new tape, which might be too expensive. A longer backup history is useful, since deleted or corrupted files are often not noticed for a long time. Even a version of a file that is not very up to date is better than no file at all.

With multiple levels the backup history can be extended more cheaply. For example, if we buy ten tapes, we could use tapes 1 and 2 for monthly backups (first Friday each month), tapes 3 to 6 for weekly backups (other Fridays; note that there can be five Fridays in one month, so we need four more tapes), and tapes 7 to 10 for daily backups (Monday to Thursday). With only four more tapes, we've been able to extend the backup history from two weeks (after all daily tapes have been used) to two months. It is true that we can't restore every version of each file during those two months, but what we can restore is often good enough.

Figure 1 shows which backup level is used each day, and which backups can be restored from at the end of the month.

Figure 1. A sample multilevel backup schedule.

Backup levels can also be used to keep filesystem restoration time to a minimum. If you have many incremental backups with monotonously growing level numbers, you need to restore all of them if you need to rebuild the whole filesystem. Instead you can use level numbers that aren't monotonous, and keep down the number of backups to restore.

To minimise the number of tapes needed to restore, you could use a smaller level for each incremental tape. However, then the time to make the backups increases (each backup copies everything since the previous full backup). A better scheme is suggested by the dump manual page and described by the table XX (efficient-backup-levels). Use the following succession of backup levels: 3, 2, 5, 4, 7, 6, 9, 8, 9, etc. This keeps both the backup and restore times low. The most you have to backup is two day's worth of work. The number of tapes for a restore depends on how long you keep between full backups, but it is less than in the simple schemes.

Table 1. Efficient backup scheme using many backup levels

Tape	Level	Backup (days)	Restore tapes
1	0	n/a	1
2	3	1	1, 2
3	2	2	1, 3
4	5	1	1, 2, 4
5	4	2	1, 2, 5
6	7	1	1, 2, 5, 6
7	6	2	1, 2, 5, 7
8	9	1	1, 2, 5, 7, 8
9	8	2	1, 2, 5, 7, 9
10	9	1	1, 2, 5, 7, 9, 10
11	9	1	1, 2, 5, 7, 9, 10, 11
...	9	1	1, 2, 5, 7, 9, 10, 11, ...

A fancy scheme can reduce the amount of labour needed, but it does mean there are more things to keep track of. You must decide if it is worth it.

dump has built-in support for backup levels. For tar and cpio it must be implemented with shell scripts.

71 What to back up

You want to back up as much as possible. The major exception is software that can be easily reinstalled, but even they may have configuration files that it is important to back up, lest you need to do all the work to configure them all over again. Another major exception is the /proc filesystem; since that only contains data that the kernel always generates automatically, it is never a good idea to back it up. Especially the /proc/kcore file is unnecessary, since it is just an image of your current physical memory; it's pretty large as well.

Gray areas include the news spool, log files, and many other things in /var. You must decide what you consider important.

The obvious things to back up are user files (/home) and system configuration files (/etc, but possibly other things scattered all over the filesystem).

72 Compressed backups

Backups take a lot of space, which can cost quite a lot of money. To reduce the space needed, the backups can be compressed. There are several ways of doing this. Some programs have support for for compression built in; for example, the --gzip (-z) option for GNU tar pipes the whole backup through the gzip compression program, before writing it to the backup medium.

Unfortunately, compressed backups can cause trouble. Due to the nature of how compression works, if a single bit is wrong, all the rest of the compressed data will be unusable. Some backup programs have some built in error correction, but no method can handle a large number of errors. This means that if the backup is compressed the way GNU tar does it, with the whole output compressed as a unit, a single error makes all the rest of the backup lost. Backups must be reliable, and this method of compression is not a good idea.

An alternative way is to compress each file separately. This still means that the one file is lost, but all other files are unharmed. The lost file would have been corrupted anyway, so this situation is not much worse than not using compression at all. The afio program (a variant of cpio) can do this.

Compression takes some time, which may make the backup program unable to write data fast enough for a tape drive. This can be avoided by buffering the output (either internally, if the backup program if smart enough, or by using another program), but even that might not work well enough. This should only be a problem on slow computers.

Log Files

Log files are files that contain messages about the system, including the kernel, services, and applications running on it. There are different log files for different information. For example, there is a default system log file, a log file just for security messages, and a log file for cron tasks.

Log files can be very useful when trying to troubleshoot a problem with the system such as trying to load a kernel driver or when looking for unauthorized log in attempts to the system. This chapter discusses where to find log files, how to view log files, and what to look for in log files.

Some log files are controlled by a daemon called `syslogd`. A list of log messages maintained by `syslogd` can be found in the `/etc/syslog.conf` configuration file.

Locating Log Files

Most log files are located in the `/var/log` directory. Some applications such as `httpd` and `samba` have a directory within `/var/log` for their log files. Notice the multiple files in the log file directory with numbers after them. These are created when the log files are rotated. Log files are rotated so their file sizes do not become too large. The `logrotate` package contains a cron task that automatically rotates log files according to the `/etc/logrotate.conf` configuration file and the configuration files in the `/etc/logrotate.d` directory. By default, it is configured to rotate every week and keep four weeks worth of previous log files.

Viewing Log Files

Most log files are in plain text format. You can view them with any text editor such as Vi or Emacs. Some log files are readable by all users on the system; however, root priviledges are required to read most log files.

To view system log files in an interactive, real-time application, use the Log Viewer. To start the application, go to the Main Menu Button (on the Panel) => System Tools => System Logs, or type the command `redhat-logviewer` at a shell prompt.

Figure 1. Log Viewer

The application only displays log files that exist; thus, the list might differ from the one shown in Figure 29-1. To view the complete list of log files that it can view, refer to the configuration file, /etc/sysconfig/redhat-logviewer.

By default, the currently viewable log file is refreshed every 30 seconds. To change the refresh rate, select Edit => Preferences from the pulldown menu. The window shown in Figure 29-2 will appear. In the Log Files tab, click the up and down arrows beside the refresh rate to change it. Click Close to return to the main window. The refresh rate is changed immediately. To refresh the currently viewable file manually, select File => Refresh Now or press [Ctrl]-[R].

To filter the contents of the log file for keywords, type the keyword or keywords in the Filter for text field, and click Filter. Click Reset to reset the contents.

You can also change where the application looks for the log files from the Log Files tab. Select the log file from the list, and click the Change Location button. Type the new location of the log file or click the Browse button to locate the file location using a file selection dialog. Click OK to return to the preferences, and click Close to return to the main window.

Figure 2. Log File Locations

Examining Log Files

Log Viewer can be configured to display an alert icon beside lines that contain key alert words. To add alerts words, select Edit => Preferences from the pulldown menu, and click on the Alerts tab. Click the Add button to add an alert word. To delete an alert word, select the word from the list, and click Delete.

Figure 29-3. Alerts

73 Questions

1. What is Network Information System (NIS)?

2. What are the advantages of using NIS? How does it differ from Domain Name System (DNS)?

3. How does NIS keep information?

4. What is the difference of NIS and NIS+?

5. Identify the steps required to configure a Linux machine as a NIS server.

6. Identify the steps required to configure a Linux machine as a NIS client.

7. How does an NIS domain differ from a DNS domain?

8. What are NIS map files?

9. How can you set the NIS domain name of a Linux machine?

10. Describe the different ways by which a NIS client may find a NIS server.

11. What security flaws are associated with NIS? How they can be mitigated?

12. How can you test the successful installation of an NIS client?

13. What is Network File System (NFS)?

14. How does NFS work?

15. What are the usefulness of NFS? Give examples.

16. Identify the steps required to configure a Linux machine as a NFS server.

17. Identify the steps required to configure a Linux machine as a NFS client.

18. How can you check whether a Linux kernel has support for NFS or not?

19. What is the difference between hard-mounting and soft-mounting a NFS share?

20. What are the different daemons that implement NFS server and client services?

21. Explain the following export options that are available while configuring a NFS server.

secure, ro, rw, roo_squash, sync

22. How can you mount a NFS share using /etc/fstab?

23. How can you mount a NFS share using autofs?

24. What is major-timeout and minor-timeout?

25. How can you keep track of which directories have been mounted by what hosts in a NFS server?

26. Explain the significance of the following options while mounting a NFS share.

rsize, wsize, timeo, hard, soft, intr

27. What are the common reasons of data-loss?

28. What are the different factors that drive the selection of a backup medium?

29. Compare floppies and tapes as backup medium.

30. What is difference between a full backup and an incremental backup?

31. What is a multilevel backup?

Tuesday, April 21, 2009

Book of Linux Sys Admin

1 Introduction

1.1 Duties of Administrator

2.1 Processes

1.2.3 Users and groups

12.5.1 Where are the commands?

1.2.6 Controlling Processes

12.7.1 Real UID and GID

12.7.2 Effective UID and GID

3.1 Files

1.3.2 File attributes

1.3.5 Users, groups and others

1.3.6 File permissions

1.3.7 Special permissions

13.7.1 Sticky bit on a file

13.7.2 Sticky bit on a directory

1.3.10 Changing file permissions

4.1 Process Control and Multitasking

1.4.1 Viewing processes

1.4.2 Killing processes

5.1 Boot Process, Init, and Shutdown

6.1 The Boot Process

7.1 A Detailed Look at the Boot Process

1.7.1 The BIOS

1.7.2 The Boot Loader

17.2.1 Boot Loaders for Other Architectures

1.7.3 The Kernel

1.7.4 The /sbin/init Program

8.1 Running Additional Programs at Boot Time

9.1 SysV Init Runlevels

1.9.1 Runlevels

1.9.2 Runlevel Utilities

10.1 Shutting Down

2 Managing User Accounting

1.2 What's an account?

2.2 Creating a user

3.2 /etc/passwd and other informative files

4.2 Picking numeric user and group ids

5.2 Initial environment: /etc/skel

6.2 Creating a user by hand

7.2 Changing user properties

8.2 Removing a user

9.2 Disabling a user temporarily

10.2 Command Line Configuration

2.10.1 Adding a User

2.10.2 Adding a Group

2.10.3 Password Aging

2.10.4 Explaining the Process

3 File System Structure

1.3 Overview of File System Hierarchy Standard (FHS)

3.1.1 FHS Organization

3.1.2 Special File Locations

2.3 The ext2 File System

3.3 The ext3 File System

4 Configuring TCP/IP Networking

1.4 Mounting the /proc Filesystem

2.4 Installing the Binaries

3.4 Setting the Hostname

4.4 Assigning IP Addresses

5.4 Creating Subnets

6.4 Writing hosts and networks Files

7.4 Interface Configuration for IP

4.7.1 The Loopback Interface

4.7.2 Ethernet Interfaces

4.7.3 Routing Through a Gateway

4.7.4 Configuring a Gateway

4.7.5 IP Alias

8.4 All About ifconfig

9.4 The netstat Command

4.9.1 Displaying the Routing Table

4.9.2 Displaying Interface Statistics

4.9.3 Displaying Connections

10.4 Checking the ARP Tables

11.4 Name Service and Resolver Configuration

4.11.1 The Resolver Library

4.11.2 The host.conf File

4.11.3 The nsswitch.conf File

4.11.4 Configuring Name Server Lookups Using resolv.conf

4.11.5 Resolver Robustness

1.7.4 The `/sbin/init` Program

3.2 `/etc/passwd` and other informative files

5.2 Initial environment: `/etc/skel`

19 Differences between `iptables` and `ipchains`

20 Options Used in `iptables` Commands

`INPUT` Filtering

`OUTPUT` Filtering

`FORWARD` and NAT Rules