A Hands on Guide
Version 1.23 20060725 Edition
Copyright © 2002, 2003, 2004, 2005, 2006 Machtelt Garrels
First published December 2002
Many people still believe that learning Linux is difficult, or that only experts can understand how a Linux system works. Though there is a lot of free documentation available, the documentation is widely scattered on the Web, and often confusing, since it is usually oriented toward experienced UNIX or Linux users. Today, thanks to the advancements in development, Linux has grown in popularity both at home and at work. The goal of this guide is to show people of all ages that Linux can be simple and fun, and used for all kinds of purposes.
This guide was created as an overview of the Linux Operating System, geared toward new users as an exploration tour and getting started guide, with exercises at the end of each chapter. For more advanced trainees it can be a desktop reference, and a collection of the base knowledge needed to proceed with system and network administration. This book contains many real life examples derived from the author's experience as a Linux system and network administrator, trainer and consultant. We hope these examples will help you to get a better understanding of the Linux system and that you feel encouraged to try out things on your own.
Everybody who wants to get a "CLUE", a Command Line User Experience, with Linux (and UNIX in general) will find this book useful.
This document is published in the Guides section of the Linux Documentation Project collection at http://www.tldp.org/guides.html; you can also download PDF and PostScript formatted versions here.
The most recent edition is available at http://tille.xalasys.com/training/tldp/.
This guide is available in print from Fultus.com Books by Print On Demand. Fultus distributes this document to many bookstores, including Baker & Taylor and the on-line bookstores Amazon.com, Amazon.co.uk, BarnesAndNoble.com and Google's Froogle.
The guide has been translated into Hindi by:
Many thanks to all the people who shared their experiences. And especially to the Belgian Linux users for hearing me out every day and always being generous in their comments.
Also a special thought for Tabatha Marshall for doing a really thorough revision, spell check and styling, and to Eugene Crosser for spotting the errors that we two overlooked.
And thanks to all the readers who notified me about missing topics and who helped to pick out the last errors, unclear definitions and typos by going through the trouble of mailing me all their remarks. These are also the people who help me keep this guide up to date, like Filipus Klutiero who did a complete review in 2005 and 2006, and Alexey Eremenko who sent me the foundation for chapter 11.
Finally, a big thank you for the volunteers who are currently translating this document in French, Swedish, German, Farsi, Hindi and more. It is a big work that should not be underestimated; I admire your courage.
Missing information, missing links, missing characters? Mail it to the maintainer of this document:
Don't forget to check with the latest version first!
© 2002-2006 Machtelt Garrels.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation. A copy of the license is included in Appendix D entitled "GNU Free Documentation License".
The author and publisher have made every effort in the preparation of this book to ensure the accuracy of the information. However, the information contained in this book is offered without warranty, either express or implied. Neither the author nor the publisher nor any dealer or distributor will be held liable for any damages caused or alleged to be caused either directly or indirectly by this book.
The logos, trademarks and symbols used in this book are the properties of their respective owners.
You will require a computer and a medium containing a Linux distribution. Most of this guide applies to all Linux distributions - and UNIX in general. Apart from time, there are no further specific requirements.
The Installation HOWTO contains helpful information on how to obtain Linux software and install it on your computer. Hardware requirements and coexistence with other operating systems are also discussed.
An interesting alternative for those who don't dare to take the step of an actual Linux installation on their machine are the Linux distributions that you can run from a CD, such as the Knoppix distribution.
The following typographic and usage conventions occur in this text:
Table 1. Typographic and usage conventions
The following images are used:
This guide aims to be the foundation for all other materials that you can get from The Linux Documentation Project. As such, it provides you with the fundamental knowledge needed by anyone who wants to start working with a Linux system, while at the same time it tries to consciously avoid re-inventing the hot water. Thus, you can expect this book to be incomplete and full of links to sources of additional information on your system, on the Internet and in your system documentation.
The first chapter is an introduction to the subject on Linux; the next two discuss absolute basic commands. Chapters 4 and 5 discuss some more advanced but still basic topics. Chapter 6 is needed for continuing with the rest, since it discusses editing files, an ability you need to pass from Linux newbie to Linux user. The following chapters discuss somewhat more advanced topics that you will have to deal with in everyday Linux use.
All chapters come with exercises that will test your preparedness for the next chapter.
In order to understand the popularity of Linux, we need to travel back in time, about 30 years ago...
Imagine computers as big as houses, even stadiums. While the sizes of those computers posed substantial problems, there was one thing that made this even worse: every computer had a different operating system. Software was always customized to serve a specific purpose, and software for one given system didn't run on another system. Being able to work with one system didn't automatically mean that you could work with another. It was difficult, both for the users and the system administrators.
Computers were extremely expensive then, and sacrifices had to be made even after the original purchase just to get the users to understand how they worked. The total cost per unit of computing power was enormous.
Technologically the world was not quite that advanced, so they had to live with the size for another decade. In 1969, a team of developers in the Bell Labs laboratories started working on a solution for the software problem, to address these compatibility issues. They developed a new operating system, which was
The Bell Labs developers named their project "UNIX."
The code recycling features were very important. Until then, all commercially available computer systems were written in a code specifically developed for one system. UNIX on the other hand needed only a small piece of that special code, which is now commonly named the kernel. This kernel is the only piece of code that needs to be adapted for every specific system and forms the base of the UNIX system. The operating system and all other functions were built around this kernel and written in a higher programming language, C. This language was especially developed for creating the UNIX system. Using this new technique, it was much easier to develop an operating system that could run on many different types of hardware.
The software vendors were quick to adapt, since they could sell ten times more software almost effortlessly. Weird new situations came in existence: imagine for instance computers from different vendors communicating in the same network, or users working on different systems without the need for extra education to use another computer. UNIX did a great deal to help users become compatible with different systems.
Throughout the next couple of decades the development of UNIX continued. More things became possible to do and more hardware and software vendors added support for UNIX to their products.
UNIX was initially found only in very large environments with mainframes and minicomputers (note that a PC is a "micro" computer). You had to work at a university, for the government or for large financial corporations in order to get your hands on a UNIX system.
But smaller computers were being developed, and by the end of the 80's, many people had home computers. By that time, there were several versions of UNIX available for the PC architecture, but none of them were truly free and more important: they were all terribly slow, so most people ran MS DOS or Windows 3.1 on their home PCs.
By the beginning of the 90s home PCs were finally powerful enough to run a full blown UNIX. Linus Torvalds, a young man studying computer science at the university of Helsinki, thought it would be a good idea to have some sort of freely available academic version of UNIX, and promptly started to code.
He started to ask questions, looking for answers and solutions that would help him get UNIX on his PC. Below is one of his first posts in comp.os.minix, dating from 1991:
From the start, it was Linus' goal to have a free system that was completely compliant with the original UNIX. That is why he asked for POSIX standards, POSIX still being the standard for UNIX.
In those days plug-and-play wasn't invented yet, but so many people were interested in having a UNIX system of their own, that this was only a small obstacle. New drivers became available for all kinds of new hardware, at a continuously rising speed. Almost as soon as a new piece of hardware became available, someone bought it and submitted it to the Linux test, as the system was gradually being called, releasing more free code for an ever wider range of hardware. These coders didn't stop at their PC's; every piece of hardware they could find was useful for Linux.
Back then, those people were called "nerds" or "freaks", but it didn't matter to them, as long as the supported hardware list grew longer and longer. Thanks to these people, Linux is now not only ideal to run on new PC's, but is also the system of choice for old and exotic hardware that would be useless if Linux didn't exist.
Two years after Linus' post, there were 12000 Linux users. The project, popular with hobbyists, grew steadily, all the while staying within the bounds of the POSIX standard. All the features of UNIX were added over the next couple of years, resulting in the mature operating system Linux has become today. Linux is a full UNIX clone, fit for use on workstations as well as on middle-range and high-end servers. Today, a lot of the important players on the hard- and software market each have their team of Linux developers; at your local dealer's you can even buy pre-installed Linux systems with official support - eventhough there is still a lot of hard- and software that is not supported, too.
Today Linux has joined the desktop market. Linux developers concentrated on networking and services in the beginning, and office applications have been the last barrier to be taken down. We don't like to admit that Microsoft is ruling this market, so plenty of alternatives have been started over the last couple of years to make Linux an acceptable choice as a workstation, providing an easy user interface and MS compatible office applications like word processors, spreadsheets, presentations and the like.
On the server side, Linux is well-known as a stable and reliable platform, providing database and trading services for companies like Amazon, the well-known online bookshop, US Post Office, the German army and such. Especially Internet providers and Internet service providers have grown fond of Linux as firewall, proxy- and web server, and you will find a Linux box within reach of every UNIX system administrator who appreciates a comfortable management station. Clusters of Linux machines are used in the creation of movies such as "Titanic", "Shrek" and others. In post offices, they are the nerve centers that route mail and in large search engine, clusters are used to perform internet searches.These are only a few of the thousands of heavy-duty jobs that Linux is performing day-to-day across the world.
It is also worth to note that modern Linux not only runs on workstations, mid- and high-end servers, but also on "gadgets" like PDA's, mobiles, a shipload of embedded applications and even on experimental wristwatches. This makes Linux the only operating system in the world covering such a wide range of hardware.
Whether Linux is difficult to learn depends on the person you're asking. Experienced UNIX users will say no, because Linux is an ideal operating system for power-users and programmers, because it has been and is being developed by such people.
Everything a good programmer can wish for is available: compilers, libraries, development and debugging tools. These packages come with every standard Linux distribution. The C-compiler is included for free - as opposed to many UNIX distributions demanding licensing fees for this tool. All the documentation and manuals are there, and examples are often included to help you get started in no time. It feels like UNIX and switching between UNIX and Linux is a natural thing.
In the early days of Linux, being an expert was kind of required to start using the system. Those who mastered Linux felt better than the rest of the "lusers" who hadn't seen the light yet. It was common practice to tell a beginning user to "RTFM" (read the manuals). While the manuals were on every system, it was difficult to find the documentation, and even if someone did, explanations were in such technical terms that the new user became easily discouraged from learning the system.
The Linux-using community started to realize that if Linux was ever to be an important player on the operating system market, there had to be some serious changes in the accessibility of the system.
Companies such as RedHat, SuSE and Mandriva have sprung up, providing packaged Linux distributions suitable for mass consumption. They integrated a great deal of graphical user interfaces (GUIs), developed by the community, in order to ease management of programs and services. As a Linux user today you have all the means of getting to know your system inside out, but it is no longer necessary to have that knowledge in order to make the system comply to your requests.
Nowadays you can log in graphically and start all required applications without even having to type a single character, while you still have the ability to access the core of the system if needed. Because of its structure, Linux allows a user to grow into the system: it equally fits new and experienced users. New users are not forced to do difficult things, while experienced users are not forced to work in the same way they did when they first started learning Linux.
While development in the service area continues, great things are being done for desktop users, generally considered as the group least likely to know how a system works. Developers of desktop applications are making incredible efforts to make the most beautiful desktops you've ever seen, or to make your Linux machine look just like your former MS Windows or MacIntosh workstation. The latest developments also include 3D acceleration support and support for USB devices, single-click updates of system and packages, and so on. Linux has these, and tries to present all available services in a logical form that ordinary people can understand. Below is a short list containing some great examples; these sites have a lot of screenshots that will give you a glimpse of what Linux on the desktop can be like:
The idea behind Open Source software is rather simple: when programmers can read, distribute and change code, the code will mature. People can adapt it, fix it, debug it, and they can do it at a speed that dwarfs the performance of software developers at conventional companies. This software will be more flexible and of a better quality than software that has been developed using the conventional channels, because more people have tested it in more different conditions than the closed software developer ever can.
The Open Source initiative started to make this clear to the commercial world, and very slowly, commercial vendors are starting to see the point. While lots of academics and technical people have already been convinced for 20 years now that this is the way to go, commercial vendors needed applications like the Internet to make them realize they can profit from Open Source. Now Linux has grown past the stage where it was almost exclusively an academic system, useful only to a handful of people with a technical background. Now Linux provides more than the operating system: there is an entire infrastructure supporting the chain of effort of creating an operating system, of making and testing programs for it, of bringing everything to the users, of supplying maintenance, updates and support and customizations, etcetera. Today, Linux is ready to accept the challenge of a fast-changing world.
While Linux is probably the most well-known Open Source initiative, there is another project that contributed enormously to the popularity of the Linux operating system. This project is called SAMBA, and its achievement is the reverse engineering of the Server Message Block (SMB)/Common Internet File System (CIFS) protocol used for file- and print-serving on PC-related machines, natively supported by MS Windows NT and OS/2, and Linux. Packages are now available for almost every system and provide interconnection solutions in mixed environments using MS Windows protocols: Windows-compatible (up to and includingWinXP) file- and print-servers.
Maybe even more successful than the SAMBA project is the Apache HTTP server project. The server runs on UNIX, Windows NT and many other operating systems. Originally known as "A PAtCHy server", based on existing code and a series of "patch files", the name for the matured code deserves to be connoted with the native American tribe of the Apache, well-known for their superior skills in warfare strategy and inexhaustible endurance. Apache has been shown to be substantially faster, more stable and more feature-full than many other web servers. Apache is run on sites that get millions of visitors per day, and while no official support is provided by the developers, the Apache user community provides answers to all your questions. Commercial support is now being provided by a number of third parties.
In the category of office applications, a choice of MS Office suite clones is available, ranging from partial to full implementations of the applications available on MS Windows workstations. These initiatives helped a great deal to make Linux acceptable for the desktop market, because the users don't need extra training to learn how to work with new systems. With the desktop comes the praise of the common users, and not only their praise, but also their specific requirements, which are growing more intricate and demanding by the day.
The Open Source community, consisting largely of people who have been contributing for over half a decade, assures Linux' position as an important player on the desktop market as well as in general IT application. Paid employees and volunteers alike are working diligently so that Linux can maintain a position in the market. The more users, the more questions. The Open Source community makes sure answers keep coming, and watches the quality of the answers with a suspicious eye, resulting in ever more stability and accessibility.
Listing all the available Linux software is beyond the scope of this guide, as there are tens of thousands of packages. Throughout this course we will present you with the most common packages, which are almost all freely available. In order to take away some of the fear of the beginning user, here's a screenshot of one of your most-wanted programs. You can see for yourself that no effort has been spared to make users who are switching from Windows feel at home:
A lot of the advantages of Linux are a consequence of Linux' origins, deeply rooted in UNIX, except for the first advantage, of course:
Although there are a large number of Linux implementations, you will find a lot of similarities in the different distributions, if only because every Linux machine is a box with building blocks that you may put together following your own needs and views. Installing the system is only the beginning of a longterm relationship. Just when you think you have a nice running system, Linux will stimulate your imagination and creativeness, and the more you realize what power the system can give you, the more you will try to redefine its limits.
Linux may appear different depending on the distribution, your hardware and personal taste, but the fundamentals on which all graphical and other interfaces are built, remain the same. The Linux system is based on GNU tools (Gnu's Not UNIX), which provide a set of standard ways to handle and use the system. All GNU tools are open source, so they can be installed on any system. Most distributions offer pre-compiled packages of most common tools, such as RPM packages on RedHat and Debian packages (also called deb or dpkg) on Debian, so you needn't be a programmer to install a package on your system. However, if you are and like doing things yourself, you will enjoy Linux all the better, since most distributions come with a complete set of development tools, allowing installation of new software purely from source code. This setup also allows you to install software even if it does not exist in a pre-packaged form suitable for your system.
A list of common GNU software:
Many commercial applications are available for Linux, and for more information about these packages we refer to their specific documentation. Throughout this guide we will only discuss freely available software, which comes (in most cases) with a GNU license.
To install missing or new packages, you will need some form of software management. The most common implementations include RPM, dpkg and Ximian Red Carpet. RPM is the RedHat Package Manager, which is used on a variety of Linux systems, eventhough the name does not suggest this. Dpkg is the Debian package management system, which uses an interface called apt-get, that can manage RPM packages as well. Ximian Red Carpet is a third party implementation of RPM with a graphical front-end. Other third party software vendors may have their own installation procedures, sometimes resembling the InstallShield and such, as known on MS Windows and other platforms. As you advance into Linux, you will likely get in touch with one or more of these programs.
The Linux kernel (the bones of your system, see Section 126.96.36.199) is not part of the GNU project but uses the same license as GNU software. A great majority of utilities and development tools (the meat of your system), which are not Linux-specific, are taken from the GNU project. Because any usable system must contain both the kernel and at least a minimal set of utilities, some people argue that such a system should be called a GNU/Linux system.
In order to obtain the highest possible degree of independence between distributions, this is the sort of Linux that we will discuss throughout this course. If we are not talking about a GNU/Linux system, the specific distribution, version or program name will be mentioned.
Prior to installation, the most important factor is your hardware. Since every Linux distribution contains the basic packages and can be built to meet almost any requirement (because they all use the Linux kernel), you only need to consider if the distribution will run on your hardware. LinuxPPC for example has been made to run on MacIntosh and other PowerPCs and does not run on an ordinary x86 based PC. LinuxPPC does run on the new Macs, but you can't use it for some of the older ones with ancient bus technology. Another tricky case is Sun hardware, which could be an old SPARC CPU or a newer UltraSparc, both requiring different versions of Linux.
Some Linux distributions are optimized for certain processors, such as Athlon CPUs, while they will at the same time run decent enough on the standard 486, 586 and 686 Intel processors. Sometimes distributions for special CPUs are not as reliable, since they are tested by fewer people.
Most Linux distributions offer a set of programs for generic PCs with special packages containing optimized kernels for the x86 Intel based CPUs. These distributions are well-tested and maintained on a regular basis, focusing on reliant server implementation and easy installation and update procedures. Examples are Debian, Ubuntu, Fedora, SuSE and Mandriva, which are by far the most popular Linux systems and generally considered easy to handle for the beginning user, while not blocking professionals from getting the most out of their Linux machines. Linux also runs decently on laptops and middle-range servers. Drivers for new hardware are included only after extensive testing, which adds to the stability of a system.
While the standard desktop might be Gnome on one system, another might offer KDE by default. Generally, both Gnome and KDE are available for all major Linux distributions. Other window and desktop managers are available for more advanced users.
The standard installation process allows users to choose between different basic setups, such as a workstation, where all packages needed for everyday use and development are installed, or a server installation, where different network services can be selected. Expert users can install every combination of packages they want during the initial installation process.
The goal of this guide is to apply to all Linux distributions. For your own convenience, however, it is strongly advised that beginners stick to a mainstream distribution, supporting all common hardware and applications by default. The following are very good choices for novices:
Downloadable ISO-images can be obtained from LinuxISO.org. The main distributions can be purchased in any decent computer shop.
In this chapter, we learned that:
A practical exercise for starters: install Linux on your PC. Read the installation manual for your distribution and/or the Installation HOWTO and do it.
Things you must know BEFORE starting a Linux installation:
The full checklist can be found at http://www.tldp.org/HOWTO/Installation-HOWTO/index.html.
In the following chapters we will find out if the installation has been successful.
In order to work on a Linux system directly, you will need to provide a user name and password. You always need to authenticate to the system. As we already mentioned in the exercise from Chapter 1, most PC-based Linux systems have two basic modes for a system to run in: either quick and sober in text console mode, which looks like DOS with mouse, multitasking and multi-user features, or in graphical mode, which looks better but eats more system resources.
This is the default nowadays on most desktop computers. You know you will connect to the system using graphical mode when you are first asked for your user name, and then, in a new window, to type your password.
To log in, make sure the mouse pointer is in the login window, provide your user name and password to the system and clickor press Enter.
After entering your user name/password combination, it can take a little while before the graphical environment is started, depending on the CPU speed of your computer, on the software you use and on your personal settings.
To continue, you will need to open a terminal window or xterm for short (X being the name for the underlying software supporting the graphical environment). This program can be found in the-> , or menu, depending on what window manager you are using. There might be icons that you can use as a shortcut to get an xterm window as well, and clicking the right mouse button on the desktop background will usually present you with a menu containing a terminal window application.
While browsing the menus, you will notice that a lot of things can be done without entering commands via the keyboard. For most users, the good old point-'n'-click method of dealing with the computer will do. But this guide is for future network and system administrators, who will need to meddle with the heart of the system. They need a stronger tool than a mouse to handle all the tasks they will face. This tool is the shell, and when in graphical mode, we activate our shell by opening a terminal window.
The terminal window is your control panel for the system. Almost everything that follows is done using this simple but powerful text tool. A terminal window should always show a command prompt when you open one. This terminal shows a standard prompt, which displays the user's login name, and the current working directory, represented by the twiddle (~):
Another common form for a prompt is this one:
In the above example, user will be your login name, hosts the name of the machine you are working on, and dir an indication of your current location in the file system.
Later we will discuss prompts and their behavior in detail. For now, it suffices to know that prompts can display all kinds of information, but that they are not part of the commands you are giving to your system.
To disconnect from the system in graphical mode, you need to close all terminal windows and other applications. After that, hit the logout icon or findin the menu. Closing everything is not really necessary, and the system can do this for you, but session management might put all currently open applications back on your screen when you connect again, which takes longer and is not always the desired effect. However, this behavior is configurable.
When you see the login screen again, asking to enter user name and password, logout was successful.
You know you're in text mode when the whole screen is black, showing (in most cases white) characters. A text mode login screen typically shows some information about the machine you are working on, the name of the machine and a prompt waiting for you to log in:
The login is different from a graphical login, in that you have to hit the Enter key after providing your user name, because there are no buttons on the screen that you can click with the mouse. Then you should type your password, followed by another Enter. You won't see any indication that you are entering something, not even an asterisk, and you won't see the cursor move. But this is normal on Linux and is done for security reasons.
When the system has accepted you as a valid user, you may get some more information, called the message of the day, which can be anything. Additionally, it is popular on UNIX systems to display a fortune cookie, which contains some general wise or unwise (this is up to you) thoughts. After that, you will be given a shell, indicated with the same prompt that you would get in graphical mode.
Logging out is done by entering the logout command, followed by Enter. You are successfully disconnected from the system when you see the login screen again.
Now that we know how to connect to and disconnect from the system, we're ready for our first commands.
These are the quickies, which we need to get started; we will discuss them later in more detail.
Table 2-1. Quickstart commands
You type these commands after the prompt, in a terminal window in graphical mode or in text mode, followed by Enter.
Commands can be issued by themselves, such as ls. A command behaves different when you specify an option, usually preceded with a dash (-), as in ls -a. The same option character may have a different meaning for another command. GNU programs take long options, preceded by two dashes (--), like ls --all. Some commands have no options.
The argument(s) to a command are specifications for the object(s) on which you want the command to take effect. An example is ls /etc, where the directory /etc is the argument to the ls command. This indicates that you want to see the content of that directory, instead of the default, which would be the content of the current directory, obtained by just typing ls followed by Enter. Some commands require arguments, sometimes arguments are optional.
You can find out whether a command takes options and arguments, and which ones are valid, by checking the online help for that command, see Section 2.3.
In Linux, like in UNIX, directories are separated using forward slashes, like the ones used in web addresses (URLs). We will discuss directory structure in-depth later.
The symbols . and .. have special meaning when directories are concerned. We will try to find out about those during the exercises, and more in the next chapter.
Try to avoid logging in with or using the system administrator's account, root. Besides doing your normal work, most tasks, including checking the system, collecting information etc., can be executed using a normal user account with no special permissions at all. If needed, for instance when creating a new user or installing new software, the preferred way of obtaining root access is by switching user IDs, see Section 3.2.1 for an example.
Almost all commands in this book can be executed without system administrator privileges. In most cases, when issuing a command or starting a program as a non-privileged user, the system will warn you or prompt you for the root password when root access is required. Once you're done, leave the application or session that gives you root privileges immediately.
Reading documentation should become your second nature. Especially in the beginning, it is important to read system documentation, manuals for basic commands, HOWTOs and so on. Since the amount of documentation is so enormous, it is impossible to include all related documentation. This book will try to guide you to the most appropriate documentation on every subject discussed, in order to stimulate the habit of reading the man pages.
Several special key combinations allow you to do things easier and faster with the GNU shell, Bash, which is the default on almost any Linux system, see Section 188.8.131.52. Below is a list of the most commonly used features; you are strongly suggested to make a habit out of using them, so as to get the most out of your Linux experience from the very beginning.
Table 2-2. Key combinations in Bash
The last two items in the above table may need some extra explanantions. For instance, if you want to change into the directory directory_with_a_very_long_name, you are not going to type that very long name, no. You just type on the command line cd dir, then you press Tab and the shell completes the name for you, if no other files are starting with the same three characters. Of course, if there are no other items starting with "d", then you might just as wel type cd d and then Tab. If more than one file starts with the same characters, the shell will signal this to you, upon which you can hit Tab twice with short interval, and the shell presents the choices you have:
In the above example, if you type "a" after the first two characters and hit Tab again, no other possibilities are left, and the shell completes the directory name, without you having to type the string "rthere":
Of course, you'll still have to hit Enter to accept this choice.
In the same example, if you type "u", and then hit Tab, the shell will add the "ff" for you, but then it protests again, because multiple choices are possible. If you type Tab Tab again, you'll see the choices; if you type one or more characters that make the choice unambiguous to the system, and Tab again, or Enter when you've reach the end of the file name that you want to choose, the shell completes the file name and changes you into that directory - if indeed it is a directory name.
This works for all file names that are arguments to commands.
The same goes for command name completion. Typing ls and then hitting the Tab key twice, lists all the commands in your PATH (see Section 3.2.1) that start with these two characters:
GNU/Linux is all about becoming more self-reliant. And as usual with this system, there are several ways to achieve the goal. A common way of getting help is finding someone who knows, and however patient and peace-loving the Linux-using community will be, almost everybody will expect you to have tried one or more of the methods in this section before asking them, and the ways in which this viewpoint is expressed may be rather harsh if you prove not to have followed this basic rule.
A lot of beginning users fear the man (manual) pages, because they are an overwhelming source of documentation. They are, however, very structured, as you will see from the example below on: man man.
Reading man pages is usually done in a terminal window when in graphical mode, or just in text mode if you prefer it. Type the command like this at the prompt, followed by Enter:
The documentation for man will be displayed on your screen after you press Enter:
Browse to the next page using the space bar. You can go back to the previous page using the b-key. When you reach the end, man will usually quit and you get the prompt back. Type q if you want to leave the man page before reaching the end, or if the viewer does not quit automatically at the end of the page.
Each man page usually contains a couple of standard sections, as we can see from the man man example:
Some commands have multiple man pages. For instance, the passwd command has a man page in section 1 and another in section 5. By default, the man page with the lowest number is shown. If you want to see another section than the default, specify it after the man command:
man 5 passwd
If you want to see all man pages about a command, one after the other, use the -a to man:
man -a passwd
This way, when you reach the end of the first man page and press SPACE again, the man page from the next section will be displayed.
In addition to the man pages, you can read the info pages about a command, using the info command. These usually contain more recent information and are somewhat easier to use. The man pages for some commands refer to the info pages.
Get started by typing info info in a terminal window:
Use the arrow keys to browse through the text and move the cursor on a line starting with an asterisk, containing the keyword about which you want info, then hit Enter. Use the P and N keys to go to the previous or next subject. The space bar will move you one page further, no matter whether this starts a new subject or an info page for another command. Use Q to quit. The info program has more information.
A short index of explanations for commands is available using the whatis command, like in the examples below:
This displays short information about a command, and the first section in the collection of man pages that contains an appropriate page.
If you don't know where to get started and which man page to read, apropos gives more information. Say that you don't know how to start a browser, then you could enter the following command:
After pressing Enter you will see that a lot of browser related stuff is on your machine: not only web browsers, but also file and FTP browsers, and browsers for documentation. If you have development packages installed, you may also have the accompanying man pages dealing with writing programs having to do with browsers. Generally, a command with a man page in section one, so one marked with "(1)", is suitable for trying out as a user. The user who issued the above apropos might consequently try to start the commands galeon, lynx or opera, since these clearly have to do with browsing the world wide web.
Most GNU commands support the --help, which gives a short explanation about how to use the command and a list of available options. Below is the output of this option with the cat command:
Don't despair if you prefer a graphical user interface. Konqueror, the default KDE file manager, provides painless and colourful access to the man and Info pages. You may want to try "info:info" in the Location address bar, and you will get a browsable Info page about the info command. Similarly, "man:ls" will present you with the man page for the ls command. You even get command name completion: you will see the man pages for all the commands starting with "ls" in a scroll-down menu. Entering "info:/dir" in the address location toolbar displays all the Info pages, arranged in utility categories. Excellent content, including the Konqueror Handbook. Start up from the menu or by typing the command konqueror in a terminal window, followed by Enter; see the screenshot below.
The Gnome Help Browser is very user friendly as well. You can start it selecting-> from the Gnome menu, by clicking the lifeguard icon on your desktop or by entering the command gnome-help in a terminal window. The system documentation and man pages are easily browsable with a plain interface.
The nautilus file manager provides a searchable index of the man and Info pages, they are easily browsable and interlinked. Nautilus is started from the command line, or clicking your home directory icon, or from the Gnome menu.
The big advantage of GUIs for system documentation is that all information is completely interlinked, so you can click through in the "SEE ALSO" sections and wherever links to other man pages appear, and thus browse and acquire knowledge without interruption for hours at the time.
Some commands don't have separate documentation, because they are part of another command. cd, exit, logout and pwd are such exceptions. They are part of your shell program and are called shell built-in commands. For information about these, refer to the man or info page of your shell. Most beginning Linux users have a Bash shell. See Section 184.108.40.206 for more about shells.
If you have been changing your original system configuration, it might also be possible that man pages are still there, but not visible because your shell environment has changed. In that case, you will need to check the MANPATH variable. How to do this is explained in Section 220.127.116.11.
Some programs or packages only have a set of instructions or references in the directory /usr/share/doc. See Section 3.3.4 to display.
In the worst case, you may have removed the documentation from your system by accident (hopefully by accident, because it is a very bad idea to do this on purpose). In that case, first try to make sure that there is really nothing appropriate left using a search tool, read on in Section 3.3.3. If so, you may have to re-install the package that contains the command to which the documentation applied, see Section 7.5.
Linux traditionally operates in text mode or in graphical mode. Since CPU power and RAM are not the cost anymore these days, every Linux user can afford to work in graphical mode and will usually do so. This does not mean that you don't have to know about text mode: we will work in the text environment throughout this course, using a terminal window.
Linux encourages its users to acquire knowledge and to become independent. Inevitably, you will have to read a lot of documentation to achieve that goal; that is why, as you will notice, we refer to extra documentation for almost every command, tool and problem listed in this book. The more docs you read, the easier it will become and the faster you will leaf through manuals. Make reading documentation a habit as soon as possible. When you don't know the answer to a problem, refering to the documentation should become a second nature.
Most of what we learn is by making mistakes and by seeing how things can go wrong. These exercises are made to get you to read some error messages. The order in which you do these exercises is important.
Don't forget to use the Bash features on the command line: try to do the exercises typing as few characters as possible!
Log in again with your user name and password.
These are some exercises to help you get the feel.
A simple description of the UNIX system, also applicable to Linux, is this:
"On a UNIX system, everything is a file; if something is not a file, it is a process."
This statement is true because there are special files that are more than just files (named pipes and sockets, for instance), but to keep things simple, saying that everything is a file is an acceptable generalization. A Linux system, just like UNIX, makes no difference between a file and a directory, since a directory is just a file containing names of other files. Programs, services, texts, images, and so forth, are all files. Input and output devices, and generally all devices, are considered to be files, according to the system.
In order to manage all those files in an orderly fashion, man likes to think of them in an ordered tree-like structure on the hard disk, as we know from MS-DOS (Disk Operating System) for instance. The large branches contain more branches, and the branches at the end contain the tree's leaves or normal files. For now we will use this image of the tree, but we will find out later why this is not a fully accurate image.
Most files are just files, called regular files; they contain normal data, for example text files, executable files or programs, input for or output from a program and so on.
While it is reasonably safe to suppose that everything you encounter on a Linux system is a file, there are some exceptions.
The -l option to ls displays the file type, using the first character of each input line:
This table gives an overview of the characters determining the file type:
Table 3-1. File types in a long list
In order not to always have to perform a long listing for seeing the file type, a lot of systems by default don't issue just ls, but ls -F, which suffixes file names with one of the characters "/=*|@" to indicate the file type. To make it extra easy on the beginning user, both the -F and --color options are usually combined, see Section 18.104.22.168. We will use ls -F throughout this document for better readability.
As a user, you only need to deal directly with plain files, executable files, directories and links. The special file types are there for making your system do what you demand from it and are dealt with by system administrators and programmers.
Now, before we look at the important files and directories, we need to know more about partitions.
Most people have a vague knowledge of what partitions are, since every operating system has the ability to create or remove them. It may seem strange that Linux uses more than one partition on the same disk, even when using the standard installation procedure, so some explanation is called for.
One of the goals of having different partitions is to achieve higher data security in case of disaster. By dividing the hard disk in partitions, data can be grouped and separated. When an accident occurs, only the data in the partition that got the hit will be damaged, while the data on the other partitions will most likely survive.
This principle dates from the days when Linux didn't have journaled file systems and power failures might have lead to disaster. The use of partitions remains for security and robustness reasons, so a breach on one part of the system doesn't automatically mean that the whole computer is in danger. This is currently the most important reason for partitioning. A simple example: a user creates a script, a program or a web application that starts filling up the disk. If the disk contains only one big partition, the entire system will stop functioning if the disk is full. If the user stores the data on a separate partition, then only that (data) partition will be affected, while the system partitions and possible other data partitions keep functioning.
Mind that having a journaled file system only provides data security in case of power failure and sudden disconnection of storage devices. This does not protect your data against bad blocks and logical errors in the file system. In those cases, you should use a RAID (Redundant Array of Inexpensive Disks) solution.
There are two kinds of major partitions on a Linux system:
Most systems contain a root partition, one or more data partitions and one or more swap partitions. Systems in mixed environments may contain partitions for other system data, such as a partition with a FAT or VFAT file system for MS Windows data.
Most Linux systems use fdisk at installation time to set the partition type. As you may have noticed during the exercise from Chapter 1, this usually happens automatically. On some occasions, however, you may not be so lucky. In such cases, you will need to select the partition type manually and even manually do the actual partitioning. The standard Linux partitions have number 82 for swap and 83 for data, which can be journaled (ext3) or normal (ext2, on older systems). The fdisk utility has built-in help, should you forget these values.
Apart from these two, Linux supports a variety of other file system types, such as the relatively new Reiser file system, JFS, NFS, FATxx and many other file systems natively available on other (proprietary) operating systems.
The standard root partition (indicated with a single forward slash, /) is about 100-500 MB, and contains the system configuration files, most basic commands and server programs, system libraries, some temporary space and the home directory of the administrative user. A standard installation requires about 250 MB for the root partition.
Swap space (indicated with swap) is only accessible for the system itself, and is hidden from view during normal operation. Swap is the system that ensures, like on normal UNIX systems, that you can keep on working, whatever happens. On Linux, you will virtually never see irritating messages like Out of memory, please close some applications first and try again, because of this extra memory. The swap or virtual memory procedure has long been adopted by operating systems outside the UNIX world by now.
Using memory on a hard disk is naturally slower than using the real memory chips of a computer, but having this little extra is a great comfort. We will learn more about swap when we discuss Processes in Chapter 4.
Linux generally counts on having twice the amount of physical memory in the form of swap space on the hard disk. When installing a system, you have to know how you are going to do this. An example on a system with 512 MB of RAM:
The last option will give the best results when a lot of I/O is to be expected.
Read the software documentation for specific guidelines. Some applications, such as databases, might require more swap space. Others, such as some handheld systems, might not have any swap at all by lack of a hard disk. Swap space may also depend on your kernel version.
The kernel is on a separate partition as well in many distributions, because it is the most important file of your system. If this is the case, you will find that you also have a /boot partition, holding your kernel(s) and accompanying data files.
The rest of the hard disk(s) is generally divided in data partitions, although it may be that all of the non-system critical data resides on one partition, for example when you perform a standard workstation installation. When non-critical data is separated on different partitions, it usually happens following a set pattern:
Once the partitions are made, you can only add more. Changing sizes or properties of existing partitions is possible but not advisable.
The division of hard disks into partitions is determined by the system administrator. On larger systems, he or she may even spread one partition over several hard disks, using the appropriate software. Most distributions allow for standard setups optimized for workstations (average users) and for general server purposes, but also accept customized partitions. During the installation process you can define your own partition layout using either your distribution specific tool, which is usually a straight forward graphical interface, or fdisk, a text-based tool for creating partitions and setting their properties.
A workstation or client installation is for use by mainly one and the same person. The selected software for installation reflects this and the stress is on common user packages, such as nice desktop themes, development tools, client programs for E-mail, multimedia software, web and other services. Everything is put together on one large partition, swap space twice the amount of RAM is added and your generic workstation is complete, providing the largest amount of disk space possible for personal use, but with the disadvantage of possible data integrity loss during problem situations.
On a server, system data tends to be separate from user data. Programs that offer services are kept in a different place than the data handled by this service. Different partitions will be created on such systems:
Servers usually have more memory and thus more swap space. Certain server processes, such as databases, may require more swap space than usual; see the specific documentation for detailed information. For better performance, swap is often divided into different swap partitions.
All partitions are attached to the system via a mount point. The mount point defines the place of a particular data set in the file system. Usually, all partitions are connected through the root partition. On this partition, which is indicated with the slash (/), directories are created. These empty directories will be the starting point of the partitions that are attached to them. An example: given a partition that holds the following directories:
We want to attach this partition in the filesystem in a directory called /opt/media. In order to do this, the system administrator has to make sure that the directory /opt/media exists on the system. Preferably, it should be an empty directory. How this is done is explained later on in this chapter. Then, using the mount command, the administrator can attach the partition to the system. When you look at the content of the formerly empty directory /opt/media, it will contain the files and directories that are on the mounted medium (hard disk or partition of a hard disk, CD, DVD, flash card, USB or other storage device).
During system startup, all the partitions are thus mounted, as described in the file /etc/fstab. Some partitions are not mounted by default, for instance if they are not constantly connected to the system, such like the storage used by your digital camera. If well configured, the device will be mounted as soon as the system notices that it is connected, or it can be user-mountable, i.e. you don't need to be system administrator to attach and detach the device to and from the system. There is an example in Section 9.3.
On a running system, information about the partitions and their mount points can be displayed using the df command (which stands for disk full or disk free). In Linux, df is the GNU version, and supports the -h or human readable option which greatly improves readability. Note that commercial UNIX machines commonly have their own versions of df and many other commands. Their behavior is usually the same, though GNU versions of common tools often have more and better features.
The df command only displays information about active non-swap partitions. These can include partitions from other networked systems, like in the example below where the home directories are mounted from a file server on the network, a situation often encountered in corporate environments.
For convenience, the Linux file system is usually thought of in a tree structure. On a standard Linux system you will find the layout generally follows the scheme presented below.
This is a layout from a RedHat system. Depending on the system admin, the operating system and the mission of the UNIX machine, the structure may vary, and directories may be left out or added at will. The names are not even required; they are only a convention.
The tree of the file system starts at the trunk or slash, indicated by a forward slash (/). This directory, containing all underlying directories and files, is also called the root directory or "the root" of the file system.
Directories that are only one level below the root directory are often preceded by a slash, to indicate their position and prevent confusion with other directories that could have the same name. When starting with a new system, it is always a good idea to take a look in the root directory. Let's see what you could run into:
Table 3-2. Subdirectories of the root directory
How can you find out which partition a directory is on? Using the df command with a dot (.) as an option shows the partition the current directory belongs to, and informs about the amount of space used on this partition:
As a general rule, every directory under the root directory is on the root partition, unless it has a separate entry in the full listing from df (or df -h with no other options).
Read more in man hier.
For most users and for most common system administration tasks, it is enough to accept that files and directories are ordered in a tree-like structure. The computer, however, doesn't understand a thing about trees or tree-structures.
Every partition has its own file system. By imagining all those file systems together, we can form an idea of the tree-structure of the entire system, but it is not as simple as that. In a file system, a file is represented by an inode, a kind of serial number containing information about the actual data that makes up the file: to whom this file belongs, and where is it located on the hard disk.
Every partition has its own set of inodes; throughout a system with multiple partitions, files with the same inode number can exist.
Each inode describes a data structure on the hard disk, storing the properties of a file, including the physical location of the file data. When a hard disk is initialized to accept data storage, usually during the initial system installation process or when adding extra disks to an existing system, a fixed number of inodes per partition is created. This number will be the maximum amount of files, of all types (including directories, special files, links etc.) that can exist at the same time on the partition. We typically count on having 1 inode per 2 to 8 kilobytes of storage.
At the time a new file is created, it gets a free inode. In that inode is the following information:
The only information not included in an inode, is the file name and directory. These are stored in the special directory files. By comparing file names and inode numbers, the system can make up a tree-structure that the user understands. Users can display inode numbers using the -i option to ls. The inodes have their own separate space on the disk.
When you want the system to execute a command, you almost never have to give the full path to that command. For example, we know that the ls command is in the /bin directory (check with which -a ls), yet we don't have to enter the command /bin/ls for the computer to list the content of the current directory.
The PATH environment variable takes care of this. This variable lists those directories in the system where executable files can be found, and thus saves the user a lot of typing and memorizing locations of commands. So the path naturally contains a lot of directories containing bin somewhere in their names, as the user below demonstrates. The echo command is used to display the content ("$") of the variable PATH:
In this example, the directories /opt/local/bin, /usr/X11R6/bin, /usr/bin, /usr/sbin and /bin are subsequently searched for the required program. As soon as a match is found, the search is stopped, even if not every directory in the path has been searched. This can lead to strange situations. In the first example below, the user knows there is a program called sendsms to send an SMS message, and another user on the same system can use it, but she can't. The difference is in the configuration of the PATH variable:
Note the use of the su (switch user) facility, which allows you to run a shell in the environment of another user, on the condition that you know the user's password.
A backslash indicates the continuation of a line on the next, without an Enter separating one line from the other.
In the next example, a user wants to call on the wc (word count) command to check the number of lines in a file, but nothing happens and he has to break off his action using the Ctrl+C combination:
The use of the which command shows us that this user has a bin-directory in his home directory, containing a program that is also called wc. Since the program in his home directory is found first when searching the paths upon a call for wc, this "home-made" program is executed, with input it probably doesn't understand, so we have to stop it. To resolve this problem there are several ways (there are always several ways to solve a problem in UNIX/Linux): one answer could be to rename the user's wc program, or the user can give the full path to the exact command he wants, which can be found by using the -a to the which command:
If the user uses programs in the other directories more frequently, he can change his path to look in his own directories last:
A path, which is the way you need to follow in the tree structure to reach a given file, can be described as starting from the trunk of the tree (the / or root directory). In that case, the path starts with a slash and is called an absolute path, since there can be no mistake: only one file on the system can comply.
In the other case, the path doesn't start with a slash and confusion is possible between ~/bin/wc (in the user's home directory) and bin/wc in /usr, from the previous example. Paths that don't start with a slash are always relative.
In relative paths we also use the . and .. indications for the current and the parent directory. A couple of practical examples:
The kernel is the heart of the system. It manages the communication between the underlying hardware and the peripherals. The kernel also makes sure that processes and daemons (server processes) are started and stopped at the exact right times. The kernel has a lot of other important tasks, so many that there is a special kernel-development mailing list on this subject only, where huge amounts of information are shared. It would lead us too far to discuss the kernel in detail. For now it suffices to know that the kernel is the most important file on the system.
When I was looking for an appropriate explanation on the concept of a shell, it gave me more trouble than I expected. All kinds of definitions are available, ranging from the simple comparison that "the shell is the steering wheel of the car", to the vague definition in the Bash manual which says that "bash is an sh-compatible command language interpreter," or an even more obscure expression, "a shell manages the interaction between the system and its users". A shell is much more than that.
A shell can best be compared with a way of talking to the computer, a language. Most users do know that other language, the point-and-click language of the desktop. But in that language the computer is leading the conversation, while the user has the passive role of picking tasks from the ones presented. It is very difficult for a programmer to include all options and possible uses of a command in the GUI-format. Thus, GUIs are almost always less capable than the command or commands that form the backend.
The shell, on the other hand, is an advanced way of communicating with the system, because it allows for two-way conversation and taking initiative. Both partners in the communication are equal, so new ideas can be tested. The shell allows the user to handle a system in a very flexible way. An additional asset is that the shell allows for task automation.
Just like people know different languages and dialects, the computer knows different shell types:
The file /etc/shells gives an overview of known shells on a Linux system:
Your default shell is set in the /etc/passwd file, like this line for user mia:
To switch from one shell to another, just enter the name of the new shell in the active terminal. The system finds the directory where the name occurs using the PATH settings, and since a shell is an executable file (program), the current shell activates it and it gets executed. A new prompt is usually shown, because each shell has its typical appearance:
Your home directory is your default destination when connecting to the system. In most cases it is a subdirectory of /home, though this may vary. Your home directory may be located on the hard disk of a remote file server; in that case your home directory may be found in /nethome/your_user_name. In another case the system administrator may have opted for a less comprehensible layout and your home directory may be on /disk6/HU/07/jgillard.
Whatever the path to your home directory, you don't have to worry too much about it. The correct path to your home directory is stored in the HOME environment variable, in case some program needs it. With the echo command you can display the content of this variable:
You can do whatever you like in your home directory. You can put as many files in as many directories as you want, although the total amount of data and files is naturally limited because of the hardware and size of the partitions, and sometimes because the system administrator has applied a quota system. Limiting disk usage was common practice when hard disk space was still expensive. Nowadays, limits are almost exclusively applied in large environments. You can see for yourself if a limit is set using the quota command:
In case quotas have been set, you get a list of the limited partitions and their specific limitations. Exceeding the limits may be tolerated during a grace period with fewer or no restrictions at all. Detailed information can be found using the info quota or man quota commands.
Your home directory is indicated by a tilde (~), shorthand for /path_to_home/user_name. This same path is stored in the HOME variable, so you don't have to do anything to activate it. A simple application: switch from /var/music/albums/arno/2001 to images in your home directory using one elegant command:
Later in this chapter we will talk about the commands for managing files and directories in order to keep your home directory tidy.
As we mentioned before, most configuration files are stored in the /etc directory. Content can be viewed using the cat command, which sends text files to the standard output (usually your monitor). The syntax is straight forward:
cat file1 file2 ... fileN
In this section we try to give an overview of the most common configuration files. This is certainly not a complete list. Adding extra packages may also add extra configuration files in /etc. When reading the configuration files, you will find that they are usually quite well commented and self-explanatory. Some files also have man pages which contain extra documentation, such as man group.
Table 3-3. Most common configuration files
Throughout this guide we will learn more about these files and study some of them in detail.
Devices, generally every peripheral attachment of a PC that is not the CPU itself, is presented to the system as an entry in the /dev directory. One of the advantages of this UNIX-way of handling devices is that neither the user nor the system has to worry much about the specification of devices.
Users that are new to Linux or UNIX in general are often overwhelmed by the amount of new names and concepts they have to learn. That is why a list of common devices is included in this introduction.
Table 3-4. Common devices
In the /var directory we find a set of directories for storing specific non-constant data (as opposed to the ls program or the system configuration files, which change relatively infrequently or never at all). All files that change frequently, such as log files, mailboxes, lock files, spoolers etc. are kept in a subdirectory of /var.
As a security measure these files are usually kept in separate parts from the main system files, so we can keep a close eye on them and set stricter permissions where necessary. A lot of these files also need more permissions than usual, like /var/tmp, which needs to be writable for everyone. A lot of user activity might be expected here, which might even be generated by anonymous Internet users connected to your system. This is one reason why the /var directory, including all its subdirectories, is usually on a separate partition. This way, there is for instance no risk that a mail bomb, for instance, fills up the rest of the file system, containing more important data such as your programs and configuration files.
One of the main security systems on a UNIX system, which is naturally implemented on every Linux machine as well, is the log-keeping facility, which logs all user actions, processes, system events etc. The configuration file of the so-called syslogdaemon determines which and how long logged information will be kept. The default location of all logs is /var/log, containing different files for access log, server logs, system messages etc.
In /var we typically find server data, which is kept here to separate it from critical data such as the server program itself and its configuration files. A typical example on Linux systems is /var/www, which contains the actual HTML pages, scripts and images that a web server offers. The FTP-tree of an FTP server (data that can be downloaded by a remote client) is also best kept in one of /var's subdirectories. Because this data is publicly accessible and often changeable by anonymous users, it is safer to keep it here, away from partitions or directories with sensitive data.
On most workstation installations, /var/spool will at least contain an at and a cron directory, containing scheduled tasks. In office environments this directory usually contains lpd as well, which holds the print queue(s) and further printer configuration files, as well as the printer log files.
On server systems we will generally find /var/spool/mail, containing incoming mails for local users, sorted in one file per user, the user's "inbox". A related directory is mqueue, the spooler area for unsent mail messages. These parts of the system can be very busy on mail servers with a lot of users. News servers also use the /var/spool area because of the enormous amounts of messages they have to process.
The /var/lib/rpm directory is specific to RPM-based (RedHat Package Manager) distributions; it is where RPM package information is stored.
Besides the name of the file, ls can give a lot of other information, such as the file type, as we already discussed. It can also show permissions on a file, file size, inode number, creation date and time, owners and amount of links to the file. With the -a option to ls, files that are normally hidden from view can be displayed as well. These are files that have a name starting with a dot. A couple of typical examples include the configuration files in your home directory. When you've worked with a certain system for a while, you will notice that tens of files and directories have been created that are not automatically listed in a directory index. Next to that, every directory contains a file named just dot (.) and one with two dots (..), which are used in combination with their inode number to determine the directory's position in the file system's tree structure.
You should really read the Info pages about ls, since it is a very common command with a lot of useful options. Options can be combined, as is the case with most UNIX commands and their options. A common combination is ls -al; it shows a long list of files and their properties as well as the destinations that any symbolic links point to. ls -latr displays the same files, only now in reversed order of the last change, so that the file changed most recently occurs at the bottom of the list. Here are a couple of examples:
On most Linux versions ls is aliased to color-ls by default. This feature allows to see the file type without using any options to ls. To achieve this, every file type has its own color. The standard scheme is in /etc/DIR_COLORS:
Table 3-5. Color-ls default color scheme
More information is in the man page. The same information was in earlier days displayed using suffixes to every non-standard file name. For mono-color use (like printing a directory listing) and for general readability, this scheme is still in use:
Table 3-6. Default suffix scheme for ls
A description of the full functionality and features of the ls command can be read with info coreutils ls.
To find out more about the kind of data we are dealing with, we use the file command. By applying certain tests that check properties of a file in the file system, magic numbers and language tests, file tries to make an educated guess about the format of a file. Some examples:
The file command has a series of options, among others the -z option to look into compressed files. See info file for a detailed description. Keep in mind that the results of file are not absolute, it is only a guess. In other words, file can be tricked.
... Is not a difficult thing to do. Today almost every system is networked, so naturally files get copied from one machine to another. And especially when working in a graphical environment, creating new files is a piece of cake and is often done without the approval of the user. To illustrate the problem, here's the full content of a new user's directory, created on a standard RedHat system:
On first sight, the content of a "used" home directory doesn't look that bad either:
But when all the directories and files starting with a dot are included, there are 185 items in this directory. This is because most applications have their own directories and/or files, containing user-specific settings, in the home directory of that user. Usually these files are created the first time you start an application. In some cases you will be notified when a non-existent directory needs to be created, but most of the time everything is done automatically.
Furthermore, new files are created seemingly continuously because users want to save files, keep different versions of their work, use Internet applications, and download files and attachments to their local machine. It doesn't stop. It is clear that one definitely needs a scheme to keep an overview on things.
In the next section, we will discuss our means of keeping order. We only discuss text tools available to the shell, since the graphical tools are very intuitive and have the same look and feel as the well known point-and-click MS Windows-style file managers, including graphical help functions and other features you expect from this kind of applications. The following list is an overview of the most popular file managers for GNU/Linux. Most file managers can be started from the menu of your desktop manager, or by clicking your home directory icon, or from the command line, issuing these commands:
These applications are certainly worth giving a try and usually impress newcomers to Linux, if only because there is such a wide variety: these are only the most popular tools for managing directories and files, and many other projects are being developed. Now let's find out about the internals and see how these graphical tools use common UNIX commands.
A way of keeping things in place is to give certain files specific default locations by creating directories and subdirectories (or folders and sub-folders if you wish). This is done with the mkdir command:
Creating directories and subdirectories in one step is done using the -p option:
If the new file needs other permissions than the default file creation permissions, the new access rights can be set in one move, still using the mkdir command, see the Info pages for more. We are going to discuss access modes in the next section on File Security.
The name of a directory has to comply with the same rules as those applied on regular file names. One of the most important restrictions is that you can't have two files with the same name in one directory (but keep in mind that Linux is, like UNIX, a case sensitive operating system). There are virtually no limits on the length of a file name, but it is usually kept shorter than 80 characters, so it can fit on one line of a terminal. You can use any character you want in a file name, although it is advised to exclude characters that have a special meaning to the shell. When in doubt, check with Appendix C.
Now that we have properly structured our home directory, it is time to clean up unclassified files using the mv command:
This command is also applicable when renaming files:
It is clear that only the name of the file changes. All other properties remain the same.
Detailed information about the syntax and features of the mv command can be found in the man or Info pages. The use of this documentation should always be your first reflex when confronted with a problem. The answer to your problem is likely to be in the system documentation. Even experienced users read man pages every day, so beginning users should read them all the time. After a while, you will get to know the most common options to the common commands, but you will still need the documentation as a primary source of information. Note that the information contained in the HOWTOs, FAQs, man pages and such is slowly being merged into the Info pages, which are today the most up-to-date source of online (as in readily available on the system) documentation.
Copying files and directories is done with the cp command. A useful option is recursive copy (copy all underlying files and subdirectories), using the -R option to cp. The general syntax is
cp [-R] fromfile tofile
As an example the case of user newguy, who wants the same Gnome desktop settings user oldguy has. One way to solve the problem is to copy the settings of oldguy to the home directory of newguy:
This gives some errors involving file permissions, but all the errors have to do with private files that newguy doesn't need anyway. We will discuss in the next part how to change these permissions in case they really are a problem.
Use the rm command to remove single files, rmdir to remove empty directories. (Use ls -a to check whether a directory is empty or not). The rm command also has options for removing non-empty directories with all their subdirectories, read the Info pages for these rather dangerous options.
On Linux, just like on UNIX, there is no garbage can - at least not for the shell, although there are plenty of solutions for graphical use. So once removed, a file is really gone, and there is generally no way to get it back unless you have backups, or you are really fast and have a real good system administrator. To protect the beginning user from this malice, the interactive behavior of the rm, cp and mv commands can be activated using the -i option. In that case the system won't immediately act upon request. Instead it will ask for confirmation, so it takes an additional click on the Enter key to inflict the damage:
We will discuss how to make this option the default in Chapter 7, which discusses customizing your shell environment.
In the example on moving files we already saw how the shell can manipulate multiple files at once. In that example, the shell finds out automatically what the user means by the requirements between the square braces "[" and "]". The shell can substitute ranges of numbers and upper or lower case characters alike. It also substitutes as many characters as you want with an asterisk, and only one character with a question mark.
All sorts of substitutions can be used simultaneously; the shell is very logical about it. The Bash shell, for instance, has no problem with expressions like ls dirname/*/*/*[2-3].
In other shells, the asterisk is commonly used to minimize the efforts of typing: people would enter cd dir* instead of cd directory. In Bash however, this is not necessary because the GNU shell has a feature called file name completion. It means that you can type the first few characters of a command (anywhere) or a file (in the current directory) and if no confusion is possible, the shell will find out what you mean. For example in a directory containing many files, you can check if there are any files beginning with the letter A just by typing ls A and pressing the Tab key twice, rather than pressing Enter. If there is only one file starting with "A", this file will be shown as the argument to ls (or any shell command, for that matter) immediately.
A very simple way of looking up files is using the which command, to look in the directories listed in the user's search path for the required file. Of course, since the search path contains only paths to directories containing executable programs, which doesn't work for ordinary files. The which command is useful when troubleshooting "Command not Found" problems. In the example below, user tina can't use the acroread program, while her colleague has no troubles whatsoever on the same system. The problem is similar to the PATH problem in the previous part: Tina's colleague tells her that he can see the required program in /opt/acroread/bin, but this directory is not in her path:
The problem can be solved by giving the full path to the command to run, or by re-exporting the content of the PATH variable:
Using the which command also checks to see if a command is an alias for another command:
These are the real tools, used when searching other paths beside those listed in the search path. The find tool, known from UNIX, is very powerful, which may be the cause of a somewhat more difficult syntax. GNU find, however, deals with the syntax problems. This command not only allows you to search file names, it can also accept file size, date of last change and other file properties as criteria for a search. The most common use is for finding file names:
find <path> -name <searchstring>
This can be interpreted as "Look in all files and subdirectories contained in a given path, and print the names of the files containing the search string in their name" (not in their content).
Another application of find is for searching files of a certain size, as in the example below, where user peter wants to find all files in the current directory or one of its subdirectories, that are bigger than 5 MB:
If you dig in the man pages, you will see that find can also perform operations on the found files. A common example is removing files. It is best to first test without the -exec option that the correct files are selected, after that the command can be rerun to delete the selected files. Below, we search for files ending in .tmp:
Later on (in 1999 according to the man pages, after 20 years of find), locate was developed. This program is easier to use, but more restricted than find, since its output is based on a file index database that is updated only once every day. On the other hand, a search in the locate database uses less resources than find and therefore shows the results nearly instantly.
Most Linux distributions use slocate these days, security enhanced locate, the modern version of locate that prevents users from getting output they have no right to read. The files in root's home directory are such an example, these are not normally accessible to the public. A user who wants to find someone who knows about the C-shell may issue the command locate .cshrc, to display all users who have a customized configuration file for the C shell. Supposing the users root and jenny are running C shell, then only the file /home/jenny/.cshrc will be displayed, and not the one in root's home directory. On most systems, locate is a symbolic link to the slocate program:
User tina could have used locate to find the application she wanted:
Directories that don't contain the name bin can't contain the program - they don't contain executable files. There are three possibilities left. The file in /usr/local/bin is the one tina would have wanted: it is a link to the shell script that starts the actual program:
In order to keep the path as short as possible, so the system doesn't have to search too long every time a user wants to execute a command, we add /usr/local/bin to the path and not the other directories, which only contain the binary files of one specific program, while /usr/local/bin contains other useful programs as well.
Again, a description of the full features of find and locate can be found in the Info pages.
A simple but powerful program, grep is used for filtering input lines and returning certain patterns to the output. There are literally thousands of applications for the grep program. In the example below, jerry uses grep to see how he did the thing with find:
All UNIXes with just a little bit of decency have an online dictionary. So does Linux. The dictionary is a list of known words in a file named words, located in /usr/share/dict. To quickly check the correct spelling of a word, no graphical application is needed:
Who is the owner of that home directory next to mine? Hey, there's his telephone number!
And what was the E-mail address of Arno again?
find and locate are often used in combination with grep to define some serious queries. For more information, see Chapter 5 on I/O redirection.
Characters that have a special meaning to the shell have to be escaped. The escape character in Bash is backslash, as in most shells; this takes away the special meaning of the following character. The shell knows about quite some special characters, among the most common /, ., ? and *. A full list can be found in the Info pages and documentation for your shell.
For instance, say that you want to display the file "*" instead of all the files in a directory, you would have to use
The same goes for filenames containing a space:
cat This\ File
Apart from cat, which really doesn't do much more than sending files to the standard output, there are other tools to view file content.
The easiest way of course would be to use graphical tools instead of command line tools. In the introduction we already saw a glimpse of an office application, OpenOffice. Other examples are the GIMP (start up with gimp from the command line), the GNU Image Manipulation Program; xpdf to view Portable Document Format files (PDF); GhostView (gv) for viewing PostScript files; Mozilla/FireFox, links (a text mode browser), Konqueror, Opera and many others for web content; XMMS, CDplay and others for multimedia file content; AbiWord, Gnumeric, KOffice etc. for all kinds of office applications and so on. There are thousands of Linux applications; to list them all would take days.
Instead we keep concentrating on shell- or text-mode applications, which form the basics for all other applications. These commands work best in a text environment on files containing text. When in doubt, check first using the file command.
So let's see what text tools we have that are useful to look inside files.
Undoubtedly you will hear someone say this phrase sooner or later when working in a UNIX environment. A little bit of UNIX history explains this:
You already know about pagers by now, because they are used for viewing the man pages.
These two commands display the n first/last lines of a file respectively. To see the last ten commands entered:
head works similarly. The tail command has a handy feature to continuously show the last n lines of a file that changes all the time. This -f option is often used by system administrators to check on log files. More information is located in the system documentation files.
Since we know more about files and their representation in the file system, understanding links (or shortcuts) is a piece of cake. A link is nothing more than a way of matching two or more file names to the same set of file data. There are two ways to achieve this:
The two link types behave similar, but are not the same, as illustrated in the scheme below:
Note that removing the target file for a symbolic link makes the link useless.
Each regular file is in principle a hardlink. Hardlinks can not span across partitions, since they refer to inodes, and inode numbers are only unique within a given partition.
It may be argued that there is a third kind of link, the user-space link, which is similar to a shortcut in MS Windows. These are files containing meta-data which can only be interpreted by the graphical file manager. To the kernel and the shell these are just normal files. They may end in a .desktop or .lnk suffix; an example can be found in ~/.gnome-desktop:
This example is from a KDE desktop:
Creating this kind of link is easy enough using the features of your graphical environment. Should you need help, your system documentation should be your first resort.
In the next section, we will study the creation of UNIX-style symbolic links using the command line.
The symbolic link is particularly interesting for beginning users: they are fairly obvious to see and you don't need to worry about partitions.
The command to make links is ln. In order to create symlinks, you need to use the -s option:
ln -s targetfile linkname
In the example below, user freddy creates a link in a subdirectory of his home directory to a directory on another part of the system:
Symbolic links are always very small files, while hard links have the same size as the original file.
The application of symbolic links is widespread. They are often used to save disk space, to make a copy of a file in order to satisfy installation requirements of a new program that expects the file to be in another location, they are used to fix scripts that suddenly have to run in a new environment and can generally save a lot of work. A system admin may decide to move the home directories of the users to a new location, disk2 for instance, but if he wants everything to work like before, like the /etc/passwd file, with a minimum of effort he will create a symlink from /home to the new location /disk2/home.
The Linux security model is based on the one used on UNIX systems, and is as rigid as the UNIX security model (and sometimes even more), which is already quite robust. On a Linux system, every file is owned by a user and a group user. There is also a third category of users, those that are not the user owner and don't belong to the group owning the file. For each category of users, read, write and execute permissions can be granted or denied.
We already used the long option to list files using the ls -l command, though for other reasons. This command also displays file permissions for these three user categories; they are indicated by the nine characters that follow the first character, which is the file type indicator at the beginning of the file properties line. As seen in the examples below, the first three characters in this series of nine display access rights for the actual user that owns the file. The next three are for the group owner of the file, the last three for other users. The permissions are always in the same order: read, write, execute for the user, the group and the others. Some examples:
The first file is a regular file (first dash). Users with user name marise or users belonging to the group users can read and write (change/move/delete) the file, but they can't execute it (second and third dash). All other users are only allowed to read this file, but they can't write or execute it (fourth and fifth dash).
The second example is an executable file, the difference: everybody can run this program, but you need to be root to change it.
The Info pages explain how the ls command handles display of access rights in detail, see the section What information is listed.
For easy use with commands, both access rights or modes and user groups have a code. See the tables below.
Table 3-7. Access mode codes
This straight forward scheme is applied very strictly, which allows a high level of security even without network security. Among other functions, the security scheme takes care of user access to programs, it can serve files on a need-to-know basis and protect sensitive data such as home directories and system configuration files.
You should know what your user name is. If you don't, it can be displayed using the id command, which also displays the default group you belong to and eventually other groups of which you are a member:
Your user name is also stored in the environment variable USER:
A normal consequence of applying strict file permissions, and sometimes a nuisance, is that access rights will need to be changed for all kinds of reasons. We use the chmod command to do this, and eventually to chmod has become an almost acceptable English verb, meaning the changing of the access mode of a file. The chmod command can be used with alphanumeric or numeric options, whatever you like best.
The example below uses alphanumeric options in order to solve a problem that commonly occurs with new users:
The + and - operators are used to grant or deny a given right to a given group. Combinations separated by commas are allowed. The Info and man pages contain useful examples. Here's another one, which makes the file from the previous example a private file to user asim:
The kind of problem resulting in an error message saying that permission is denied somewhere is usually a problem with access rights in most cases. Also, comments like, "It worked yesterday," and "When I run this as root it works," are most likely caused by the wrong file permissions.
When using chmod with numeric arguments, the values for each granted access right have to be counted together per group. Thus we get a 3-digit number, which is the symbolic value for the settings chmod has to make. The following table lists the most common combinations:
Table 3-9. File protection with chmod
If you enter a number with less than three digits as an argument to chmod, omitted characters are replaced with zeros starting from the left. There is actually a fourth digit on Linux systems, that precedes the first three and sets special access modes. Everything about these and many more are located in the Info pages.
When you type id on the command line, you get a list of all the groups that you can possibly belong to, preceded by your user name and ID and the group name and ID that you are currently connected with. However, on many Linux systems you can only be actively logged in to one group at the time. By default, this active or primary group is the one that you get assigned from the /etc/passwd file. The fourth field of this file holds users' primary group ID, which is looked up in the /etc/group file. An example:
The fourth field in the line from /etc/passwd contains the value "501", which represents the group asim in the above example. From /etc/group we can get the name matching this group ID. When initially connecting to the system, this is the group that asim will belong to.
Apart from his own private group, user asim can also be in the groups users and web. Because these are secondary groups to this user, he will need to use the newgrp to log into any of these groups. In the example, asim needs to create files that are owned by the group web.
When asim creates new files now, they will be in group ownership of the group web instead of being owned by the group asim:
Logging in to a new group prevents you from having to use chown (see Section 22.214.171.124) or calling your system administrator to change ownerships for you.
See the manpage for newgrp for more information.
When a new file is saved somewhere, it is first subjected to the standard security procedure. Files without permissions don't exist on Linux. The standard file permission is determined by the mask for new file creation. The value of this mask can be displayed using the umask command:
Instead of adding the symbolic values to each other, as with chmod, for calculating the permission on a new file they need to be subtracted from the total possible access rights. In the example above, however, we see 4 values displayed, yet there are only 3 permission categories: user, group and other. The first zero is part of the special file attributes settings, which we will discuss in Section 126.96.36.199 and Section 4.1.6. It might just as well be that this first zero is not displayed on your system when entering the umask command, and that you only see 3 numbers representing the default file creation mask.
Each UNIX-like system has a system function for creating new files, which is called each time a user uses a program that creates new files, for instance, when downloading a file from the Internet, when saving a new text document and so on. This function creates both new files and new directories. Full read, write and execute permission is granted to everybody when creating a new directory. When creating a new file, this function will grant read and write permissions for everybody, but set execute permissions to none for all user categories. This, before the mask is applied, a directory has permissions 777 or rwxrwxrwx, a plain file 666 or rw-rw-rw-.
The umask value is subtracted from these default permissions after the function has created the new file or directory. Thus, a directory will have permissions of 775 by default, a file 664, if the mask value is (0)002. This is demonstrated in the example below:
If you log in to another group using the newgrp command, the mask remains unchanged. Thus, if it is set to 002, files and directories that you create while being in the new group will also be accessible to the other members of that group; you don't have to use chmod.
The root user usually has stricter default file creation permissions:
These defaults are set system-wide in the shell resource configuration files, for instance /etc/bashrc or /etc/profile. You can change them in your own shell configuration file, see Chapter 7 on customizing your shell environment.
When a file is owned by the wrong user or group, the error can be repaired with the chown (change owner) and chgrp (change group) commands. Changing file ownership is a frequent system administrative task in environments where files need to be shared in a group. Both commands are very flexible, as you can find out by using the --help option.
The chown command can be applied to change both user and group ownership of a file, while chgrp only changes group ownership. Of course the system will check if the user issuing one of these commands has sufficient permissions on the file(s) she wants to change.
In order to only change the user ownership of a file, use this syntax:
chown newuser file
If you use a colon after the user name (see the Info pages), group ownership will be changed as well, to the primary group of the user issuing the command. On a Linux system, each user has his own group, so this form can be used to make files private:
If jacky would like to share this file, without having to give everybody permission to write it, he can use the chgrp command:
This way, users in the group project will be able to work on this file. Users not in this group have no business with it at all.
Both chown and chgrp can be used to change ownership recursively, using the -R option. In that case, all underlying files and subdirectories of a given directory will belong to the given user and/or group.
For the system admin to not be bothered solving permission problems all the time, special access rights can be given to entire directories, or to separate programs. There are three special modes:
On UNIX, as on Linux, all entities are in some way or another presented to the system as files with the appropriate file properties. Use of (predefined) paths allows the users and the system admin to find, read and manipulate files.
We've made our first steps toward becoming an expert: we discussed the real and the fake structure of the file system, and we know about the Linux file security model, as well as several other security precautions that are taken on every system by default.
The shell is the most important tool for interaction with the system. We learned several shell commands in this chapter, which are listed in the table below.
Table 3-10. New commands
We also stressed the fact that you should READ THE MAN PAGES. This documentation is your first-aid kit and contains the answers to many questions. The above list contains the basic commands that you will use on a daily basis, but they can do much more than the tasks we've discussed here. Reading the documentation will give you the control you need.
Last but not least, a handy overview of file permissions:
Just login with your common user ID.
Now that we are more used to our environment and we are able to communicate a little bit with our system, it is time to study the processes we can start in more detail. Not every command starts a single process. Some commands initiate a series of processes, such as mozilla; others, like ls, are executed as a single command.
Furthermore, Linux is based on UNIX, where it has been common policy to have multiple users running multiple commands, at the same time and on the same system. It is obvious that measures have to be taken to have the CPU manage all these processes, and that functionality has to be provided so users can switch between processes. In some cases, processes will have to continue to run even when the user who started them logs out. And users need a means to reactivate interrupted processes.
We will explain the structure of Linux processes in the next sections.
Interactive processes are initialized and controlled through a terminal session. In other words, there has to be someone connected to the system to start these processes; they are not started automatically as part of the system functions. These processes can run in the foreground, occupying the terminal that started the program, and you can't start other applications as long as this process is running in the foreground. Alternatively, they can run in the background, so that the terminal in which you started the program can accept new commands while the program is running. Until now, we mainly focussed on programs running in the foreground - the length of time taken to run them was too short to notice - but viewing a file with the less command is a good example of a command occupying the terminal session. In this case, the activated program is waiting for you to do something. The program is still connected to the terminal from where it was started, and the terminal is only useful for entering commands this program can understand. Other commands will just result in errors or unresponsiveness of the system.
While a process runs in the background, however, the user is not prevented from doing other things in the terminal in which he started the program, while it is running.
The shell offers a feature called job control which allows easy handling of multiple processes. This mechanism switches processes between the foreground and the background. Using this system, programs can also be started in the background immediately.
Running a process in the background is only useful for programs that don't need user input (via the shell). Putting a job in the background is typically done when execution of a job is expected to take a long time. In order to free the issuing terminal after entering the command, a trailing ampersand is added. In the example, using graphical mode, we open an extra terminal window from the existing one:
The full job control features are explained in detail in the bash Info pages, so only the frequently used job control applications are listed here:
Table 4-1. Controlling processes
More practical examples can be found in the exercises.
Most UNIX systems are likely to be able to run screen, which is useful when you actually want another shell to execute commands. Upon calling screen, a new session is created with an accompanying shell and/or commands as specified, which you can then put out of the way. In this new session you may do whatever it is you want to do. All programs and operations will run independent of the issuing shell. You can then detach this session, while the programs you started in it continue to run, even when you log out of the originating shell, and pick your screen up again any time you like.
This program originates from a time when virtual consoles were not invented yet, and everything needed to be done using one text terminal. To addicts, it still has meaning in Linux, even though we've had virtual consoles for almost ten years.
Automatic or batch processes are not connected to a terminal. Rather, these are tasks that can be queued into a spooler area, where they wait to be executed on a FIFO (first-in, first-out) basis. Such tasks can be executed using one of two criteria:
Daemons are server processes that run continuously. Most of the time, they are initialized at system startup and then wait in the background until their service is required. A typical example is the networking daemon, xinetd, which is started in almost every boot procedure. After the system is booted, the network daemon just sits and waits until a client program, such as an FTP client, needs to connect.
A process has a series of characteristics:
The ps command is one of the tools for visualizing processes. This command has several options which can be combined to display different process attributes.
With no options specified, ps only gives information about the current shell and eventual processes:
Since this does not give enough information - generally, at least a hundred processes are running on your system - we will usually select particular processes out of the list of all processes, using the grep command in a pipe, see Section 188.8.131.52, as in this line, which will select and display all processes owned by a particular user:
ps -ef | grep username
This example shows all processes with a process name of bash, the most common login shell on Linux systems:
In these cases, the grep command finding lines containing the string bash is often displayed as well on systems that have a lot of idletime. If you don't want this to happen, use the pgrep command.
Bash shells are a special case: this process list also shows which ones are login shells (where you have to give your username and password, such as when you log in in textmode or do a remote login, as opposed to non-login shells, started up for instance by clicking a terminal window icon). Such login shells are preceded with a dash (-).
More info can be found the usual way: ps --help or man ps. GNU ps supports different styles of option formats; the above examples don't contain errors.
Note that ps only gives a momentary state of the active processes, it is a one-time recording. The top program displays a more precise view by updating the results given by ps (with a bunch of options) once every five seconds, generating a new list of the processes causing the heaviest load periodically, meanwhile integrating more information about the swap space in use and the state of the CPU, from the proc file system:
The first line of top contains the same information displayed by the uptime command:
The data for these programs is stored among others in /var/run/utmp (information about currently connected users) and in the virtual file system /proc, for example /proc/loadavg (average load information). There are all sorts of graphical applications to view this data, such as the Gnome System Monitor and lavaps. Over at FreshMeat and SourceForge you will find tens of applications that centralize this information along with other server data and logs from multiple servers on one (web) server, allowing monitoring of the entire IT infrastructure from one workstation.
The relations between processes can be visualized using the pstree command:
The -u and -a options give additional information. For more options and what they do, refer to the Info pages.
In the next section, we will see how one process can create another.
A new process is created because an existing process makes an exact copy of itself. This child process has the same environment as its parent, only the process ID number is different. This procedure is called forking.
After the forking process, the address space of the child process is overwritten with the new process data. This is done through an exec call to the system.
The fork-and-exec mechanism thus switches an old command with a new, while the environment in which the new program is executed remains the same, including configuration of input and output devices, environment variables and priority. This mechanism is used to create all UNIX processes, so it also applies to the Linux operating system. Even the first process, init, with process ID 1, is forked during the boot procedure in the so-called bootstrapping procedure.
This scheme illustrates the fork-and-exec mechanism. The process ID changes after the fork procedure:
There are a couple of cases in which init becomes the parent of a process, while the process was not started by init, as we already saw in the pstree example. Many programs, for instance, daemonize their child processes, so they can keep on running when the parent stops or is being stopped. A window manager is a typical example; it starts an xterm process that generates a shell that accepts commands. The window manager then denies any further responsibility and passes the child process to init. Using this mechanism, it is possible to change window managers without interrupting running applications.
Every now and then things go wrong, even in good families. In an exceptional case, a process might finish while the parent does not wait for the completion of this process. Such an unburied process is called a zombie process.
When a process ends normally (it is not killed or otherwise unexpectedly interrupted), the program returns its exit status to the parent. This exit status is a number returned by the program providing the results of the program's execution. The system of returning information upon executing a job has its origin in the C programming language in which UNIX has been written.
The return codes can then be interpreted by the parent, or in scripts. The values of the return codes are program-specific. This information can usually be found in the man pages of the specified program, for example the grep command returns -1 if no matches are found, upon which a message on the lines of "No files found" can be printed. Another example is the Bash Builtin command true, which does nothing except return an exit status of 0, meaning success.
Processes end because they receive a signal. There are multiple signals that you can send to a process. Use the kill command to send a signal to a process. The command kill -l shows a list of signals. Most signals are for internal use by the system, or for programmers when they write code. As a user, you will need the following signals:
Table 4-2. Common signals
You can read more about default actions that are taken when sending a signal to a process in man 7 signal.
As promised in the previous chapter, we will now discuss the special modes SUID and SGID in more detail. These modes exist to provide normal users the ability to execute tasks they would normally not be able to do because of the tight file permission scheme used on UNIX based systems. In the ideal situation special modes are used as sparsely as possible, since they include security risks. Linux developers have generally tried to avoid them as much as possible. The Linux ps version, for example, uses the information stored in the /proc file system, which is accessible to everyone, thus avoiding exposition of sensitive system data and resources to the general public. Before that, and still on older UNIX systems, the ps program needed access to files such as /dev/mem and /dev/kmem, which had disadvantages because of the permissions and ownerships on these files:
With older versions of ps, it was not possible to start the program as a common user, unless special modes were applied to it.
While we generally try to avoid applying any special modes, it is sometimes necessary to use an SUID. An example is the mechanism for changing passwords. Of course users will want to do this themselves instead of having their password set by the system administrator. As we know, user names and passwords are listed in the /etc/passwd file, which has these access permissions and owners:
Still, users need to be able to change their own information in this file. This is achieved by giving the passwd program special permissions:
When called, the passwd command will run using the access permissions of root, thus enabling a common user to edit the password file which is owned by the system admin.
SGID modes on a file don't occur nearly as frequently as SUID, because SGID often involves the creation of extra groups. In some cases, however, we have to go through this trouble in order to build an elegant solution (don't worry about this too much - the necessary groups are usually created upon installation). This is the case for the write and wall programs, which are used to send messages to other users' terminals (ttys). The write command writes a message to a single user, while wall writes to all connected users.
Sending text to another user's terminal or graphical display is normally not allowed. In order to bypass this problem, a group has been created, which owns all terminal devices. When the write and wall commands are granted SGID permissions, the commands will run using the access rights as applicable to this group, tty in the example. Since this group has write access to the destination terminal, also a user having no permissions to use that terminal in any way can send messages to it.
In the example below, user joe first finds out on which terminal his correspondent is connected, using the who command. Then he sends her a message using the write command. Also illustrated are the access rights on the write program and on the terminals occupied by the receiving user: it is clear that others than the user owner have no permissions on the device, except for the group owner, which can write to it.
User jenny gets this on her screen:
After receiving a message, the terminal can be cleared using the Ctrl+L key combination. In order to receive no messages at all (except from the system administrator), use the mesg command. To see which connected users accept messages from others use who -w. All features are fully explained in the Info pages of each command.
One of the most powerful aspects of Linux concerns its open method of starting and stopping the operating system, where it loads specified programs using their particular configurations, permits you to change those configurations to control the boot process, and shuts down in a graceful and organized way.
Beyond the question of controlling the boot or shutdown process, the open nature of Linux makes it much easier to determine the exact source of most problems associated with starting up or shutting down your system. A basic understanding of this process is quite beneficial to everybody who uses a Linux system.
A lot of Linux systems use lilo, the LInux LOader for booting operating systems. We will only discuss GRUB, however, which is easier to use and more flexible. Should you need information about lilo, refer to the man pages and HOWTOs. Both systems support dual boot installations, we refer to the HOWTOs on this subject for practical examples and background information.
When an x86 computer is booted, the processor looks at the end of the system memory for the BIOS (Basic Input/Output System) and runs it. The BIOS program is written into permanent read-only memory and is always available for use. The BIOS provides the lowest level interface to peripheral devices and controls the first step of the boot process.
The BIOS tests the system, looks for and checks peripherals, and then looks for a drive to use to boot the system. Usually it checks the floppy drive (or CD-ROM drive on many newer systems) for bootable media, if present, and then it looks to the hard drive. The order of the drives used for booting is usually controlled by a particular BIOS setting on the system. Once Linux is installed on the hard drive of a system, the BIOS looks for a Master Boot Record (MBR) starting at the first sector on the first hard drive, loads its contents into memory, then passes control to it.
This MBR contains instructions on how to load the GRUB (or LILO) boot-loader, using a pre-selected operating system. The MBR then loads the boot-loader, which takes over the process (if the boot-loader is installed in the MBR). In the default Red Hat Linux configuration, GRUB uses the settings in the MBR to display boot options in a menu. Once GRUB has received the correct instructions for the operating system to start, either from its command line or configuration file, it finds the necessary boot file and hands off control of the machine to that operating system.
This boot method is called direct loading because instructions are used to directly load the operating system, with no intermediary code between the boot-loaders and the operating system's main files (such as the kernel). The boot process used by other operating systems may differ slightly from the above, however. For example, Microsoft's DOS and Windows operating systems completely overwrite anything on the MBR when they are installed without incorporating any of the current MBR's configuration. This destroys any other information stored in the MBR by other operating systems, such as Linux. The Microsoft operating systems, as well as various other proprietary operating systems, are loaded using a chain loading boot method. With this method, the MBR points to the first sector of the partition holding the operating system, where it finds the special files necessary to actually boot that operating system.
GRUB supports both boot methods, allowing you to use it with almost any operating system, most popular file systems, and almost any hard disk your BIOS can recognize.
GRUB contains a number of other features; the most important include:
The kernel, once it is loaded, finds init in sbin and executes it.
When init starts, it becomes the parent or grandparent of all of the processes that start up automatically on your Linux system. The first thing init does, is reading its initialization file, /etc/inittab. This instructs init to read an initial configuration script for the environment, which sets the path, starts swapping, checks the file systems, and so on. Basically, this step takes care of everything that your system needs to have done at system initialization: setting the clock, initializing serial ports and so forth.
Then init continues to read the /etc/inittab file, which describes how the system should be set up in each run level and sets the default run level. A run level is a configuration of processes. All UNIX-like systems can be run in different process configurations, such as the single user mode, which is referred to as run level 1 or run level S (or s). In this mode, only the system administrator can connect to the system. It is used to perform maintenance tasks without risks of damaging the system or user data. Naturally, in this configuration we don't need to offer user services, so they will all be disabled. Another run level is the reboot run level, or run level 6, which shuts down all running services according to the appropriate procedures and then restarts the system.
Commonly, run level 3 is configured to be text mode on a Linux machine, and run level 5 initializes the graphical login and environment. More about run levels in the next section, see Section 4.2.5.
After having determined the default run level for your system, init starts all of the background processes necessary for the system to run by looking in the appropriate rc directory for that run level. init runs each of the kill scripts (their file names start with a K) with a stop parameter. It then runs all of the start scripts (their file names start with an S) in the appropriate run level directory so that all services and applications are started correctly. In fact, you can execute these same scripts manually after the system is finished booting with a command like /etc/init.d/httpd stop or service httpd stop logged in as root, in this case stopping the web server.
None of the scripts that actually start and stop the services are located in /etc/rc<x>.d. Rather, all of the files in /etc/rc<x>.d are symbolic links that point to the actual scripts located in /etc/init.d. A symbolic link is nothing more than a file that points to another file, and is used in this case because it can be created and deleted without affecting the actual scripts that kill or start the services. The symbolic links to the various scripts are numbered in a particular order so that they start in that order. You can change the order in which the services start up or are killed by changing the name of the symbolic link that refers to the script that actually controls the service. You can use the same number multiple times if you want a particular service started or stopped right before or after another service, as in the example below, listing the content of /etc/rc5.d, where crond and xfs are both started from a linkname starting with "S90". In this case, the scripts are started in alphabetical order.
After init has progressed through the run levels to get to the default run level, the /etc/inittab script forks a getty process for each virtual console (login prompt in text mode). getty opens tty lines, sets their modes, prints the login prompt, gets the user's name, and then initiates a login process for that user. This allows users to authenticate themselves to the system and use it. By default, most systems offer 6 virtual consoles, but as you can see from the inittab file, this is configurable.
/etc/inittab can also tell init how it should handle a user pressing Ctrl+Alt+Delete at the console. As the system should be properly shut down and restarted rather than immediately power-cycled, init is told to execute the command /sbin/shutdown -t3 -r now, for instance, when a user hits those keys. In addition, /etc/inittab states what init should do in case of power failures, if your system has a UPS unit attached to it.
On most RPM-based systems the graphical login screen is started in run level 5, where /etc/inittab runs a script called /etc/X11/prefdm. The prefdm script runs the preferred X display manager, based on the contents of the /etc/sysconfig/desktop directory. This is typically gdm if you run GNOME or kdm if you run KDE, but they can be mixed, and there's also the xdm that comes with a standard X installation.
But there are other possibilities as well. On Debian, for instance, There is an initscript for each of the display managers, and the content of the /etc/X11/default-display-manager is used to determine which one to start. More about the graphical interface can be read in Section 7.3. Ultimately, your system documentation will explain the details about the higher level aspects of init.
The /etc/default and/or /etc/sysconfig directories contain entries for a range of functions and services, these are all read at boot time. The location of the directory containing system defaults might be somewhat different depending on your Linux distribution.
Besides the graphical user environment, a lot of other services may be started as well. But if all goes well, you should be looking at a login prompt or login screen when the boot process has finished.
The idea behind operating different services at different run levels essentially revolves around the fact that different systems can be used in different ways. Some services cannot be used until the system is in a particular state, or mode, such as being ready for more than one user or having networking available.
There are times in which you may want to operate the system in a lower mode. Examples are fixing disk corruption problems in run level 1 so no other users can possibly be on the system, or leaving a server in run level 3 without an X session running. In these cases, running services that depend upon a higher system mode to function does not make sense because they will not work correctly anyway. By already having each service assigned to start when its particular run level is reached, you ensure an orderly start up process, and you can quickly change the mode of the machine without worrying about which services to manually start or stop.
Available run levels are generally described in /etc/inittab, which is partially shown below:
Feel free to configure runlevels 2 and 4 as you see fit. Many users configure those runlevels in a way that makes the most sense for them while leaving the standard runlevels 3 and 5 alone. This allows them to quickly move in and out of their custom configuration without disturbing the normal set of features at the standard runlevels.
If your machine gets into a state where it will not boot due to a bad /etc/inittab or will not let you log in because you have a corrupted /etc/passwd file (or if you have simply forgotten your password), boot into single-user mode.
The discussion of run levels, scripts and configurations in this guide tries to be as general as possible. Lots of variations exist. For instance, Gentoo Linux stores scripts in /etc/runlevels. Other systems might first run through (a) lower runlevel(s) and execute all the scripts in there before arriving at the final runlevel and executing those scripts. Refer to your system documentation for more information.
The chkconfig or update-rc.d utilities, when installed on your system, provide a simple command-line tool for maintaining the /etc/init.d directory hierarchy. These relieve system administrators from having to directly manipulate the numerous symbolic links in the directories under /etc/rc[x].d.
In addition, some systems offer the ntsysv tool, which provides a text-based interface; you may find this easier to use than chkconfig's command-line interface. On SuSE Linux, you will find the yast and insserv tools. For Mandrake easy configuration, you may want to try DrakConf, which allows among other features switching between run levels 3 and 5. In Mandriva this became the Mandriva Linux COntrol Center.
Most distributions provide a graphical user interface for configuring processes, check with your system documentation.
All of these utilities must be run as root. The system administrator may also manually create the appropriate links in each run level directory in order to start or stop a service in a certain run level.
UNIX was not made to be shut down, but if you really must, use the shutdown command. After completing the shutdown procedure, the -h option will halt the system, while -r will reboot it.
The reboot and halt commands are now able to invoke shutdown if run when the system is in runlevels 1-5, and thus ensure proper shutdown of the system,but it is a bad habit to get into, as not all UNIX/Linux versions have this feature.
If your computer does not power itself down, you should not turn off the computer until you see a message indicating that the system is halted or finished shutting down, in order to give the system the time to unmount all partitions. Being impatient may cause data loss.
While managing system resources, including processes, is a task for the local system administrator, it doesn't hurt a common user to know something about it, especially where his or her own processes and their optimal execution are concerned.
We will explain a little bit on a theoretical level about system performance, though not as far as hardware optimization and such. Instead, we will study the daily problems a common user is confronted with, and actions such a user can take to optimally use the resources available. As we learn in the next section, this is mainly a matter of thinking before acting.
Bash offers a built-in time command that displays how long a command takes to execute. The timing is highly accurate and can be used on any command. In the example below, it takes about a minute and a half to make this book:
The GNU time command in /usr/bin (as opposed to the shell built-in version) displays more information that can be formatted in different ways. It also shows the exit status of the command, and the total elapsed time. The same command as the above using the independent time gives this output:
Refer again to the Info pages for all the information.
To a user, performance means quick execution of commands. To a system manager, on the other hand, it means much more: the system admin has to optimize system performance for the whole system, including users, all programs and daemons. System performance can depend on a thousand tiny things which are not accounted for with the time command:
In short: the load depends on what is normal for your system. My old P133 running a firewall, SSH server, file server, a route daemon, a sendmail server, a proxy server and some other services doesn't complain with 7 users connected; the load is still 0 on average. Some (multi-CPU) systems I've seen were quite happy with a load of 67. There is only one way to find out - check the load regularly if you want to know what's normal. If you don't, you will only be able to measure system load from the response time of the command line, which is a very rough measurement since this speed is influenced by a hundred other factors.
Keep in mind that different systems will behave different with the same load average. For example, a system with a graphics card supporting hardware acceleration will have no problem rendering 3D images, while the same system with a cheap VGA card will slow down tremendously while rendering. My old P133 will become quite uncomfortable when I start the X server, but on a modern system you hardly notice the difference in the system load.
A big environment can slow you down. If you have lots of environment variables set (instead of shell variables), long search paths that are not optimized (errors in setting the path environment variable) and such, the system will need more time to search and read data.
In X, window managers and desktop environments can be real CPU-eaters. A really fancy desktop comes with a price, even when you can download it for free, since most desktops provide add-ons ad infinitum. Modesty is a virtue if you don't buy a new computer every year.
The priority or importance of a job is defined by it's nice number. A program with a high nice number is friendly to other programs, other users and the system; it is not an important job. The lower the nice number, the more important a job is and the more resources it will take without sharing them.
Making a job nicer by increasing its nice number is only useful for processes that use a lot of CPU time (compilers, math applications and such). Processes that always use a lot of I/O time are automatically rewarded by the system and given a higher priority (a lower nice number), for example keyboard input always gets highest priority on a system.
Defining the priority of a program is done with the nice command.
Most systems also provide the BSD renice command, which allows you to change the niceness of a running command. Again, read the man page for your system-specific information.
Use of these commands is usually a task for the system administrator. Read the man page for more info on extra functionality available to the system administrator.
On every Linux system, many programs want to use the CPU(s) at the same time, even if you are the only user on the system. Every program needs a certain amount of cycles on the CPU to run. There may be times when there are not enough cycles because the CPU is too busy. The uptime command is wildly inaccurate (it only displays averages, you have to know what is normal), but far from being useless. There are some actions you can undertake if you think your CPU is to blame for the unresponsiveness of your system:
If none of these solutions are an option in your particular situation, you may want to upgrade your CPU. On a UNIX machine this is a job for the system admin.
When the currently running processes expect more memory than the system has physically available, a Linux system will not crash; it will start paging, or swapping, meaning the process uses the memory on disk or in swap space, moving contents of the physical memory (pieces of running programs or entire programs in the case of swapping) to disk, thus reclaiming the physical memory to handle more processes. This slows the system down enormously since access to disk is much slower than access to memory. The top command can be used to display memory and swap use. Systems using glibc offer the memusage and memusagestat commands to visualize memory usage.
If you find that a lot of memory and swap space are being used, you can try:
While I/O limitations are a major cause of stress for system admins, the Linux system offers rather poor utilities to measure I/O performance. The ps, vmstat and top tools give some indication about how many programs are waiting for I/O; netstat displays network interface statistics, but there are virtually no tools available to measure the I/O response to system load, and the iostat command gives a brief overview of general I/O usage. Various graphical front-ends exist to put the output of these commands in a humanly understandable form.
Each device has its own problems, but the bandwidth available to network interfaces and the bandwidth available to disks are the two primary causes of bottlenecks in I/O performance.
Network I/O problems:
Disk I/O problems:
This kind of problem is more difficult to detect, and usually takes extra hardware in order to re-divide data streams over buses, controllers and disks, if overloaded hardware is cause of the problem. One solution to solve this is a RAID array configuration optimized for input and output actions. This way, you get to keep the same hardware. An upgrade to faster buses, controlers and disks is usually the other option.
If overload is not the cause, maybe your hardware is gradually failing, or not well connected to the system. Check contacts, connectors and plugs to start with.
Users can be divided in several classes, depending on their behavior with resource usage:
You can see that system requirements may vary for each class of users, and that it can be hard to satisfy everyone. If you are on a multi-user system, it is useful (and fun) to find out habits of other users and the system, in order to get the most out of it for your specific purposes.
For the graphical environment, there are a whole bunch of monitoring tools available. Below is a screen shot of the Gnome System Monitor, which has features for displaying and searching process information, and monitoring system resources:
There are also a couple of handy icons you can install in the task bar, such as a disk, memory and load monitor. xload is another small X application for monitoring system load. Find your favorite!
As a non-privileged user, you can only influence your own processes. We already saw how you can display processes and filter out processes that belong to a particular user, and what possible restrictions can occur. When you see that one of your processes is eating too much of the system's resources, there are two things that you can do:
In the case that you want the process to continue to run, but you also want to give the other processes on the system a chance, you can renice the process. Appart from using the nice or renice commands, top is an easy way of spotting the troublesome process(es) and reducing priority.
Identify the process in the "NI" column, it will most likely have a negative priority. Type r and enter the process ID of the process that you want to renice. Then enter the nice value, for instance "20". That means that from now on, this process will take 1/5 of the CPU cycles at the most.
Examples of processes that you want to keep on running are emulators, virtual machines, compilers and so on.
If you want to stop a process because it hangs or is going totally berserk in the way of I/O consumption, file creation or use of other system resources, use the kill command. If you have the opportunity, first try to kill the process softly, sending it the SIGTERM signal. This is an instruction to terminate whatever it is doing, according to procedures as described in the code of the program:
In the example above, user joe stopped his Mozilla browser because it hung.
Some processes are a little bit harder to get rid of. If you have the time, you might want to send them the SIGINT signal to interrupt them. If that does not do the trick either, use the strongest signal, SIGKILL. In the example below, joe stops a Mozilla that is frozen:
In such cases, you might want to check that the process is really dead, using the grep filter again on the PID. If this only returns the grep process, you can be sure that you succeeded in stopping the process.
Among processes that are hard to kill is your shell. And that is a good thing: if they would be easy to kill, you woud loose your shell every time you type Ctrl-C on the command line accidentally, since this is equivalent to sending a SIGINT.
In a graphical environment, the xkill program is very easy to use. Just type the name of the command, followed by an Enter and select the window of the application that you want to stop. It is rather dangerous because it sends a SIGKILL by default, so only use it when an application hangs.
A Linux system can have a lot to suffer from, but it usually suffers only during office hours. Whether in an office environment, a server room or at home, most Linux systems are just idling away during the morning, the evening, the nights and weekends. Using this idle time can be a lot cheaper than buying those machines you'd absolutely need if you want everything done at the same time.
There are three types of delayed execution:
The following sections discuss each possibility.
The Info page on sleep is probably one of the shortest there is. All sleep does is wait. By default the time to wait is expressed in seconds.
So why does it exist? Some practical examples:
Somebody calls you on the phone, you say "Yes I'll be with you in half an hour" but you're about drowned in work as it is and bound to forget your lunch:
(sleep 1800; echo "Lunch time..") &
When you can't use the at command for some reason, it's five o'clock, you want to go home but there's still work to do and right now somebody is eating system resources:
(sleep 10000; myprogram) &
Make sure there's an auto-logout on your system, and that you log out or lock your desktop/office when submitting this kind of job, or run it in a screen session.
When you run a series of printouts of large files, but you want other users to be able to print in between:
lp lotoftext; sleep 900; lp hugefile; sleep 900; lp anotherlargefile
Printing files is discussed in Chapter 8.
Programmers often use the sleep command to halt script or program execution for a certain time.
The at command executes commands at a given time, using your default shell unless you tell the command otherwise (see the man page).
The options to at are rather user-friendly, which is demonstrated in the examples below:
Typing Ctrl+D quits the at utility and generates the "EOT" message.
User steven does a strange thing here combining two commands; we will study this sort of practice in Chapter 5, Redirecting Input and Output.
The -m option sends mail to the user when the job is done, or explains when a job can't be done. The command atq lists jobs; perform this command before submitting jobs in order prevent them from starting at the same time as others. With the atrm command you can remove scheduled jobs if you change your mind.
It is a good idea to pick strange execution times, because system jobs are often run at "round" hours, as you can see in Section 4.4.4 the next section. For example, jobs are often run at exactly 1 o'clock in the morning (e.g. system indexing to update a standard locate database), so entering a time of 0100 may easily slow your system down rather than fire it up. To prevent jobs from running all at the same time, you may also use the batch command, which queues processes and feeds the work in the queue to the system in an evenly balanced way, preventing excessive bursts of system resource usage. See the Info pages for more information.
The cron system is managed by the cron daemon. It gets information about which programs and when they should run from the system's and users' crontab entries. Only the root user has access to the system crontabs, while each user should only have access to his own crontabs. On some systems (some) users may not have access to the cron facility.
At system startup the cron daemon searches /var/spool/cron/ for crontab entries which are named after accounts in /etc/passwd, it searches /etc/cron.d/ and it searches /etc/crontab, then uses this information every minute to check if there is something to be done. It executes commands as the user who owns the crontab file and mails any output of commands to the owner.
On systems using Vixie cron, jobs that occur hourly, daily, weekly and monthly are kept in separate directories in /etc to keep an overview, as opposed to the standard UNIX cron function, where all tasks are entered into one big file.
Example of a Vixie crontab file:
Some variables are set, and after that there's the actual scheduling, one line per job, starting with 5 time and date fields. The first field contains the minutes (from 0 to 59), the second defines the hour of execution (0-23), the third is day of the month (1-31), then the number of the month (1-12), the last is day of the week (0-7, both 0 and 7 are Sunday). An asterisk in these fields represents the total acceptable range for the field. Lists are allowed; to execute a job from Monday to Friday enter 1-5 in the last field, to execute a job on Monday, Wednesday and Friday enter 1,3,5.
Then comes the user who should run the processes which are listed in the last column. The example above is from a Vixie cron configuration where root runs the program run-parts on regular intervals, with the appropriate directories as options. In these directories, the actual jobs to be executed at the scheduled time are stored as shell scripts, like this little script that is run daily to update the database used by the locate command:
Users are supposed to edit their crontabs in a safe way using the crontab -e command. This will prevent a user from accidentally opening more than one copy of his/her crontab file. The default editor is vi (see Chapter 6, but you can use any text editor, such as gvim or gedit if you feel more comfortable with a GUI editor.
When you quit, the system will tell you that a new crontab is installed.
This crontab entry reminds billy to go to his sports club every Thursday night:
After adding a new scheduled task, the system will tell you that a new crontab is installed. You do not need to restart the crond daemon for the changes to take effect. In the example, billy added a new line pointing to a backup script:
The backup.sh script is executed every Thursday and Sunday. See Section 7.2.5 for an introduction to shell scripting. Keep in mind that output of commands, if any, is mailed to the owner of the crontab file. If no mail service is configured, you might find the output of your commands in your local mailbox, /var/spool/mail/<your_username>, a plain text file.
Linux is a multi-user, multi-tasking operating system that has a UNIX-like way of handling processes. Execution speed of commands can depend on a thousand tiny things. Among others, we learned a lot of new commands to visualize and handle processes. Here's a list:
Table 4-3. Process handling commands
These are some exercises that will help you get the feel for processes running on your system.
Most Linux commands read input, such as a file or another attribute for the command, and write output. By default, input is being given with the keyboard, and output is displayed on your screen. Your keyboard is your "standard input" (stdin) device, and the screen or a particular terminal window is the "standard output" (stdout) device.
However, since Linux is a flexible system, these default settings don't necessarily have to be applied. The standard output, for example, on a heavily monitored server in a large environment may be a printer.
Sometimes you will want to put output of a command in a file, or you may want to issue another command on the output of one command. This is known as redirecting output. Redirection is done using either the ">" (greater-than symbol), or using the "|" (pipe) operator which sends the standard output of one command to another command as standard input.
As we saw before, the cat command concatenates files and puts them all together to the standard output. By redirecting this output to a file, this file name will be created - or overwritten if it already exists, so take care.
Redirecting "nothing" to an existing file is equal to emptying the file:
This process is called truncating.
The same redirection to an nonexistent file will create a new empty file with the given name:
Chapter 7 gives some more examples on the use of this sort of redirection.
Some examples using piping of commands:
To find a word within some text, display all lines matching "pattern1", and exclude lines also matching "pattern2" from being displayed:
grep pattern1 file | grep -v pattern2
To display output of a directory listing one page at a time:
ls -la | less
To find a file in a directory:
ls -l | grep part_of_file_name
In another case, you may want a file to be the input for a command that normally wouldn't accept a file as an option. This redirecting of input is done using the "<" (less-than symbol) operator.
Below is an example of sending a file to somebody, using input redirection.
If the user mike exists on the system, you don't need to type the full address. If you want to reach somebody on the Internet, enter the fully qualified address as an argument to mail.
This reads a bit more difficult than the beginner's cat file | mail someone, but it is of course a much more elegant way of using the available tools.
The following example combines input and output redirection. The file text.txt is first checked for spelling mistakes, and the output is redirected to an error log file:
spell < text.txt > error.log
The following command lists all commands that you can issue to examine another file when using less:
The -i option is used for case-insensitive searches - remember that UNIX systems are very case-sensitive.
If you want to save output of this command for future reference, redirect the output to a file:
Output of one command can be piped into another command virtually as many times as you want, just as long as these commands would normally read input from standard input and write output to the standard output. Sometimes they don't, but then there may be special options that instruct these commands to behave according to the standard definitions; so read the documentation (man and info pages) of the commands you use if you should encounter errors.
Again, make sure you don't use names of existing files that you still need. Redirecting output to existing files will replace the content of those files.
Instead of overwriting file data, you can also append text to an existing file using two subsequent greater-than signs:
The date command would normally put the last line on the screen; now it is appended to the file wishlist.
There are three types of I/O, which each have their own identifier, called a file descriptor:
In the following descriptions, if the file descriptor number is omitted, and the first character of the redirection operator is <, the redirection refers to the standard input (file descriptor 0). If the first character of the redirection operator is >, the redirection refers to the standard output (file descriptor 1).
Some practical examples will make this more clear:
ls > dirlist 2>&1
will direct both standard output and standard error to the file dirlist, while the command
ls 2>&1 > dirlist
will only direct standard output to dirlist. This can be a useful option for programmers.
Things are getting quite complicated here, don't confuse the use of the ampersand here with the use of it in Section 184.108.40.206, where the ampersand is used to run a process in the background. Here, it merely serves as an indication that the number that follows is not a file name, but rather a location that the data stream is pointed to. Also note that the bigger-than sign should not be separated by spaces from the number of the file descriptor. If it would be separated, we would be pointing the output to a file again. The example below demonstrates this:
The first command that nancy executes is correct (eventhough no errors are generated and thus the file to which standard error is redirected is empty). The second command expects that 2 is a file name, which does not exist in this case, so an error is displayed.
All these features are explained in detail in the Bash Info pages.
If your process generates a lot of errors, this is a way to thoroughly examine them:
command 2>&1 | less
This is often used when creating new software using the make command, such as in:
Constructs like these are often used by programmers, so that output is displayed in one terminal window, and errors in another. Find out which pseudo terminal you are using issuing the tty command first:
You can use the tee command to copy input to standard output and one or more output files in one move. Using the -a option to tee results in appending input to the file(s). This command is useful if you want to both see and save output. The > and >> operators do not allow to perform both actions simultaneously.
This tool is usually called on through a pipe (|), as demonstrated in the example below:
When a program performs operations on input and writes the result to the standard output, it is called a filter. One of the most common uses of filters is to restructure output. We'll discuss a couple of the most important filters below.
As we saw in Section 220.127.116.11, grep scans the output line per line, searching for matching patterns. All lines containing the pattern will be printed to standard output. This behavior can be reversed using the -v option.
Some examples: suppose we want to know which files in a certain directory have been modified in February:
The grep command, like most commands, is case sensitive. Use the -i option to make no difference between upper and lower case. A lot of GNU extensions are available as well, such as --colour, which is helpful to highlight searchterms in long lines, and --after-context, which prints the number of lines after the last matching line. You can issue a recursive grep that searches all subdirectories of encountered directories using the -r option. As usual, options can be combined.
Regular expressions can be used to further detail the exact character matches you want to select out of all the input lines. The best way to start with regular expressions is indeed to read the grep documentation. An excellent chapter is included in the info grep page. Since it would lead us too far discussing the ins and outs of regular expressions, it is strongly advised to start here if you want to know more about them.
Play around a bit with grep, it will be worth the trouble putting some time in this most basic but very powerful filtering command. The exercises at the end of this chapter will help you to get started, see Section 5.5.
The command sort arranges lines in alphabetical order by default:
But there are many more things sort can do. Looking at the file size, for instance. With this command, directory content is sorted smallest files first, biggest files last:
ls -la | sort -nk 5
The sort command is also used in combination with the uniq program (or sort -u) to sort output and filter out double entries:
In this chapter we learned how commands can be linked to each other, and how input from one command can be used as output for another command.
Input/output redirection is a common task on UNIX and Linux machines. This powerful mechanism allows flexible use of the building blocks UNIX is made of.
The most commonly used redirections are > and |.
These exercises give more examples on how to combine commands. The main goal is to try and use the Enter key as little as possible.
All exercises are done using a normal user ID, so as to generate some errors. While you're at it, don't forget to read those man pages!
It is very important to be able to use at least one text mode editor. Knowing how to use an editor on your system is the first step to independence.
We will need to master an editor by the next chapter as we need it to edit files that influence our environment. As an advanced user, you may want to start writing scripts, or books, develop websites or new programs. Mastering an editor will immensely improve your productivity as well as your capabilities.
Our focus is on text editors, which can also be used on systems without a graphical environment and in terminal windows. The additional advantage of mastering a text editor is in using it on remote machines. Since you don't need to transfer the entire graphical environment over the network, working with text editors tremendously improves network speed.
There are, as usual, multiple ways to handle the problem. Let's see what editors are commonly available:
The ed editor is line-oriented and used to create, display, modify and otherwise manipulate text files, both interactively and by use in shell scripts.
ed is the original text editor on UNIX machines, and thus widely available. For most purposes, however, it is superceded by full-screen editors such as emacs and vi, see below.
Emacs is the extensible, customizable, self-documenting, real-time display editor, known on many UNIX and other systems. The text being edited is visible on the screen and is updated automatically as you type your commands. It is a real-time editor because the display is updated very frequently, usually after each character or pair of characters you type. This minimizes the amount of information you must keep in your head as you edit. Emacs is called advanced because it provides facilities that go beyond simple insertion and deletion: controlling subprocesses; automatic indentation of programs; viewing two or more files at once; editing formatted text; and dealing in terms of characters, words, lines, sentences, paragraphs, and pages, as well as expressions and comments in several different programming languages.
Self-documenting means that at any time you can type a special character, Ctrl+H, to find out what your options are. You can also use it to find out what any command does, or to find all the commands that pertain to a topic. Customizable means that you can change the definitions of Emacs commands in little ways. For example, if you use a programming language in which comments start with `<**' and end with `**>', you can tell the Emacs comment manipulation commands to use those strings. Another sort of customization is rearrangement of the command set. For example, if you prefer the four basic cursor motion commands (up, down, left and right) on keys in a diamond pattern on the keyboard, you can rebind the keys that way.
Extensible means that you can go beyond simple customization and write entirely new commands, programs in the Lisp language that are run by Emacs's own Lisp interpreter. Emacs is an online extensible system, which means that it is divided into many functions that call each other, any of which can be redefined in the middle of an editing session. Almost any part of Emacs can be replaced without making a separate copy of all of Emacs. Most of the editing commands of Emacs are written in Lisp already; the few exceptions could have been written in Lisp but are written in C for efficiency. Although only a programmer can write an extension, anybody can use it afterward.
When run under the X Window System (started as xemacs) Emacs provides its own menus and convenient bindings to mouse buttons. But Emacs can provide many of the benefits of a window system on a text-only terminal. For instance, you can look at or edit several files at once, move text between files, and edit files while running shell commands.
Vim stands for Vi IMproved. It used to be Vi IMitation, but there are so many improvements that a name change was appropriate. Vim is a text editor which includes almost all the commands from the UNIX program vi and a lot of new ones.
Commands in the vi editor are entered using only the keyboard, which has the advantage that you can keep your fingers on the keyboard and your eyes on the screen, rather than moving your arm repeatedly to the mouse. For those who want it, mouse support and a GUI version with scrollbars and menus can be activated.
We will refer to vi or vim throughout this book for editing files, while you are of course free to use the editor of your choice. However, we recommend to at least get the vi basics in the fingers, because it is the standard text editor on almost all UNIX systems, while emacs can be an optional package. There may be small differences between different computers and terminals, but the main point is that if you can work with vi, you can survive on any UNIX system.
Apart from the vim command, the vIm packages may also provide gvim, the Gnome version of vim. Beginning users might find this easier to use, because the menus offer help when you forgot or don't know how to perform a particular editing task using the standard vim commands.
The vi editor is a very powerful tool and has a very extensive built-in manual, which you can activate using the :help command when the program is started (instead of using man or info, which don't contain nearly as much information). We will only discuss the very basics here to get you started.
What makes vi confusing to the beginner is that it can operate in two modes: command mode and insert mode. The editor always starts in command mode. Commands move you through the text, search, replace, mark blocks and perform other editing tasks, and some of them switch the editor to insert mode.
This means that each key has not one, but likely two meanings: it can either represent a command for the editor when in command mode, or a character that you want in a text when in insert mode.
Moving through the text is usually possible with the arrow keys. If not, try:
SHIFT-G will put the prompt at the end of the document.
Pressing the Esc key switches back to command mode. If you're not sure what mode you're in because you use a really old version of vi that doesn't display an "INSERT" message, type Esc and you'll be sure to return to command mode. It is possible that the system gives a little alert when you are already in command mode when hitting Esc, by beeping or giving a visual bell (a flash on the screen). This is normal behavior.
Instead of reading the text, which is quite boring, you can use the vimtutor to learn you first Vim commands. This is a thirty minute tutorial that teaches the most basic Vim functionality in eight easy exercises. While you can't learn everything about vim in just half an hour, the tutor is designed to describe enough of the commands that you will be able to easily use Vim as an all-purpose editor.
In UNIX and MS Windows, if Vim has been properly installed, you can start this program from the shell or command line, entering the vimtutor command. This will make a copy of the tutor file, so that you can edit it without the risk of damaging the original. There are a few translated versions of the tutor. To find out if yours is available, use the two-letter language code. For French this would be vimtutor fr (if installed on the system).
Throughout the last decade the office domain has typically been dominated by MS Office, and, let's face it: the Microsoft Word, Excel and PowerPoint formats are industry standards that you will have to deal with sooner or later.
This monopoly situation of Microsoft proved to be a big disadvantage for getting new users to Linux, so a group of German developers started the StarOffice project, that was, and is still, aimed at making an MS Office clone. Their company, StarDivision, was acquired by Sun Microsystems by the end of the 1990s, just before the 5.2 release. Sun continues development but restricted access to the sources. Nevertheless, development on the original set of sources continues in the Open Source community, which had to rename the project to OpenOffice. OpenOffice is now available for a variety of platforms, including MS Windows, Linux, MacOS and Solaris. There is a screenshot in Section 1.3.2.
Almost simultaneously, a couple of other quite famous projects took off. Also a very common alternative to using MS Office is KOffice, the office suite that used to be popular among SuSE users. Like the original, this clone incorporates an MS Word and Excel compatible program, and much more.
Smaller projects deal with particular programs of the MS example suite, such as Abiword and MS Wordview for compatibility with MS Word documents, and Gnumeric for viewing and creating Excel compatible spreadsheets.
Current distributions usually come with all the necessary tools. Since these provide excellent guidelines and searchable indexes in themenus, we won't discuss them in detail. For references, see you system documentation or the web sites of the projects, such as
Try to limit the use of office documents for the purposes they were meant for: the office.
An example: it drives most Linux users crazy if you send them a mail that says in the body something like: "Hello, I want to tell you something, see attach", and then the attachement proves to be an MS Word compatible document like: "Hello my friend, how is your new job going and will you have time to have lunch with me tomorrow?" Also a bad idea is the attachment of your signature in such a file, for instance. If you want to sign messages or files, use GPG, the PGP-compatible GNU Privacy Guard or SSL (Secure Socket Layer) certificates.
These users are not annoyed because they are unable to read these documents, or because they are worried that these formats typically generate much larger files, but rather because of the implication that they are using MS Windows, and possibly because of the extra work of starting some additional programs.
In the next chapter, we start configuring our environment, and this might include editing all kinds of files that determine how a program behave.
Don't edit these files with any office component!
The default file format specification would make the program add several lines of code, defining the format of the file and the fonts used. These lines won't be interpreted in the correct way by the programs depending on them, resulting in errors or a crash of the program reading the file. In some cases, you can save the file as plain text, but you'll run into trouble when making this a habit.
If you really insist, try gedit, kedit, kwrite or xedit; these programs only do text files, which is what we will be needing. If you plan on doing anything serious, though, stick to a real text mode editor such as vim or emacs.
An acceptable alternative is gvim, the Gnome version of vim. You still need to use vi commands, but if you are stuck, you can look them up in the menus.
In this chapter we learned to use an editor. While it depends on your own individual preference which one you use, it is necessary to at least know how to use one editor.
The vi editor is available on every UNIX system.
Most Linux distributions include an office suite and a graphical text editor.
As we mentioned before, it is easy enough to make a mess of the system. We can't put enough stress on the importance of keeping the place tidy. When you learn this from the start, it will become a good habit that will save you time when programming on a Linux or UNIX system or when confronted with system management tasks. Here are some ways of making life easier on yourself:
On some systems, the quota system may force you to clean up from time to time, or the physical limits of your hard disk may force you to make more space without running any monitoring programs. This section discusses a number of ways, besides using the rm command, to reclaim disk space.
Run the quota -v command to see how much space is left.
Sometimes the content of a file doesn't interest you, but you need the file name as a marker (for instance, you just need the timestamp of a file, a reminder that the file was there or should be there some time in the future). Redirecting the output of a null command is how this is done in the Bourne and Bash shells:
The process of reducing an existing file to a file with the same name that is 0 bytes large is called "truncating."
For creating a new empty file, the same effect is obtained with the touch command. On an existing file, touch will only update the timestamp. See the Info pages on touch for more details.
To "almost" empty a file, use the tail command. Suppose user andy's wishlist becomes rather long because he always adds stuff at the end but never deletes the things he actually gets. Now he only wants to keep the last five items:
Some Linux programs insist on writing all sorts of output in a log file. Usually there are options to only log errors, or to log a minimal amount of information, for example setting the debugging level of the program. But even then, you might not care about the log file. Here are some ways to get rid of them or at least set some limits to their size:
Regularly clean out your mailbox, make sub-folders and automatic redirects using procmail (see the Info pages) or the filters of your favorite mail reading application. If you have a trash folder, clean it out on a regular basis.
To redirect mail, use the .forward file in your home directory. The Linux mail service looks for this file whenever it has to deliver local mail. The content of the file defines what the mail system should do with your mail. It can contain a single line holding a fully qualified E-mail address. In that case the system will send all your mail to this address. For instance, when renting space for a website, you might want to forward the mail destined for the webmaster to your own account in order not to waste disk space. The webmaster's .forward may look like this:
Using mail forwarding is also useful to prevent yourself from having to check several different mailboxes. You can make every address point to a central and easily accessible account.
You can ask your system administrator to define a forward for you in the local mail aliases file, like when an account is being closed but E-mail remains active for a while.
When several users need access to the same file or program, when the original file name is too long or too difficult to remember, use a symbolic link instead of a separate copy for each user or purpose.
Multiple symbolic links may have different names, e.g. a link may be called monfichier in one user's directory, and mylink in another's. Multiple links (different names) to the same file may also occur in the same directory. This is often done in the /lib directory: when issuing the command
ls -l /lib
you will see that this directory is plenty of links pointing to the same files. These are created so that programs searching for one name would not get stuck, so they are pointed to the correct/current name of the libraries they need.
The shell contains a built-in command to limit file sizes, ulimit, which can also be used to display limitations on system resources:
Cindy is not a developer and doesn't care about core dumps, which contain debugging information on a program. If you do want core dumps, you can set their size using the ulimit command. Read the Info pages on bash for a detailed explanation.
Compressed files are useful because they take less space on your hard disk. Another advantage is that it takes less bandwidth to send a compressed file over your network. A lot of files, such as the man pages, are stored in a compressed format on your system. Yet unpacking these to get a little bit of information and then having to compress them again is rather time-consuming. You don't want to unpack a man page, for instance, read about an option to a command and then compress the man page again. Most people will probably forget to clean up after they found the information they needed.
So we have tools that work on compressed files, by uncompressing them only in memory. The actual compressed file stays on your disk as it is. Most systems support zgrep, zcat, bzless and such to prevent unnecessary decompressing/compressing actions. See your system's binary directory and the Info pages.
See Chapter 9 for more on the actual compressing of files and examples on making archives.
We already mentioned a couple of environment variables, such as PATH and HOME. Until now, we only saw examples in which they serve a certain purpose to the shell. But there are many other Linux utilities that need information about you in order to do a good job.
What other information do programs need apart from paths and home directories?
A lot of programs want to know about the kind of terminal you are using; this information is stored in the TERM variable. In text mode, this will be the linux terminal emulation, in graphical mode you are likely to use xterm. Lots of programs want to know what your favorite editor is, in case they have to start an editor in a subprocess. The shell you are using is stored in the SHELL variable, the operating system type in OS and so on. A list of all variables currently defined for your session can be viewed entering the printenv command.
The environment variables are managed by the shell. As opposed to regular shell variables, environment variables are inherited by any program you start, including another shell. New processes are assigned a copy of these variables, which they can read, modify and pass on in turn to their own child processes.
There is nothing special about variable names, except that the common ones are in upper case characters by convention. You may come up with any name you want, although there are standard variables that are important enough to be the same on every Linux system, such as PATH and HOME.
An individual variable's content is usually displayed using the echo command, as in these examples:
If you want to change the content of a variable in a way that is useful to other programs, you have to export the new value from your environment into the environment that runs these programs. A common example is exporting the PATH variable. You may declare it as follows, in order to be able to play with the flight simulator software that is in /opt/FlightGear/bin:
This instructs the shell to not only search programs in the current path, $PATH, but also in the additional directory /opt/FlightGear/bin.
However, as long as the new value of the PATH variable is not known to the environment, things will still not work:
Exporting variables is done using the shell built-in command export:
In Bash, we normally do this in one elegant step:
The same technique is used for the MANPATH variable, that tells the man command where to look for compressed man pages. If new software is added to the system in new or unusual directories, the documentation for it will probably also be in an unusual directory. If you want to read the man pages for the new software, extend the MANPATH variable:
You can avoid retyping this command in every window you open by adding it to one of your shell setup files, see Section 7.2.2.
The following table gives an overview of the most common predefined variables:
Table 7-1. Common environment variables
A lot of variables are not only predefined but also preset, using configuration files. We discuss these in the next section.
When entering the ls -al command to get a long listing of all files, including the ones starting with a dot, in your home directory, you will see one or more files starting with a . and ending in rc. For the case of bash, this is .bashrc. This is the counterpart of the system-wide configuration file /etc/bashrc.
When logging into an interactive login shell, login will do the authentication, set the environment and start your shell. In the case of bash, the next step is reading the general profile from /etc, if that file exists. bash then looks for ~/.bash_profile, ~/.bash_login and ~/.profile, in that order, and reads and executes commands from the first one that exists and is readable. If none exists, /etc/bashrc is applied.
When a login shell exits, bash reads and executes commands from the file ~/.bash_logout, if it exists.
This procedure is explained in detail in the login and bash man pages.
Let's look at some of these config files. First /etc/profile is read, in which important variables such as PATH, USER and HOSTNAME are set:
These lines check the path to set: if root opens a shell (user ID 0), it is checked that /sbin, /usr/sbin and /usr/local/sbin are in the path. If not, they are added. It is checked for everyone that /usr/X11R6/bin is in the path.
All trash goes to /dev/null if the user doesn't change this setting.
Here general variables are assigned their proper values.
If the variable INPUTRC is not set, and there is no .inputrc in the user's home directory, then the default input control file is loaded.
All variables are exported, so that they are available to other programs requesting information about your environment.
All readable shell scripts from the /etc/profile.d directory are read and executed. These do things like enabling color-ls, aliasing vi to vim, setting locales etc. The temporary variable i is unset to prevent it from disturbing shell behavior later on.
Then bash looks for a .bash_profile in the user's home directory:
This very straight forward file instructs your shell to first read ~/.bashrc and then ~/.bash_login. You will encounter the source built-in shell command regularly when working in a shell environment: it is used to apply configuration changes to the current environment.
The ~/.bash_login file defines default file protection by setting the umask value, see Section 18.104.22.168. The ~/.bashrc file is used to define a bunch of user-specific aliases and functions and personal environment variables. It first reads /etc/bashrc, which describes the default prompt (PS1) and the default umask value. After that, you can add your own settings. If no ~/.bashrc exists, /etc/bashrc is read by default:
Upon logout, the commands in ~/.bash_logout are executed, which can for instance clear the terminal, so that you have a clean window upon logging out of a remote session, or upon leaving the system console:
Let's take a closer look at how these scripts work in the next section. Keep info bash close at hand.
The Bash prompt can do much more than displaying such simple information as your user name, the name of your machine and some indication about the present working directory. We can add other information such as the current date and time, number of connected users etc.
Before we begin, however, we will save our current prompt in another environment variable:
When we change the prompt now, for example by issuing the command PS1="->", we can always get our original prompt back with the command PS1=$MYPROMPT. You will, of course, also get it back when you reconnect, as long as you just fiddle with the prompt on the command line and avoid putting it in a shell configuration file.
In order to understand these prompts and the escape sequences used, we refer to the Bash Info or man pages.
Variables are exported so the subsequently executed commands will also know about the environment. The prompt configuration line that you want is best put in your shell configuration file, ~/.bashrc.
If you want, prompts can execute shell scripts and behave different under different conditions. You can even have the prompt play a tune every time you issue a command, although this gets boring pretty soon. More information can be found in the Bash-Prompt HOWTO.
A shell script is, as we saw in the shell configuration examples, a text file containing shell commands. When such a file is used as the first non-option argument when invoking Bash, and neither the `-c' nor `-s' option is supplied, Bash reads and executes commands from the file, then exits. This mode of operation creates a non-interactive shell. When Bash runs a shell script, it sets the special parameter `0' to the name of the file, rather than the name of the shell, and the positional parameters are set to the remaining arguments, if any are given. If no additional arguments are supplied, the positional parameters are unset.
A shell script may be made executable by using the chmod command to turn on the execute bit. When Bash finds such a file while searching the PATH for a command, it spawns a sub-shell to execute it. In other words, executing
is equivalent to executing
bash filename ARGUMENTS
if `filename' is an executable shell script. This sub-shell reinitializes itself, so that the effect is as if a new shell had been invoked to interpret the script, with the exception that the locations of commands remembered by the parent (see hash in the Info pages) are retained by the child.
Most versions of UNIX make this a part of the operating system's command execution mechanism. If the first line of a script begins with the two characters `#!', the remainder of the line specifies an interpreter for the program. Thus, you can specify bash, awk, perl or some other interpreter or shell and write the rest of the script file in that language.
The arguments to the interpreter consist of a single optional argument following the interpreter name on the first line of the script file, followed by the name of the script file, followed by the rest of the arguments. Bash will perform this action on operating systems that do not handle it themselves.
Bash scripts often begin with `#! /bin/bash' (assuming that Bash has been installed in `/bin'), since this ensures that Bash will be used to interpret the script, even if it is executed under another shell.
A very simple script consisting of only one command, that says hello to the user executing it:
The script actually consists of only one command, echo, which uses the value of ($) the USER environment variable to print a string customized to the user issuing the command.
Another one-liner, used for displaying connected users:
Here is a script consisting of some more lines, that I use to make backup copies of all files in a directory. The script first makes a list of all the files in the current directory and puts it in the variable LIST. Then it sets the name of the copy for each file, and then it copies the file. For each file, a message is printed:
Just entering a line like mv * *.old won't work, as you will notice when trying this on a set of test files. An echo command was added in order to display some activity. echo's are generally useful when a script won't work: insert one after each doubted step and you will find the error in no time.
The /etc/rc.d/init.d directory contains loads of examples. Let's look at this script that controls the fictive ICanSeeYou server:
First, with the . command (dot) a set of shell functions, used by almost all shell scripts in /etc/rc.d/init.d, is loaded. Then a case command is issued, which defines 4 different ways the script can execute. An example might be ICanSeeYou start. The decision of which case to apply is made by reading the (first) argument to the script, with the expression $1.
When no compliant input is given, the default case, marked with an asterisk, is applied, upon which the script gives an error message. The case list is ended with the esac statement. In the start case the server program is started as a daemon, and a process ID and lock are assigned. In the stop case, the server process is traced down and stopped, and the lock and the PID are removed. Options, such as the daemon option, and functions like killproc, are defined in the /etc/rc.d/init.d/functions file. This setup is specific to the distribution used in this example. The initscripts on your system might use other functions, defined in other files, or none at all.
Upon success, the script returns an exit code of zero to its parent.
This script is a fine example of using functions, which make the script easier to read and the work done faster. Note that they use sh instead of bash, to make them useful on a wider range of systems. On a Linux system, calling bash as sh results in the shell running in POSIX-compliant mode.
The bash man pages contain more information about combining commands, for- and while-loops and regular expressions, as well as examples. A comprehensible Bash course for system administrators and power users, with exercises, from the same author as this Introduction to Linux guide, is at http://tille.xalasys.com/training/bash/. Detailed description of Bash features and applications is in the reference guide Advanced Bash Scripting.
The average user may not care too much about his login settings, but Linux offers a wide variety of flashy window and desktop managers for use under X, the graphical environment. The use and configuration of window managers and desktops is straightforward and may even resemble the standard MS Windows, MacIntosh or UNIX CDE environment, although many Linux users prefer flashier desktops and fancier window managers. We won't discuss the user specific configuration here. Just experiment and read the documentation using the built-in Help functions these managers provide and you will get along fine.
We will, however, take a closer look at the underlying system.
The X Window System is a network-transparent window system which runs on a wide range of computing and graphics machines. X Window System servers run on computers with bitmap displays. The X server distributes user input to and accepts output requests from several client programs through a variety of different interprocess communication channels. Although the most common case is for the client programs to be running on the same machine as the server, clients can be run transparently from other machines (including machines with different architectures and operating systems) as well. We will learn how to do this in Chapter 10 on networking and remote applications.
X supports overlapping hierarchical sub-windows and text and graphics operations, on both monochrome and color displays. The number of X client programs that use the X server is quite large. Some of the programs provided in the core X Consortium distribution include:
We refer again to the man pages of these commands for detailed information. More explanations on available functions can be found in the Xlib - C language X Interface manual that comes with your X distribution, the X Window System Protocol specification, and the various manuals and documentation of X toolkits. The /usr/share/doc directory contains references to these documents and many others.
Many other utilities, window managers, games, toolkits and gadgets are included as user-contributed software in the X Consortium distribution, or are available using anonymous FTP on the Internet. Good places to start are http://www.x.org and http://www.xfree.org.
Furthermore, all your graphical applications, such as your browser, your E-mail program, your image viewing programs, sound playing tools and so on, are all clients to your X server. Note that in normal operation, that is in graphical mode, X clients and the X server on Linux run on the same machine.
From the user's perspective, every X server has a display name in the form of:
This information is used by the application to determine how it should connect to the X server and which screen it should use by default (on displays with multiple monitors):
On POSIX systems, the default display name is stored in your DISPLAY environment variable. This variable is set automatically by the xterm terminal emulator. However, when you log into another machine on a network, you might need to set DISPLAY by hand to point to your display, see Section 10.3.3.
More information can be found in the X man pages.
The layout of windows on the screen is controlled by special programs called window managers. Although many window managers will honor geometry specifications as given, others may choose to ignore them (requiring the user to explicitly draw the window's region on the screen with the pointer, for example).
Since window managers are regular (albeit complex) client programs, a variety of different user interfaces can be built. The X Consortium distribution comes with a window manager named twm, but most users prefer something more fancy when system resources permit. Sawfish and Enlightenment are popular examples which allow each user to have a desktop according to mood and style.
A desktop manager makes use of one window manager or another for arranging your graphical desktop in a convenient way, with menubars, drop-down menus, informative messages, a clock, a program manager, a file manager and so on. Among the most popular desktop managers are Gnome and KDE, which both run on almost any Linux distribution and many other UNIX systems.
The X distribution that used to come with Linux, XFree86, uses the configuration file XF86Config for its initial setup. This file configures your video card and is searched for in a number of locations, although it is usually in /etc/X11.
If you see that the file /etc/X11/XF86Config is present on your system, a full description can be found in the Info or man pages about XF86Config.
Because of licensing issues with XFree86, newer systems usually come with the X.Org distribution of the X server and tools. The main configuration file here is xorg.conf, usually also in /etc/X11. The file consists of a number of sections that may occur in any order. The sections contain information about your monitor, your video adaptor, the screen configuration, your keyboard etcetera. As a user, you needn't worry too much about what is in this file, since everything is normally determined at the time the system is installed.
Should you need to change graphical server settings, however, you can run the configuration tools or edit the configuration files that set up the infrastructure to use the XFree86 server. See the man pages for more information; your distribution might have its own tools. Since misconfiguration may result in unreadable garbage in graphical mode, you may want to make a backup copy of the configuration file before attempting to change it, just to be on the safe side.
Setting the keyboard layout is done using the loadkeys command for text consoles. Use your local X configuration tool or edit the Keyboard section in XF86Config manually to configure the layout for graphical mode. The XkbdLayout is the one you want to set:
This is the default. Change it to your local settings by replacing the quoted value with any of the names listed in the subdirectories of your keymaps directory. If you can't find the keymaps, try displaying their location on your system issuing the command
It is possible to combine layout settings, like in this example:
Make a backup of the /etc/X11/XF86Config file before editing it! You will need to use the root account to do this.
Log out and reconnect in order to reload X settings.
The Gnome Keyboard Applet enables real-time switching between layouts; no special pemissions are needed for using this program. KDE has a similar tool for switching between keyboard layouts.
Use the setfont tool to load fonts in text mode. Most systems come with a standard inputrc file which enables combining of characters, such as the French "é" (meta characters). The system admin should then add the line
to the /etc/bashrc file.
Setting time information is usually done at installation time. After that, it can be kept up to date using an NTP (Network Time Protocol) client. Most Linux systems run ntpd by default:
See your system manual and the documentation that comes with the NTP package. Most desktop managers include tools to set the system time, providing that you have access to the system administrator's account.
For setting the time zone correct, you can use tzconfig or timezone commands. Timezone information is usually set during the installation of your machine. Many systems have distribution-specific tools to configure it, see your system documentation.
If you'd rather get your messages from the system in Dutch or French, you may want to set the LANG and LANGUAGE environment variables, thus enabling locale support for the desired language and eventually the fonts related to character conventions in that language.
With most graphical login systems, such as gdm or kdm, you have the possibility to configure these language settings before logging in.
Note that on most systems, the default tends to be en_US.UTF-8 these days. This is not a problem, because systems where this is the default, will also come with all the programs supporting this encoding. Thus, vi can edit all the files on your system, cat won't behave strange and so on.
Trouble starts when you connect to an older system not supporting this font encoding, or when you open a UTF-8 encoded file on a system supporting only 1-byte character fonts. The recode utility might come in handy to convert files from one character set to another. Read the man pages for an overview of features and usage. Another solution might be to temporarily work with another encoding definition, by setting the LANG environment variable:
The list of HOWTOs contains references to Bangla, Belarusian, Chinese, Esperanto, Finnish, Francophone, Hebrew, Hellenic, Latvian, Polish, Portugese, Serbian, Slovak, Slovenian, Spanish, Thai and Turkish localization instructions.
Most people are surprised to see that they have a running, usable computer after installing Linux; most distributions contain ample support for video and network cards, monitors and other external devices, so there is usually no need to install extra drivers. Also common tools such as office suites, web browsers, E-mail clients and such are included in the main distributions. Even so, an initial installation might not meet your requirements.
If you just can't find what you need, maybe it is not installed on your system. It may also be that you have the required software, but it does not do what it is supposed to do. Remember that Linux moves fast, and software improves on a daily basis. Don't waste your time troubleshooting problems that might already be resolved.
You can update your system or add packages to it at any time you want. Most software comes in packages. Extra software may be found on your installation CDs or on the Internet. The website of your Linux distribution is a good place to start looking for additional software and contains instructions about how to install it on your type of Linux, see Appendix A. Always read the documentation that comes with new software, and any installation guidelines the package might contain. All software comes with a README file, which you are very strongly advised to read.
RPM, the RedHat Package Manager, is a powerful package manager that you can use to install, update and remove packages. It allows you to search for packages and keeps track of the files that come with each package. A system is built-in so that you can verify the authenticity of packages downloaded from the Internet. Advanced users can build their own packages with RPM.
An RPM package consists of an archive of files and meta-data used to install and erase the archive files. The meta-data includes helper scripts, file attributes, and descriptive information about the package. Packages come in two varieties: binary packages, used to encapsulate software to be installed, and source packages, containing the source code and recipe necessary to produce binary packages.
Many other distributions support RPM packages, among the popular ones RedHat Enterprise Linux, Mandriva (former Mandrake), Fedora Core and SuSE Linux. Apart from the advice for your distribution, you will want to read man rpm.
Most packages are simply installed with the upgrade option, -U, whether the package is already installed or not. The RPM package contains a complete version of the program, which overwrites existing versions or installs as a new package. The typical usage is as follows:
rpm -Uvh /path/to/rpm-package(s)
The -v option generates more verbose output, and -h makes rpm print a progress bar:
New kernel packages, however, are installed with the install option -i, which does not overwrite existing version(s) of the package. That way, you will still be able to boot your system with the old kernel if the new one does not work.
You can also use rpm to check whether a package is installed on your system:
Or you can find out which package contains a certain file or executable:
Note that you need not have access to administrative privileges in order to use rpm to query the RPM database. You only need to be root when adding, modifying or deleting packages.
Below is one last example, demonstrating how to uninstall a package using rpm:
Note that uninstalling is not that verbose by default, it is normal that you don't see much happening. When in doubt, use rpm -qa again to verify that the package has been removed.
RPM can do much more than the couple of basic functions we discussed in this introduction; the RPM HOWTO contains further references.
This package format is the default on Debian GNU/Linux, where dselect, and, nowadays more common, aptitude, is the standard tool for managing the packages. It is used to select packages that you want to install or upgrade, but it will also run during the installation of a Debian system and help you to define the access method to use, to list available packages and to configure packages.
The Debian web site contains all information you need, including a "dselect Documentation for Beginners".
According to the latest news, the Debian package format is becoming more and more popular. At the time of this writing, 5 of the top-10 distributions use it. Also apt-get (see Section 22.214.171.124 is becoming extremely popular, also on non-DEB systems.
The largest part of Linux programs is Free/Open Source, so source packages are available for these programs. Source files are needed for compiling your own program version. Sources for a program can be downloaded from its web site, often as a compressed tarball (program-version.tar.gz or similar). For RPM-based distributions, the source is often provided in the program-version.src.rpm. Debian, and most distributions based on it, provide themselves the adapted source which can be obtained using apt-get source.
Specific requirements, dependencies and installation instructions are provided in the README file. You will probably need a C compiler, gcc. This GNU C compiler is included in most Linux systems and is ported to many other platforms.
The first thing you do after installing a new system is applying updates; this applies to all operating systems and Linux is not different.
The updates for most Linux systems can usually be found on a nearby site mirroring your distribution. Lists of sites offering this service can be found at your distribution's web site, see Appendix A.
Updates should be applied regularly, daily if possible - but every couple of weeks would be a reasonable start. You really should try to have the most recent version of your distribution, since Linux changes constantly. As we said before, new features, improvements and bug fixes are supplied at a steady rhythm, and sometimes important security problems are addressed.
The good news is that most Linux distributions provide tools so that you don't have to upgrade tens of packages daily by hand. The following sections give an overview of "package manager managers." There is much more to this subject, even regular updates of source packages is manageable automatically; we only list the most commonly known systems. Always refer to the documentation for your specific distribution for advised procedures.
The Advanced Package Tool is a management system for software packages. The command line tool for handling packages is apt-get, which comes with an excellent man page describing how to install and update packages and how to upgrade singular packages or your entire distribution. APT has its roots in the Debian GNU/Linux distribution, where it is the default manager for the Debian packages. APT has been ported to work with RPM packages as well. The main advantage of APT is that it is free and flexible to use. It will allow you to set up systems similar to the distribution specific (and in some cases commercial) ones listed in the next sections.
Generally, when first using apt-get, you will need to get an index of the available packages. This is done using the command
After that, you can use apt-get to upgrade your system:
Do this often, it's an easy way to keep your system up-to-date and thus safe.
Apart from this general usage, apt-get is also very fast for installing individual packages. This is how it works:
Note the -c option to the su command, which indicates to the root shell to only execute this command, and then return to the user's environment. This way, you cannot forget to quit the root account.
If there are any dependencies on other packages, apt-get will download and install these supporting packages.
More information can be found in the APT HOWTO.
Update Agent, which originally only supported RedHat RPM packages, is now ported to a wider set of software, including non-RedHat repositories. This tool provides a complete system for updating the RPM packages on a RedHat or Fedora Core system. On the command line, type up2date to update your system. On the desktop, by default a small icon is activated, telleng you whether or not there are updates available for your system.
Yellowdog's Updater Modified (yum) is another tool that recently became more popular. It is an interactive but automated update program for installing, updating or removing RPM packages on a system. It is the tool of choice on Fedora systems.
On SuSE Linux, everything is done with YaST, Yet another Setup Tool, which supports a wide variety of system administration tasks, among which updating RPM packages. Starting from SuSE Linux 7.1 you can also upgrade using a web interface and YOU, Yast Online Update.
Mandrake Linux and Mandriva provide so-called URPMI tools, a set of wrapper programs that make installing new software easier for the user. These tools combine with RPMDrake and MandrakeUpdate to provide everything needed for smooth install and uninstall of software packages. MandrakeOnline offers an extended range of services and can automatically notify administrators when updates are available for your particular Mandrake system. See man urpmi, among others, for more info.
Also the KDE and Gnome desktop suites have their own (graphical) versions of package managers.
Most Linux installations are fine if you periodically upgrade your distribution. The upgrade procedure will install a new kernel when needed and make all necessary changes to your system. You should only compile or install a new kernel manually if you need kernel features that are not supported by the default kernel included in your Linux distribution.
Whether compiling your own optimized kernel or using a pre-compiled kernel package, install it in co-existence with the old kernel until you are sure that everything works according to plan.
Then create a dual boot system that will allow you to choose which kernel to boot by updating your boot loader configuration file grub.conf. This is a simple example:
After the new kernel has proven to work, you may remove the lines for the old one from the GRUB config file, although it is best to wait a couple of days just to be sure.
This is basically done in the same way as installing packages manually, except that you have to append the file system of the CD to your machine's file system to make it accessible. On most systems, this will be done automatically upon insertion of a CD in the drive because the automount daemon is started up at boot time. If your CD is not made available automatically, issue the mount command in a terminal window. Depending on your actual system configuration, a line similar to this one will usually do the trick:
mount /dev/cdrom /mnt/cdrom
On some systems, only root can mount removable media; this depends on the configuration.
For automation purposes, the CD drive usually has an entry in /etc/fstab, which lists the file systems and their mount points, that make up your file system tree. This is such a line:
This indicates that the system will understand the command mount /mnt/cdrom. The noauto option means that on this system, CDs are not mounted at boot time.
You may even try to right click on the CD icon on your desktop to mount the CD if your file manager doesn't do it for you. You can check whether it worked issuing the mount command with no arguments:
After mounting the CD, you can change directories, usually to the mount point /mnt/cdrom, where you can access the content of the CD-ROM. Use the same commands for dealing with files and directories as you would use for files on the hard disk.
In order to get the CD out of the drive after you've finished using it, the file system on the CD should be unused. Even being in one of the subdirectories of the mount point, /mnt/cdrom in our example, will be considered as "using the file system", so you should get out of there. Do this for instance by typing cd with no arguments, which will put you back in your home directory. After that, you can either use the command
When everything has its place, that means already half the work is done.
While keeping order is important, it is equally important to feel at home in your environment, whether text or graphical. The text environment is controlled through the shell setup files. The graphical environment is primarily dependent on the X server configuration, on which a number of other applications are built, such as window and desktop managers and graphical applications, each with their own config files. You should read the system and program specific documentation to find out about how to configure them.
Regional settings such as keyboard setup, installing appropriate fonts and language support are best done at installation time.
Software is managed either automatically or manually using a package system.
Printing from within an application is very easy, selecting theoption from the menu.
From the command line, use the lp or lpr command.
These commands can read from a pipe, so you can print the output of commands using
command | lp
There are many options available to tune the page layout, the number of copies, the printer that you want to print to if you have more than one available, paper size, one-side or double-sided printing if your printer supports this feature, margins and so on. Read the man pages for a complete overview.
Once the file is accepted in the print queue, an identification number for the print job is assigned:
To view (query) the print queue, use the lpq or lpstat command. When entered without arguments, it displays the contents of the default print queue.
Which is the default printer on a system that has access to multiple printers?
What is the status of my printer(s)?
If you don't like what you see from the status commands, use lprm or cancel to delete jobs.
In the graphical environment, you may see a popup window telling you that the job has been canceled.
In larger environments, lpc may be used to control multiple printers. See the Info or man pages on each command.
There are many GUI print tools used as a front-end to lp, and most graphical applications have a print function that uses lp. See the built-in Help functions and program specific documentation for more.
If we want to get something sensible out of the printer, files should be formatted first. Apart from an abundance of formatting software, Linux comes with the basic UNIX formatting tools and languages.
Modern Linux systems support direct printing, without any formatting by the user, of a range of file types: text, PDF, PostScript and several image formats like PNG, JPEG, BMP and GIF.
For those file formats that do need formatting, Linux comes with a lot of formatting tools, such as the pdf2ps, fax2ps and a2ps commands, that convert other formats to PostScript.
Apart from these command line tools there are a lot of graphical word processing programs. Several complete office suites are available, many are free. These do the formatting automatically upon submission of a print job. Just to name a few: OpenOffice, KOffice, AbiWord, WordPerfect, etc.
The following are common languages in a printing context:
Anything that you can send to the printer can normally be sent to the screen as well. Depending on the file format, you can use one of these commands:
Until a couple of years ago, the choice for Linux users was simple: everyone ran the same old LPD from BSD's Net-2 code. Then LPRng became more popular, but nowadays most modern Linux distributions use CUPS, the Common UNIX Printing System. CUPS is an implementation of the Internet Printing Protocol (IPP), an HTTP-like RFC standard replacement protocol for the venerable (and clunky) LPD protocol. CUPS is distributed under the GNU Public License. CUPS is also the default print system on MacOS X.
Most distributions come with a GUI for configuring networked and local (parallel port or USB) printers. They let you choose the printer type from a list and allow easy testing. You don't have to bother about syntax and location of configuration files. Check your system documentation before you attempt installing your printer.
CUPS can also be configured using a web interface that runs on port 631 on your computer. To check if this feature is enabled, try browsing to localhost:631/help.
As more and more printer vendors make drivers for CUPS available, CUPS will allow easy connection with almost any printer that you can plug into a serial, parallel, or USB port, plus any printer on the network. CUPS will ensure a uniform presentation to you and your applications of all different types of printers.
Printers that only come with a Win9x driver could be problematic if they have no other support. Check with the hardware compatibility HOWTO when in doubt.
In the past, your best choice would have been a printer with native PostScript support in the firmware, since nearly all UNIX or Linux software producing printable output, produces it in PostScript, the publishing industry's printer control language of choice. PostScript printers are usually a bit more expensive, but it is a device-independent, open programming language and you're always 100% sure that they will work. These days, however, the importance of this rule of thumb is dwindling.
In this section, we will discuss what you can do as a user when something goes wrong. We won't discuss any problems that have to do with the daemon-part of the printing service, as that is a task for system administrators.
If you print the wrong file, the job may be canceled using the command lprm jobID, where jobID is in the form printername-printjobnumber (get it from information displayed by lpq or lpstat). This will work when other jobs are waiting to be printed in this printer's queue. However, you have to be really quick if you are the only one using this printer, since jobs are usually spooled and send to the printer in only seconds. Once they arrive on the printer, it is too late to remove jobs using Linux tools.
What you can try in those cases, or in cases where the wrong print driver is configured and only rubbish comes out of the printer, is power off the printer. However, that might not be the best course of action, as you might cause paper jams and other irregularities.
Use the lpq command and see if you can spot your job:
Lots of printers have web interfaces these days, which can display status information by typing the printer's IP address in your web browser:
If your job ID is not there and not on the printer, contact your system administrator. If your job ID is listed in the output, check that the printer is currently printing. If so, just wait, your job will get done in due time.
If the printer is not printing, check that it has paper, check the physical connections to both electricity and data network. If that's okay, the printer may need restarting. Ask your system admin for advice.
In the case of a network printer, try printing from another host. If the printer is reachable from your own host (see Chapter 10 for the ping utility), you may try to put the formatted file on it, like file.ps in case of a PostScript printer, using an FTP client. If that works, your print system is misconfigured. If it doesn't work, maybe the printer doesn't understand the format you are feeding it.
The GNU/Linux Printing site contains more tips and tricks.
The Linux print service comes with a set of printing tools based on the standard UNIX LPD tools, whether it be the SystemV or BSD implementation. Below is a list of print-related commands.
Configuring and testing printers involves being in the possession of one, and having access to the root account. If so, you may try:
The following exercises can be done without printer or root access.
Although Linux is one of the safest operating systems in existence, and even if it is designed to keep on going, data can get lost. Data loss is most often the consequence of user errors, but occasionally a system fault, such as a power failure, is the cause, so it's always a good idea to keep an extra copy of sensitive and/or important data.
In most cases, we will first collect all the data to back up in a single archive file, which we will compress later on. The process of archiving involves concatenating all listed files and taking out unnecessary blanks. In Linux, this is commonly done with the tar command. tar was originally designed to archive data on tapes, but it can also make archives, known as tarballs.
tar has many options, the most important ones are cited below:
It is common to leave out the dash-prefix with tar options, as you can see from the examples below.
In the example below, an archive is created and unpacked.
This example also illustrates the difference between a tarred directory and a bunch of tarred files. It is advisable to only compress directories, so files don't get spread all over when unpacking the tarball (which may be on another system, where you may not know which files were already there and which are the ones from the archive).
When a tape drive is connected to your machine and configured by your system administrator, the file names ending in .tar are replaced with the tape device name, for example:
tar cvf /dev/tape mail/
The directory mail and all the files it contains are compressed into a file that is written on the tape immediately. A content listing is displayed because we used the verbose option.
The tar tool supports the creation of incremental backups, using the -N option. With this option, you can specify a date, and tar will check modification time of all specified files against this date. If files are changed more recent than date, they will be included in the backup. The example below uses the timestamp on a previous archive as the date value. First, the initial archive is created and the timestamp on the initial backup file is shown. Then a new file is created, upon which we take a new backup, containing only this new file:
Standard errors are redirected to /dev/null. If you don't do this, tar will print a message for each unchanged file, telling you it won't be dumped.
This way of working has the disadvantage that it looks at timestamps on files. Say that you download an archive into the directory containing your backups, and the archive contains files that have been created two years ago. When checking the timestamps of those files against the timestamp on the initial archive, the new files will actually seem old to tar, and will not be included in an incremental backup made using the -N option.
A better choice would be the -g option, which will create a list of files to backup. When making incremental backups, files are checked against this list. This is how it works:
The next day, user jimmy works on file3 a bit more, and creates file4. At the end of the day, he makes a new backup:
These are some very simple examples, but you could also use this kind of command in a cronjob (see Section 4.4.4), which specifies for instance a snapshot file for the weekly backup and one for the daily backup. Snapshot files should be replaced when taking full backups, in that case.
More information can be found in the tar documentation.
Data, including tarballs, can be compressed using zip tools. The gzip command will add the suffix .gz to the file name and remove the original file.
Uncompress gzipped files with the -d option.
bzip2 works in a similar way, but uses an improved compression algorithm, thus creating smaller files. See the bzip2 info pages for more.
Linux software packages are often distributed in a gzipped tarball. The sensible thing to do after unpacking that kind of archives is find the README and read it. It will generally contain guidelines to installing the package.
The GNU tar command is aware of gzipped files. Use the command
tar zxvf file.tar.gz
for unzipping and untarring .tar.gz or .tgz files. Use
tar jxvf file.tar.bz2
for unpacking tar archives that were compressed with bzip2.
The GNU project provides us with the jar tool for creating Java archives. It is a Java application that combines multiple files into a single JAR archive file. While also being a general purpose archiving and compression tool, based on ZIP and the ZLIB compression format, jar was mainly designed to facilitate the packing of Java code, applets and/or applications in a single file. When combined in a single archive, the components of a Java application, can be downloaded much faster.
Unlike tar, jar compresses by default, independent from other tools - because it is basically the Java version of zip. In addition, it allows individual entries in an archive to be signed by the author, so that origins can be authenticated.
The syntax is almost identical as for the tar command, we refer to info jar for specific differences.
Saving copies of your data on another host is a simple but accurate way of making backups. See Chapter 10, Communications, for more information on scp, ftp and many more.
In the next section we'll discuss local backup devices.
On most Linux systems, users have access to the floppy disk device. The name of the device may vary depending on the size and number of floppy drives, contact your system admin if you are unsure. On some systems, there will likely be a link /dev/floppy pointing to the right device, probably /dev/fd0 (the auto-detecting floppy device) or /dev/fd0H1440 (set for 1,44MB floppies).
fdformat is the low-level floppy disk formatting tool. It has the device name of the floppy disk as an option. fdformat will display an error when the floppy is write-protected.
The mformat command (from the mtools package) is used to create DOS-compatible floppies which can then be accessed using the mcopy, mdir and other m-commands.
Graphical tools are also available.
After the floppy is formatted, it can be mounted into the file system and accessed as a normal, be it small, directory, usually via the /mnt/floppy entry.
Should you need it, install the mkbootdisk utility, which makes a floppy from which the current system can boot.
The dd command can be used to put data on a disk, or get it off again, depending on the given input and output devices. An example:
Note that the dumping is done on an unmounted device. Floppies created using this method will not be mountable in the file system, but it is of course the way to go for creating boot or rescue disks. For more information on the possibilities of dd, read the man pages.
This tool is part of the GNU coreutils package.
On some systems users are allowed to use the CD-writer device. Your data will need to be formatted first. Use the mkisofs command to do this in the directory containing the files you want to backup. Check with df that enough disk space is available, because a new file about the same size as the entire current directory will be created:
The -J and -r options are used to make the CD-ROM mountable on different systems, see the man pages for more. After that, the CD can be created using the cdrecord tool with appropriate options:
Depending on your CD-writer, you now have the time to smoke a cigarette and/or get a cup of coffee. Upon finishing the job, you will get a confirmation message:
There are some graphical tools available to make it easier on you. One of the popular ones is xcdroast, which is freely available from the X-CD-Roast web site and is included on most systems and in the GNU directory. Both the KDE and Gnome desktop managers have facilities to make your own CDs.
These devices are usually mounted into the file system. After the mount procedure, they are accessed as normal directories, so you can use the standard commands for manipulating files.
In the example below, images are copied from a USB camera to the hard disk:
If the camera is the only USB storage device that you ever connect to your system, this is safe. But keep in mind that USB devices are assigned entries in /dev as they are connected to the system. Thus, if you first connect a USB stick to your system, it will be on the /dev/sda entry, and if you connect your camera after that, it will be assigned to /dev/sdb - provided that you do not have any SCSI disks, which are also on /dev/sd*. On newer systems, since kernel 2.6, a hotplug system called HAL (Hardware Abstraction Layer) ensures that users don't have to deal with this burden. If you want to check where your device is, type dmesg after inserting it.
You can now copy the files:
Likewise, a jazz drive may be mounted on /mnt/jazz.
Appropriate lines should be added in /etc/modules.conf and /etc/fstab to make this work. Refer to specific hardware HOWTOs for more information. On systems with a 2.6.x kernel or higher, you may also want to check the man pages for modprobe and modprobe.conf.
This is done using tar (see above). The mt tool is used for controlling the magnetic tape device, like /dev/st0. Entire books have been written about tape backup, therefore, refer to our reading-list in Appendix B for more information. Keep in mind that databases might need other backup procedures because of their architecture.
The appropriate backup commands are usually put in one of the cron directories in order to have them executed on a regular basis. In larger environments, the freely available Amanda backup suite or a commercial solution may be implemented to back up multiple machines. Working with tapes, however, is a system administration task beyond the scope of this document.
Most Linux distributions offer their own tools for making your life easy. A short overview:
The rsync program is a fast and flexible tool for remote backup. It is common on UNIX and UNIX-like systems, easy to configure and use in scripts. While the r in rsync stands for "remote", you do not need to take this all too literally. Your "remote" device might just as well be a USB storage device or another partition on your hard disk, you do not need to have two separated machines.
As discussed in Section 126.96.36.199, we will first have to mount the device. This is done as root:
Note that this guideline requires USB support to be installed on your system. See the USB Guide for help if this does not work. Check with dmesg that /dev/sda1 is indeed the device to mount.
Then you can start the actual backup, for instance of the /home/karl directory:
As usual, refer to the man pages for more.
Here's a list of the commands involving file backup:
Table 9-1. Backup commands
A protocol is, simply put, a set of rules for communication.
Linux supports many different networking protocols. We list only the most important:
The Transport Control Protocol and the Internet Protocol are the two most popular ways of communicating on the Internet. A lot of applications, such as your browser and E-mail program, are built on top of this protocol suite.
Very simply put, IP provides a solution for sending packets of information from one machine to another, while TCP ensures that the packets are arranged in streams, so that packets from different applications don't get mixed up, and that the packets are sent and received in the correct order.
The Internet was originally developed three decades ago for the United States Department of Defense (DoD), mainly for the purpose of interconnecting different-brand computers. Another reason for the development of TCP/IP was to provide a reliable data transport system over an unreliable network.
TCP/IP networking has been present in Linux since its beginnings. It has been implemented from scratch. It is one of the most robust, fast and reliable implementations and is one of the key factors of the success of Linux. Linux and networking are made for each other, in so much that not connecting your Linux system to the network may result in slow startup and other troubles. Even if you don't use any network connections to other computers, networking protocols are used for internal system and application communications. Linux expects to be networked.
A good starting point for learning more about TCP and IP is in the following documents:
Nobody expected the Internet to grow as fast as it does. IP proved to have quite some disadvantages when a really large number of computers is in a network, the most important being the availability of unique addresses to assign to each machine participating. Thus, IP version 6 was deviced to meet the needs of today's Internet.
Unfortunately, not all applications and services support IPv6, yet. A migration is currently being set in motion in many environments that can benefit from an upgrade to IPv6. For some applications, the old protocol is still used, for applications that have been reworked the new version is already active. So when checking your network configuration, sometimes it might be a bit confusing since all kinds of measures can be taken to hide one protocol from the other so as the two don't mix up connections.
More information can be found in the following documents:
The Linux kernel has built-in support for PPP (Point-to-Point-Protocol), SLIP (Serial Line IP) and PLIP (Parallel Line IP). PPP is the most popular way individual users access their ISP (Internet Service Provider), although in densely populated areas it is often being replaced by PPPOE, PPP over Ethernet, the protocol used in cable modem connections.
Most Linux distributions provide easy-to-use tools for setting up an Internet connection. The only thing you basically need is a username and password to connect to your Internet Service Provider (ISP), and a telephone number in the case of PPP. These data are entered in the graphical configuration tool, which will likely also allow for starting and stopping the connection to your provider.
The Linux kernel has built-in ISDN capabilities. Isdn4linux controls ISDN PC cards and can emulate a modem with the Hayes command set ("AT" commands). The possibilities range from simply using a terminal program to full connection to the Internet.
Check your system documentation.
Appletalk is the name of Apple's internetworking stack. It allows a peer-to-peer network model which provides basic functionality such as file and printer sharing. Each machine can simultaneously act as a client and a server, and the software and hardware necessary are included with every Apple computer.
Linux provides full AppleTalk networking. Netatalk is a kernel-level implementation of the AppleTalk Protocol Suite, originally for BSD-derived systems. It includes support for routing AppleTalk, serving UNIX and AFS file systems using AppleShare and serving UNIX printers and accessing AppleTalk printers.
For compatibility with MS Windows environments, the Samba suite, including support for the NMB and SMB protocols, can be installed on any UNIX-like system. The Server Message Block protocol (also called Session Message Block, NetBIOS or LanManager protocol) is used on MS Windows 3.11, NT, 95/98, 2K and XP to share disks and printers.
The basic functions of the Samba suite are: sharing Linux drives with Windows machines, accessing SMB shares from Linux machines, sharing Linux printers with Windows machines and sharing Windows printers with Linux machines.
Most Linux distributions provide a samba package, which does most of the server setup and starts up smbd, the Samba server, and nmbd, the netbios name server, at boot time by default. Samba can be configured graphically, via a web interface or via the command line and text configuration files. The daemons make a Linux machine appear as an MS Windows host in an MS Windows My Network Places/Network Neighbourhood window; a share from a Linux machine will be indistinguishable from a share on any other host in an MS Windows environment.
More information can be found at the following locations:
All the big, userfriendly Linux distributions come with various graphical tools, allowing for easy setup of the computer in a local network or for connecting it to an Internet Service Provider. These tools can be started up from the command line or from a menu:
Your system documentation provides plenty of advice and information about availability and use of tools.
Information you'll need to provide:
The graphical helper tools edit a specific set of network configuration files, using a couple of basic commands. The exact names of the configuration files and their location in the file system is largely dependent on your Linux distribution and version. However, a couple of network configuration files are common on all UNIX systems:
The distribution-specific scripts and graphical tools are front-ends to ip (or ifconfig and route on older systems) to display and configure the kernel's networking configuration.
The ip command is used for assigning IP addresses to interfaces, for setting up routes to the Internet and to other networks, for displaying TCP/IP configurations etcetera.
The following commands show IP address and routing information:
Things to note:
While ip is the most novel way to configure a Linux system, ifconfig is still very popular. Use it without option fo displaying network interface information:
Here, too, we note the most important aspects of the interface configuration:
Both ifconfig and ip display more detailed configuration information and a number of statistics about each interface and, maybe most important, whether it is "UP" and "RUNNING".
On your laptop which you usually connect to the company network using the onboard Ethernet connection, but which you are now to configure for dial-in at home or in a hotel, you might need to activate the PCMCIA card. This is done using the cardctl control utility. However, a good distribution should provide PCMCIA support in the network configuration tools, preventing users from having to execute PCMCIA commands manually.
Further discussion of network configuration is out of the scope of this document. Your primary source for extra information is the man pages for the services you want to set up. Additional reading:
On a Linux machine, the device name lo or the local loop is linked with the internal 127.0.0.1 address. The computer will have a hard time making your applications work if this device is not present; it is always there, even on computers which are not networked.
The first ethernet device, eth0 in the case of a standard network interface card, points to your local LAN IP address. Normal client machines only have one network interface card. Routers, connecting networks together, have one network device for each network they serve.
If you use a modem to connect to the Internet, your network device will probably be named ppp0. This is normally also the case for connections using a cable modem.
There are many more names, for instance for Virtual Private Network interfaces (VPNs), and multiple interfaces can be active simultaneously, so that the output of the ifconfig or ip commands might become quite extensive when no options are used. Even multiple interfaces of the same type can be active. In that case, they are numbered sequentially: the first will get the number 0, the second will get a suffix of 1, the third will get 2, and so on. This is the case on many application servers, on machines which have a failover configuration, on routers, firewalls and many more.
Apart from the ip command for displaying the network configuration, there's the common netstat command which has a lot of options and is generally useful on any UNIX system.
Routing information can be displayed with the -nr option to the netstat command:
This is a typical client machine in an IP network. It only has one network device, eth0. The lo interface is the local loop.
When this machine tries to contact a host that is on another network than its own, indicated by the line starting with 0.0.0.0, it will send the connection requests to the machine (router) with IP address 192.168.42.1, and it will use its primary interface, eth0, to do this.
Hosts that are on the same network, the line starting with 192.168.42.0, will also be contacted through the primary network interface, but no router is necessary, the data are just put on the network.
Machines can have much more complicated routing tables than this one, with lots of different "Destination-Gateway" pairs to connect to different networks. If you have the occasion to connect to an application server, for instance at work, it is most educating to check the routing information.
An impressive amount of tools is focused on network management and remote administration of Linux machines. Your local Linux software mirror will offer plenty of those. It would lead us too far to discuss them in this document, so please refer to the program-specific documentation.
We will only discuss some common UNIX/Linux text tools in this section.
To display information on hosts or domains, use the host command:
Similar information can be displayed using the dig command, which gives additional information about how records are stored in the name server.
To check if a host is alive, use ping. If your system is configured to send more than one packet, interrupt ping with the Ctrl+C key combination:
To check the route that packets follow to a network host, use the traceroute command:
Specific domain name information can be queried using the whois command, as is explained by many whois servers, like the one below:
For other domain names than .com, .net, .org and .edu, specify the whois server, such as this one for .be domains:
The Linux system is a great platform for offering networking services. In this section, we will try to give an overview of most common network servers and applications.
Offering a service to users can be approached in two ways. A daemon or service can run in standalone mode, or it can be dependent on another service to be activated.
Network services that are heavily and/or continuously used, usually run in the standalone mode: they are independent program daemons that are always running. They are most likely started up at system boot time, and they wait for requests on the specific connection points or ports for which they are set up to listen. When a request comes, it is processed, and the listening continues until the next request. A web server is a typical example: you want it to be available 24 hours a day, and if it is too busy it should create more listening instances to serve simultaneous users. Other examples are the large software archives such as Sourceforge or your Tucows mirror, which must handle thousands of FTP requests per day.
An example of a standalone network service on your home computer might be the named (name daemon), a caching name server. Standalone services have there own processes running, you can check any time using ps:
Most services on your home PC, such as the FTP service, don't have a running daemon, yet you can use them:
Let's see in the next section how this is arranged.
On your home PC, things are usually a bit calmer. You may have a small network, for instance, and you may have to transfer files from one PC to another from time to time, using FTP or Samba (for connectivity with MS Windows machines). In those cases, starting all the services which you only need occasionally and having them run all the time would be a waste of resources. So in smaller setups, you will find the necessary daemons dependent on a central program, that listen on all the ports of the services for which it is responsible.
This super-server, the Internet services daemon, is started up at system initialization time. There are two common implementations: inetd and xinetd (the extended Internet services daemon). One or the other is usually running on every Linux system:
The services for which the Internet daemon is responsible, are listed in its configuration file, /etc/inetd.conf, for inetd, and in the directory /etc/xinetd.d for xinetd. Commonly managed services include file share and print services, SSH, FTP, telnet, the Samba configuration daemon, talk and time services.
As soon as a connection request is received, the central server will start an instance of the required server. Thus, in the example below, when user bob starts an FTP session to the local host, an FTP daemon is running as long as the session is active:
Of course, the same happens when you open connections to remote hosts: either a daemon answers directly, or a remote (x)inetd starts the service you need and stops it when you quit.
Sendmail is the standard mail server program or Mail Transport Agent for UNIX platforms. It is robust, scalable, and when properly configured with appropriate hardware, handles thousands of users without blinking. More information about how to configure Sendmail is included with the sendmail and sendmail-cf packages, you may want to read the README and README.cf files in /usr/share/doc/sendmail. The man sendmail and man aliases are also useful.
Qmail is another mail server, gaining popularity because it claims to be more secure than Sendmail. While Sendmail is a monolithic program, Qmail consists of smaller interacting program parts that can be better secured. Postfix is another mail server which is gaining popularity.
These servers handle mailing lists, filtering, virus scanning and much more. Free and commercial scanners are available for use with Linux. Examples of mailing list software are Mailman, Listserv, Majordomo and EZmlm. See the web page of your favorite virus scanner for information on Linux client and server support. Amavis and Spamassassin are free implementations of a virus scanner and a spam scanner.
The most popular protocols to access mail remotely are POP3 and IMAP4. IMAP and POP both allow offline operation, remote access to new mail and they both rely on an SMTP server to send mail.
While POP is a simple protocol, easy to implement and supported by almost any mail client, IMAP is to be preferred because:
There are plenty of both text and graphical E-mail clients, we'll just name a few of the common ones. Pick your favorite.
The UNIX mail command has been around for years, even before networking existed. It is a simple interface to send messages and small files to other users, who can then save the message, redirect it, reply to it and such.
While it is not commonly used as a client anymore, the mail program is still useful, for example to mail the output of a command to somebody:
mail <future.employer@whereIwant2work.com> < cv.txt
The elm mail reader is a much needed improvement to mail, and so is pine (Pine Is Not ELM). The mutt mail reader is even more recent and offers features like threading.
For those users who prefer a graphical interface to their mail (and a tennis elbow or a mouse arm), there are hundreds of options. The most popular for new users are Mozilla Mail/Thunderbird, which has easy anti-spam configuring options, and the Ximian MS Exchange clone, Evolution.
There are also tens of web mail applications available, such as Squirrelmail, Yahoo! mail, gmail from Google and Hotmail.
An overview is available via the Linux Mail User HOWTO.
Most Linux distributions include fetchmail, a mail-retrieval and forwarding utility. It fetches mail from remote mail servers (POP, IMAP and some others) and forwards it to your local delivery system. You can then handle the retrieved mail using normal mail clients. It can be run in daemon mode to repeatedly poll one or more systems at a specified interval. Information and usage examples can be found in the Info pages; the directory /usr/share/doc/fetchmail-<version> contains a full list of features and a FAQ for beginners.
The procmail filter can be used for filtering incoming mail, to create mailing lists, to pre-process mail, to selectively forward mail and more. The accompanying formail program, among others, enables generation of auto-replies and splitting up mailboxes. Procmail has been around for years on UNIX and Linux machines and is a very robust system, designed to work even in the worst circumstances. More information may be found in the /usr/share/doc/procmail-<version> directory and in the man pages.
Apache is by far the most popular web server, used on more than half of all Internet web servers. Most Linux distributions include Apache. Apache's advantages include its modular design, SSL support, stability and speed. Given the appropriate hardware and configuration it can support the highest loads.
On Linux systems, the server configuration is usually done in the /etc/httpd directory. The most important configuration file is httpd.conf; it is rather self-explanatory. Should you need help, you can find it in the httpd man page or on the Apache website.
A number of web browsers, both free and commercial, exist for the Linux platform. Netscape Navigator as the only decent option has long been a thing of the past, as Mozilla/Firefox offers a competitive alternative running on many other operating systems, like MS Windows and MacOS X as well.
Amaya is the W3C browser. Opera is a commercial browser, compact and fast. Many desktop managers offer web browsing features in their file manager, like nautilus.
Among the popular text based browsers are lynx and links. You may need to define proxy servers in your shell, by setting the appropriate variables. Text browsers are fast and handy when no graphical environment is available, such as when used in scripts.
On a Linux system, an FTP server is typically run from xinetd, using the WU-ftpd server, although the FTP server may be configured as a stand-alone server on systems with heavy FTP traffic. See the exercises.
Other FTP servers include among others Ncftpd and Proftpd.
Most Linux distributions contain the anonftp package, which sets up an anonymous FTP server tree and accompanying configuration files.
Most Linux distributions include ncftp, an improved version of the common UNIX ftp command, which you may also know from the Windows command line. The ncftp program offers extra features such as a nicer and more comprehensible user interface, file name completion, append and resume functions, bookmarking, session management and more:
Excellent help with lot of examples can be found in the man pages. And again, a number of GUI applications are available.
Various clients and systems are available in each distribution, replacing the old-style IRC text-based chat. A short and incomplete list of the most popular programs:
Running a Usenet server involves a lot of expertise and fine-tuning, so refer to the INN homepage for more information.
There are a couple of interesting newsgroups in the comp.* hierarchy, which can be accessed using a variety of text and graphical clients. A lot of mail clients support newsgroup browsing as well, check your program or see your local Open Source software mirror for text clients such as tin, slrnn and mutt, or download Mozilla or one of a number of other graphical clients.
Deja.com keeps a searchable archive of all newsgroups, powered by Google. This is a very powerful instrument for getting help: chances are very high that somebody has encountered your problem, found a solution and posted it in one of the newsgroups.
All these applications need DNS services to match IP addresses to host names and vice versa. A DNS server does not know all the IP addresses in the world, but networks with other DNS servers which it can query to find an unknown address. Most UNIX systems can run named, which is part of the bind (Berkeley Internet Name Domain) package distributed by the Internet Software Consortium. It can run as a stand-alone caching nameserver, which is often done on Linux systems in order to speed up network access.
Your main client configuration file is /etc/resolv.conf, which determines the order in which Domain Name Servers are contacted:
DHCP is the Dynamic Host Configuration Protocol, which is gradually replacing good old bootp in larger environments. It is used to control vital networking parameters such as IP addresses and name servers of hosts. DHCP is backward compatible with bootp. For configuring the server, you will need to read the HOWTO.
DHCP client machines will usually be configured using a GUI that configures the dhcpcd, the DHCP client daemon. Check your system documentation if you need to configure your machine as a DHCP client.
Traditionally, users are authenticated locally, using the information stored in /etc/passwd and /etc/shadow on each system. But even when using a network service for authenticating, the local files will always be present to configure system accounts for administrative use, such as the root account, the daemon accounts and often accounts for additional programs and purposes.
These files are often the first candidates for being examined by hackers, so make sure the permissions and ownerships are strictly set as should be:
Linux can use PAM, the Pluggable Authentication Module, a flexible method of UNIX authentication. Advantages of PAM:
The directory /etc/pam.d contains the PAM configuration files (used to be /etc/pam.conf). Each application or service has its own file. Each line in the file has four elements:
Shadow password files are automatically detected by PAM.
More information can be found in the pam man pages or at the Linux-PAM project homepage.
The Lightweight Directory Access Protocol is a client-server system for accessing global or local directory services over a network. On Linux, the OpenLDAP implementation is used. It includes slapd, a stand-alone server; slurpd, a stand-alone LDAP replication server; libraries implementing the LDAP protocol and a series of utilities, tools and sample clients.
The main benefit of using LDAP is the consolidation of certain types of information within your organization. For example, all of the different lists of users within your organization can be merged into one LDAP directory. This directory can be queried by any LDAP-enabled applications that need this information. It can also be accessed by users who need directory information.
Other LDAP or X.500 Lite benefits include its ease of implementation (compared to X.500) and its well-defined Application Programming Interface (API), which means that the number of LDAP-enabled applications and LDAP gateways should increase in the future.
On the negative side, if you want to use LDAP, you will need LDAP-enabled applications or the ability to use LDAP gateways. While LDAP usage should only increase, currently there are not very many LDAP-enabled applications available for Linux. Also, while LDAP does support some access control, it does not possess as many security features as X.500.
Since LDAP is an open and configurable protocol, it can be used to store almost any type of information relating to a particular organizational structure. Common examples are mail address lookups, central authentication in combination with PAM, telephone directories and machine configuration databases.
See your system specific information and the man pages for related commands such as ldapmodify and ldapsearch for details. More information can be found in the LDAP Linux HOWTO, which discusses installation, configuration, running and maintenance of an LDAP server on Linux. The LDAP Implementation HOWTO describes the technical aspects of storing application data in an LDAP server. The author of this Introduction to Linux document also wrote an LDAP Operations HOWTO, describing the basics everyone should know about when dealing with LDAP management, operations and integration of services.
There are a couple of different ways to execute commands or run programs on a remote machine and have the output, be it text or graphics, sent to your workstation. The connections can be secure or insecure. While it is of course advised to use secure connections instead of transporting your password over the network unencrypted, we will discuss some practical applications of the older (unsafe) mechanisms, as they are still useful in a modern networked environment, such as for troubleshooting or running exotic programs.
The rlogin and rsh commands for remote login and remote execution of commands are inherited from UNIX. While seldom used because they are blatantly insecure, they still come with almost every Linux distribution for backward compatibility with UNIX programs.
Telnet, on the other hand, is still commonly used, often by system and network administrators. Telnet is one of the most powerful tools for remote access to files and remote administration, allowing connections from anywhere on the Internet. Combined with an X server, remote graphical applications can be displayed locally. There is no difference between working on the local machine and using the remote machine.
Because the entire connection is unencrypted, allowing telnet connections involves taking high security risks. For normal remote execution of programs, Secure SHell or ssh is advised. We will discuss the secure method later in this section.
However, telnet is still used in many cases. Below are some examples in which a mail server and a web server are tested for replies:
Checking that a mail server works:
Checking that a web server answers to basic requests:
This is perfectly safe, because you never have to give a username and/or password for getting the data you want, so nobody can snoop that important information off the cable.
As we already explained in Chapter 7 (see Section 7.3.3), the X Window system comes with an X server which serves graphics to clients that need a display.
It is important to realize the distinction between the X server and the X client application(s). The X server controls the display directly and is responsible for all input and output via keyboard, mouse and display. The X client, on the other hand, does not access the input and output devices directly. It communicates with the X server which handles input and output. It is the X client which does the real work, like computing values, running applications and so forth. The X server only opens windows to handle input and output for the specified client.
In normal operation (runlevel five, graphical mode), every Linux workstation is an X server to itself, even if it only runs client applications. All the applications you are running (for example, Gimp, a terminal window, your browser, your office application, your CD playing tool, and so on) are clients to your X server. Server and client are running on the same machine in this case.
This client/server nature of the X system makes it an ideal environment for remote execution of applications and programs. Because the process is actually being executed on the remote machine, very little CPU power is needed on the local host. Such machines, purely acting as servers for X, are called X terminals and were once very popular. More information may be found in the Remote X applications mini-HOWTO.
If you would want to use telnet to display graphical applications running on a remote machine, you first need to give the remote machine access to your display (to your X server!) using the xhost command, by typing a command similar to the one below in a terminal window on your local machine:
After that, connect to the remote host and tell it to display graphics on the local machine by setting the environment variable DISPLAY:
After completing this step, any application started in this terminal window will be displayed on your local desktop, using remote resources for computing, but your local graphical resources (your X server) for displaying the application.
This procedure assumes that you have some sort of X server (XFree86, Exceed, Cygwin) already set up on the machine where you want to display images. The architecture and operating system of the client machine are not important as long as they allow you to run an X server on it.
Mind that displaying a terminal window from the remote machine is also considered to be a display of an image.
Most UNIX and Linux systems now run Secure SHell in order to leave out the security risks that came with telnet. Most Linux systems will run a version of OpenSSH, an Open Source implementation of the SSH protocol, providing secure encrypted communications between untrusted hosts over an untrusted network. In the standard setup X connections are automatically forwarded, but arbitrary TCP/IP ports may also be forwarded using a secure channel.
The ssh client connects and logs into the specified host name. The user must provide his identity to the remote machine as specified in the sshd_config file, which can usually be found in /etc/ssh. The configuration file is rather self-explanatory and by defaults enables most common features. Should you need help, you can find it in the sshd man pages.
When the user's identity has been accepted by the server, the server either executes the given command, or logs into the machine and gives the user a normal shell on the remote machine. All communication with the remote command or shell will be automatically encrypted.
The session terminates when the command or shell on the remote machine exits and all X11 and TCP/IP connections have been closed.
When connecting to a host for the first time, using any of the programs that are included in the SSH collection, you need to establish the authenticity of that host and acknowledge that you want to connect:
It is important that you type "yes", in three characters, not just "y". This edits your ~/.ssh/known_hosts file, see Section 10.3.4.3.
If you just want to check something on a remote machine and then get your prompt back on the local host, you can give the commands that you want to execute remotely as arguments to ssh:
If the X11Forwarding entry is set to yes and the user is using X applications, the DISPLAY environment variable is set, the connection to the X11 display is automatically forwarded to the remote side in such a way that any X11 programs started from the shell will go through the encrypted channel, and the connection to the real X server will be made from the local machine. The user should not manually set DISPLAY. Forwarding of X11 connections can be configured on the command line or in the sshd configuration file.
The value for DISPLAY set by ssh will point to the server machine, but with a display number greater than zero. This is normal, and happens because ssh creates a proxy X server on the server machine (that runs the X client application) for forwarding the connections over the encrypted channel.
This is all done automatically, so when you type in the name of a graphical application, it is displayed on your local machine and not on the remote host. We use xclock in the example, since it is a small program which is generally installed and ideal for testing:
SSH will also automatically set up Xauthority data on the server machine. For this purpose, it will generate a random authorization cookie, store it in Xauthority on the server, and verify that any forwarded connections carry this cookie and replace it by the real cookie when the connection is opened. The real authentication cookie is never sent to the server machine (and no cookies are sent in the plain).
Forwarding of arbitrary TCP/IP connections over the secure channel can be specified either on the command line or in a configuration file.
The ssh client/server system automatically maintains and checks a database containing identifications for all hosts it has ever been used with. Host keys are stored in $HOME/.ssh/known_hosts in the user's home directory. Additionally, the file /etc/ssh/ssh_known_hosts is automatically checked for known hosts. Any new hosts are automatically added to the user's file. If a host's identification ever changes, ssh warns about this and disables password authentication to prevent a Trojan horse from getting the user's password. Another purpose of this mechanism is to prevent man-in-the-middle attacks which could otherwise be used to circumvent the encryption. In environments where high security is needed, sshd can even be configured to prevent logins to machines whose host keys have changed or are unknown.
The SSH suite provides scp as a secure alternative to the rcp command that used to be popular when only rsh existed. scp uses ssh for data transfer, uses the same authentication and provides the same security as ssh. Unlike rcp, scp will ask for passwords or passphrases if they are needed for authentication:
Any file name may contain a host and user specification to indicate that the file is to be copied to/from that host. Copies between two remote hosts are permitted. See the Info pages for more information.
If you would rather use an FTP-like interface, use sftp:
The ssh-keygen command generates, manages and converts authentication keys for ssh. It can create RSA keys for use by SSH protocol version 1 and RSA or DSA keys for use by SSH protocol version 2.
Normally each user wishing to use SSH with RSA or DSA authentication runs this once to create the authentication key in $HOME/.ssh/identity, id_dsa or id_rsa. Additionally, the system administrator may use this to generate host keys for the system.
Normally this program generates the key and asks for a file in which to store the private key. The public key is stored in a file with the same name but .pub appended. The program also asks for a passphrase. The passphrase may be empty to indicate no passphrase (host keys must have an empty passphrase), or it may be a string of arbitrary length.
There is no way to recover a lost passphrase. If the passphrase is lost or forgotten, a new key must be generated and copied to the corresponding public keys.
We will study SSH keys in the exercises. All information can be found in the man or Info pages.
VNC or Virtual Network Computing is in fact a remote display system which allows viewing a desktop environment not only on the local machine on which it is running, but from anywhere on the Internet and from a wide variety of machines and architectures, including MS Windows and several UNIX distributions. You could, for example, run MS Word on a Windows NT machine and display the output on your Linux desktop. VNC provides servers as well as clients, so the opposite also works and it may thus be used to display Linux programs on Windows clients. VNC is probably the easiest way to have X connections on a PC. The following features make VNC different from a normal X server or commercial implementations:
More information can be found in the VNC client man pages (man vncviewer) or on the VNC website.
In order to ease management of MS Windows hosts, recent Linux distributions support the Remote Desktop Protocol (RDP), which is implemented in the rdesktop client. The protocol is used in a number of MicroSoft products, including Windows NT Terminal Server, Windows 2000 Server, Windows XP and Windows 2003 Server.
Surprise your friends (or management) with the fullscreen mode, multiple types of keyboard layouts and single application mode, just like the real thing. The man rdesktop manual provides more information. The project's homepage is at http://www.rdesktop.org/.
Cygwin provides substantial UNIX functionality on MS Windows systems. Apart from providing UNIX command line tools and graphical applications, it can also be used to display a Linux desktop on an MS Windows machine, using remote X. From a Cygwin Bash shell, type the command
/usr/X11R6/bin/XWin.exe -query your_linux_machine_name_or_IP
The connection is by default denied. You need to change the X Display Manager (XDM) configuration and possibly the X Font Server (XFS) configuration to enable this type of connection, where you get a login screen on the remote machine. Depending on your desktop manager (Gnome, KDE, other), you might have to change some configurations there, too.
If you do not need to display the entire desktop, you can use SSH in Cygwin, just like explained in Section 10.3.3.2. without all the fuss of editing configuration files.
As soon as a computer is connected to the network, all kinds of abuse becomes possible, be it a UNIX-based or any other system. Admittedly, mountains of papers have been spilled on this subject and it would lead us too far to discuss the subject of security in detail. There are, however, a couple of fairly logical things even a novice user can do to obtain a very secure system, because most break-ins are the result of ignorant or careless users.
Maybe you are asking yourself if this all applies to you, using your computer at home or working at your office on a desktop in a fairly protected environment. The questions you should be asking yourself, however, are more on the lines of:
Presuming you don't, we will quickly list the steps you can take to secure your machine. Extended information can be found in the Linux Security HOWTO.
The goal is to run as few services as possible. If the number of ports that are open for the outside world are kept to a minimum, this is all the better to keep an overview. If services can't be turned off for the local network, try to at least disable them for outside connections.
A rule of thumb is that if you don't recognize a particular service, you probably won't need it anyway. Also keep in mind that some services are not really meant to be used over the Internet. Don't rely on what should be running, check which services are listening on what TCP ports using the netstat command:
Things to avoid:
Stop running services using the chkconfig command, the initscripts or by editing the (x)inetd configuration files.
Its ability to adapt quickly in an ever changing environment is what makes Linux thrive. But it also creates a possibility that security updates have been released even while you are installing a brand new version, so the first thing you should do (and this goes for about any OS you can think of) after installing is getting the updates as soon as possible. After that, update all the packages you use regularly.
Some updates may require new configuration files, and old files may be replaced. Check the documentation, and ensure that everything runs normal after updating.
Most Linux distributions provide mailing list services for security update announcements, and tools for applying updates to the system. General Linux only security issues are reported among others at Linuxsecurity.com.
Updating is an ongoing process, so it should be an almost daily habit.
In the previous section we already mentioned firewall capabilities in Linux. While firewall administration is one of the tasks of your network admin, you should know a couple of things about firewalls.
Firewall is a vague term that can mean anything that acts as a protective barrier between us and the outside world, generally the Internet. A firewall can be a dedicated system or a specific application that provides this functionality. Or it can be a combination of components, including various combinations of hardware and software. Firewalls are built from "rules" that are used to define what is allowed to enter and/or exit a given system or network.
After disabling unnecessary services, we now want to restrict accepted services as to allow only the minimum required connections. A fine example is working from home: only the specific connection between your office and your home should be allowed, connections from other machines on the Internet should be blocked.
The first line of defense is a packet filter, which can look inside IP packages and make decisions based on the content. Systems running the ipchains firewall are based on 2.2 kernels. Newer systems (2.4 kernel) use iptables, a next generation packet filter for Linux, and the Gnome Lokkit tool. This tool was only created to provide an easy interface for normal users. It sets up a basic firewall configuration for a desktop, a dial-up or cable modem connection, and that's about it. It should not be used in larger environments.
One of the most noteworthy enhancements in the newer kernels is the stateful inspection feature, which not only tells what is inside a packet, but also detects if a packet belongs or is related to a new or existing connection.
Development is ongoing, so it is best to check with each new version of a distribution which system is being used.
More information can be found at the netfilter/iptables project page.
TCP wrapping provides much the same results as the packet filters, but works differently. The wrapper actually accepts the connection attempt, then examines configuration files and decides whether to accept or reject the connection request. It controls connections at the application level rather than at the network level.
TCP wrappers are typically used with xinetd to provide host name and IP-address-based access control. In addition, these tools include logging and utilization management capabilities that are easy to configure.
The advantages of TCP wrappers are that the connecting client is unaware that wrappers are used, and that they operate separately from the applications they protect.
The host based access is controlled in the hosts.allow and hosts.deny files. More information can be found in the TCP wrapper documentation files in /usr/share/doc/tcp_wrappers-<version>/ and in the man pages for the host based access control files, which contain examples.
Proxies can perform various duties, not all of which have much to do with security. But the fact that they are an intermediary make proxies a good place to enforce access control policies, limit direct connections through a firewall, and control how the network behind the proxy looks to the Internet.
Usually in combination with a packet filter, but sometimes all by themselves, proxies provide an extra level of control. More information can be found in the Firewall HOWTO or on the Squid website.
Some servers may have their own access control features. Common examples include Samba, X11, Bind, Apache and CUPS. For every service you want to offer check which configuration files apply.
If anything, the UNIX way of logging all kinds of activities into all kinds of files confirms that "it is doing something." Of course, log files should be checked regularly, manually or automatically. Firewalls and other means of access control tend to create huge amounts of log files, so the trick is to try and only log abnormal activities.
Intrusion Detection Systems are designed to catch what might have gotten past the firewall. They can either be designed to catch an active break-in attempt in progress, or to detect a successful break-in after the fact. In the latter case, it is too late to prevent any damage, but at least we have early awareness of a problem. There are two basic types of IDS: those protecting networks, and those protecting individual hosts.
For host based IDS, this is done with utilities that monitor the file system for changes. System files that have changed in some way, but should not change, are a dead give-away that something is amiss. Anyone who gets in and gets root access will presumably make changes to the system somewhere. This is usually the very first thing done, either so he can get back in through a backdoor, or to launch an attack against someone else, in which case, he has to change or add files to the system. Some systems come with the tripwire monitoring system, which is documented at the Tripwire Open Source Project website.
Network intrusion detection is handled by a system that sees all the traffic that passes the firewall (not by portscanners, which advertise usable ports). Snort is an Open Source example of such a program. Whitehats.com features an open Intrusion detection database, arachNIDS.
Some general things you should keep in mind:
How can you tell? This is a checklist of suspicious events:
In short, stay calm. Then take the following actions in this order:
Linux and networking go hand in hand. The Linux kernel has support for all common and most uncommon network protocols. The standard UNIX networking tools are provided in each distribution. Next to those, most distributions offer tools for easy network installation and management.
Linux is well known as a stable platform for running various Internet services, the amount of Internet software is endless. Like UNIX, Linux can be just as well used and administered from a remote location, using one of several solutions for remote execution of programs.
We briefly touched the subject of security. Linux is an ideal firewall system, light and cheap, but can be used in several other network functions such as routers and proxy servers.
Increasing network security is mainly done by applying frequent updates and common sense.
Most likely, your system is already installed with audio drivers and the configuration was done at installation time. Likewise, should you ever need to replace your audio hardware, most systems provide tools that allow easy setup and configuration of the device. Most currently available plug-and-play sound cards should be recognized automatically. If you can hear the samples that are played during configuration, just clickand everything will be set up for you.
If your card is not detected automatically, you may be presented with a list of sound cards and/or of sound card properties from which to choose. After that, you will have to provide the correct I/O port, IRQ and DMA settings. Information about these settings can be found in your sound card documentation. If you are on a dual boot system with MS Windows, this information can be found in t he Windows Control Panel as well.
There are generally two types of sound architecture: the older Open Sound System or OSS, which works with every UNIX-like system, and the newer Advanced Linux Sound Architecture or ALSA, that has better support for Linux, as the name indicates. ALSA also has more features and allows for faster driver development. We will focus here on the ALSA system.
Today, almost all mainstream audio chipsets are supported. Only some high-end professional solutions and some cards developed by manufacturers refusing to document their chipset specifications are unsupported. An overview of supported devices can be found on the ALSA site at http://www.alsa-project.org/alsa-doc/index.php?vendor=All#matrix.
Configuring systems installed with ALSA is done using the alsaconf tool. Additionally, distributions usually provide their own tools for configuring the sound card; these tools might even integrate the old and the new way of handling sound devices.
The cdp package comes with most distributions and provides cdp or cdplay, a text-based CD player. Desktop managers usually include a graphical tool, such as the gnome-cd player in Gnome, that can be started from a menu.
Be sure to understand the difference between an audio CD and a data CD. You do not have to mount an audio CD into the file system in order to listen to it. This is because the data on such a CD are not Linux file system data; they are accessed and sent to the audio output channel directly, using a CD player program. If your CD is a data CD containing .mp3 files, you will first need to mount it into the file system, and then use one of the programs that we discuss below in order to play the music. How to mount CDs into the file system is explained in Section 7.5.5.
The cdparanoia tool from the package with the same name reads audio directly as data from the CD, without analog conversions, and writes data to a file or pipe in different formats, of which .wav is probably the most popular. Various tools for conversion to other formats, formats, such as .mp3, come with most distributions or are downloadable as separate packages. The GNU project provides several CD playing, ripping and encoding tools, database managers; see the Free Software Directory, Audio section for detailed information.
Audio-CD creation is eased, among many others, with the kaudiocreator tool from the KDE suite. It comes with clear information from the KDE Help Center.
CD burning is covered in general in Section 9.2.2.
The popular .mp3 format is widely supported on Linux machines. Most distributions include multiple programs that can play these files. Among many other applications, XMMS, which is presented in the screenshot below, is one of the most wide-spread, partially because it has the same look and feel as the Windows tool.
Also very popular for playing music are AmaroK, a KDE application that is steadily gaining popularity, and MPlayer, which can also play movies.
In text mode, you can use the mplayer command:
It would lead us too far to discuss all possible audio formats and ways to play them. An (incomplete) overview of other common sound playing and manipulating software:
Check your system documentation and man pages for particular tools and detailed explanations on how to use them.
aumix and alsamixer are two common text tools for adjusting audio controls. Use the arrow keys to toggle settings. The alsamixer has a graphical interface when started from the Gnome menu or as gnome-alsamixer from the command line. The kmix tool does the same in KDE.
Regardless of how you choose to listen to music or other sounds, remember that there may be other people who may not be interested in hearing you or your computer. Try to be courteous, especially in office environments. Use a quality head-set, rather than the ones with the small ear pieces. This is better for your ears and causes less distraction for your colleagues.
Various tools are again available that allow you to record voice and music. For recording voice you can use arecord on the command line:
"Interrupt" means that the application has caught a Ctrl+C. Play the sample using the simple play command.
This is a good test that you can execute prior to testing applications that need voice input, like Voice over IP (VoIP). Keep in mind that the microphone input should be activated. If you don't hear your own voice, check your sound settings. It often happens that the microphone is muted or on verry low volume. This can be easily adjusted using alsamixer or your distribution-specific graphical interface to the sound system.
In KDE you can start the krec utility, Gnome provides the gnome-sound-recorder.
Various players are available:
Most likely, you will find one of these in your graphical menus.
Keep in mind that all codecs necessary for viewing different types of video might not be on your system by default. You can get a long way downloading W32codecs and libdvdcss.
The LDP released a document that is very appropriate for this section. It is entitled DVD Playback HOWTO and describes the different tools available for viewing movies on a system that has a DVD drive. It is a fine addition to the DVD HOWTO that explains installation of the drive.
For watching TV there is choice of the following tools, among many others for watching and capturing TV, video and other streams:
Internet telephony, or more common, Voice over IP (VoIP) or digital telephony allows parties to exchange voice data flows over the network. The big difference is that the data flows over a general purpose network, the Internet, contrary to conventional telephony, that uses a dedicated network of voice transmission lines. The two networks can be connected, however, under special circumstances, but for now this is certainly not a standard. In other words: it is very likely that you will not be able to call people who are using a conventional telephone. If it is possible at all, it is likely that you will need to pay for a subscription.
While there are currently various applications available for free download, both free and proprietary, there are some major drawbacks to telephony over the Internet. Most noticably, the system is unreliable, it can be slow or there can be a lot of noise on the connection, and it can thus certainly not be used to replace conventional telephony - think about emergency calls. While some providers take their precautions, there is no guarantee that you can reach the party that you want to call.
Most applications currently do not use encryption, so be aware that it is potentially easy for someone to eavesdrop on your conversations. If security is a concern for you, read the documentation that comes with your VoIP client. Additionally, if you are using a firewall, it should be configured to allow incoming connections from anywhere, so using VoIP also includes taking risks on the level of site security.
First of all, you need a provider offering the service. This service might integrate traditional telephony and it might or might not be free. Among others are SIPphone, Vonage, Lingo, AOL TotalTalk and many locally accessible providers offering the so-called "full phone service". Internet phone service only is offered by Skype, SIP Broker, Google and many others.
If you want to set up a server of your own, you might want to look into Asterisk.
On the client side, the applications that you can use depend on your network configuration. If you have a direct Internet connection, there won't be any problems, provided that you know on what server you can connect, and usually that you also have a username and password to authenticate to the service.
If you are behind a firewall that does Network Address Translation (NAT), however, some services might not work, as they will only see the IP address of the firewall and not the address of your computer, which might well be unroutable over the Internet, for instance when you are in a company network and your IP address starts with 10., 192.168. or another non-routable subnet prefix. This depends on the protocol that is used by the application.
Also, available bandwidth might be a blocking factor: some applications are optimized for low bandwidth consumption, while others might require high bandwidth connections. This depends on the codec that is used by the application.
Among the most common applications are the Skype client, which has an interface that reminds of instant messaging, and X-Lite, the free version of the XTen softphone, which looks like a mobile telephone. However, while these programs are available for free download and very popular, they are not free as in free speech: they use proprietary protocols and/or are only available in binary packages, not in source format.
VoIP applications are definitely a booming market. Volunteers try to document the current status at http://www.voip-info.org/.
The GNU/Linux platform is fully multi-media enabled. A wide variety of devices like sound cards, tv-cards, headsets, microphones, CD and DVD players is supported. The list of applications is sheer endless.
As an extra means of orientation for new users with a Windows background, the table below lists MS-DOS commands with their Linux counterparts. Keep in mind that Linux commands usually have a number of options. Read the Info or man pages on the command to find out more.
Table B-1. Overview of DOS/Linux commands
The following features are standard in every shell. Note that the stop, suspend, jobs, bg and fg commands are only available on systems that support job control.
Table C-1. Common Shell Features
The table below shows major differences between the standard shell (sh), Bourne Again SHell (bash), Korn shell (ksh) and the C shell (csh).
Table C-2. Differing Shell Features
The Bourne Again SHell has many more features not listed here. This table is just to give you an idea of how this shell incorporates all useful ideas from other shells: there are no blanks in the column for bash. More information on features found only in Bash can be retrieved from the Bash info pages, in the "Bash Features" section.
You should at least read one manual, being the manual of your shell. The preferred choice would be info bash, bash being the GNU shell and easiest for beginners. Print it out and take it home, study it whenever you have 5 minutes.
See Appendix B if you are having difficulties to assimilate shell commands.
The purpose of this License is to make a manual, textbook, or other functional and useful document "free" in the sense of freedom: to assure everyone the effective freedom to copy and redistribute it, with or without modifying it, either commercially or noncommercially. Secondarily, this License preserves for the author and publisher a way to get credit for their work, while not being considered responsible for modifications made by others.
This License is a kind of "copyleft", which means that derivative works of the document must themselves be free in the same sense. It complements the GNU General Public License, which is a copyleft license designed for free software.
We have designed this License in order to use it for manuals for free software, because free software needs free documentation: a free program should come with manuals providing the same freedoms that the software does. But this License is not limited to software manuals; it can be used for any textual work, regardless of subject matter or whether it is published as a printed book. We recommend this License principally for works whose purpose is instruction or reference.
This License applies to any manual or other work, in any medium, that contains a notice placed by the copyright holder saying it can be distributed under the terms of this License. Such a notice grants a world-wide, royalty-free license, unlimited in duration, to use that work under the conditions stated herein. The "Document", below, refers to any such manual or work. Any member of the public is a licensee, and is addressed as "you". You accept the license if you copy, modify or distribute the work in a way requiring permission under copyright law.
A "Modified Version" of the Document means any work containing the Document or a portion of it, either copied verbatim, or with modifications and/or translated into another language.
A "Secondary Section" is a named appendix or a front-matter section of the Document that deals exclusively with the relationship of the publishers or authors of the Document to the Document's overall subject (or to related matters) and contains nothing that could fall directly within that overall subject. (Thus, if the Document is in part a textbook of mathematics, a Secondary Section may not explain any mathematics.) The relationship could be a matter of historical connection with the subject or with related matters, or of legal, commercial, philosophical, ethical or political position regarding them.
The "Invariant Sections" are certain Secondary Sections whose titles are designated, as being those of Invariant Sections, in the notice that says that the Document is released under this License. If a section does not fit the above definition of Secondary then it is not allowed to be designated as Invariant. The Document may contain zero Invariant Sections. If the Document does not identify any Invariant Sections then there are none.
The "Cover Texts" are certain short passages of text that are listed, as Front-Cover Texts or Back-Cover Texts, in the notice that says that the Document is released under this License. A Front-Cover Text may be at most 5 words, and a Back-Cover Text may be at most 25 words.
A "Transparent" copy of the Document means a machine-readable copy, represented in a format whose specification is available to the general public, that is suitable for revising the document straightforwardly with generic text editors or (for images composed of pixels) generic paint programs or (for drawings) some widely available drawing editor, and that is suitable for input to text formatters or for automatic translation to a variety of formats suitable for input to text formatters. A copy made in an otherwise Transparent file format whose markup, or absence of markup, has been arranged to thwart or discourage subsequent modification by readers is not Transparent. An image format is not Transparent if used for any substantial amount of text. A copy that is not "Transparent" is called "Opaque".
Examples of suitable formats for Transparent copies include plain ASCII without markup, Texinfo input format, LaTeX input format, SGML or XML using a publicly available DTD, and standard-conforming simple HTML, PostScript or PDF designed for human modification. Examples of transparent image formats include PNG, XCF and JPG. Opaque formats include proprietary formats that can be read and edited only by proprietary word processors, SGML or XML for which the DTD and/or processing tools are not generally available, and the machine-generated HTML, PostScript or PDF produced by some word processors for output purposes only.
The "Title Page" means, for a printed book, the title page itself, plus such following pages as are needed to hold, legibly, the material this License requires to appear in the title page. For works in formats which do not have any title page as such, "Title Page" means the text near the most prominent appearance of the work's title, preceding the beginning of the body of the text.
A section "Entitled XYZ" means a named subunit of the Document whose title either is precisely XYZ or contains XYZ in parentheses following text that translates XYZ in another language. (Here XYZ stands for a specific section name mentioned below, such as "Acknowledgements", "Dedications", "Endorsements", or "History".) To "Preserve the Title" of such a section when you modify the Document means that it remains a section "Entitled XYZ" according to this definition.
The Document may include Warranty Disclaimers next to the notice which states that this License applies to the Document. These Warranty Disclaimers are considered to be included by reference in this License, but only as regards disclaiming warranties: any other implication that these Warranty Disclaimers may have is void and has no effect on the meaning of this License.
You may copy and distribute the Document in any medium, either commercially or noncommercially, provided that this License, the copyright notices, and the license notice saying this License applies to the Document are reproduced in all copies, and that you add no other conditions whatsoever to those of this License. You may not use technical measures to obstruct or control the reading or further copying of the copies you make or distribute. However, you may accept compensation in exchange for copies. If you distribute a large enough number of copies you must also follow the conditions in section 3.
You may also lend copies, under the same conditions stated above, and you may publicly display copies.
If you publish printed copies (or copies in media that commonly have printed covers) of the Document, numbering more than 100, and the Document's license notice requires Cover Texts, you must enclose the copies in covers that carry, clearly and legibly, all these Cover Texts: Front-Cover Texts on the front cover, and Back-Cover Texts on the back cover. Both covers must also clearly and legibly identify you as the publisher of these copies. The front cover must present the full title with all words of the title equally prominent and visible. You may add other material on the covers in addition. Copying with changes limited to the covers, as long as they preserve the title of the Document and satisfy these conditions, can be treated as verbatim copying in other respects.
If the required texts for either cover are too voluminous to fit legibly, you should put the first ones listed (as many as fit reasonably) on the actual cover, and continue the rest onto adjacent pages.
If you publish or distribute Opaque copies of the Document numbering more than 100, you must either include a machine-readable Transparent copy along with each Opaque copy, or state in or with each Opaque copy a computer-network location from which the general network-using public has access to download using public-standard network protocols a complete Transparent copy of the Document, free of added material. If you use the latter option, you must take reasonably prudent steps, when you begin distribution of Opaque copies in quantity, to ensure that this Transparent copy will remain thus accessible at the stated location until at least one year after the last time you distribute an Opaque copy (directly or through your agents or retailers) of that edition to the public.
It is requested, but not required, that you contact the authors of the Document well before redistributing any large number of copies, to give them a chance to provide you with an updated version of the Document.
You may copy and distribute a Modified Version of the Document under the conditions of sections 2 and 3 above, provided that you release the Modified Version under precisely this License, with the Modified Version filling the role of the Document, thus licensing distribution and modification of the Modified Version to whoever possesses a copy of it. In addition, you must do these things in the Modified Version:
GNU FDL Modification Conditions
If the Modified Version includes new front-matter sections or appendices that qualify as Secondary Sections and contain no material copied from the Document, you may at your option designate some or all of these sections as invariant. To do this, add their titles to the list of Invariant Sections in the Modified Version's license notice. These titles must be distinct from any other section titles.
You may add a section Entitled "Endorsements", provided it contains nothing but endorsements of your Modified Version by various parties--for example, statements of peer review or that the text has been approved by an organization as the authoritative definition of a standard.
You may add a passage of up to five words as a Front-Cover Text, and a passage of up to 25 words as a Back-Cover Text, to the end of the list of Cover Texts in the Modified Version. Only one passage of Front-Cover Text and one of Back-Cover Text may be added by (or through arrangements made by) any one entity. If the Document already includes a cover text for the same cover, previously added by you or by arrangement made by the same entity you are acting on behalf of, you may not add another; but you may replace the old one, on explicit permission from the previous publisher that added the old one.
The author(s) and publisher(s) of the Document do not by this License give permission to use their names for publicity for or to assert or imply endorsement of any Modified Version.
You may combine the Document with other documents released under this License, under the terms defined in section 4 above for modified versions, provided that you include in the combination all of the Invariant Sections of all of the original documents, unmodified, and list them all as Invariant Sections of your combined work in its license notice, and that you preserve all their Warranty Disclaimers.
The combined work need only contain one copy of this License, and multiple identical Invariant Sections may be replaced with a single copy. If there are multiple Invariant Sections with the same name but different contents, make the title of each such section unique by adding at the end of it, in parentheses, the name of the original author or publisher of that section if known, or else a unique number. Make the same adjustment to the section titles in the list of Invariant Sections in the license notice of the combined work.
In the combination, you must combine any sections Entitled "History" in the various original documents, forming one section Entitled "History"; likewise combine any sections Entitled "Acknowledgements", and any sections Entitled "Dedications". You must delete all sections Entitled "Endorsements".
You may make a collection consisting of the Document and other documents released under this License, and replace the individual copies of this License in the various documents with a single copy that is included in the collection, provided that you follow the rules of this License for verbatim copying of each of the documents in all other respects.
You may extract a single document from such a collection, and distribute it individually under this License, provided you insert a copy of this License into the extracted document, and follow this License in all other respects regarding verbatim copying of that document.
A compilation of the Document or its derivatives with other separate and independent documents or works, in or on a volume of a storage or distribution medium, is called an "aggregate" if the copyright resulting from the compilation is not used to limit the legal rights of the compilation's users beyond what the individual works permit. When the Document is included in an aggregate, this License does not apply to the other works in the aggregate which are not themselves derivative works of the Document.
If the Cover Text requirement of section 3 is applicable to these copies of the Document, then if the Document is less than one half of the entire aggregate, the Document's Cover Texts may be placed on covers that bracket the Document within the aggregate, or the electronic equivalent of covers if the Document is in electronic form. Otherwise they must appear on printed covers that bracket the whole aggregate.
Translation is considered a kind of modification, so you may distribute translations of the Document under the terms of section 4. Replacing Invariant Sections with translations requires special permission from their copyright holders, but you may include translations of some or all Invariant Sections in addition to the original versions of these Invariant Sections. You may include a translation of this License, and all the license notices in the Document, and any Warranty Disclaimers, provided that you also include the original English version of this License and the original versions of those notices and disclaimers. In case of a disagreement between the translation and the original version of this License or a notice or disclaimer, the original version will prevail.
If a section in the Document is Entitled "Acknowledgements", "Dedications", or "History", the requirement (section 4) to Preserve its Title (section 1) will typically require changing the actual title.
You may not copy, modify, sublicense, or distribute the Document except as expressly provided for under this License. Any other attempt to copy, modify, sublicense or distribute the Document is void, and will automatically terminate your rights under this License. However, parties who have received copies, or rights, from you under this License will not have their licenses terminated so long as such parties remain in full compliance.
The Free Software Foundation may publish new, revised versions of the GNU Free Documentation License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. See http://www.gnu.org/copyleft/.
Each version of the License is given a distinguishing version number. If the Document specifies that a particular numbered version of this License "or any later version" applies to it, you have the option of following the terms and conditions either of that specified version or of any later version that has been published (not as a draft) by the Free Software Foundation. If the Document does not specify a version number of this License, you may choose any version ever published (not as a draft) by the Free Software Foundation.
To use this License in a document you have written, include a copy of the License in the document and put the following copyright and license notices just after the title page:
If you have Invariant Sections, Front-Cover Texts and Back-Cover Texts, replace the "with...Texts." line with this:
If you have Invariant Sections without Cover Texts, or some other combination of the three, merge those two alternatives to suit the situation.
If your document contains nontrivial examples of program code, we recommend releasing these examples in parallel under your choice of free software license, such as the GNU General Public License, to permit their use in free software.