|
|
The Answer Guy 35: Listing "Just the Links": It's the only way, Luke
"The Linux Gazette...making Linux just a little more fun!"
Listing "Just the Links":
It's the only way, Luke
From Jerry Giles on Thu, 05 Nov 1998
Sorry for the intrusion but I came across your name while browsing
for Linux. I am currently in a CIS program at the local college
and a recent test had an item I still can't find the answer to.
The professor asked what command to use to list "only the linked
files" in a directory. He is expecting us to use ls with flags, I
guess, but I've looked at all the flags given in the text and
nothing seems to address this. Can you help?
Thanks, jerry giles
Either you misunderstand, or your professor isn't being very
precise. The 'ls' command "lists links" --- all directory
entries are links! Some of these are symbolic links;
others are "hard" links (which we think of as "normal"
directory entries. The 'ls' command can't list anything
but links. I can list other information that it extracts
from the inodes to which each of these links points (via the
stat() function).
So, the question is essentially meaningless as you've
relayed it.
Now, if the question was about listing symbolic links
there are a couple of simple answers that do make sense.
ls -l | grep ^l
... this filters a "long" listing of all the links (hard and
"symbolic") and displays only those which start with the
letter l. In a "long" directory listing the first block of
characters (field) is a string which encodes the type and
permissions of the files to which these directory links
point. (l is "symlink", d for "directory", s for "socket",
p for "FIFO/named pipe", b and c for "block" and "character"
special device nodes --- normally only found under the /dev/
directory --- and "-" (dash) for "regular" files).
The second field in a long listing is the "link count."
This tells you how many "hard links" point to the same
inodes that this one does.
Here's an example of my own root directory
drwxr-xr-x 14 root root 1024 Sep 27 17:19 .
drwxr-xr-x 14 root root 1024 Sep 27 17:19 ..
-rw-r--r-- 2 root root 219254 Sep 27 17:19 System.map
drwxr-xr-x 2 root root 2048 Sep 12 03:25 bin
drwxr-xr-x 2 root root 1024 Sep 27 17:20 boot
drwxr-xr-x 2 root root 1024 Aug 31 06:40 cdrom
drwxr-xr-x 21 root root 4096 Nov 4 03:12 etc
lrwxrwxrwx 1 root root 15 Apr 20 1998 home -> /usr/local/home
drwxr-xr-x 5 root root 2048 Sep 16 23:48 lib
drwxr-xr-x 2 root root 12288 Mar 10 1998 lost+found
drwxr-xr-x 9 root root 1024 Aug 31 06:40 mnt
lrwxrwxrwx 1 root root 14 Mar 31 1998 opt -> /usr/local/opt
dr-xr-xr-x 63 root root 0 Oct 13 02:25 proc
drwx--x--x 13 root root 2048 Oct 31 17:47 root
drwxr-xr-x 5 root root 2048 Sep 16 23:48 sbin
drwxrwxrwt 8 temp root 3072 Nov 5 09:33 tmp
drwxr-xr-x 30 root root 1024 Aug 31 13:32 usr
lrwxrwxrwx 1 root root 13 Aug 31 06:40 var -> usr/local/var
-rw-r--r-- 1 root root 732668 Sep 27 17:19 vmlinuz
This was generated with the command: 'ls -al /'
The number in the second field (the first number on each
of these lines) is the "link count." This is the number
of hard links (non-symlinks) that point to the same inode.
Thus my rood directory has 14 links to it. The ".." entry
for each of /'s subdirectories points back up to it.
In other words /usr/.. points back to /,
so do /etc/..,
/dev/.., and all the others that are just one level down
from it. /usr/local/.. points to /usr and so on.
We see that 'System.map' has a link count of 2. That means
that there is another name for this file. Somewhere on this
filesystem there is another hard link to it.
Most Unix newbies are using to thinking of the 'ls' command
as a listing of files. This is wrong. The 'ls' command
is a listing of links to files. When you add parameters
like "-l" to the 'ls' command, you are listing the links,
AND SOME INFORMATION ABOUT THE FILES TO WHICH THEY POINT.
(Under the hood the 'ls' command is "stat()'ing each of
these entries). A Unix/Linux directory consists of a list
of names and inodes. All of the rest of the information
that we associate with the file (its type, ownership,
permissions, link count, all three time/date stamps, size,
and --- most importantly --- the list of blocks that
contains the file's contents, all of this is stored in the
inode).
To understand the difference better, create a subdirectory
(~/tmp/experiment). Put a few arbitrary links into that
(use the 'ln' command to make "hard links" and the 'ln -s'
command to make some symlinks, and maybe some 'cp' commands
to copy in a few files). Now use the 'chmod' command to remove
your own execute ("x") rights to that directory
('chmod a-x ~/tmp/experiment').
- (technically this is a "demonstration" rather
than a true "experiment" but that's a bit of
scientific method hairsplitting that I'll only
mention in passing).
You should be able to do an 'ls' command (be sure to use the
real 'ls' command --- NOT SOME ALIAS, SHELL FUNCTION OR
SCRIPT). That should work. (If it doesn't --- you probably
have 'ls' alias'ed to 'ls --color' or something like that
--- try issuing the command /bin/ls, or try the command
'unalias ls' for the duration of this experiment). When you
can issue the 'ls' command, with no arguments and get a
list of the file names in the "~/tmp/experiment" directory
then try 'ls -l' or 'ls -i'
You should get a whole stream of "Permission denied"
messages. Note that you also have to do all of this from
outside of the directory. Issuing the 'cd' command to get into a
directory requires that you have "execute" permission to
that directory.
The reason that you get these "Permission denied" errors
is because, to give any other information about a file
(other than the link names) the 'ls' command needs to access
the 'inodes' (which requires "execute" permissions for a
directory). You can do an 'ls' or an 'ls -a' on the
directory --- because these only provide lists of the link
names. These variations of the command don't need access
to any other information about the files (which is all
stored in the inode).
So, now that you (hopefully) understand what links really
are --- you can understand something about the 'rm' command.
'rm' doesn't remove files. 'rm' remove links
to files. The filesystem driver then checks the link count. If that's
"zero" (and there are no open file descriptors, processes
with the file open) then the file is actually removed.
Note the important element here: file removal happens
indirectly, as part of the filesystem's maintenance. The
'rm' and similar commands just call "unlink()" (the system
call).
There was also an extra clause I snuck in. If I open a
file (with and editor, for example) and then I use 'rm'
to remove that file, what happens? (Let's assume that there
was only one hard link to the file).
Nothing spectacular. The link count is zero but the file
is open. The filesystem maintenance routines leave the
inode and the data blocks to the file alone so long as the
file is open. As soon as the file is close, these routines
will detect the zero link count and then remove the file.
If a dozen processes have the file open --- than all of
them must close it before the file is truly removed.
Removal actually involves a few steps. All of the
data blocks that are allocated to the file are reassigned to
the "free list." You can think of the free list as a "special
file" that "owns" all of the free space on the disk. The
actual implementation is different for different fileystems.
Then the inode is marked as deleted, or its "zero'd out"
(filesystem and version specific).
Now, back to your original question:
A more precise way to find all of the "symlinks" in a
directory is to use the 'find' command. Try the command:
find / -type l -maxdepth 1 -print
... (GNU 'find' defaults to "-print" so you can leave that
off under Linux).
The "maxdepth 1" part is to prevent 'find' from traversing
down the whole file tree. (Note: I tend to use "file tree"
or "file hiearchy" to refer to all the files *and all the
mounted filesystems* below a point, and "filesystem" to
refer to all of the files on a single mounted fs. This is a
subtle point of confusion).
Now, if the question was "find all of the regular files with
a link count greater than 1" you'd use:
find ... -type f -maxdepth 1 -links +1
... where the ellipsis is a list of one or more directories
and/or filenames and the other parameters test for the
various conditions that I described (and prevent traversal
down the tree, of course). In GNU find many of the
numeric conditions can be specified as "+x" "x" or
"-x" --- where +x means "more than 'x'", -x means "less than 'x'"
and just x means "exactly x." That's a subtlety of the
'find' command.
A last interpretation of this question that I can imagine
is: find all of the links to a given file (inode). To do
this you start with the inode. If it is not a directory (*)
and it has a link count of more than one then search the
whole filesystem for any other link that has a matching
inode. This is a non-trivial question to a first term
Unix student. It entails writing a script in a few parts.
* (We don't have to search for the additional
hard links to directories, because they should
all be in ./*/.. --- that is they are all . or
.. entries in the current directory and the ones
just below us. If you were to use some custom
code for force the creation of some other
hard link to a directory --- fsck would probably
have fits about the anomaly in the directory
structure. Some versions of Unix have
historically allowed root (superuser) to
create hard links to directories --- but the
GNU utilities under Linux won't allow it ---
so you'd have to write your own code or
you'd have to directly modify the fs with a
hex editor).
I'll just walk through one example to get us warmed up:
In my root directory example above I saw that System.map
had a link count of 2. It's a regular file. So I want
to find the other link to it.
First I find the inode.
'ls -ail /' gives us:
2 drwxr-xr-x 14 root root 1024 Sep 27 17:19 .
2 drwxr-xr-x 14 root root 1024 Sep 27 17:19 ..
13 -rw-r--r-- 2 root root 219254 Sep 27 17:19 System.map
4019 drwxr-xr-x 2 root root 2048 Sep 12 03:25 bin
56241 drwxr-xr-x 2 root root 1024 Sep 27 17:20 boot
14 lrwxrwxrwx 1 root root 13 Aug 31 06:40 var
(etc).
... the numbers in the first field here are the inodes
--- the filesystem data structures to which these links
point. We note that the '.' and '..' (current and parent
directories) both point to the same inode *for the root
directory*. (For any other directory this would not be the
case).
... so I want to find all links on this filesystem (*)
which point to inode number 13.
- (not on any other filesystem that's mounted
--- they each have their own inode number "13")
So, here's the command to do that:
find / -mount -inum 13
... whoa! That was easy. The "-mount" option tells the
find command not to traverse across any mount points (it's
the same as the -xdev option).
To do this for each of the items in a directory -- the hard
part is to find the root of the filesystem on which each
file resides. In my example this was deceptively easy
because the link I was looking at was in the root directory
(which obviously is at the root of its filesytem).
If I had a script or program that would "find the root
of the filesystem on which a given file resided" (let's call
it "fsrootof" --- then I could write the rest of this
script:
find ... -type f -links +1 -printf "%i %p\n"
| while read i f; do
find $(fsrootof $f) -mount -inum $i
done
... this is a bit of shell script code that uses 'find' to
generate a list of the inodes and names/paths (the -printf
option to the first 'find') of "regular files" with link
counts greater than 1. That list is fed into a simple shell
loop (a mill) that reads each line as a "inode" and a
"patch" (later referred to as $i and $f respectively). The
body of that loop calls my mythical script or program to
find the "root of the filesystem of the file" --- and use
that as the search point for the second find command.
Just off hand I can't think of a way to implement this
'fsrootof' command using simple shell scripting. It
would probably best be done as a C program or a Perl script
(making direct use of some system calls to stat the file
and some other trick to traverse upwards (following
.. links) until we cross a mountpoint. I'd have to dig
up the sources to the 'find' command to see how they do that.
So, maybe I'll leave that as the "Linux Gazette Reader
Challenge" (implement 'fsrootof' as described above).
Copyright © 1998, James T. Dennis
Published in The Linux Gazette Issue 35 December 1998
|