|
|
- "Well, well - what have we here?" Woomert's fingers flew over the keyboard as he fired off the one-liner. After about a second, he smiled but kept watching the screen - which, after a another second or two, printed a list of filenames. - "There you are, Willard - a list of unique names. I'm glad your system had the module that I needed - it's a common one, but I wasn't certain. Copy those off to another directory, delete all the others, and copy them back, and you're all done. You could even automate the process by writing..." A mischievous grin flashed over Woomert's face as he paused for a second. "...a program. Well, a one-line shell script, anyway." - "That... that's it???" Willard stared in hope and disbelief at the screen where the short list of files beckoned for action. He quickly created a subdirectory in "/tmp", copied the files by carefully using "cp" and backticks around Woomert's script, and scanned them by using "less". When he turned toward Woomert a few seconds later, his face was shining with joy. - "Mr. Foonly... you've saved me. I promise I'll be far more careful from now on, and I'll talk to our administrator about setting up a - what did you call it, a ``chroot jail''? - anyway, I'm really grateful. How can I ever repay you?" - "Well, you could bring me large loads of gold and jewels..." Woomert stopped and laughed at the look of dismay on the young man's face, "just kidding. I have a suggestion for you, though, that you might put some thought into. You seem to have some aptitude for programming - I was just looking at your "randfile.c", and except for the obvious errors, you were doing pretty well. I'd suggest you take a few programming courses at the local vocational school as a start - when you're just starting out, it's difficult to get anywhere, particularly in languages like C and C++ where there are many, many traps and pitfalls for the unwary. They work well for their specific purposes, mind you - but you should have some formal training to understand the background of what you're doing, or you end up with a mess." - "A vocational school." Willard seemed struck by the idea. "Say, I never thought of that; I just knew that college was too expensive for me right now, and I wanted to learn somehow. Great idea, Mr. Foonly; I'll run down there and find out what it takes as soon as possible! I'll even put practicing C aside for now, until I do learn some of the background... what about the stuff that you were using? I'd heard of PERL before." - "Well, it's not called ``PERL'', since it's not an abbreviation - although some people have come up with back-formations for what it stands for [1]. It's ``Perl'' if you're talking about the language, and "perl" for the the executable name. Yes, I think that learning Perl would be a very good idea, especially if you're going to back it up with a later study of C; you'll find that it's easy to learn and keep learning, allows you to become competent quickly, and avoids many of the problems of the older languages that have you dealing with abstruse issues like memory management and bad pointers. I'd suggest picking up a good book - be careful, there are many poorly-written books on Perl, but I can definitely recommend "Learning Perl'' by Randal Schwartz and Tom Phoenix - and studying it. An evening or two of that, and you'll be able to get in trouble even more efficiently than you did with your C program." Woomert grinned at the somewhat woebegone-looking Willard, who finally grinned back. - "Well, I've actually read up on it a little bit before, but I'd read all kinds of things on the Net about Perl being hard to read, or hard to understand, so I was a little reticent about studying it. Actually, " Willard looked abashed, "after seeing your code, I know what they mean. Is it always that complicated?" - "Not at all. I use these one-liners because I understand Perl well, and because they're not code that I'm leaving for someone else to use. In fact, if you're interested, I can explain what I did and show how it would look in a script." - "Mr. Foonly, I'd be fascinated. After all, I'm going to be learning this stuff - what better way to start than by hearing you explain it?" Smiling, Woomert extracted his cell phone from the quick-release waterproof stainless steel holder that he'd recently invented. "Hold on while I get Frink. He'd like to see this too, I'm sure. Hello, Frink? Got a case here... actually, it's solved already, but you might want to see the method. Ten minutes? See you then." He returned the phone to its holster. "We'll just have some of this excellent brew that I've made up until he gets here. It's a pure, fine-pluck, high-altitude rolled Nepalese tea that's got a wonderful smoky flavor. A cup for you?..." A bit later, Frink showed up, looking like he'd torn himself away from some project or another. He also looked disappointed, but Woomert immediately forestalled him. - "Frink, I know that you strongly prefer to participate in my cases; I do also, since you're now going to be my partner. However, there are times when a case just sneaks up on you and turns into a knotty problem before you can blink, and you have to get things tied up before it loops and replicates itself into some huge number of variables." Both of them glanced over at Willard who was by now unsuccessfully trying to choke down his laughter. "Willard, for example, understands precisely what I mean. Anyway, be assured that I would not have left you out if there was not a time element involved; as it turned out, I was able to solve the problem quickly, but there was always the chance that we'd need every available second. Let me tell you about it and judge for yourself." A few moments sufficed to explain what had come before, and Frink nodded and smiled at Woomert. - "Thanks, Woomert. I was feeling left out, and I appreciate your explaining that. Good communications between partners are important, aren't they? That's a lesson all its own." The two of them grinned at each other before turning to the computer. - "Go ahead, Frink. Can you break this one out for Willard? I'll be right here, so if you get stuck, I'll keep it going." - "All right, then. Let's see." Frink stared at the code on the screen, forehead furrowed in concentration.
- "All right. ``-MDigest::MD5=md5'' is pretty easy: you're loading the ``Digest::MD5'' module and importing the ``md5'' method from it, just as we've talked about before. ``-we'', we know about - enable warnings and execute what follows as a script. ``-0'', now... ah, I remember - a number as an option is the octal code of the end-of-line definition for the files we're reading in. Oh, I get it! You're effectively disabling the EOL, thus ``slurping'' entire files, one at a time. Right?" Woomert silently applauded; Frink grinned and turned back to the screen before him. - "Next. You copy @ARGV right at the start - this saves the list of file names so you can re-use them, since @ARGV is going to change as we read in the files. Furthermore, you didn't have to use a BEGIN procedure to do this since we're not looping the entire script, as we would be with a ``-n'' or a ``-p'' switch. Next... uh, next it gets pretty tricky. I'll admit that you've just lost me, although I can explain what you did further on: you copied the values in the %h hash to an array so you could use Perl's "pretty print" mechanism: an array in double-quotes is printed with spaces between the elements, which was what you wanted. The ``\n'' at the end also deserves a comment: normally, you'd use the ``-l'' switch on the command line which would append the EOL to every line that was printed, but you'd redefined EOL as a null, so that wouldn't help - so you had to use the ``\n''. How's that?" - "Well done, partner. Now, here's the rest of the story - are you following this, Willard? Speak up if you don't understand something. While Frink is ``chanting his beads'', so to speak, and learning in the process, you're our reviewer for this run: if it's not being clearly explained, we'd like to hear from you." Willard cleared his throat. - "Well - actually, I understand it all so far. I'm guessing that a ``module'' is like a C library, and ``Digest::MD5'' probably has to do with, well, generating MD5 sums - I've heard of this but am not really sure of what that means. Other than that, yes, I think I've got it." Frink spoke up. - "An MD5 digest, or sum (sometimes also called a hash), is used as a unique ID for strings, most commonly file contents. If you get a file and its MD5 hash, you can check it using commonly available tools to make sure that the file hasn't changed in any way by generating a new sum from the file and comparing it with the one you've received. In fact, here's a useful little utility that I use to do exactly that, instead of having to visually compare them:
Makes it a little easier, I think. Anyway, back to Woomert's explanation... I'd like to see how he pulled off this particular trick." Woomert smiled at his partner. - "Obviously, you're talking about the ``@h{map{md5($_)}<>}=@a'' bit, right? Yeah, that one is a little complex if you're not used to it. What I did there is use a hash slice to populate %h - it's a neat little idiom to keep in mind. If you think about how a hash is structured: key1 => value1 key2 => value2 key3 => value3 key4 => value4 key5 => value5 ...you'll see that it's an array of keys which point to an array of values. Consequently, we can treat it as such; as an example, we can create a hash of the alphabet and letters' numerical positions by saying @alpha{ 1 .. 26 } = "a" .. "z"; # The range operator, '..' generates the two lists
The ``@'' sigil before the hash name simply indicates the context
of what is going on; what tells us about the type of variable we're using
are the curly braces following the variable name - that indicates a hash.
If we saw square braces, we'd know we were dealing with an array slice
instead.Still, that doesn't explain everything - so here's the rest of it. Since we're reading in the file contents one large slurp at a time, meaning that we get one entire file's worth when we read the special ``<>'' filehandle, I simply used the map function to do an implicit loop over it - and run the ``md5()'' routine over each of those chunks of text. I would have had to do something very different if these weren't text files - a file that contained a null would have thrown off the count - but they were. My safety margin was in the fact that the ``-w'' switch would warn me if I had an unbalanced hash - which would happen if there was a null anywhere in there. So, I created a hash of keys which were MD5 digests of the file contents, and assigned the array of file names that I'd created earlier as the values. It's important to note that hashes do not store the key-value pairs in the order that they're assigned... but it wasn't a factor here, since we were really dealing with arrays which are stored in order. Now, Frink, I'll leave this one thing to you. Why did this produce a list of unique file names?" Frink laughed. - "Thanks, Woomert. I actually do know this one. Since a hashes keys are unique - values don't have to be, but keys do - every time that you added a key/value pair where the key already existed in the hash, the old value for that key simply got overwritten. Voila - a unique list. In fact, I can now break all this out in a script... mmm, I'll have to change a few things, since the way you did it is implicit in that hash slice mechanism: After a moment or two, Willard suddenly spoke up. - "Say, I think I understand this stuff. Why, that doesn't look complicated at all! I'm not sure about the ``$_'' and the ``$/'' variables, but I'd think I can find out about those - Perl does have good documentation, right?" Frink and Woomert both laughed, and Frink fielded the question. - "The best. In fact, it all comes with Perl - and is augmented with every module you install. It's all available via the ``perldoc'' program; start by reading ``perldoc perldoc'', and you'll never find yourself at a loss for information about Perl." Somewhat later, after the very grateful Willard had headed for home and (finally) a night of sleep, Frink and Woomert were relaxing with a rare recording of Burundi Ubuhuba nose-singing that was accompanied by a thumb-piano and zither. As usual, the food accompanying the music was tasty and highly appropriate: dinner consisted of curried ingelegde vis (a spicy fish recipe that Woomert had learned at Cape Malay) and futari (squash and yams) on the side, with East African samosa bread and spicy piri-piri sauce for the adventurous. Pickled African peaches wrapped up the menu. Suddenly, there was a loud jangling noise from the outside, followed by cursing that would blister cheap paint (Woomert had providentially done the house and the out-buildings in a top-grade epoxy, so they weren't affected), and by police sirens shortly thereafter. - "Ah." Woomert casually leaned back in his chair, nibbling on one last tasty peach. "That would be the Zigamorphs. Back to prison they go for violating their probation; they had been explicitly told to stay out of my neighborhood." - "What... happened, Woomert? It sounded pretty bad." - "I knew they'd come calling soon, and had set a trap for them. Just a very basic numerical complement program which would throw a steel-cage exception when it detected a null [2]. One of these days, Frink, the criminals will become intelligent - mark my words, it's a simple matter of selection pressure. Until then, we can all sleep safe in our beds..."
[2] A zigamorph, according to the Jargon File, is a hex 'FF' character (11111111). A numerical complement of this would, of course, be all zeros - a null.
Ben is a Contributing Editor for Linux Gazette and a member of The Answer Gang.
Ben's subsequent experiences include creating software in nearly a dozen
languages, network and database maintenance during the approach of a hurricane,
and writing articles for publications ranging from sailing magazines to
technological journals. Having recently completed a seven-year
Atlantic/Caribbean cruise under sail, he is currently docked in Baltimore, MD,
where he works as a technical instructor for Sun Microsystems.
Ben has been working with Linux since 1997, and credits it with his complete
loss of interest in waging nuclear warfare on parts of the Pacific Northwest.
Published in Issue 91 of Linux Gazette, June 2003 |
Ben was born in Moscow, Russia in 1962. He became interested in
electricity at age six--promptly demonstrating it by sticking a fork into
a socket and starting a fire--and has been falling down technological mineshafts
ever since. He has been working with computers since the Elder Days, when
they had to be built by soldering parts onto printed circuit boards and
programs had to fit into 4k of memory. He would gladly pay good money to any
psychologist who can cure him of the resulting nightmares.