We are in bvmlinux now!
With the help of misc.c:decompress_kernel(),
we are going to decompress piggy.o
to get the resident kernel image linux/vmlinux.
This file is of pure 32-bit startup code.
Unlike previous two files, it has no ".code16" statement in the source file.
Refer to
Using as: Writing 16-bit Code for details.
5.1. Decompress Kernel
The segment base addresses in segment descriptors (which correspond to
segment selector __KERNEL_CS and __KERNEL_DS) are equal to 0;
therefore, the logical address offset (in segment:offset format) will
be equal to its linear address if either of these segment selectors
is used.
For zImage, CS:EIP is at logical address 10:1000
(linear address 0x1000) now;
for bzImage, 10:100000 (linear address 0x100000).
As paging is not enabled, linear address is identical to physical address.
Check IA-32 Manual (Vol.1. Ch.3.3. Memory Organization, and
Vol.3. Ch.3. Protected-Mode Memory Management) and
Linux Device Drivers: Memory Management in Linux for address issue.
It comes from setup.S that BX=0 and
ESI=INITSEG<<4.
.text
///////////////////////////////////////////////////////////////////////////////
startup_32()
{
cld;
cli;
DS = ES = FS = GS = __KERNEL_DS;
SS:ESP = *stack_start; // end of user_stack[], defined in misc.c
// all segment registers are reloaded after protected mode is enabled
// check that A20 really IS enabled
EAX = 0;
do {
1: DS:[0] = ++EAX;
} while (DS:[0x100000]==EAX);
EFLAGS = 0;
clear BSS; // from _edata to _end
struct moveparams mp; // subl $16,%esp
if (!decompress_kernel(&mp, ESI)) { // return value in AX
restore ESI from stack;
EBX = 0;
goto __KERNEL_CS:100000;
// see linux/arch/i386/kernel/head.S:startup_32
}
/*
* We come here, if we were loaded high.
* We need to move the move-in-place routine down to 0x1000
* and then start it with the buffer addresses in registers,
* which we got from the stack.
*/
3: move move_rountine_start..move_routine_end to 0x1000;
// move_routine_start & move_routine_end are defined below
// prepare move_routine_start() parameters
EBX = real mode pointer; // ESI value passed from setup.S
ESI = mp.low_buffer_start;
ECX = mp.lcount;
EDX = mp.high_buffer_star;
EAX = mp.hcount;
EDI = 0x100000;
cli; // make sure we don't get interrupted
goto __KERNEL_CS:1000; // move_routine_start();
}
/* Routine (template) for moving the decompressed kernel in place,
* if we were high loaded. This _must_ PIC-code ! */
///////////////////////////////////////////////////////////////////////////////
move_routine_start()
{
move mp.low_buffer_start to 0x100000, mp.lcount bytes,
in two steps: (lcount >> 2) words + (lcount & 3) bytes;
move/append mp.high_buffer_start, ((mp.hcount + 3) >> 2) words
// 1 word == 4 bytes, as I mean 32-bit code/data.
ESI = EBX; // real mode pointer, as that from setup.S
EBX = 0;
goto __KERNEL_CS:100000;
// see linux/arch/i386/kernel/head.S:startup_32()
move_routine_end:
}
Didn't find _edata and
_end definitions?
No problem, they are defined in the "internal linker script".
Without -T (--script=) option specified, ld uses
this builtin script to link compressed/bvmlinux.
Use "ld --verbose" to display this script, or check
Appendix B. Internal Linker Script.
decompress_kernel() calls
gunzip() -> inflate(), which are defined in
linux/lib/inflate.c,
to decompress resident kernel image to
low buffer (pointed by output_data) and
high buffer (pointed by high_buffer_start, for
bzImage only).
the size of the uncompressed input data modulo 2^32
Notes: a.
ID2 value can be 158 (0x9e, \236) for gzip 0.5;
b.
XFL value 4 - compressor used fastest algorithm;
c.
FLG bit 0, FTEXT, does not indicate any "extra field".
We can use this file format knowledge to find out
the beginning of gzipped linux/vmlinux.
We can see that the gzipped file begins at 0x4c50 in the above example.
The four bytes before "1f 8b 08 00" is input_len
(0x0011011e, in little endian), and 0x4c50+0x0011011e=0x114d6e equals to
the size of bzImage
(/boot/vmlinuz-2.4.20-28.9).
static uch *inbuf; /* input buffer */
static unsigned insize = 0; /* valid bytes in inbuf */
static unsigned inptr = 0; /* index of next byte to be processed in inbuf */
///////////////////////////////////////////////////////////////////////////////
static int gunzip(void)
{
Check input buffer for {ID1, ID2, CM}, must be
{0x1f, 0x8b, 0x08} (normal case), or
{0x1f, 0x9e, 0x08} (for gzip 0.5);
Check FLG (flag byte), must not set bit 1, 5, 6 and 7;
Ignore {MTIME, XFL, OS};
Handle optional structures, which correspond to FLG bit 2, 3 and 4;
inflate(); // handle compressed blocks
Validate {CRC32, ISIZE};
}
When get_byte(), defined in
linux/arch/i386/boot/compressed/misc.c,
is called for the first time,
it calls fill_inbuf() to setup input buffer
inbuf=input_data and
insize=input_len.
Symbol input_data and
input_len are defined in
piggy.o linker script.
See Section 2.5.
5.3. inflate()
// some important definitions in misc.c
#define WSIZE 0x8000 /* Window size must be at least 32k,
* and a power of two */
static uch window[WSIZE]; /* Sliding window buffer */
static unsigned outcnt = 0; /* bytes in output buffer */
// linux/lib/inflate.c
#define wp outcnt
#define flush_output(w) (wp=(w),flush_window())
STATIC unsigned long bb; /* bit buffer */
STATIC unsigned bk; /* bits in bit buffer */
STATIC unsigned hufts; /* track memory usage */
static long free_mem_ptr = (long)&end;
///////////////////////////////////////////////////////////////////////////////
STATIC int inflate()
{
int e; /* last block flag */
int r; /* result code */
unsigned h; /* maximum struct huft's malloc'ed */
void *ptr;
wp = bb = bk = 0;
// inflate compressed blocks one by one
do {
hufts = 0;
gzip_mark() { ptr = free_mem_ptr; };
if ((r = inflate_block(&e)) != 0) {
gzip_release() { free_mem_ptr = ptr; };
return r;
}
gzip_release() { free_mem_ptr = ptr; };
if (hufts > h)
h = hufts;
} while (!e);
/* Undo too much lookahead. The next read will be byte aligned so we
* can discard unused bits in the last meaningful byte. */
while (bk >= 8) {
bk -= 8;
inptr--;
}
/* write the output window window[0..outcnt-1] to output_data,
* update output_ptr/output_data, crc and bytes_out accordingly, and
* reset outcnt to 0. */
flush_output(wp);
/* return success */
return 0;
}
free_mem_ptr is used in
misc.c:malloc() for dynamic memory allocation.
Before inflating each compressed block, gzip_mark()
saves the value of free_mem_ptr;
After inflation, gzip_release() will
restore this value.
This is how it "free()" the memory allocated in
inflate_block().
Gzip uses
Lempel-Ziv coding (LZ77) to compress files.
The compressed data format is specified in
RFC 1951.
inflate_block() will inflate compressed blocks,
which can be treated as a bit sequence.
The data structure of each compressed block is outlined below:
BFINAL (1 bit)
0 - not the last block
1 - the last block
BTYPE (2 bits)
00 - no compression
remaining bits until the byte boundary;
LEN (2 bytes);
NLEN (2 bytes, the one's complement of LEN);
data (LEN bytes);
01 - compressed with fixed Huffman codes
{
literal (7-9 bits, represent code 0..287, excluding 256);
// See RFC 1951, table in Paragraph 3.2.6.
length (0-5 bits if literal > 256, represent length 3..258);
// See RFC 1951, 1st alphabet table in Paragraph 3.2.5.
data (of literal bytes if literal < 256);
distance (5 plus 0-13 extra bits if literal == 257..285, represent
distance 1..32768);
/* See RFC 1951, 2nd alphabet table in Paragraph 3.2.5,
* but statement in Paragraph 3.2.6. */
/* Move backward "distance" bytes in the output stream,
* and copy "length" bytes */
}* // can be of multiple instances
literal (7 bits, all 0, literal == 256, means end of block);
10 - compressed with dynamic Huffman codes
HLIT (5 bits, # of Literal/Length codes - 257, 257-286);
HDIST (5 bits, # of Distance codes - 1, 1-32);
HCLEN (4 bits, # of Code Length codes - 4, 4 - 19);
Code Length sequence ((HCLEN+4)*3 bits)
/* The following two alphabet tables will be decoded using
* the Huffman decoding table which is generated from
* the preceeding Code Length sequence. */
Literal/Length alphabet (HLIT+257 codes)
Distance alphabet (HDIST+1 codes)
// Decoding tables will be built from these alphpabet tables.
/* The following is similar to that of fixed Huffman codes portion,
* except that they use different decoding tables. */
{
literal/length
(variable length, depending on Literal/Length alphabet);
data (of literal bytes if literal < 256);
distance (variable length if literal == 257..285, depending on
Distance alphabet);
}* // can be of multiple instances
literal (literal value 256, which means end of block);
11 - reserved (error)
Note that data elements are packed into bytes starting from
Least-Significant Bit (LSB) to Most-Significant Bit (MSB), while
Huffman codes are packed starting with MSB.
Also note that literal value 286-287 and
distance codes 30-31 will never actually occur.
With the above data structure in mind and RFC 1951 by hand,
it is not too hard to understand inflate_block().
Refer to related paragraphs in RFC 1951 for Huffman coding and
alphabet table generation.
For more details, refer to linux/lib/inflate.c,
gzip source code (many in-line comments) and related reference materials.