Tuesday 29 May 2007

Smart tricks, part 1: A bootable message

Some time ago I had been wandering how does a boot loader work. Why is it booted and what does it do. Well, I've made a research and what I found out is awesome. A boot loader works in unrestricted environment, that is it can on the contrary to a program running under Linux or Windows access all BIOS features. Unrestricted. It is of course much harder to write boot loader programs because such a restricted environment doesn't provide access to all libraries we know, not even such basic things like libc. But nevertheless I wanted to try it to write some small program to run at the OS level.

My idea was to try to create a trick program. The trick was to be to write message on the screen (or rather print) and then just stop, but leaving the message on. I googled much for help, I also read much code from Linux kernel x86 boot loader and from Lilo. What I found out is that such a program runs in an not initialized environment, it means all it's register point to hell knows what, not some reasonable RAM addresses. Only properly set register is the %eip or rather %ip for such programs start in 16-bit mode! I really tried long to create some smart Assembler program. But it just wouldn't work, although I already knew how print characters. And you can't even debug your program at this stage, for obvious reasons. Now I know the reasons. How I came around the problem I'd explain afterwards. For now I owe you an explanation why it didn't work. Well, as I got to know as I started reading "Understanding the Linux Kernel" by Daniel P. Bovet and Marco Cesati, I got to know that memory addresses you play with at any level are not the addresses that are actually accessed by the processor. They are being further translated into physical addresses according to the information you provide in some special registers. It's why it didn't work, because I didn't initialize those registers and just random data was read.

Now, the problem was, I didn't exactly know with what values to initialize my registers. Nor do I know today, but I would probably be more able to try some combinations after reading few chapters of the mentioned book. So, what I thought would be a solution was not to read some data and than process it but to hard code all the data into my executable (because as mentioned instruction pointer was the only one working right without much effort). It would of course mean much work and any changes I'd like to make in the message I wanted to display wound be just almost impossibly hard to make. But now comes why I labeled this post "python". I decided to automate the process a bit by writing a Python script that would create an Assembler file for the given data. Here it is, "bootgen.py":
from sys import argv, exit

def printc(c):
print " movw $0x%x,%%ax" % (0x0e00+c)
print " movw $0x7,%bx"
print " int $0x10"

if len(argv) < 2:
exit()

f=open(argv[1])

print ".code16"
print ".text"
print ".global _start"
print "_start:"
print " sti"
print " cld"
print " movw $0x1202,%ax" # 80x25
print " movb $0x30,%bl"
print " int $0x10"

for c in f.read():
c = ord(c)
if c == ord('\n'):
printc(13)
printc(10)
printc(0)
else:
printc(c)
print "loop:"
print " jmp loop"
Here I believe I should explain a bit. I'd leave the explanation of Python grammar, it's a topic where I assume some knowledge. As you see this script outputs an Assembler file ready to be compiled and run! What does the printc function does? Well, it prints the character onto the screen. Interrupt $0x10 is a BIOS interrupt that invokes graphic card. Why do I add 0e bit the character value? Well, it a modifier that tells your graphic card to print the character in normal mode, without any colors or effects. Yes, you're right - by playing with it one can have those effects become true! Further headers are being printed. I'd leave the explanation , I've copied them myself from Lilo and Linux, just believe me they should be there. Now, the loop is the most important part of the script. It reads a given input file character for character and outputs the equivalent Assembler code to print it onto the screen. In BIOS mode you need to have new lines with \r\n and in Unix they are indicated with \n so I add some characters there. Last two lines are just the endless loop. To create an Assembler script just prepare your text message and run ./bootgen.py text.txt > boot.s. You can compile it now. As it's AT&T syntax you must use compiler. as -o boot.o boot.s will do. Now the linking command is a bit more complicated: ld boot.o -o boot.bin --oformat binary -Ttext 1000.

Now, the boot.bin is a ready bootable executable. The only thing one needs to do now is to place it somewhere where it can get booted from. What I think is funny is to create a bootalbe CD with this file. Once because it's very easy to repair but not obvious enough to be lame ;). You can of course overwrite the MBR it will be very cruel and besides, with my technique the final executables are quite big (it's more efficient to initialize your process properly and than process a string). If you decide though to overwrite MBR mind not to overwrite whole 512 bites, because in the last 150 or so your partition tables are stored.

I hope I was helpful and gave you some idea to play around the other way. Now I'd have one more piece of code for you, it's a shell script that automates a script of CD image creation.
#!/bin/sh
/bin/cp $1 `dirname $0`
cd `dirname $0`
/bin/mkdir $2
/bin/mkdir $2/boot.catalog
python bootgen.py `basename $1` > $2/boot.s
as -o $2/boot.o $2/boot.s
ld $2/boot.o -o $2/boot.bin --oformat binary -Ttext 1000
/bin/cp `basename $1` $2/
/bin/cp `basename $0` $2/
/bin/cp bootgen.py $2/
/bin/echo $0 $* > $2/gencmd
mkisofs -b boot.bin -c boot.catalog -no-emul-boot -o $2.iso $2/
cp $2.iso $2/
mkisofs -b boot.bin -c boot.catalog -no-emul-boot -o $2.iso $2/
/bin/rm -rf $2/

2 comments:

LEW21 said...

Stanowczo za długo uczyłeś się angielskiego.

Cytat:
Some time ago I had been wandering how does a boot loader work

Jak by nie można było napisać was thinking. :P

AdamW said...

Heh, być może. Tym bardziej, że jak teraz o tym myślę to wydaje mi się, że was thinking było by chyba nawet bardziej poprawne