LFB Emulation and LFBemu


                           Introduction

Bored being not able to play some  games  because  they  require
a so-called Linear Frame Buffer (LFB)?  Or  wanna  forget  about
bank switching in your program which uses VESA  graphics  modes?
The information on how to get this LFB working is  contained  in
this document and the LFBemu program which actually  provides  a
full source code and may be very helpful.
Basically, this information is for a programmer. And a  provided
program is not a magic one that enables LFB  disabled  for  some
stupid reason or by mistake on the manufacturer's factory. Don't
think that it's just enough to run this program and  after  that
play your  games  requiring  LFB.  It  doesn't  work  this  way,
although it's really possible to make such a program out of  all
this stuff and this is a desireable thing.


                    Download vesrsion 2.2 now!


                           Assumptions...

The author assumes that the reader (e.g.  you)  does  know  what
VESA/VBE stands for, what Linear Frame Buffer (LFB) stands  for,
what "bank switching" is and (s)he is advanced in Protected Mode
programming for i386+ CPUs.


                            The Idea...

By means of page translation  we  can  emulate  such  things  as
Linear Frame Buffer on video  cards  which  don't  have  such  a
feature.

Suppose we have set up a VESA graphics mode 640x480x8bpp (#101h)
and our program runs after that in 32-bit Protected Mode.

Let's reserve as much linear space as required to fit entire the
screen (let's say) of resolution 640x480x8bpp starting at  (say)
linear address of 2MB.

Let's map each page belonging to this linear space  region  onto
the standard VGA buffer  (located  in  the  region  of  physical
addresses 0A0000h through 0BFFFFh) so that

 1st page points to the beginning of the buffer (e.g. addr
0A0000h)
 2nd page points to the beginning + 4KB
 3rd page points to the beginning + 8KB
 4th             ...              + 12KB
 ...
16th             ...              + 60KB

17th page points to the beginning of the buffer (e.g. addr
0A0000h)
18th page points to the beginning + 4KB
19th page points to the beginning + 8KB
20th             ...              + 12KB
 ...
32th             ...              + 60KB

33th page points to the beginning of the buffer (e.g. addr
0A0000h)
 ...
and so on

Let's mark pages 1...32 as present and read/write and  the  rest
of pages as not present.

Let's see what happens if we try to access a byte in the  region
of 2MB...2MB+64KB.  Nothing  interesting  really  happens  -  we
either read data from VGA buffer or write it to the  buffer  and
see changes on the screen.

But what if we try to access a byte at the address  of  2MB+64KB
or 2MB+65KB or whatever outside the linear  space  made  out  of
present pages? Well, we will cause a Page Fault Exception. "What
a pity!" you would say... "It's not gonna help us, a  ridiculous
exception. What does it have to do with all this stuff, huh?"...

But guess what? It's something we really desire! If  we  make  a
Page Fault Handler routine we can get a linear  address  of  the
byte we tried to access but failed due to a  not  present  page.
Remember, there is such a register CR2? Yep, we get that address
from this register upon the exception.

What's next? It's simple... Having this address we can find  out
a 64KB-size block withing which (block 1 = pages 1...32, block 2
= pages 33..64, etc) the byte we've tried to access resides.

We then mark all pages of this block as present ones,  mark  all
pages in other blocks as not present, switch a bank (YEAH,  here
we can switch bank from #1 to #2 because the  byte  we  want  to
access is in the bank  #2)  and  perform  the  interrupt  return
(IRETD) from the page fault handler.

The program now  continues  the  instruction  which  caused  the
exception and normally completes reading or writing from/to that
byte because the page is now present. (Basically,  this  is  how
virtual memory and swapping works - one can extend physical  RAM
size by means of free space on the HDD).

Since we switch banks upon the page fault, we may go deeper  and
deeper into the video card RAM...

The nice thing about this design  is  that  a  user  application
doesn't know anything about all those banks, pages,  exceptions.
Everything is processed inside the Page Fault Handler.  And  the
user application just works with the linear space from  2MB  and
up to whatever the limit is (depending on the screen resolution)
and that's it. The user application by itself  doesn't  have  to
worry about anything.


             It's a Piece of Cake, isn't it? :)

Unfortunately, not. :) It turns out that everything  works  just
fine unless we want to read/write a word or a double word  which
parts  reside  in  different  blocks  (e.g.  one  byte  of   the
word/double word is in one block but the rest of the  bytes  are
in the other one). So if we keep our design as  is,  we  end  up
with a infinite loop. Let's see what happens...

If we try to access a word/dword crossing the block boundary, we
may have 2 (and more) exceptions - one for the  byte(s)  in  the
first block and then immediatly after  the  exception  has  been
handled we have the second one corresponding to the  byte(s)  in
the next block.  This  is  because  in  order  to  complete  the
read/write instruction we have to have access to  all  of  bytes
involved.

So, if we mark pages for the 1st  block  as  present  while  all
others as not present upon  1st  exception,  we  have  a  second
exception for the byte(s) in the 2nd block.  We  do  same  thing
here, e.g. we mark pages for the 2nd block as present ones while
all others as not present ones. After we finish with the  second
exception, CPU tries to start the instruction  which  cause  all
these exceptions over again and we have 3rd  exception  (because
now pages of the 1st block are not present) and then 4th and  so
on sitting in a tight loop around a single instruction.


                       It's all Over. :(
                           Is it? :)

The possible approach is  to  use  temporarily  one  extra  page
instead of that 2nd block inside the page  fault  handler...  We
don't map this page onto the VGA buffer, it's just  a  piece  of
RAM.

Basically the problem with infinite loop arises  from  the  fact
that we can not physically have 2 or more banks  being  selected
at the same instant. A video card has  only  one  current  bank.
This is why we always had only one 64KB-block  made  of  present
pages at any given moment.

So we just map this extra page instead of that 2nd block but  we
don't mark pages of previous block as  not  present  ones.  Thus
after  issuing  IRETD,  the  instruction  which   accesses   the
word/dword residing in separate  blocks,  can  be  complete.  Of
course before  mapping  in  that  extra  page  we  should  write
something  into  its  first  dword   because   the   interrupted
instruction has to read the correct information. But then, after
this instruction completes, we have to write changes made to the
extra page back to the  screen  because  we  want  to  see  this
information correct as well.

Also after we finish with this damn instruction which  causes  a
bunch of exceptions, bank switches and data transfers,  we  have
to restore our primary state - e.g. pages of only one of  blocks
are present and the extra page is not mapped.

Now this is a  real  problem...  We  have  to  stop  after  that
instruction and do repair all stuff back. But how???

Scared?  :)  It's  easy  enough.  You  don't  really   have   to
disassembly that instruction on the fly to find out  the  length
of the instruction and then put something to the code segment of
the program right after this instruction. Is  unreal.  What  you
can do is just to modify the EFLAGS.TF flag  of  the  intrrupted
program (which is pushed  onto  the  stack  before  the  control
passes to the code of the page fault handler) so that it becomes
1 after IRETD.

This  bit  (Trap  Flag)  makes  possible  to  debug  a   program
step-by-step, e.g. at the end of each instruction an  Int  1  is
generated  automatically  so  that  you  can  examine  state  of
registers. This mode of working of  CPU  is  called  Single-Step
mode. You may trace a program this way and  I  bet  you've  done
this hundreds of times before. Haven't you? :)

This way, after the instruction finally completes,  we  get  the
Int1 which is passed to its handler. Inside this handler we  can
unmap the extra page, mark pages of the next block  as  present,
all others as not present, clear the EFLAGS.TF bit on the  stack
and issue  IRETD  once  again  now  from  single-step  exception
handler :).

Easy? Not? Why not? the concept is very simple. Well, it's kinda
a lot of work to be done, but it really works... I've managed to
complete this program in a day from scratch. You don't have  to.
:) Learn from my code, understand what's going on and  then  use
this idea on your own if you want.


                   Page Fault Handler Actions

(if index/offset within the current bank is < 3, e.g. a dword is
probably being accessed):

 1. switch to next 64KB bank (w/o remapping)
 2. read a dword from the video buffer (at addr 0a0000h)
 3. write this dword to an extra page  (at  offs  0  within  the
    page)
 4. map the extra page instead of  a  corresponding  non-present
    page of LFB (e.g. 1st page within this next 64KB bank)
 5. switch the 64KB bank back (w/o remapping)
 6. set single-step flag
 7. perform IRETD

             What happens inbetween of two exceptions
                   (page fault and single-step)

 8. an instruction reads/writes data from/to previous 64KB  bank
    and  the  extra  page  thus   it   reads/writes   everything
    correctly. upon instruction completion single-step exception
    occurs.

                  Single-Step Handler Actions

 9. read a dword from the extra page (at offs 0 within the page)
10. write the dword to the video buffer (at addr 0a0000h)
11. switch to next 64KB bank (with remapping)
12. clear the single-step flag
13. perform IRETD


                        What's missing?

This is actually a good question... We can now  either  read  or
write words/dwords that reside in two pages... But is it all  we
need? Funny somehow, but it's not the end  yet.  :)  What  about
such instructions as MOVSB/W/D and XCHG which have to  read  and
write at the same time? For example, if we want to copy a sprite
from one location on the screen to another on the  same  screen,
then this becomes a problem. We can not  dig  inbetween  reading
information and writing information  in  a  single  instruction.
It's atomic - we can not do much about  this.  We  probably  can
allow a lot more exceptions to happen and care about  this  with
those "extra" pages but this becomes a lot of  extra  work.  But
don't worry. What we've done is not too bad at all. Probably  we
can not move data around the screen w/o any extra  buffers,  but
we now don't have any problems with separate reading and writing
from/to the screen. It's a very good achivement.

An addition... LFBemu does emulate thigns like  MOVSx  and  XCHG
now correctly. It uses now up to 3 extra pages. 3 extra pages is
enough to cover simultaneous access to 2 memory  cells  (words / 
dwords / etc) crossing bank boundaries.


                        About the Program

The program requires a few things...

0. A computer. An x86 computer. :)
1. i386+ CPU.
2. Real mode  upon  execution  (e.g.  no  any  crappy  Windows',
   EMM386.EXE's and stuff like  that).  E.g.  real-mode  DOS  is
   needed.
3. Of course a VESA compatible video card (I guess VESA  1.2  or
   better).
4. The VESA BIOS should have VESA Protected Mode Interface  -  a
   set of 32-bit PMode functions such that allows us to select a
   bank still sitting in PMode. Since  version  2.0  of  LFBemu,
   its not a requirement anymore. It would  just  be  better  to
   have this thing because otherwise a v86 task will be used  in
   order to switch banks by real-mode VESA  BIOS.  And  this  is
   much slower.
5. Probably a user. Better both a programmer and a user. :)  I'd
   say it should be a single human being who is both :)  because
   this program doesn't just let to enable  LFB  and  then  play
   existing  games  requiring  LFB.  The  program  is   actually 
   intended to show how to emulate LFB.

Besides that, there is no  reason  to  run  this  program  on  a
computer equipped with a VESA card that does support LFB. :)  It
will work, but it's ridiculous. Isn't it? :)

To run the program just type lfbemu or run it  from  your  shell
program (like Norton Commander). The program contains everything
it needs inside itself. E.g. it doesn't need load any  files  or
run other programs. Just a nice single program. :)


                       The Second Idea...

The idea is simple. To  put  this  code  into  an  existing  DOS
extender (DPMI host). This would allow to run programs requiring
LFB because both DPMI service functions and emulation is done in
a single PMode program.


                  What's New in this Release?

1. LFBemu 2.2 does  emulate  thigns  like  MOVSx  and  XCHG  now
   correctly. It uses now up to 3 extra pages. 3 extra pages  is
   enough to cover simultaneous access to 2 memory cells (words/
   dwords/etc) crossing bank boundaries.
2. Some minor changes made in version 2.1... 
   The program is a bit reorganized so that  the  .COM  file  is
   again very short.
   v86 emulation doesn't call real-mode BIOS int 10h by means of
   the "int" insturction. Instead it uses a far call which saves
   time required for bank switching procedure. It must be  quite
   fast now.
3. Nothing much... In version 2.0 IRQs are  disabled  while  the
   program is in PMode so that if BIOS enables  interrupts,  the
   program doesn't crash.
4. Since version 2.0, a v86 task is used to pass bank  switching
   to the real-mode BIOS. This allows to  emulate  LFB  even  on
   cards that lack VESA Protected Mode Interface.


                       Bugs and Stuff alike

1. The program doesn't  hangs  at  the  point  you  see  a  nice
   picture. It just waits 10 seconds and then terminates.
2. If the program starts making a sound and then freezes up  the
   system then there is probably a bug. More likely in BIOS  but
   who knows -- v86 handling is tough. :)


                     Guarranties, Warranties,
                  Responsibilites and other crap...

You use this information and  program  at  your  own  risk.  The
author can not guarrantee anything and can  not  be  responsible
for any misuse of the information and program as well as for any
harm it can make. The program and infrmation are provided as  is
to the public domain. If you haven't  read  this  --  it's  your
problem, not mine. :)


                            Copyrights

Yep, this is about this weired symbol "(c)" :).  It  means  that
this software and information is an intellectual property and it
has its owner. You may  use  and  distribute  this  package  for
educational and non-comercial purposes w/o any problems for free
(well, shipping cost is up to you :). Omitting  the  information
about the primary author and copyrigths, earning money from this
or anything else similar to such actions is strictly prohibited.
Should you ever have questions or concerns, contact the author.


                       Contact Information

Author  : Alexei A. Frounze
e-mail  : alexfru@chat.ru
homepage: http://alexfru.chat.ru
mirror  : http://members.nbci.com/alexfru/
pmode   : http://welcome.to/pmode/