January 14th, 2009 by nkeynes
Memory system rewrite
Posted in Development

The memory system rewrite is merged now – there are a few things I’m not completely happy with yet, and the old page_map isn’t quite gone completely, but on the whole it’s simpler, faster, and much more consistent. More importantly perhaps, UTLB translation is now _very_ cheap (3-instruction overhead[0] for OSes using the typical 4K page) – linux now boots and runs at full speed on my systems. There’s probably a few lingering issues and I’m still working on a good test suite for it[1], but most bugs are likely to be in things that never worked before anyway.

I also have some work-in-progress on the operand cache (nominally the original reason I started doing the rewrite…), but it’s still showing a bit more of a performance hit than I would like (10-15%). So currently I’m thinking this will probably wait for the next version before being fully integrated and finished. It does need to be done eventually though for correctness reasons, since the SH4 doesn’t ensure cache-coherency in hardware.

In any case, once the MMU tests are done I’ll get back on the translator upgrade. It’s looking at this stage like 0.9.1 will end up being almost purely a performance release, but since it should be at least twice as fast overall as 0.9, no one is really going to complain about that, right? ^_^

[0] We might be able to special case sdram access and get that case down to 0 instructions, but leaving that aside until after the op-cache is done…
[1] Annoyingly enough, there doesn’t seem to be a good way to recover from TLB multi-hit resets on the DC, which makes it a little hard to test that aspect of things… Even more annoyingly, the DC BIOS _does_ vector manual resets through 0x8c000018, but not any other reset.

2 Responses to “Memory system rewrite”

  1. Lahr says:

    Awesome. You can NEVER go wrong with a performance update.

  2. Gondos says:

    Hi Nathan,
    Good to see some steady progress on your project, I know it’s hard to keep focused; I sadly gave up working on the DC emulator I was toying with.
    If you want some tips for your dynrec, I came up with a 1-pass DRC that gave pretty decent code. Given the instruction set of the SH4, constant propagation is a great addition (and not too difficult to add) to the register allocator and can really reduce the size of the generated code (even further if you consider “MOV.x @(disp, PC), Rn” instructions to be generating constants). Also lazy flushing the T bit can virtually remove the need for setx operations in a basic block and allows for some sweet “cmp reg1,reg2/jx” combos (especially if you allow your basic blocks to continue through branches not taken).
    Well, that was just my 2 cents; I’m sure you already have nice plans for your dynamic recompiler :)
    Keep up the good work,

Leave a Reply

You must be logged in to post a comment.