February 10th, 2009 by nkeynes
Timing and update
Posted in Development

I’ve had some high-level performance numbers kicking around for a while – give or take a few percent they’re fairly consistent, at least for the work-loads that I’ve been profiling. So, expressed as seconds of real time per second of emulated time, the runtime looks something like this on my machine with current svn head (0.9 was quite a lot slower):

  • 695 ms – 3979 ms Rendering and display
  • 264 ms SH4 Translated code
  • 76 ms SH4 mem read/write
  • 55 ms SH4 support code
  • 64 ms AICA/ARM
  • 57 ms Tile accelerator
  • 45 ms Render Scene parsing
  • 9 ms Miscellaneous

SH4 total: 395 ms, All non-rendering: 570 ms, Grand total: 1.265s/4.549s. The two rendering numbers are for 2 otherwise fairly similar machines, one with a 9800GTX and the other with an 8600M GT (I’m sure you can guess which one is which). The times are also with TLB off (which still has an appreciable although lesser impact than previously)

By way of comparison, I’d originally budgeted for CPU along the following lines (although I was also targeting a much slower machine than I have now as well):

  • 50% SH4 emulation
  • 25% rendering
  • 25% everything else

Rendering performance is quite obviously atrocious at the moment and needs a lot of work, which may or may not make it into 0.9.1 at this stage. At the moment though I’m still focusing on SH4 performance – while we’re technically under budget already, one has to bear in mind that a) the SH4 is currently underclocked by a good bit, b) there’s a few things I will need to add for accuracy that are likely to hurt performance (possibly severely in some cases), and c) Given the rendering problems I’m now aiming for 50% rendering / 35% SH4 / 15% everything else, as this may be a more achievable target.

In the meantime, the new multi-pass translator is slowly taking shape, having gone through a few fairly major revisions by now. I’ve ended up with a fairly simple low-level 2-op IR that (mostly)?maps directly to the x86[0], plus a few ‘macro ops’ to express the handful of SH4 instructions that don’t have a simple representation on x86. I’m currently working on the x86 codegen backend, which should be ‘done’ sometime this week, so the whole caboodle may actually be fully working by the end of the month after all.

[0] Having said that, it’s not at all x86-specific, and it should be a heck of a lot easier to add other targets than with the old codebase. You know, should anyone actually want to do that…

Leave a Reply

You must be logged in to post a comment.