Archive for October, 2008

October 31st, 2008 by nkeynes
Core Optimization
Posted in Development

I’ve been working on “core” optimization this week (which for this purpose is everything except rendering and I/O) – aiming to get the test loads down to around 40% CPU usage on my machine (leaving the rest for rendering). By comparison 0.9 is runing at 80-100%, and that’s with the SH4 underclocked.

Compiler optimizations:

  1. Turn on -fomit-frame-pointer (for 32-bit builds). I’ve been wanting to do this for a while, but it had the slight problem of completely breaking exception handling. Fortunately there is a solution: build with -fexceptions (or one of the other flags that emit eh_frame sections) and use _Unwind_Backtrace instead of manual frame-pointer chasing.
  2. Enable SSE2 math for i386-linux (already enabled on all the other platforms)
  3. Convert the functions called from the translator to use register-passing calling conventions (regparm) – this is a decent 5-6% improvement (Note that these three all apply only to 32-bit code – the 64-bit ABI already behaves this way by default)
  4. OS X: Disable PIC code generation (I now discover that for some ineffable reason Apple enable it by default, unlike most platforms) – this is a about a 12% speedup by itself, which pretty much brings it back to par with the Linux version. If I’d known about this earlier…

Translated code generation:

  1. Remove all the ugly generated fpscr check/branch for the different FPU modes, and just check it at the start of the translation block – if it’s different from last time, flush and retranslate.  Small win (about 3-4%) on FP code. (This was suggested a long time ago by dknute but I hadn’t gotten around to doing it until now).
  2. Implement SSE3 versions of FIPR and FTRV – the latter gives us a 4.5% improvement overall on typical rendering tests (eg 1-2% FTRVs) – that’s pretty good for tweaking one instruction.
  3. Optimize the store-queue write path a little bit (used fairly heavily by most apps)

I’ve also added a couple of new configure options: –disable-optimized turns off all optimizations and compiles with -g3, and –enable-profiled does (surprisingly enough) a profiling build.

Results after all of the above (on one particular test load): 32-bit OS X: 36% faster; 32-bit Linux: 21% faster; 64-bit Linux: 12% faster. 32-bit + 64-bit versions are now performing almost identically, with the 64-bit build just a hair in front.

Of course, this doesn’t directly translate into equivalently better frame-rates as we’re more limited by render performance than core speed at the moment, but every little bit still helps.

October 25th, 2008 by nkeynes
Lxdream 0.9 “Shiny” Released
Posted in Releases

Go get it now on the download page. It’s looking very nice.

This is the first version where we can say that most software should “just work”, outside of a small number of known issues – so please report any other problems you encounter. Note however that the focus of 0.9 has been on accuracy – performance has not substantially changed from earlier versions. That will be the main aim of the work for 0.9.1 “Speedy”, along with timing precision and a few other things.

What’s new

  • Improved accuracy + compatibility (aka many bugfixes)
  • Shadow volumes, render-to-texture, fogging
  • Light-gun support

More details in the release notes

October 24th, 2008 by nkeynes
Week of bugs
Posted in Development

I’ve been tidying up a number of little issues that have been hanging around – nothing major but it’s good to get them out of the way. So this should be about it for the release now unless something release-critical turns up.

Changes

  • Fix assorted minor compile warnings
  • Fix save-state compatibility between 32-bit and 64-bit platforms
  • Fix save-state loading in headless mode
  • Fix make distcheck
  • Increase ALSA start buffer size (sounds much less choppy now)
  • SH4: Fix yet another flag-clobbering case. That should be all of them now *fingers crossed*
  • PVR2: Fix texcache reset breaking data structure invariants
  • PVR2: Fix FBO reuse when using more than 2 buffer sizes (crash)
  • GUI: Display an error message when unable to run rather than just disabling the button
  • GTK: Remove annoying error messages when loading save-state previews
  • OSX: Add preferences toolbar item to main window
  • OSX: Fix changes to path properties not taking effect until restart
October 16th, 2008 by nkeynes
Feature-freeze for 0.9
Posted in Development

The triangle sort improvements are in now (that being the most requested fix in the poll), which has made a big difference to many scenes. It’s still not 100% correct (and the 100% correct version needs a lot more time than I have right now), but the situations it gets wrong tend not to crop up so often in real scenes. From this point I’m just tracking down a few rendering glitches and related bugs which may or may not be fixed in time for release.

The bad news is that 0.9 will not have complete or perfect PVR2 emulation as originally envisioned. The good news is that most of the features that are still missing now are used fairly rarely, and the overall rendering quality is a huge jump over 0.8.*. And we’ll only continue to improve the quality and performance in following releases, of course.

What didn’t make the cut:

  • Accumulation buffer + bump mapping (although the error dialogs have been killed)
  • Shadow volumes for translucent polygons
  • Strip buffers

Changes

  • PVR2: Render-to-texture actually works now, with some internal simplifications
  • PVR2: Made triangle sorting algorithm approximate correctness a little more closely.
  • SH4: Fixed a few places where the T flag was accidentally being clobbered
  • OSX: Add the default config file into the bundle
October 5th, 2008 by nkeynes
Shadow volumes are in
Posted in Development

for opaque polygons, and they’re looking quite nice. And it didn’t even have as big a performance hit as I had expected. Translucent shadows need to be dealt with separately, alongside the translucent poly sorting (and will probably be quite a bit more expensive, but then again they’re more expensive on the actual PVR2 as well).

Other than that, Real Life(tm) has been quite busy recently, so progress has been a little slower than one might like, but we’re still on track for 0.9 at the moment.

Changes

  • Add opaque shadow volumes
  • Fix some punchout ordering problems
  • Add EXT_packed_depth_stencil support