[AS3] Fast, full-screen, touch-every-pixel updating.
After experimenting with doom-style FPS rendering, I wanted to share some results that might help other people speed up their games/visual demos that require a touch-every-pixel approach. After seeing Max's doom clone posted here (http://blog.brokenfunction.com/2007/...ash-plays-doom) I was curious how he was hitting such high frame rates. I had fast ray-tracing codes that could COMPUTE the pixel color to write at a screen position very quickly, but actually doing a single pixel WRITE to screen wasn't very fast. I was using a bitmapData as a screen buffer and poking into it with bitmapData.setPixel32().
So I got in touch with him and picked his brain about his code (turns out, we live in the same city and attend the same university - small world). He mentioned that he was also using pixel-by-pixel poking, but that there was a paletteMap operation involved, and some other optimizations that he wasn't very specific about. Running with his idea, I came up with the following arrangement - I don't know if it's exactly what he's doing but it's certainly much faster than what I had before. Here's what I did:
For a bitmap screen of size X by Y, first define a ByteArray that's big enough to hold 32 bits for each pixel (so it's .length is X*Y*4) - this is the exact size that bitmapData.setPixels() needs to fill every pixel of the bitmap. The next logical step would be to call byteArray.writeUnsignedInt() for each pixel (so X*Y calls total). But it turns out that writeUnsignedInt() is slow. (Which, as an aside, makes very little sense to me - it's a 32 bit machine so writing one 32 bit value should be an atomic operation - but who knows what the flash VM really does inside there. It's probably got endianess-safety in there so it can't use the native machines store-word instruction)
On the other hand, the ByteArray's access operator, [], is pretty speedy. The catch is that you can only write single bytes (0-255 values). If you want to push all 4 bytes of a color in, you'll need four calls ( bytes[i++]=alpha; bytes[i++]=red; bytes[i++]=green, etc). This will tank the performance, back to the speed of writeUnsignedInt(). The workaround, which I think Max does too, is to give up on 32 bit true color and instead just use palette-style color indexing. This impacts image quality - but you can still make decent looking screens with good choices of palette colors. After all, wolf3d did it, doom did it, and quake 1 did it - and those games look pretty good side-by-side with current day flash games. Besides, 32 bit color is more addressable colors than addressable pixels on the screen, so in some sense, you're always looking at a palettized image (i.e. the colorset contained in the image is a subset of colorset by the display hardware).
To implement this idea, write into the ByteArray using the access operator (bytes[i]=something) , but increment i by 4 for every consecutive write so that you keep poking into different pixels of the same color plane. I chose to write into the alpha plane, which is rasterized into bytes[0], bytes[4], bytes[8] ... etc. For example, to fill the whole bitmap with flat color, do:
var color : int = 120; // ...something (0..255)
var bytes : ByteArray = new ByteArray;
bytes.length = X*Y*4;
var offset : int = 0;
for (var x : int = 0; x < X; x++)
for (var y : int = 0; y < Y; y++) {
bytes[offset] = color;
offset += 4; }
bitmapData.setPixels(rectangle, bytes); // rectangle is sized for full-screen
This is not the final image yet, these palette indices must be promoted into full fledged 32-bit colors. BitmapData.paletteMap can do this for every pixel. Eerily (well it's not that eerie really), paletteMap is practically instantaneous. Predefine a palette array, whose length is 256 and every entry is a 32 bit uint. Attached is a free sample palette (it's ripped from wolf3d - I like their color choices). Pass that in as the alpha palette map, set the other channels palette maps to null:
bitmapData.paletteMap(bitmapData, rectangle, new Point(0,0) , null, null, null, palette.pal);
.paletteMap will walk over each pixel, read the 0-255 indices stored in the alpha channel, and overwrite all 4 channels (ARGB) with the full color values found in the palette array.
To put my money where my mouth is, here are two versions of a doom-clone. One uses setPixel32 and the other uses byteArray / paletteMap. Their raytracing methods are identical (equally optimized), the only differences are in the way pixels get onto the screen. On my PC, (2) is about twice as fast. Actually, I'd like to know how it runs/improves on other machines so leave a post if you like. The maps look different because they use totally different texture sets - I pulled the latter from wolf3d so that they would be renderable by the chosen palette. The doom-dudes hanging around are palette best-fits, so their color looks a teensy bit off (doom/wolf3d use different palettes, dooms has fewer basic colors but more shades of each one, to support it's dynamic lighting capabilities).
Just one more time for summary/emphasis, for fast touch-every-pixel updating: (1) make a full screen sized X*Y*4 ByteArray, (2) but only write into one of it's channels using the [] access operator, where you skip 4 bytes per pixel-write, (3) call paletteMap, which promotes those 0-255 ranged palette indices into 32 bit colors (0xFFFFFFFF).
I have a few more things to say about the subject of texture mapping, coming up...