Click to See Complete Forum and Search --> : [AS3] Fast, full-screen, touch-every-pixel updating.

08-10-2007, 03:55 PM
After experimenting with doom-style FPS rendering, I wanted to share some results that might help other people speed up their games/visual demos that require a touch-every-pixel approach. After seeing Max's doom clone posted here (http://blog.brokenfunction.com/2007/07/20/flash-plays-doom) I was curious how he was hitting such high frame rates. I had fast ray-tracing codes that could COMPUTE the pixel color to write at a screen position very quickly, but actually doing a single pixel WRITE to screen wasn't very fast. I was using a bitmapData as a screen buffer and poking into it with bitmapData.setPixel32().

So I got in touch with him and picked his brain about his code (turns out, we live in the same city and attend the same university - small world). He mentioned that he was also using pixel-by-pixel poking, but that there was a paletteMap operation involved, and some other optimizations that he wasn't very specific about. Running with his idea, I came up with the following arrangement - I don't know if it's exactly what he's doing but it's certainly much faster than what I had before. Here's what I did:

For a bitmap screen of size X by Y, first define a ByteArray that's big enough to hold 32 bits for each pixel (so it's .length is X*Y*4) - this is the exact size that bitmapData.setPixels() needs to fill every pixel of the bitmap. The next logical step would be to call byteArray.writeUnsignedInt() for each pixel (so X*Y calls total). But it turns out that writeUnsignedInt() is slow. (Which, as an aside, makes very little sense to me - it's a 32 bit machine so writing one 32 bit value should be an atomic operation - but who knows what the flash VM really does inside there. It's probably got endianess-safety in there so it can't use the native machines store-word instruction)

On the other hand, the ByteArray's access operator, [], is pretty speedy. The catch is that you can only write single bytes (0-255 values). If you want to push all 4 bytes of a color in, you'll need four calls ( bytes[i++]=alpha; bytes[i++]=red; bytes[i++]=green, etc). This will tank the performance, back to the speed of writeUnsignedInt(). The workaround, which I think Max does too, is to give up on 32 bit true color and instead just use palette-style color indexing. This impacts image quality - but you can still make decent looking screens with good choices of palette colors. After all, wolf3d did it, doom did it, and quake 1 did it - and those games look pretty good side-by-side with current day flash games. Besides, 32 bit color is more addressable colors than addressable pixels on the screen, so in some sense, you're always looking at a palettized image (i.e. the colorset contained in the image is a subset of colorset by the display hardware).

To implement this idea, write into the ByteArray using the access operator (bytes[i]=something) , but increment i by 4 for every consecutive write so that you keep poking into different pixels of the same color plane. I chose to write into the alpha plane, which is rasterized into bytes[0], bytes[4], bytes[8] ... etc. For example, to fill the whole bitmap with flat color, do:

var color : int = 120; // ...something (0..255)
var bytes : ByteArray = new ByteArray;
bytes.length = X*Y*4;
var offset : int = 0;

for (var x : int = 0; x < X; x++)
for (var y : int = 0; y < Y; y++) {
bytes[offset] = color;
offset += 4; }
bitmapData.setPixels(rectangle, bytes); // rectangle is sized for full-screen

This is not the final image yet, these palette indices must be promoted into full fledged 32-bit colors. BitmapData.paletteMap can do this for every pixel. Eerily (well it's not that eerie really), paletteMap is practically instantaneous. Predefine a palette array, whose length is 256 and every entry is a 32 bit uint. Attached is a free sample palette (it's ripped from wolf3d - I like their color choices). Pass that in as the alpha palette map, set the other channels palette maps to null:

bitmapData.paletteMap(bitmapData, rectangle, new Point(0,0) , null, null, null, palette.pal);

.paletteMap will walk over each pixel, read the 0-255 indices stored in the alpha channel, and overwrite all 4 channels (ARGB) with the full color values found in the palette array.

To put my money where my mouth is, here are two versions of a doom-clone. One uses setPixel32 and the other uses byteArray / paletteMap. Their raytracing methods are identical (equally optimized), the only differences are in the way pixels get onto the screen. On my PC, (2) is about twice as fast. Actually, I'd like to know how it runs/improves on other machines so leave a post if you like. The maps look different because they use totally different texture sets - I pulled the latter from wolf3d so that they would be renderable by the chosen palette. The doom-dudes hanging around are palette best-fits, so their color looks a teensy bit off (doom/wolf3d use different palettes, dooms has fewer basic colors but more shades of each one, to support it's dynamic lighting capabilities).

Just one more time for summary/emphasis, for fast touch-every-pixel updating: (1) make a full screen sized X*Y*4 ByteArray, (2) but only write into one of it's channels using the [] access operator, where you skip 4 bytes per pixel-write, (3) call paletteMap, which promotes those 0-255 ranged palette indices into 32 bit colors (0xFFFFFFFF).

I have a few more things to say about the subject of texture mapping, coming up...

08-10-2007, 04:13 PM
I have a few more things to say about the subject of texture mapping, coming up...bring it on :D - will have an in depth look at the details - perhaps pixel loops arenīt that bad

08-10-2007, 04:35 PM
... and here's a followup post about how to speed up bitmap-to-bitmap texturemapping a bit. The basic loop for a texture mapping a vertical span of pixels is:

x, the screen x where the strip is being drawn
y_start, where the vertical strip begins in screen space
y_end, where the vertical strip begins in screen space
u, the texture "x" coordinate to draw
(which is constant given dooms view restrictions)
v_start, the texel "y" coordinate that should be written at (x, y_start)
dv_dy, if i step on pixel, this is how many texels i should advance
Vmax, the height of the texture

v = v_start;
for (y = y_start; y < y_end; y++)
screen.setPixel32 (x,y , texture.getPixel32(u,v) );
v = (v + dv_dy) % Vmax;

The last post was about eliminating the setPixel32 operation, but getPixel32 is equally slow. The first idea is just to stuff the bitmap into an array or bytearray and read pixels using the [] operator, but you can actually do it faster than this. You don't need the random access property of [], consecutive texture accesses typically identical or adjacent. The trick is to model each column of texture with a circular linked list. At the beginning of the span, do a (slow - linear walk) seek operation to fetch a list pointer pointing at v_start. Inside the loop, watch the v coordinate, and everytime it steps past a whole pixel, advance the list pointer one texel. This is just an assignment operation, and cheaper than an array derefence (at least, according to senocular - I took his/er word for it). The %Vmax operation pops out for free because of the circular links.

Similar ideas can be used to render the ceilings and floors, however the walk is along x in screen space. It's also more complicated because both u and v change as you increment x (and the signs of du_dx and dv_dx can be arbitrary). So instead of a singly-linked list, the structure needs to be doubly linked and a "linked-matrix". Each texel will hold pointers to all four neighboring texels - +u, -u, +v, -v, and all four edges of the texture wrap around circularly, here's a code pasting:

public class Texel
// A node of the Texture graph structure.
public var color8bit : int;
public var next_y : Texel;
public var prev_y : Texel;
public var next_x : Texel;
public var prev_x : Texel;
public function Texel (color : uint)
color8bit = color & 0xFF;
next_y = null;
prev_y = null;
next_x = null;
prev_x = null;

All in all, this bloats your texture data considerably. A 64x64 texture should be 4096 bytes. The first hit is that an int has to be used to represent the 0-255 value because AS3 doesn't have a byte type. That puts it up to 16384 bytes. Then every texel gets four pointers tacked onto it, bloating to 81920 bytes (atleast... iirc references are 32 bits in flash but I am not certain). It is my impression that this kind of flash app will be more CPU bound than memory hungry, so I think it's a fair trade.

08-10-2007, 04:37 PM
Nice writeup, even I understood that! Interesting tradeoff between speed and exact colors.

Anyone know why "writeUnsignedInt() " is slow? It seems like the best solution is just out of reach.

08-10-2007, 06:08 PM
Brilliant! I've been wondering a lot how they make these things so fast, particularly some of the awesome demos at flashscene. Thanks!

08-10-2007, 06:31 PM
Excellent post, and perfect timing ( I've been reading up on ray casting and I've been itching to dive in and play ).


08-10-2007, 10:22 PM
fantastic, great post.

Goes to show, asking for something really works.

vote 1 for knowledge base.

09-05-2007, 02:24 PM
After some more work on this doom-clone, I'd like to share more progress... this thread seemed like as good of a place as any.

Although 256 color palette images that the last code generated looked pretty good, I found a decent way to extend this into "almost true-color" quality. The trick is to add an additional layer that acts as a lighting map. It's another 8-bit (0-255) byteArray image that is generated alongside the regular framebuffer. Every time a pixel is written an (x,y) position in the framebuffer, a lightmap-pixel is written into the lighting map at the same location(x,y) There's basically no work spent computing a new offset, it's always adjacent to the framebuffer pixel.

Furthermore, the doom rasterization algorithm always writes a bunch of pixels with the same projection distance (z-value?) at the same time. That is, columns of consecutive pixels are filled with wall texture are inside a single function call, and rows of consecutive pixels are filled floor texture are drawn inside a single function call. In both of these cases, these pixels are all at the same projection distance from the camera. So if you make your lighting model only depend upon projection distance, you don't need per-pixel recomputation of lighting level, because it's constant for the whole span. This is how doom does it's "darkening" or "black-fogging" lighting model. I think a lot of other 1993-1998 era FPS games worked like this too - far away objects are black shifted, as if the player himself is the source of the light. The only expense to this model is one additional byteArray[] write per pixel, which hurts performance, but you're getting a lot of visual appeal for that smallish cost.

Now, promoting palette images to true colors images is more complicated, but it's still the same idea:
1a. Write palette indices into the blue channel of a byteArray.
1b. Write darkness indices (255 = near and bright, 0 = far and dark) into the green channel of the same byteArray.
These two rendering steps (1a/1b) occur concurrently. Every pixel write into the blue channel is accompanied by a lightmap write into green channel - 1 byte away, right next door.
2. Like last time, write into a bitmapData using setPixels. This copies both channels concurrently - so we now have some funky blue/green image (note the byteArray should have it's alpha channel seeded with 255's when it is created - otherwise the alpha-multiplication semantics built into bitmapData will zero out your precious green/blue data as soon as your write it in).
3. Now a paletteMap operation takes place:
Use the blueArray argument to assign the red/green/blue channels of the output image (this is exactly what we did before, except we were using the alpha channel as the source). The greenArray should be used to generate a new alpha channel. Each 32 bit entry of the greenArray has 8-bits of alpha data following by 24 bits of zeros - so it doesn't modulate color, just alpha. Note this can be all be done in one call! The redArray and alphaArray arguments are zero.

And finally, some flash-specific cleverness..
4. Place the finished image on top of a black backdrop (setting the stage default color to black will do just fine). Our completed image is partially transparent, so some pixels are see-through to the black behind them, and get mixed together with it. Closeup pixels are nearly opaque so they don't get much black mixed into them and appear brighter. Further pixels are almost completely transparent and are heavily fogged/blacked. Voila! - distance based lighting model.

For a sample, here's a downloadable (zipped) .swf that demos these features and the current state of the game. Some artwork has been pulled to keep the filesize low. There are also some other features added in here and there:

* Better clipping of things against map geometry (last submission was pretty wonky although you might not have noticed)
* Crude particle system for wall and player shrapnel. Sort of like the blood droplets from doom - but the target audience for this game is not into blood so... it's shrapnel! Yeah, that's the ticket. shrapnel.
* Beam weapons (lasers) - work like dooms hitscan weapons but they leave a tracer for 1 frame.
* Sector based lighting - similar to dooms lighting abilities.
* Player avatar artwork- finally some stuff that wasn't just jacked from other games! (That's why's it's uglies)

Walk around with arrow keys, + shoot with spacebar. (you can walk the big green dude around with ASDF+G - but his guns don't pop out of the right spots yet).

Feedback welcome! A lot of gameplay is still up in the air, but I am shooting (har) for a game that plays mostly like 4-player coop doom (run, shoot enemies, open doors, ride elevators, snatch repair item) but has some additional sim-like elements (power management between weapons/shields/engines like X-Wing, damage taken affects mech performance like Privateer 1, and a garage or upgrade system where you can repair, purchase new weapons, or mech chassis between maps).

09-05-2007, 02:51 PM
ok my first feedback,-
canīt you upload those swfīs,- its somewhat irritating to first fownload the zip, unzip it and then drag the swf in the browser.

nice update, though still to heavy for my old computer. Wouldnīt it be nice to en/disable some viewports or setting the amount of viewports?
If the map size and the parsing time are within acceptable values this engine could have lots of potential a series of games either single player or multiplayer via a server.
Hr hr wich makes me think of that recent anouncement of id software that they were to release a quake multiplayer game for the browser. We already saw that dome clone,- but what about a duke3d, blood or hexen inspired game :D

09-06-2007, 09:52 PM
Here's a single viewport version for the budget minded. :)
It's not quite identical, all the things are cloaked like doom's spectre monster. So you won't be able to enjoy the animations very much, they're just shimmering blobs.

I can't get .swf's to upload in the attachment dialog... says it's an invalid file type. (Something I am doing wrong?)

EDIT: heh 1337 kb. And what's up with 5k+ views on the other? That can't be right.