Trying to design some object pooling

Good tests and good posts 002.

I can't help but still feel pooling is more than worth the effort. In a real world example where you'd use pooling ( Let's say a game with a lot of bullets and particles ) you may not get the maximum benefit straight away, but after a certain point the game will max out in complexity ( ie number of objects needed ) so no more will need to be created, they'll just be pop'ped out of an array and give their init data to start them off.

Sticking with it being more real world, let's take a bullet as an example. At it's most basic it needs an image to plot ( Either a sprite or bitmap data to blit, let's stick with blitting ), some coords to say where it starts, speed / direction, a reference to the playfield bitmap to blit it to and methods to update it's position, dispose of it, possibly have collision checks in there etc.

With doing it as a complete reinstantiation object you'd pass the init details to the constructor and that would then set everything up.

With pooling, you'd have the constructor which would just have one shot info passed to it, such as a reference to the playfield and then a seperate init method to give it it's speed / direction / start point etc.
But in the constructor you'd be able to set up any speed up constants you may need, eg shorter references to Math.sin, Math.cos, PI etc. Anything that only needs setting once ( Even to the point of creating the copyPixel plotting point, eg pos=new Point() and any rectangle objects that are needed, then in the init method you can just set the individual properties of these ).

With this approach, it would take in effect two method calls ( The constructor when you first create an instance, and the init() method ) with data being passed to both, but like I said above the game will soon have enough objects not to need any more, so it will only be the init method being called with less data being passed to it to set things up.

Again, only relatively small savings, but it all adds up, and in terms of code it's really next to nothing to add.

Squize.

Hmmm yea you are probably right Squize...now that I think about it I do actually instantiate a lot more than just a simple "projectile" object.

I actually create two classes. The projectile which extends "MovingEntity"..and inside that I create several "Vector" classes as well as several references etc etc...

Then I create a "BitmapClipRenderer" which uses my BitmapClip class and sets up some bitmap and bitmapData objects to render that to the main screen...so all in all quite a few things are being created just for a projectile...

If I did pooling all id be doing is sending like a "params" object which like you said data about the target, speed, etc... and then the bitmapclip to "copy" which wouldn't create a new bitmapclap but rather it would just swap its array reference to the input bitmapdata array...(whatever)

Anyways with the number of projectiles and particles I am creating like this it probably would make a bit of a difference actually.

I messed around with pooling while implementing doom-a-like, because I was creating a lot of data structures that were built, used and discarded all in the same frame. They were POD types for describing spans of pixels in a bitmap, very much flyweights. I was never impressed with pooling.

I didn't take down any numbers to back it up, but pooling did not seem to have any impact on the time spent on instantiation. The only reason I would take a stab at it again, was (i) if I really wanted to avoid invoking the GC on such a regular basis, or (ii) was working with a very strict memory footprint (mobile device?). None of the games I have made since do any pooling, they just gobble the heap and exploit the GC as god intended. [Commandment #11 ;) ]. I haven't looked back, and at this point I don't think the architectural adjustments (however minor they may be) are worth the effort.

Quake-a-like has data structures with similar single-frame lifetimes, uses no pooling, and outperforms doom-a-like by lightyears. I know it's comparing apples and oranges because they work so differently under the hood, but my opinion is that pooling will never be an optimization that has a measureable impact on your final frame rate, unless your example is contrived to do so.

I tried to think of a good way to make a pooling class that was strongly typed / templated like the C++ STL but never came up with it. I like newblacks solution, so maybe it's worth giving another shot if I ever find myself in a situation (i) or (ii).

Pooling isn't always the best solution, but for certain things ( Such as a lot of on-screen objects like in a shoot'em up ) I think it's better to do it than not.

Squize.

AGHH!! so much conflicting advice! lol

It seems to remain true that pooling is faster, even if only slightly. With the GC being a possible factor in any object or bitmap heavy game, pooling has advantages.

Unless someone posts an example that shows pooling to be slower, I still don't see a valid argument against it.

Well I have been meaning to do some performance testing flash for a while now, and this seems to be the time.

As far as I see it, there are three major variations that can be tested here:
Naive - Creating new instances
Pooling - Reusing the same set of instances
Modified Flyweight - Feeding in primitive data
This time I will use something more substantial as a test. I will be using my IsoTest which extends IsoSprite.

I believe this is a pretty generic representation of a common game object, it has a set of coordinates, draws itself and has an update function among others.

It is by no means a heavyweight, but it is far larger than a point/rectangle.

The code for these two classes follows:

Code:

package Display { import flash.display.BitmapData; import flash.geom.Point; /** * Base class for the update/render architecture. * IsoSprite contains the isometric coordinates for all drawn objects. * * Depending on desired behavior, you may simply extend this class and * add instances of it to the DefaultController, or create your own * controller/factory system. * * @author Ryan Turner */ public class IsoSprite { //========================== // Coordinates ============= //========================== // Coordinates public var x:Number; public var y:Number; public var z:Number; // Inalidation marks the sprite for removal public var valid:Boolean; // Marks the sprite as having its position changed // Will not be depth sorted if false public var sort:Boolean; //========================================== // Constructor ============================= //========================================== /** * Creates an isosprite * @param _x * @param _y * @param _z */ public function IsoSprite(_x:Number, _y:Number, _z:Number) { x = _x; y = _y; z = _z; valid = true; sort = false; } //========================================== // Override ================================ //========================================== /** * Draw this sprite at point p on bitmap data bd. * @param p * @param bd */ public function draw (p:Point, bd:BitmapData):void { } /** * Update the sprite * @param t Time, in ms since the last frame */ public function update (t:Number):void { } /** * Clean up any assets used by the sprite */ public function dispose ():void { valid = false; } //========================================== // Compare ================================= //========================================== /** * Negative 1 if it a should be infront of b. * @param a * @param b * @return */ public static function comp (a:IsoSprite, b:IsoSprite):Number { if (a.x > b.x) return 1; if (a.x < b.x) return -1; if (a.y > b.y) return 1; if (a.y < b.y) return -1; if (a.z > b.z) return 1; if (a.z < b.z) return -1; return 0; } } } package Display.Sprite { import Display.IsoSprite; import flash.display.BitmapData; import flash.geom.Rectangle; import flash.geom.Point; public class IsoTest extends IsoSprite { private var sheet:BitmapData; private var r:Rectangle; public function IsoTest (_x:Number, _y:Number, _z:Number, _sheet:BitmapData, _rectangle:Rectangle) { super (_x, _y, _z); sheet = _sheet; r = _rectangle; } public override function draw(p:Point , bd:BitmapData):void { bd.copyPixels(sheet, r, p); } } }

The basic premise of the test will be similar:
We want an efficient method of drawing short lived sprites that contain simple behavior. In detail:
Object life will be 20-60 frames
Objects keep track of position
Objects draw themselves
Objects have an update function

There is not much to be said about the first test, where new instances are created each time. It will form the baseline as the worst possible method.

The object pool test will be harder to simulate in a for loop as there will be only one instance in use at a time. I do not believe this will impact performance in any way though, as long as I fill the pool with ~400 objects to begin with.
I will be simulating newblacks sample code, which performs a hashtable lookup and a push/pop for each element. Another faster alternative would be a circular array however I don't feel like coding one at the moment and the difference should be negligible.

The flyweight pattern will involve a single class that represents every IsoSprite. Essentially, make every member of IsoSprite an array, and loop through these primitives. The code should make it clear.

Tests:
I don't feel like doing all three in a single file, so first:

Creating new instances
Quite straightforward, 50,000 iterations of creating an IsoTest, drawing it x number of times and then repeating.

Code:

Code:

import Display.Sprite.IsoTest; var display:BitmapData = new BitmapData(800,600); var copy:BitmapData = new BitmapData(400,400); var area:Rectangle = new Rectangle (0,0,40,40); var p:Point = new Point (0,0); var it:IsoTest; var n:Number = getTimer(); for (var i:int = 0; i < 50000; i++) { // Instantiate a new IsoTest it = new IsoTest(0,0,0,copy,area); // Perform 20/60 draws (Varies by test) for (var j:int = 0; j < 60; j++) { it.draw(p,display); } } trace (getTimer() - n);

Results:
Lifespan of 20 frames: 1970ms
Lifespan of 60 frames: 5816ms
Pretty straightforward, tripling the lifespan increases the time almost linearly.

Object Pool
I dont route my method calls through another object, I manually simulate an object pool in the for loop. I don't think this will cause any major difference.

Changes from the previous test is that initially (And before the timer starts), I populate an array with 400 objects. From there, instead of instantiating i perform a hashtable lookup, pop off an element, reset its members and push it back on at the end.

Code:

Code:

import Display.Sprite.IsoTest; var display:BitmapData = new BitmapData(800,600); var copy:BitmapData = new BitmapData(400,400); var area:Rectangle = new Rectangle (0,0,40,40); var p:Point = new Point (0,0); var it:IsoTest; var aPool:Array = new Array(); var dict:Dictionary = new Dictionary(); for (var k:int = 0; k < 400; k++) { aPool.push (new IsoTest(0,0,0,copy,area)); } var n:Number = getTimer(); for (var i:int = 0; i < 50000; i++) { // Hashtable dict[IsoTest]; // Pool it = aPool.pop(); it.x = 0; it.y = 0; it.z = 0; // Note that I do not reset its bitmapData or drawing area // Perform 20/60 draws (Varies by test) for (var j:int = 0; j < 60; j++) { it.draw(p,display); } // Stick it back on aPool.push(it); } trace (getTimer() - n);

Results:
Lifespan of 20 frames: 1940ms
Lifespan of 60 frames: 5781ms
Very small differences, approximately 30ms was saved in both tests.

Modified Flyweight
This is probably the least "OOPish" design. I don't find it that big of an issue though since it is very encapsulated and hidden behind an object. Like with the pool, I didn't feel it necessary to create a complete implementation, only simulate what is being done.

Of particular interest is that a flyweight is essentially three seperate pools, but as these pools only act on numbers they are still faster than a single pool of objects.

Code:

import Display.Sprite.IsoTest; var display:BitmapData = new BitmapData(800,600); var copy:BitmapData = new BitmapData(400,400); var area:Rectangle = new Rectangle (0,0,40,40); var p:Point = new Point (0,0); // Arrays of primitives var ax:Array = new Array(); var ay:Array = new Array(); var az:Array = new Array(); // Fill em up, to 400 like with the pool for (var k:int = 0; k < 400; k++) { ax.push (0); ay.push (0); az.push (0); } // Note that a flyweight is basically a pool but with primitives except objects. var n:Number = getTimer(); for (var i:int = 0; i < 50000; i++) { ax.push (0); ay.push (0); az.push (0); // Perform 20/60 draws (Varies by test) for (var j:int = 0; j < 60; j++) { display.copyPixels(copy,area,p); } ax.pop(); ay.pop(); az.pop(); } trace (getTimer() - n);

Results:
Lifespan of 20 frames: 1838ms
Lifespan of 60 frames: 5450ms
This provides approximately a 130ms gain over 20 frames, and a 360ms gain over 60 frames. Well over 4 times the speed increase gained by a pool.
I might as well mention why there is this gain. In a 60 frame life object, you have 1 call to the constructor, but 120 calls to the update/render methods. A flyweight inlines these calls into a loop. Sure, you have a few more arrays, but you also remove the overhead of 120 method calls.

Also, I forgot to call the update(); method in these tests. Since its empty it wouldn't really change much and I didn't feel like redoing all the tests. Calling update() would have increased the speed gain of a flyweight by a moderate amount as well (Aka handicapping all other tests with 50,000*60 method calls).

Conclusion
While yes, a pool will gain you, optimistically 3% speed (And this test showed only 1.5%), frankly, thats not worth it and a flyweight will provide closer to 7%.

What I really want to make the most clear here is that performance on this scale might as well not exist. There is a reason big O notation will drop performance gains of hundreds of percents. It does not matter. In the grand scheme of things, this entire section of your engine may take 3% of your processing time, rendering even a 10% gain completely worthless.
Things like switching from an exponential algorithm to a logarithmic one will help, squeaking fractions of a percent out by complicating your architecture wont.

Note that using an object pool will increase your framerate from 30fps to 30.0135fps. (Assuming that 3% of execution time is spent dealing with instantiation, a heavy guess).

What this test does show is something that should make a lot of sense: One call to the constructor is nothing compared to the hundred of method calls that an object will receive over its life. If you want speed, inline those functions. Don't bother optimizing something that is only called once when there is something else getting 120 calls right beside it.

Don't get me wrong, there are places for pools, but lightweight objects aren't one of them.

The only thing I will mention is garbage collection. Object pools and flyweights both have the advantage of not creating tonnes of orphaned objects. I have not experienced it myself, but orphaning hundreds of objects a frame may cause excessive use of the garbage collector.

Whew...