VR Headtracking

1 Attachment(s)

VR Headtracking

I've been trying a few methods to do headtracking with a webcam, and I'm more and more convinced it's possible in flash with items you have at home.

Attached is the begining of my experiments, using the same on-screen demo environment type as shown on the youtube videos for WiiFlash and PS3Eye. It doesn't have the same amount of freedom yet, no rotation around the y-axis and rotation around the x-axis is auto-focused on the horizon. A single light source develops z-depth issues when the detection area fluctuates, but that can be smoothed out with multi-sampling. Once I mount a couple of lights on a hat, I'm sure the increased width will be easier to work with.

I think for full freedom (or close to it), I'll need a third light source either at the brim or top of the hat. This will allow for z-depth dertimination by height and y-axis rotation by width and the distance between the points. I don't know how to do rotation with two points without constraining right->left and left->right (ie you couldn't look left if you are standing on the left), which is fine for windowing but isn't full freedom.

Any ideas on better methods?

Here's some ugly code I'm working on:

PHP Code:

// import flash.display.BitmapData, flash.geom.Rectangle, flash.geom.Point, and flash.geom.Matrix // set up rendering variables var Ox = Stage.width/2; var Oy = Stage.height/2; var focalLength = 100; var cam = new Object(); cam.x = 0; cam.y = 0; cam.z = 100; // light source width multiplier // for a single point of light try values between 5 and 20 // for two points of light, or a light bar, try values between 0.1 and 5 var scaler = 10; // attach the webcam video to a video object my_cam = Camera.get(); webcam_video.attachVideo(my_cam); // create a mirror image of the webcam video, scaled 2* for help with precision later createEmptyMovieClip('holder', getNextHighestDepth()); now = new flashBitmapData(webcam_video._width * 2, webcam_video._height * 2); holder.attachBitmap(now, holder.getNextHighestDepth()); with(holder){ _x = webcam_video._width; _y = webcam_video._height + 10; _xscale = -50; _yscale = 50; } // create a box to place around the isolated light source for debugging createEmptyMovieClip('box',getNextHighestDepth()); with(box){ lineStyle(1, 0xFFFFFF); lineTo(100, 0); lineTo(100, 100); lineTo(0, 100); lineTo(0, 0); } // create some target objects createEmptyMovieClip('targets', getNextHighestDepth()); MovieClip.prototype.makeTargets=function(targetObj){ for(n = 0; n < 6; n++){ this.createEmptyMovieClip(targetObj.names[n], 100+n); with(this[targetObj.names[n]]){ lineStyle(100, targetObj.colors[n]); lineTo(1,0); lineStyle(50, 0xffffff); moveTo(0, 0); lineTo(1, 0); lineStyle(25, targetObj.colors[n]); moveTo(0, 0); lineTo(1, 0); lineStyle(12, 0xffffff); moveTo(0, 0); lineTo(1, 0); } this[targetObj.names[n]].x = targetObj.x[n]; this[targetObj.names[n]].y = targetObj.y[n]; this[targetObj.names[n]].z = targetObj.z[n]; } }; targetsObject = new Object(); targetsObject.names = new Array('target1','target2','target3','target4','target5','target6'); targetsObject.x = new Array(0,200,-300,-100,-300,300); targetsObject.y = new Array(0,200,-100,-100,-300,-300); targetsObject.z = new Array(100,200,300,500,700,700); targetsObject.colors = new Array('0xff0000','0x0000ff','0x00b000','0xff00ff','0xffff00','0xff6666'); targets.makeTargets(targetsObject); // work the magic this.onEnterFrame=function(){ // draw the current webcam image to a bitmapdata object // scaling up will help with accuracy a bit matrix = new flashGeomMatrix(); matrix.scale(2,2); now.draw(webcam_video, matrix); // eleminate all but the brightest colors now.threshold(now, now.rectangle, new flashGeomPoint(0, 0), '<=', 0xFF666666, 0xFF000000, 0xFF0000FF, false); // find the bounding box of the brightest color redBox=now.getColorBoundsRect(0x00FF0000,0x00FFFFFF,false); // TO-DO -- use multiple lights, subdivide the bounding box and repeat to isolate each light // if a light source is detected, track it if(redBox.width>0){ // align and resize box indicator for debugging box._x = redBox.x/2-redBox.width/4+webcam_video._x; box._y = redBox.y/2-redBox.height/4+webcam_video._y; box._width = redBox.width/2; box._height = redBox.height/2; // set z based on bounding box width and scaler value cam.z = redBox.width*scaler; // determine the head position in 3d space (remember that holder has been scaled to 2*) var ratio = focalLength / (focalLength + cam.z); cam.x = (holder._width / 2 - redBox.x) * ratio * 4; cam.y = (redBox.y - holder._height / 2) * ratio * 4; cam.rotY = 1-(redBox.y/holder._height); // keeps the camera pointed at horizon // TO-DO -- decide which freedoms are important, rotation vs position in x/y planes // perhaps 3 points of light can be used to triangulate all freedoms using x,y,width,height ratios // render targets and perspective lines adjust3d(targets.target1, cam); adjust3d(targets.target2, cam); adjust3d(targets.target3, cam); adjust3d(targets.target4, cam); adjust3d(targets.target5, cam); adjust3d(targets.target6, cam); renderLines(cam); } }; // 3d perspective rendering function adjust3d(obj, cam){ var TminusC = (obj.z-cam.z == 0) ? 1 : obj.z - cam.z; if(focalLength + TminusC == 0){ var ratio = 0.00000001; }else{ var ratio = focalLength / (focalLength + TminusC); } obj._x = Ox + (obj.x - cam.x) * ratio; obj._y = Oy + (obj.y - cam.y) * ratio; obj._xscale = obj._yscale = ratio * 100; if(obj.z < cam.z - focalLength){ obj._visible = false; }else{ obj._visible = true; } obj.swapDepths(Math.round(10000-obj.z-cam.z)); } // render prespective lines function renderLines(cam){ with(_root){ clear(); for(var x = -Stage.width; x <= Stage.width; x+=200){ var y = -Stage.height; var z = 0; var ratio = focalLength / (focalLength + 200-cam.z); var sx = Ox + (x - cam.x) * ratio; var sy = Oy + (y - cam.y) * ratio; ratio = focalLength / (focalLength + -cam.z+100000); var ex = Ox + (x - cam.x) * ratio; var ey = Oy + (y - cam.y) * ratio; lineStyle(1,0xffffff); moveTo(sx,sy); lineTo(ex,ey); y = Stage.height; z = 0; ratio = focalLength / (focalLength + 200-cam.z); sx = Ox + (x - cam.x) * ratio; sy = Oy + (y - cam.y) * ratio; ratio = focalLength / (focalLength + -cam.z+100000); ex = Ox + (x - cam.x) * ratio; ey = Oy + (y - cam.y) * ratio; lineStyle(1,0xffffff); moveTo(sx,sy); lineTo(ex,ey); } for(var v = -Stage.height; v <= Stage.height; v+=200){ var x = -Stage.width; var y = v; var z = 0; var ratio = focalLength / (focalLength + 200-cam.z); var sx = Ox + (x - cam.x) * ratio; var sy = Oy + (y - cam.y) * ratio; ratio = focalLength / (focalLength + -cam.z+100000); var ex = Ox + (x - cam.x) * ratio; var ey = Oy + (y - cam.y) * ratio; lineStyle(1,0xffffff); moveTo(sx,sy); lineTo(ex,ey); x = Stage.width; y = v; z = 0; ratio = focalLength / (focalLength + 200-cam.z); sx = Ox + (x - cam.x) * ratio; sy = Oy + (y - cam.y) * ratio; ratio = focalLength / (focalLength + -cam.z+100000); ex = Ox + (x - cam.x) * ratio; ey = Oy + (y - cam.y) * ratio; lineStyle(1,0xffffff); moveTo(sx,sy); lineTo(ex,ey); } for(var z = 200; z <= 100000; z*=1.5){ var x = Stage.width; var y = Stage.height; var ratio = focalLength / (focalLength + z-cam.z); var sx = Ox + (x - cam.x) * ratio; var sy = Oy + (y - cam.y) * ratio; ratio = focalLength / (focalLength + z-cam.z); y = -Stage.height; var ex = Ox + (x - cam.x) * ratio; var ey = Oy + (y - cam.y) * ratio; lineStyle(1,0xffffff); moveTo(sx,sy); lineTo(ex,ey); x = -Stage.width; y = Stage.height; ratio = focalLength / (focalLength + z-cam.z); sx = Ox + (x - cam.x) * ratio; sy = Oy + (y - cam.y) * ratio; ratio = focalLength / (focalLength + z-cam.z); y = -Stage.height; ex = Ox + (x - cam.x) * ratio; ey = Oy + (y - cam.y) * ratio; lineStyle(1,0xffffff); moveTo(sx,sy); lineTo(ex,ey); x = Stage.width; y = Stage.height; ratio = focalLength / (focalLength + z-cam.z); sx = Ox + (x - cam.x) * ratio; sy = Oy + (y - cam.y) * ratio; ratio = focalLength / (focalLength + z-cam.z); x = -Stage.width; ex = Ox + (x - cam.x) * ratio; ey = Oy + (y - cam.y) * ratio; lineStyle(1,0xffffff); moveTo(sx,sy); lineTo(ex,ey); x = Stage.width; y = -Stage.height; ratio = focalLength / (focalLength + z-cam.z); sx = Ox + (x - cam.x) * ratio; sy = Oy + (y - cam.y) * ratio; ratio = focalLength / (focalLength + z-cam.z); x = -Stage.width; ex = Ox + (x - cam.x) * ratio; ey = Oy + (y - cam.y) * ratio; lineStyle(1,0xffffff); moveTo(sx,sy); lineTo(ex,ey); } } } stop();

Note- For this to work, you need either a dark room, or a piece of film over the webcam lens to filter out all light other than your lightsource

nice to see your asking for help rather than just linking to a vid=-)

unfortuantly i've never worked with a webcam so i don't think i'll be of much help

Note- For this to work, you need either a dark room, or a piece of film over the webcam lens to filter out all light other than your lightsource

thats why I'm hesitant about this webcam stuff, it seems that to get something working you need to be in really specific conditions. Would it work better if you chose a color that is not seen much in the real world or in an office (magenta?) and use that as points?

You would have to go to a arts and craft store and buy those little sticky dots and slap them on your head. Then isolate that color in the bitmap. Might work better, and in any light.

A piece of film from some bad negatives taped to your webcam lens is all you need for daytime, that's not too difficult(ok, if you're of the digital age, ask your mom/grandma). Turning of the lights after dark isn't too hard either, and I've used everything from a lighter, to my mouse, to my cellphone as the targeting light source.

I've done some testing using specific color values, but the issue is with reflected light. Even without magenta existing in the real world, it does exist in reflected light in a webcam since the true colors are resolved to a limited subset of the spectrum (256,256,256). A single point of light the color of magenta away from your targeting dot will make the colorBoundsRect paramaters worthless.

I was really interested in this post and did a quick search to see if there was any resources to see how tracking was accomplished with Sony's Eyetoy for PlayStation and I stumbled upon this interesting video.

http://www.gametrailers.com/player/u...es/169289.html

In the video the guy recreates that Wii VR tracking demo on PlayStation3 using the PS3. He wears a pair of glasses with infrared lights emitting from it and this is picked up from the camera. The camera uses film to filter out only the infrared light, similar to how Jerry described.

I was wondering if you tried this, Jerry. I'm guessing you've already seen it since you mentioned the film and PS3Eye. The conditions may be a bit specific but if this is something you can setup for less than $50 it's definitely worth investigating, if only for an artistic experiment.

I'm starting to think I should invest in a webcam just for new interactive development.

Yes, I'm inspired by both the WiiFlash and PS3Eye demonstrations. I checked out the source, but it's dependant upon libraries I don't have. One of the source files for the PS3Eye demo uses lookups for alot of values, which helps with speed I'm sure.

You can buy a cheap webcam for $20, or a used one for $10. I have a cheap one that is over 6 years old, and it works fine for this. A piece of scrap negative can't even be figured into the cost, you could get it for free at any photo shop if you don't have some old photos envelopes lying around. So for $10 bucks, you could be set up.

Being able to move around objects is so immersive it shouldn't be overlooked as a fun (dare I say it after a recent thread here, even addicting) factor for games!

You could do some clever OCR to get the colourbounds thing working.

One or two odd pixels here and there is bound to be discernable from a small area of more solid colour.

you could also do some pre-calibration to get it working, so if you're sitting in front of a magenta wall, if tells you to move your ass somewhere else, or use a different colour.

I've been experimenting with face-recognition based tracking lately. Although currently too slow to be useable (1-3 fps), I do get x/y tracking with NO special equipment. Z shouldn't be too hard to get via scale comparison. Rotation is far more problematic for this method.

You can see a demo of the tracking here:
http://suckatmath.com/personal/faced...acedetect.html

I threw this together with papervision yesterday (my first pv3d experiment), and got a very natural interface. You move your head, you move the camera.

If you'd like to see the code, I just open-sourced it. http://code.google.com/p/deface

Nice example both Jerry and 5 tons :)
5tons your example is pretty slow and can take several seconds to update if I move my head. Nevertheless it traces my head as it should! Good job :D

I would love to try some of this stuff some day, so thanks for sharing your progress!

Nice work 5tons! Thanks for posting your source as well!

As you said, it's too slow currently to be used, but there is potential there.

The tracking stuff was actually just a side goal of the main face-detection project, but now I'm finding it quite interesting. How's this for an idea:
Use face-detection as a calibration step, to determine which colors are "face", use those colors to filter the pixels acquired through the standard difference between frames motion detection, or just in a threshold operation. A colorbounds should then get you a fairly good idea of where in the frame the users face is, and from there you can use a little bit of edge detection for orientation calculation and location refinement.

It's a rough idea, but I think that it might be made to work.

Without looking at the source 5tons... And going on what you've just said, i'm curious...

does the whole face tracking code re-execute from fresh every frame, or does it use the last position, and previous frame information to get a better grip on where the face is?

It does use previous information to narrow the search window for the face (basically it scales the previous found rectangle up and down by .2 as the new limits), but each frame DOES require a fairly expensive preprocessing step to calculate what is called an Integral Image on which the face classifiers work.

mr_malee:
Actually you could use IR Leds instead of regular lights so you could use it on regular lighting conditions.

Great Work jerryscript! ;)

Btw, I remember you mentioned this method to me when I was working with my OSR engine but I found is faster to

-Store the image in a byte array (12 ms for a 320*240 image aprox.)
-Analyze the bytearray for a lighting threeshold and store the result in a binary array.
-Analyze the binary array instead.

Is a LOT faster to analize an array in memory than using getPixel, getThreshold about 10 times faster actually. (a lot more coding is required though)

Hmmm, 5tons' demo, and Mr_Malee and lesli_felix's post made me think...

1- click on several points on the face to determine a color average

2- click on the eyes and determine their color average

3 - use threshold and getColorBoundsRect to determine the face and eye screen positions

4 - use the spatial relation between the eyes and the face to determine orientation and positioning including depth

With this method, you increase your points of reference from 1 to 6 making positioning and orientation much easier to calculate, and more accurate. The farther the eyes are from the top of the face, the greater the cam.rotX. The farther from the bottom of the face, the lesser the cam.rotX. The farther from the left side, the greater the cam.rotY (converse for right side). The greater the size of the face and/or the distance between the eyes, the lesser the cam.z.

hmmm, just thought of something else... If I end up posting an example using an image of my face, you are all going to regret it!

that sounds just crazy enough to work. Get a demo up so I can rotate stuff with me noggin :D

1 Attachment(s)

Just an update here. I've only been able to work on this in my spare time, but at least there is good progress, and no need for special lights or other equipment. Current status:

1- head recognition : buggy, first attempted using colorBounds reduction via motion detection, but this doesn't work well for those with long full voluptuous hair, may have to resort to an initialization matrix via a historgram of a scaled down bitmapData (down to 10px), but I really want to avoid histograms

2- facial orientation : buggy, but improving fast. using a modified version of GSkinner's ColorMatrix class in combination with paletteMap, I can easily find the eye height, and with a bit of twiddling of contrast/brightnes I can connect the dots between the eyes

3- noise reduction : not-implemented yet, I'm experimenting with a couple of algorithms to try to make different test sets cancel each other's errors, and to make each test set ignore out-of-bounds data. For those interested in the algorithm I'm trying to adapt, you can read about it here : http://www.mii.lt/informatica/pdf/INFO537.pdf

My current process is as follows:

1- grab webcam image and store in two bitmapData's via draw

2- use difference blend mode and threshold to determine motion area

3- copyPixels of motion area to new bitmapData (this cuts out as much of the image as possible resulting in a smaller image area for further processing)

4- adjust the brightness and contrast for initial color reduction (GSkinner's)

5- paletteMap the results to as few colors as possible (down to 9 or even 6)

6- copyPixels from a narrow band in the center of the motion detection bitmapData at what is assumed to be the forehead height, then use threshold to determine the eye height

6- floodFill the palleteMap bitmapData in the assumed forehead region, then use threshold to determine the facial width

7- copyPixels using the facial width and eye height as a guide to grab the eye region, then use threshold to determine the eye positions

8- draw an ellipse using bitmapFill in a seperate sprite with 2:1 (h:w) proportions based on the floodFill/threshold results to show only the face, then draw a box from eye to eye.

The result is that you can now track the pitch and yaw of the head. I doubt I'll include roll unless absolutely necessary, unless I find a cheap way. Once I work out a good head detection method (regardless of hair), it will be possible to make both fulcrum adjustments, and pitch and yaw, which should be enough for VR with nothing more than a webcam! Mr Malee's Amazing Noggin Turning Control System is not far away! :)

Attached is an example swf. The text boxes in the lower left are for brightness (left text box) and contrast (right text box). For my system and lighting conditions, I get the best results with brightness adjusted based on conditions (0 to -50), and contrast set as high as possible (up to 100).

Note- this only works with head motion, not full body motion yet! If anyone has some good ideas for how to find the head regardless of how much of the body is showing (and regardless of hair style), I would appreciate your suggestions!

Well, since posting that demo back in the thread, I've got some optimization help from FlashGuru on the face detection code. It's still not quite fast enough to use alone in a realtime situation, but it's somewhere between 10 and 20 fps. We used it as an intermittent position corrector for a mean-shift head tracking application, and that worked very well.
I haven't updated the code in the svn repository yet, but will probably do that in the coming days or weeks.

You could take a similar approach, using face detection for initialization and correction, but letting faster/looser algorithms do the actual tracking.