[Resolved] [Resolved] [Resolved] [Resolved] [Resolved] [Resolved] [Resolved] recursion and parsing X

**tutash** · 08-15-2002, 12:14 PM

Howdy. I wrote this script, and its working pretty well, but I doubt that its efficient, or as elegant as it could be (not to mention any possible bugs I may have missed). I though it would be cool for people to help clean it up so that we could share it with all those in need.

Code:

// 
x = 1;
function doXML(whichXML) {
    if (whichXML == null) {
        break;
    }
    if (whichXML.hasChildNodes) {
        xx = whichXML.nodeName+"_"+x;
        this[xx] = whichXML.firstChild.nodeValue;
        whichXML = whichXML.firstChild.nextSibling;
        doXML(whichXML);
    } else {
        x++;
        theXML = theXML.nextSibling;
        if (theXML == null) {
            break;
        } else {
            doXML(theXML);
        }
    }
}
// 
// 
doXML(theXML);

I'm still having a hard time understanding recursion, so I'm sure it could work better.

Anyway,

enjoy

tutash

**VAYKENT** · 08-15-2002, 01:41 PM

Here's what I did quite a while ago... it too needs some dusting off - but it might give you some ideas.

/*
Recurser
Richard Lyman
*/

// Add this to the end of your traverser function. *** Outside of the 'if/then' condition given below ***
// currentNode is where ever you are in the XML object.
// traverser is the function that deals with the nodes you pass it.

if(currentNode.hasChildNodes()){
traverser(currentNode.childNodes[0]);
} else if(currentNode.nextSibling){
traverser(currentNode.nextSibling);
} else if(!currentNode.hasChildNodes() && !currentNode.nextSibling){
traverser(currentNode.parentNode.nextSibling)
}

// To skip 'whitespace' and terminate the recurse when you reach the end....
// Add this as the first line of your 'traverser' function.
// Becareful though.. it will go right over any textNodes... so you might want to deal with that differently.

if(currentNode!=null && currentNode.nodeType!=3){

// Bulk of code to deal with the node.

}

**shoehorn** · 08-15-2002, 07:44 PM

Both of you seem to be afraid of - or unaware of - "while" loops. Trying to emulate while loops with if statements and function calls can be a bit dangerous, because the ultimate depth of the recursion is equal to the number of elements you have to look at:

Code:

call recursive function:
  check first element
  call recursive function:
    check second element
    call recursive function:
      check third element
      ...
        check final element
        return
      ...
    return
  return
return

So if you have 1000 xml elements (not an unlikely event) then you will end up 1000 function calls deep. This is the kind of thing that brings up the message box: "One of your scripts is taking a long time to complete."

Here's a sample of a typical while looped used to parse XML:

Code:

var node = xmlTree.firstChild;

while (node) {   // if node is valid, this will pass
  if (node.nodeType == 1){
    // this is a tag node
    // do whatever you want here
  } else {
    // this is a text node
  }

  node = node.nextSibling;  // traverse the tree -
    // just ask the current node for the next sibling,
    // and if there isn't one node gets set to null
}

Now if you are working with XML that has any sort of meaningful structure (i.e. is more than one level deep, has different tags at different depths) you'll probably have several functions that perform this same kind of loop. The while loop has an advantage over the recursive loop, because it is much simpler.

**VAYKENT** · 08-16-2002, 12:27 AM

Sorry, I'm not afraid of and certainly not unaware of... and no - I'm not offended.

Besides, you'll run into Flash MX's '255 limit' before you get anywhere anyway.

I personally like extracting the functionality I need, so I can apply it anywhere I want. Using a while loop is great - but it would still need to be wrapped in a 'callable' form... like a function, to optimize it's usage (unless it's only used once ofcourse).

Plus - in Flash 5 - I never got the 'script is taking too long' error. If you get that, then something else wasn't planned properly (or your could use one of the many scripts that distributes the processing over several frames).

I'll have to argue with your statement of 'while loops are much simpler', since we both use the same code.

Plus - I always work with XML that has meaningful structure. It doesn't matter how wide or long the XML is structured - function recursion can still handle it just fine.

Having said all that (remember - I'm not offended

)... I appreciate your counter example, I just dissagree with some of your statements.

**tutash** · 08-16-2002, 08:58 AM

I like the fact that shoehorn brought up: recursion can wipe out Flash. However, I believe it is still something that, as a Flash Programmer, I need to understand fully.

My intention is to parse XML into objects in the using the least amount of code possible. I was under the impression that a recursive function was the best way yo get there.

Of course, the code needs to have some error correction built in, but I think I'm on the right track. I'll be working with the concepts VAYKENT laid out for me, and include them in the script. I'll also try to accomplish the same functionality using shoehorn's linear 'while loop' technique.

My challenge: generate a list of logically ordered objects from XML, while keeping the script totally abstracted and as small as possible.

I use a similar script now to handle data, though its not very refined. Also, though I'm sure I will run into an xml document that breaks the code, I'm the the one deciding how external data is to be formatted, so I shouldn't be surprised.

I concede that shoehorn's argument is a valid one, and that developement of a script base on a 'while loop' is of eaqual importance to this thread.

enjoy

tutash

**tutash** · 08-16-2002, 02:47 PM

I've begun using standard XML for testing (Moreover.com). I
I've come up with a new script, but I think it still needs some error correction. It's working well; rather quickly parses.

Take a look:

Code:

// 
function doXML(whichXML, x) {
    while (whichXML[x].hasChildNodes) {
        whichXML[x].ignoreWhite = true;
        if (String(whichXML[x].nodeName) == "null" | String(whichXML[x].nodeName) == "") {
            whichXML[x].removeNode();
            c--;
        }
        v = (whichXML[x].nodeName+"_"+c);
        this[v] = whichXML[x].firstChild.nodeValue;
        doXML(whichXML[x].childNodes, 1);
        x++;
    }
    c++;
}
// 
// 
theXML = theXML.childNodes[2].childNodes;
doXML(theXML, 1);
//

enjoy

tutash
[Edited by tutash on 08-16-2002 at 03:29 PM]

**VAYKENT** · 08-16-2002, 02:54 PM

A little suggestion - if you say::

something.firstChild.nextSibling.nextSibling

You could also say::

something.childNodes[2]

and you should get the same result.

'childNodes' is a zero-indexed array where each element of the array holds a reference to that 'child node'... if that helps... People get messed up with childNodes, when they forget to 'ignoreWhite', but I see you've considered that in your loop, just make sure you set it on your base XML Object.

**tutash** · 08-16-2002, 03:35 PM

Thanks VAYKENT and shoehorn!

I guess I made a recursive 'while loop' funciton. Anyway, it works really well. I just need to include some error correction. We'll see. This code took me all day, but I believe I got what I want. Now it just needs to cleaned up. I'm sure it can be made better (faster). I'll also need to test it against a non-recursive version to see if its any faster to do it that way.

The end result of this function is a numbered list of object sets, based on tag names. Its an 'automatic' script that requires no more logic. I think its totally abstract, but I may be missing something.

enjoy

tutash

**tutash** · 08-16-2002, 11:43 PM

I'm wondering what verieties of tree would mess this script up.

Any ideas?

Obviously I need some code to tie up the crap tags at the beginnging of the .xml, but what type of .xml structure can make this code fail, and what type of corrections would be needed to make it function correctly, keeping in mind that the function would be smaller when recursive as above?

If i've broken any rules of recursion, please let me know. I'm also interested in sorting these arrays, even though they're not traditionally arrays, or whatever...

enjoy

tutash

**shoehorn** · 08-18-2002, 04:58 PM

VAYKENT -

Yes, of course you'd want to put the code inside a function. I guess that was an unstated assumption I should have made. I also wanted to imply that for structured XML, it is desirable to write a number of functions that will have this same kind of while loop, but will work on different levels/depths of the xml. So for example, if you have this XML:

Code:

&lt;things>
  &lt;customer>
    &lt;name>Robert Shoehorn&lt;/name>
  &lt;/customer>
  
  &lt;supplier>
    &lt;company>Widgets R Us&lt;/company>
  &lt;/supplier>
&lt;/things>

You'd have one function to parse the outer THINGS tag, then one fuction for each of the inner tags, CUSTOMER and SUPPLIER:

Code:

function parseThings(xml) {
  node = xml.firstChild;

  while (node) {
    if (customer node) {
      parseCustomer(node)
    }

    if (supplier node) {
      parseSupplier(node)
    }
    
    node = next node
  }
}

function parseCustomer(xml) {
  look at the customer node
}

function parseSupplier(xml) {
  look at the supplier node
}

That said, this goes a bit against the grain of what tutash is trying to do/learn. There is a time and place for complete abstraction, for writing simple and general purpose functions. But in this case, writing a function that simply copies data from XML onto another object means one is just putting off doing anything useful with the data, it increases the memory needed, and it doesn't take advantage of the free functionality built into the tools.

Okay, now I'm on a minor rant. Have you ever seen someone do this:

Code:

var dataArray = new Array();

function storeData(xml) {
  var d = new Object():
  d.name = xml.attributes.name;
  d.foo = xml.attributes.foo;
  d.bar = xml.attributes.bar;

  dataArray.push(d);
}

Now sometimes - rarely - there is a reason to do this. But I've seen a lot of people write this code as sort of the first step in "parsing" some XML. They're just copying data! Now the data is stuck in some custom object that, in this case, doesn't do anything but keep an extra copy of the data lying around to confuse everybody.

There's nothing wrong with this if there's an actual reason for the new object, which assumes the programmer actually knows something about objects, in which case they're code would look more like this:

Code:

function dataObject(xml) {
  this.name = xml.attributes.name;
  this.foo = xml.attributes.foo;
}

dataObject.prototype.doSomething = function() {
  // something using this.name
  // something using this.foo
}

So...

**VAYKENT** · 08-18-2002, 05:09 PM

Wow... you hit the nail on the head. I HATE it when someone asks for help in taking an XML object and putting it into an Array object....

I completely agree with you approach of 'functions' for 'nodes'. That approach is similar to what I think SAX is... firing off functions based on the node you come across...

Thanks for the comments!

**tutash** · 08-19-2002, 11:07 AM

Thanks, shoehorn, for adding those bits for handling the code. I can see what you're trying to do/learn, and you're right, it is against the grain of what i'm trying to do, which is to make the smallest bit of code i can to parse out objects from an XML doc. I've kept everything abstract so that the code can be reused anywhere, irrespective of the data. I've noticed that you're catching specific XML node to pass to external functions, which is cool, but doesn't really fit in with abstraction. i can see running these functions on the data after the xml has been made into objects, but making specific function calls based on node names during the parsing defeats the purpose of having generic, abstracted code: reusability.

simply copying data from XML onto other objects in flash is what i'm looking to do. a small bit of code that allows for quick and easy creation of dynamic flash from any XML source. doing something with the information contained in the nodes would of course be the next step, but i'd rather leave that up to those implementing their own solutions to their specific problems.

Again, thanks for fleshing out the thread: you make some very good points about handling the data.

enjoy

tutash

**shoehorn** · 08-19-2002, 09:13 PM

You've already got an abstract data model - we call it XML. What you want is a slightly more brain-dead data model so that bad programmers can do things without having a clue what they are doing. This may sound a bit rude, but I think we've got something of a philosophical difference here.

Now even if you are just trying to copy data from a server to the client, I would argue that you've got a pretty specific application in mind. A specific application is going to require specific code, not some generalized method that is "magically useful." I would also suggest that you will want to create some custom objects for storing this data, but this probably goes against your goals of using generic code. I'm not sure how you ever plan on getting anything done.

XML is abstract. It is generic. It is reusable. Use the tools you have.

**VAYKENT** · 08-19-2002, 09:31 PM

Good to see that I'm not the only one with strong opinions.

I agree with what you said...

...BUT...

Let's supposed that he just wanted a framework that he could apply later to 'parse' through his XML... in that case I think "... some generalized method that is "magically useful..." might be ok - as long as it was tweaked to fit each "...specific application..."

I have one text file that I can depend on to go through EVERY node in an XML document. If I want to deal with specific nodes in a certain way, I'll just plop some specific code in the middle of my generic code that says - when you run across this kind of node, pass it to this kind of function...

I like doing things that way - every once in a while.

For the most part I code my XML handling from the ground up, since it's so natural.

**tutash** · 08-20-2002, 12:30 AM

I totally agree with both of you, but saying that this code is a..

...slightly more brain-dead data model so that bad programmers can do things without having a clue what they are doing.

... I think is truly counter to my uses, and my intention.

This script is an excercise in parsing XML using a recursive function. By understanding the use of recursion, I hope to better my ability for handling XML in general.

I presented my script as a study of recursion, and it's use in handling XML. I'm not suggesting it's use for any project more complex than a ticker, or some other simple XML gizmo. It's a quick application of XML to Flash, in a very reusable form. It's not the solution to, as you put it...

XML that has any sort of meaningful structure

...and I don't think I've presented it in that way.

If I wanted to post my working code: the code I'm doing for my projects, I'm sure you become quite board with the thread! :-)

That said, I'm glad that shoehorn has enough passion about programming to admit...

This may sound a bit rude, but I think we've got something of a philosophical difference here.

It would be nice to see him give an example of how a recursive function could be put to good use at the more "refined" level he's used to.

Recursion must have a use somewhere, don't you think?

In any case, I plan to put it to good use.

enjoy

tutash

**tutash** · 08-20-2002, 11:40 AM

Back to the problem at hand:

The objects parsed out from the function need to be hierarchical. more like:

Code:

_level0.topNode_1.innerNode_1

...so the recursion has to take it's parent process into account. Right now, top level nodes refer to their children with c. This leaves structured nodes (c defining top level nodes and child nodes), but no obvious hierarchy. Not that hierarchy is a must; node relationships are still expressed through c, but their hierarchical relationships to their parent nodes are missing.

My next task would be to use recursion to nest each object in it's appropriate parent object. This would make the resulting data more structurally similar to the XML object itself. The code shouldn't have to get much bigger. We'll see.

enjoy

tutash

**VAYKENT** · 08-20-2002, 12:52 PM

Now you've got me worried... it sounds like you're trying to take it out of an XML object, and put it into and XML-like object....

So - I'm lost.

**tutash** · 08-20-2002, 01:24 PM

VAYKENT,
I'm just trying to parse all of the objects out using one recursive function. I want to get the same results I would get using looping, but with only one function. After you parse your XML using traditional methods, you usually end up with objects structured hierarchicaly: a sort of mirror image of the XML structure. Thats what I'm going for. The point: if I know the structure of the data being passed into Flash, I can address it logically, and without any more coding. Lets say I've got 300 pages of text, with associated word definitions (say, 12+ per page). I would like the structure of the XML to dictate the the structure of the objects parsed from it. I can have other functions looking to see if the "page objects" exist, and generate pages based on the data's existence, and populate them with the appropriate sub-objects. This is just one example. I'm looking to have the data drive the application, sort of like a browser.

Do you think this is a bad approach?

enjoy

tutash
[Edited by tutash on 08-20-2002 at 01:41 PM]

**tutash** · 08-21-2002, 03:49 PM

This is almost there. I'm getting hierarchichal objects, but the numbering is wrong. Any suggestions?

Code:

// 
x = 0;
c = 2;
// 
// 
function doXML(whichXML, x, nodePass) {
    while (whichXML[x].hasChildNodes) {
        // error correction
        whichXML[x].ignoreWhite = true;
        if (String(whichXML[x].nodeName) == "null" ) {
            whichXML[x].removeNode();
            c--;
        }
        // the meat
        v = string(nodePass+whichXML[x].nodeName+"_"+(x));
        this[v] = whichXML[x].childNodes[0].nodeValue;
        // recurse
        doXML(whichXML[x].childNodes, 1, string(v+"."));
        x++;
    }
    x--;
    c++;
}
// 
theXML = theXML.childNodes[2].childNodes;
// 
doXML(theXML, 1, "");

its still a bit rough. The inner function call allows for depth now. I've tested it at 4 levels of depth and it works. If I can get the numbering right, I'll add in error correction, and more custom parsing, like shoehorn suggested.

enjoy

tutash

**VAYKENT** · 08-21-2002, 04:18 PM

Which number??

Is that the currently 'working' code?

Is there a currently 'working' sample of XML to go with it?

If I can get both, then I'd be glad to test it out.

Thread: [Resolved] [Resolved] [Resolved] [Resolved] [Resolved] [Resolved] [Resolved] recursion and parsing X

Thread Tools

Display

Uhm...

Uhm...

checking with actual files

Uhm...

Uhm...

philosophy

Uhm...

Uhm...

Uhm...

Posting Permissions