Grabbing a URL via a Regular Expression (special cases)

**Endoplasmic** · 03-28-2009, 11:59 AM

After 2 nights of *almost* getting it I realized that Flash's RegExp doesn't support lookbehinds.

So I'm wondering if anyone could help point me in the right direction that is away from this wall I'm currently facing.

Here are the 3 scenarios:

1) http://example.com
2) <a href="http://example.com">http://example.com</a>
3) <br><br>http://example.com<br><br>

Basically what I want to happen is to ignore scenario #2. I know the solution, but I can't seem to nail it. All that needs to happen is to select the http (and everything after that) as long as it's not preceded by a "> or =", but since I don't know a way for RegExp to look backwards, I'm sort of at a loss.

Here's hoping that someone has run into this issue before

**Marchut** · 03-31-2009, 04:27 AM

You want to isolate the address from a string that might have a syntax of any of those three options?

**Endoplasmic** · 03-31-2009, 08:57 PM

Well basically I want to be able to find case #1 and #3 and NOT #2. That way I can wrap the <a href> stuff around the URLs like normal.

I'm starting to wonder if regular expressions are not the answer and I'm making something harder than it should be.

**Chris_Seahorn** · 03-31-2009, 10:19 PM

This example uses three hard coded textfields (on top) so I can force your examples exactly without having to add slashes or replace straight quotes with straight apostrophes ( if I were to instead populate them via script) to simulate.

The lower fields are returned from the RegEX. In the real world I would use "ignored" as a variable flag and react to it being such. This example instead shows the ignored field just to show it on the frontend.

http://www.km-codex.com/examples/RegEX_IsolateUrls.html

This where you are heading?

var testpattern1:String = txt1.text;
testpattern1 = testpattern1.replace( /<a.*?a>/g, "ignored" );

var testpattern2:String = txt2.text;
testpattern2 = testpattern2.replace( /<a.*?a>/g, "ignored" );

var testpattern3:String = txt3.text;
testpattern3 = testpattern3.replace( /<a.*?a>/g, "ignored" );

txt4.text=testpattern1;
txt5.text=testpattern2;
txt6.text=testpattern3;

If you instead want to rip out ALL http references in the strings...smoke the originals and reformat as <a href> wrapped http's...well...we can do that too. As it stands you said you want to ignore.

This is heavily covered in the Cookbook btw.

**Endoplasmic** · 03-31-2009, 10:48 PM

Here's where I came to tonight (not gonna lie it's pretty bananas):

PHP Code:


private static function checkURL(text:String):String {
            //match up [url]  [url=xxxxx]
            var url:String = text;
            var startLink:String = "<font color=\"#" + Config.POST_LINK_COLOUR.toString(16) + "\"><u>";
            var endLink:String = "</u></font>";

            //handle regular [url] links
            var regularURL:RegExp = new RegExp("\\[url\\]([a-zA-Z0-9/@?:#&+._=-]*)\\[/url\\]", "gi");
            url = url.replace(regularURL, startLink + "<a href=\"$1\" target=\"_blank\">$1</a>" + endLink);
            
            //handle links that have "[url=????]"
            var specialURL:RegExp = new RegExp("\\[url=(("+PROTOCOLS+")://){1}([a-zA-Z0-9/@?:#&+._=-]*)\\](.*?)\\[/url\\]", "gi");
            url = url.replace(specialURL, startLink + "<a href=\"$1$3\" target=\"_blank\">$4</a>" + endLink);            
            
            //handle URLs that don't have tags
            var inlineURL:RegExp = new RegExp("("+PROTOCOLS+")://[a-zA-Z0-9/@?#&+._=-]*", "gi");    
            var matches:Array = url.match(inlineURL);
            var startIndex:int = 0;
            
            //loop through the array and replace the HTML to what it should be
            for(var i:int = 0; i < matches.length; i++){
                var urlStart:int = url.indexOf(matches[i], startIndex);
                
                //look before the URL to see if it needs to be parsed
                var prevText:String = url.substring(urlStart-8, urlStart);
                
                if(prevText != "a href=\"" && prevText != "_blank\">"){
                    //parse it!
                    var firstHalf:String = url.substring(0, urlStart);
                    var secondHalf:String = url.substring(urlStart + matches[i].length);
                    
                    //display the new text
                    firstHalf += startLink + "<a href=\"" + matches[i] + "\" target=\"_blank\">" + matches[i] + "</a>" + endLink;
                    url = firstHalf + secondHalf;
                }
                
                //push the start index forward through the string
                startIndex = urlStart + matches[i].length;
            }
            
            return url;
        }

So basically that allows me to run my [url ] [/url ] and [url = ][/url ] tags FIRST then after those have been parsed I can handle the inline ones.

**Chris_Seahorn** · 03-31-2009, 10:51 PM

I love bananas

Thread: Grabbing a URL via a Regular Expression (special cases)

Thread Tools

Display

Grabbing a URL via a Regular Expression (special cases)

Posting Permissions