+chorpeac MVC Posted August 20, 2004 MVC Share Posted August 20, 2004 I have a regular expression in a ASP.NET app I am using. I am trying to extract the url from the href of an anchor tag. It works great unless there is some attribute before the href. I got it working for attributes following after the href... Regex.Replace(input,"<a href=\"(?<link>((.|\n)*?))\" ([^>]*)>(?<text>((.|\n)*?))</a>","${link}"); So if I have a <a href="https://www.neowin.net">Neowin</a>'>https://www.neowin.net">Neowin</a> or <a href="https://www.neowin.net" target="_blank">Neowin</a> I will get the https://www.neowin.net But if I have a <a class="regularlink" href="https://www.neowin.net">Neowin</a>'>https://www.neowin.net">Neowin</a> it doesn't work. So this "([^>]*)" allows attributes after the href, and I tried it before the href too like this Regex.Replace(input,"<a ([^>]*) href=\"(?<link>((.|\n)*?))\" ([^>]*)>(?<text>((.|\n)*?))</a>","${link}"); But that doesn't work. Anyknow have any ideas? Link to comment https://www.neowin.net/forum/topic/206405-regular-expressions/ Share on other sites More sharing options...
0 +primortal Subscriber² Posted August 20, 2004 Subscriber² Share Posted August 20, 2004 Playing aroung with RegexBuddy (http://www.regexbuddy.com/) they had this for extracting domains and it seemed to worked with attributes in the href \b((?#protocol)https?|ftp)://((?#domain)[-A-Z0-9.]+)((?#file)/[-A-Z0-9+&@#/%=~_|!:,.;]*)?((?#parameters)\?[-A-Z0-9+&@#/%=~_|!:,.;]*)? Link to comment https://www.neowin.net/forum/topic/206405-regular-expressions/#findComment-584398204 Share on other sites More sharing options...
0 +chorpeac MVC Posted August 20, 2004 Author MVC Share Posted August 20, 2004 regex buddy...nice... :D Does it give descriptions of the symbols instead of having to scour the net? Link to comment https://www.neowin.net/forum/topic/206405-regular-expressions/#findComment-584398729 Share on other sites More sharing options...
0 +chorpeac MVC Posted August 20, 2004 Author MVC Share Posted August 20, 2004 Ok sorry, let me rephrase what I am trying to do. I want to replace all <a href="http://address">link text</a> (Anchor Tags) with the text that is present inside the href attribute. My problem is, the expression string I have won't replace anchors that have some attributes before the href attribute. So can how can I modify this <a href=\"(?<link>((.|\n)*?))\" ([^>]*)>(?<text>((.|\n)*?))</a> Or is there another string that will do it... Link to comment https://www.neowin.net/forum/topic/206405-regular-expressions/#findComment-584398954 Share on other sites More sharing options...
0 +primortal Subscriber² Posted August 20, 2004 Subscriber² Share Posted August 20, 2004 not sure, something like this -> http://www.regexbuddy.com/create.html Link to comment https://www.neowin.net/forum/topic/206405-regular-expressions/#findComment-584398963 Share on other sites More sharing options...
0 +chorpeac MVC Posted August 20, 2004 Author MVC Share Posted August 20, 2004 I found this one (?<anchor><\s*a\s*(?:(?:\b\w+\b\s*(?:=\s*(?:"[^"]*"|'[^']*'|[^"'<> ]+)\s*)?)*)/?\s*>)(?<linktext>.*)<\s*/a\s*> but I am trying to figure out how to get the data between the href Found this one here Link to comment https://www.neowin.net/forum/topic/206405-regular-expressions/#findComment-584399300 Share on other sites More sharing options...
0 +chorpeac MVC Posted August 23, 2004 Author MVC Share Posted August 23, 2004 anyone have an idea? Link to comment https://www.neowin.net/forum/topic/206405-regular-expressions/#findComment-584417287 Share on other sites More sharing options...
0 +mrbester MVC Posted August 23, 2004 MVC Share Posted August 23, 2004 I think you were so close with your original try it'll hurt. Regex.Replace(input,"<a ([^>]*) href=\"(?<link>((.|\n)*?))\" ([^>]*)>(?<text>((.|\n)*?))</a>","${link}"); will look for <a href="... (TWO spaces between a and href [possibly other content, but not necessarily]) Regex.Replace(input,"<a ([^>]*)href=\"(?<link>((.|\n)*?))\" ([^>]*)>(?<text>((.|\n)*?))</a>","${link}"); might well work (all the the bracketed bit is saying is "anything other than a > [the end of tag], a zero match is acceptable") Link to comment https://www.neowin.net/forum/topic/206405-regular-expressions/#findComment-584417860 Share on other sites More sharing options...
Question
+chorpeac MVC
I have a regular expression in a ASP.NET app I am using. I am trying to extract the url from the href of an anchor tag. It works great unless there is some attribute before the href. I got it working for attributes following after the href...
So if I have a
<a href="https://www.neowin.net">Neowin</a>'>https://www.neowin.net">Neowin</a> or
<a href="https://www.neowin.net" target="_blank">Neowin</a>
I will get the https://www.neowin.net
But if I have a <a class="regularlink" href="https://www.neowin.net">Neowin</a>'>https://www.neowin.net">Neowin</a>
it doesn't work.
So this "([^>]*)" allows attributes after the href, and I tried it before the href too like this
But that doesn't work. Anyknow have any ideas?
Link to comment
https://www.neowin.net/forum/topic/206405-regular-expressions/Share on other sites
7 answers to this question
Recommended Posts