• 0

Regular Expressions


Question

I have a regular expression in a ASP.NET app I am using. I am trying to extract the url from the href of an anchor tag. It works great unless there is some attribute before the href. I got it working for attributes following after the href...

Regex.Replace(input,"<a href=\"(?<link>((.|\n)*?))\" ([^>]*)>(?<text>((.|\n)*?))</a>","${link}");

So if I have a

<a href="https://www.neowin.net">Neowin</a>'>https://www.neowin.net">Neowin</a> or

<a href="https://www.neowin.net" target="_blank">Neowin</a>

I will get the https://www.neowin.net

But if I have a <a class="regularlink" href="https://www.neowin.net">Neowin</a>'>https://www.neowin.net">Neowin</a>

it doesn't work.

So this "([^>]*)" allows attributes after the href, and I tried it before the href too like this

Regex.Replace(input,"&lt;a ([^&gt;]*) href=\"(?&lt;link&gt;((.|\n)*?))\" ([^&gt;]*)&gt;(?&lt;text&gt;((.|\n)*?))&lt;/a&gt;","${link}");

But that doesn't work. Anyknow have any ideas?

Link to comment
https://www.neowin.net/forum/topic/206405-regular-expressions/
Share on other sites

7 answers to this question

Recommended Posts

  • 0

Playing aroung with RegexBuddy (http://www.regexbuddy.com/) they had this for extracting domains and it seemed to worked with attributes in the href

\b((?#protocol)https?|ftp)://((?#domain)[-A-Z0-9.]+)((?#file)/[-A-Z0-9+&amp;@#/%=~_|!:,.;]*)?((?#parameters)\?[-A-Z0-9+&amp;@#/%=~_|!:,.;]*)?

post-46-1093014122.png

  • 0

Ok sorry, let me rephrase what I am trying to do.

I want to replace all <a href="http://address">link text</a> (Anchor Tags) with the text that is present inside the href attribute.

My problem is, the expression string I have won't replace anchors that have some attributes before the href attribute.

So can how can I modify this

&lt;a href=\"(?&lt;link&gt;((.|\n)*?))\" ([^&gt;]*)&gt;(?&lt;text&gt;((.|\n)*?))&lt;/a&gt;

Or is there another string that will do it...

  • 0

I found this one

(?&lt;anchor&gt;&lt;\s*a\s*(?:(?:\b\w+\b\s*(?:=\s*(?:"[^"]*"|'[^']*'|[^"'&lt;&gt; ]+)\s*)?)*)/?\s*&gt;)(?&lt;linktext&gt;.*)&lt;\s*/a\s*&gt;

but I am trying to figure out how to get the data between the href

Found this one here

  • 0

I think you were so close with your original try it'll hurt.

Regex.Replace(input,"&lt;a ([^&gt;]*) href=\"(?&lt;link&gt;((.|\n)*?))\" ([^&gt;]*)&gt;(?&lt;text&gt;((.|\n)*?))&lt;/a&gt;","${link}");

will look for <a href="... (TWO spaces between a and href [possibly other content, but not necessarily])

Regex.Replace(input,"&lt;a ([^&gt;]*)href=\"(?&lt;link&gt;((.|\n)*?))\" ([^&gt;]*)&gt;(?&lt;text&gt;((.|\n)*?))&lt;/a&gt;","${link}");

might well work (all the the bracketed bit is saying is "anything other than a > [the end of tag], a zero match is acceptable")

This topic is now closed to further replies.
  • Recently Browsing   0 members

    • No registered users viewing this page.