• 0

Need help with regex <img> and alt= tags


Question

I'm trying to match the image and alt= and scr= tags with regular expressions...

I'm reusing code from this forum, (Tks)..

&lt;p align="center"&gt;&lt;a title="Gato con carrito de compra de supermercado invisible" class="imagelink" rel="attachment" id="p203" href="http://www.ecnc.com/blog/2007/07/23/voy-al-supermercado-me-compras-unas-toallas-femeninas-mi-amor/gato-con-carrito-de-compra-de-supermercado-invisible/"&gt;
&lt;img alt="Gato con carrito de compra de supermercado invisible" id="image203" src="https://www.lecnc.com/blog/wp-content/uploads/2007/07/carrito_de_compra_invisible.jpg" /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Fuente: &lt;a target="_blank" title="Funny animals" href="http://www.flickr.com/photos/funny_animals/380169474/"&gt;Flickr Funny Animals &lt;/a&gt;&lt;/p&gt;

I'm trying to match alt= and scr=

Code I'm using

/\&lt; *[img][^\&gt;]*alt *= *[\"\']{0,1}([^&gt;]*) *src *= *[\"\']{0,1}([^\"\'\ &gt;]*)/

It's matching both, however match 1, the alt= is getting

Gato con carrito de compra de supermercado invisible" id="image203"

Now I need to take out the id="image203" from the selection

Any suggestion is well appreciated?

Link to comment
Share on other sites

3 answers to this question

Recommended Posts

  • 0

This is working for me. I don't know why people keep using the complicated regex from the other thread.

\<img.+?alt="(.+?)".+?src="(.+?)".+?\/>

Edited by PKHelloNasty
Link to comment
Share on other sites

  • 0

Both of those assume that the alt attribute appears before the src attribute. There is also the assumption that they are quoted, which we all know ain't necessarily so when it comes to Frontpage created pages (or HTML4 for that matter). PKHN's also assumes you're writing XHTML. To counter these assumptions makes for an annoyingly complicated regex, or a selection that drills down to what you're after: first get an image tag, then get the attributes and then look through those to finally retrieve what you're after.

Lest we forget, > can appear unencoded in a quoted attribute value (though frowned upon), so /\<img[^>]*>/ might only match '<img alt="My pic" title="My pictures >' in '<img alt="My pic" title="My pictures > my pic" src=mypic.gif>' (note unquoted src attribute as there aren't any spaces in the value), meaning you have to check if the occurrence of > is before end of string or the next tag start.

Link to comment
Share on other sites

  • 0
This is working for me. I don't know why people keep using the complicated regex from the other thread.

\<img.+?alt="(.+?)".+?src="(.+?)".+?\/>

Thanks...

I tested this solution provided at regexadvice that is working fine, however like you said, it is not nice...

<img[^>]*?alt=\x22([^\x22]*)\x22[^>]*?src=\x22([^\x22]*)[^>]*?>|<img[^>]*?src=\x22([^\x22]*)\x22[^>]*?alt=\x22([^\x22]*)\x22[^>]*?>

it should match on both *src* and *alt* no matter in what order they appear in the *Img* tag.

Link to comment
Share on other sites

This topic is now closed to further replies.
  • Recently Browsing   0 members

    • No registered users viewing this page.