• 0

[PHP] Little Regular Expression help


Question

Hey guys,

This is it for the day, i promise :p

I have the current code for a regular expression and replacement

$patterns = '/(.*?)(http:\/\/)*(www\S*)/'; 			// replace urls with CLICK
$replacements = '$1<a href="http://$3" target="blank">(Click)</A>';
$cjmsg = preg_replace($patterns, $replacements, $cjmsg);

This regex matches either http://www.* or www.* but i need it to match http://* as well.

I guess this means just iradicating http://www.* and having http://* and www.* seperately, but i have no idea how to do this..

Thanks,

Garnett

Link to comment
https://www.neowin.net/forum/topic/679244-php-little-regular-expression-help/
Share on other sites

11 answers to this question

Recommended Posts

  • 0

Thanks, but that throws up quite a few problems

www.* is fine

http://www.* makes http://http//www.(whateverelse)

http://* doesnt do anything

However changing it to

'/(.*?)(((http:\/\/)*(www\S*))|(http:\/\/\S*))/' does the same as above... apart from http://* catches it but only puts a / after http://

Thanks for trying, any more help would be appreciated

Thanks,

Garnett

  • 0

<?php

	$test_data = "Test 1: http://www.google.co.uk\n";
	$test_data.= "Test 2: http://maps.google.co.uk\n";
	$test_data.= "Test 3: http://google.co.uk\n";
	$test_data.= "Test 4: www.google.co.uk\n";
	$test_data.= "Test 5: maps.google.co.uk\n";

	echo preg_replace('/(\s|\()((http:\/\/([^:\s]+:?[^@\s]+@)?[_\.0-9a-z-]*(:\d+)?([\/?#]\S*[^),\.\s])?)|(www\.[_\.0-9a-z-]*(:\d+)?([\/?#]\S*[^),\.\s])?))/iu', '$1<a href="$2">(Click)</a>', $test_data);
?>

Matches everything but the last maps.google.co.uk, which is what you want it do, right?

  • 0

Indeed. You can simply prefix the string with a space before running it through preg_replace then remove it afterwards. If you put the whole lot in a function you can make it entirely transparent to the calling script:

<?php

function links($str)
{
	$str = ' '. $str;

	$str = preg_replace('/(\s|\()((http:\/\/([^:\s]+:?[^@\s]+@)?[_\.0-9a-z-]*(:\d+)?([\/?#]\S*[^),\.\s])?)|(www\.[_\.0-9a-z-]*(:\d+)?([\/?#]\S*[^),\.\s])?))/iu', '$1<a href="$2">(Click)</a>', $str);

	return substr($str, 1);
}

$test_data = "http://www.google.co.uk should link to Google\n";
$test_data.= "Google Maps is here: http://maps.google.co.uk\n";
$test_data.= "http://google.co.uk\n";
$test_data.= "www.google.co.uk is also Google\n";
$test_data.= "Finally, maps.google.co.uk doesn't link anywhere\n";

echo links($test_data);

?>

  • 0

Thanks, that works good, however if i put in jhhd www.google.co.uk then the link doesn't work, it puts in the link of my website followed by the link put in

e.g. http://calltoarms.net/forums/GarnettsC2ASi...ww.google.co.uk

  • 0

Here's version 3 :)

Had to split the regular expression into two separate calls, but it still does the same thing.

<?php

function links($str)
{
	$str = ' '. $str;

	$str = preg_replace('/(\s|\()(http:\/\/([^:\s]+:?[^@\s]+@)?[_\.0-9a-z-]*(:\d+)?([\/?#]\S*[^),\.\s])?)/iu', '$1<a href="$2">$2</a>', $str);
	$str = preg_replace('/(\s|\()(www\.[_\.0-9a-z-]*(:\d+)?([\/?#]\S*[^),\.\s])?)/iu', '$1<a href="http://$2">$2</a>', $str);

	return substr($str, 1);
}

$test_data = "http://www.google.co.uk should link to Google\n";
$test_data.= "Google Maps is here: [url="http://maps.google.co.uk\n";"]http://maps.google.co.uk\n";[/url]
$test_data.= "http://google.co.uk\n";
$test_data.= "www.google.co.uk is also Google\n";
$test_data.= "While maps.google.co.uk doesn't link anywhere\n";

echo links($test_data);

?>

  • 0

(\s|\() - Matches any whitespace characters or an open bracket in case of user putting URL in brackets

(http:\/\/ Matches http://

([^:\s]+:?[^@\s]+@)? Matches optional username:password@ portion

[_\.0-9a-z-]* Matches 0 or more of _ . 0-9 a-z and -

(:\d+)? Matches optional port number that follows the domain.

([\/?#]\S*[^),\.\s])?) Matches optional slash or hash (for named achors), followed by any non-white-space character, followed by any character that's not a comma, full-stop or white-space character in the URL. This is basically everything that appears after the domain name (i.e. the path and filename). It stops at full-stop or comma as these should technically be encoded if present in the URL. This entire match is also optional as noted by the question mark just before the last closing bracket.

The second expression is a bit simpler. It doesn't include the username:password bit and changes the main match so that it is prefixed by "www.", otherwise it is exactly the same.

This topic is now closed to further replies.
  • Recently Browsing   0 members

    • No registered users viewing this page.