• 0

I need a reg_ex for urls


Go to solution Solved by mollick2,

Question

Original Poster    324

hello! i need a regular expression to check a string (a url) to see if its a landing URL (e.g. neowin.net) or if its a sub page (neowin.net/forums) orrrr if possible to see if the url has ANY file extensions (css, html, pngs)

Link to post
Share on other sites

7 answers to this question

Recommended Posts

  • 0
mollick2    19

You can actually accomplish this with a single regex expression. No need for a bunch of explodes and multiple regex expressions.

 

Edit: Heres a link so you can test it, http://regex101.com/r/xT3wZ9/1

(^https?:\/\/?)?([w]{3}|[a-z0-9]+\.)?([a-z0-9]+)(\.[a-z\.]{2,6}+)(\/[a-z0-9\-\_\/]+)+(\.[a-z0-9]+)?(\?[a-z0-9\=\-\_\&\+]*)?

Using this in a preg_match will get you you the different parts of the url.

$url = "http://www.neowin.net/forum/topic/1229911-i-need-a-reg-ex-for-urls/";
$regex = "/(^https?:\/\/?)?([w]{3}|[a-z0-9]+\.)?([a-z0-9]+)(\.[a-z\.]{2,6}+)(\/[a-z0-9\-\_\/]+)+(\.[a-z0-9]+)?(\?[a-z0-9\=\-\_\&\+]*)?/i";
$url_parts = preg_match($regex, $url);

/**
* 
* $url_parts:
*
* Array
* (
*     [0] => http://www.neowin.net/forum/topic/1229911-i-need-a-reg-ex-for-urls/
*     [1] => http://  	// Protocal
*     [2] => www.  		//Subdomain or www
*     [3] => neowin  	//Domain
*     [4] => .net  		//TLD
*     [5] => /forum/topic/1229911-i-need-a-reg-ex-for-urls/  //directory & file
* )
*
**/

It also works for non rewrite urls

$url = "http://www.example.net/app/index.php?pagename=account&id=0983477868";
$regex = "/(^https?:\/\/?)?([w]{3}|[a-z0-9]+\.)?([a-z0-9]+)(\.[a-z\.]{2,6}+)(\/[a-z0-9\-\_\/]+)+(\.[a-z0-9]+)?(\?[a-z0-9\=\-\_\&\+]*)?/i";
$url_parts = preg_match($regex, $url);

/**
*
* $url_parts
*
* Array
* (
*    [0] => http://www.example.net/app/index.php?pagename=account&id=0983477868
*    [1] => http://		//protocal
*    [2] => www.		//subdomain or www
*    [3] => example		//domain name
*    [4] => .net 		//TLD
*    [5] => /app/index 	//Directory & file
*    [6] => .php 		//File extension
*    [7] => ?pagename=account&id=0983477868 //Query
* )
* 
**/
  • Like 2
Link to post
Share on other sites
  • 0
Seahorsepip    610

$urlend = explode("/", $url)[1];

$filetype = explode(".",$url);

$filetype = $filetype[count($filetype-1)];

if($urlend){

echo "url has something behind the domain";

if($filetype){

echo "filetype is $filetype";

}

}

Just some messy code written on my phone :P

Link to post
Share on other sites
  • 0
Seahorsepip    610

Code fails with pages like neowin.net/?lol=true

It sees the stuff behind the url as a subpage, you could add a extra check to look if the $urlend starts with "?".

Link to post
Share on other sites
  • 0
Original Poster    324

$urlend = explode("/", $url)[1];

$filetype = explode(".",$url);

$filetype = $filetype[count($filetype-1)];

if($urlend){

echo "url has something behind the domain";

if($filetype){

echo "filetype is $filetype";

}

}

Just some messy code written on my phone :p

yea I did it in the end lol though I went into alot more details I have around 7 explodes and 4 reg_expression :P to rip apart logs, file types, urls, error codes, and lots more xD i built my own not to shabby squid analytics system in about half a day... quite happy with my self

  • Like 1
Link to post
Share on other sites
  • 0
Seahorsepip    610

 

You can actually accomplish this with a single regex expression. No need for a bunch of explodes and multiple regex expressions.

 

Edit: Heres a link so you can test it, http://regex101.com/r/xT3wZ9/1

(^https?:\/\/?)?([w]{3}|[a-z0-9]+\.)?([a-z0-9]+)(\.[a-z\.]{2,6}+)(\/[a-z0-9\-\_\/]+)+(\.[a-z0-9]+)?(\?[a-z0-9\=\-\_\&\+]*)?

Using this in a preg_match will get you you the different parts of the url.

$url = "http://www.neowin.net/forum/topic/1229911-i-need-a-reg-ex-for-urls/";
$regex = "/(^https?:\/\/?)?([w]{3}|[a-z0-9]+\.)?([a-z0-9]+)(\.[a-z\.]{2,6}+)(\/[a-z0-9\-\_\/]+)+(\.[a-z0-9]+)?(\?[a-z0-9\=\-\_\&\+]*)?/i";
$url_parts = preg_match($regex, $url);

/**
* 
* $url_parts:
*
* Array
* (
*     [0] => http://www.neowin.net/forum/topic/1229911-i-need-a-reg-ex-for-urls/
*     [1] => http://  	// Protocal
*     [2] => www.  		//Subdomain or www
*     [3] => neowin  	//Domain
*     [4] => .net  		//TLD
*     [5] => /forum/topic/1229911-i-need-a-reg-ex-for-urls/  //directory & file
* )
*
**/

It also works for non rewrite urls

$url = "http://www.example.net/app/index.php?pagename=account&id=0983477868";
$regex = "/(^https?:\/\/?)?([w]{3}|[a-z0-9]+\.)?([a-z0-9]+)(\.[a-z\.]{2,6}+)(\/[a-z0-9\-\_\/]+)+(\.[a-z0-9]+)?(\?[a-z0-9\=\-\_\&\+]*)?/i";
$url_parts = preg_match($regex, $url);

/**
*
* $url_parts
*
* Array
* (
*    [0] => http://www.example.net/app/index.php?pagename=account&id=0983477868
*    [1] => http://		//protocal
*    [2] => www.		//subdomain or www
*    [3] => example		//domain name
*    [4] => .net 		//TLD
*    [5] => /app/index 	//Directory & file
*    [6] => .php 		//File extension
*    [7] => ?pagename=account&id=0983477868 //Query
* )
* 
**/

Nice regex expression, I really need to learn more about regex myself :P

Link to post
Share on other sites
  • 0
Original Poster    324

 

You can actually accomplish this with a single regex expression. No need for a bunch of explodes and multiple regex expressions.

 

Edit: Heres a link so you can test it, http://regex101.com/r/xT3wZ9/1

(^https?:\/\/?)?([w]{3}|[a-z0-9]+\.)?([a-z0-9]+)(\.[a-z\.]{2,6}+)(\/[a-z0-9\-\_\/]+)+(\.[a-z0-9]+)?(\?[a-z0-9\=\-\_\&\+]*)?

Using this in a preg_match will get you you the different parts of the url.

$url = "http://www.neowin.net/forum/topic/1229911-i-need-a-reg-ex-for-urls/";
$regex = "/(^https?:\/\/?)?([w]{3}|[a-z0-9]+\.)?([a-z0-9]+)(\.[a-z\.]{2,6}+)(\/[a-z0-9\-\_\/]+)+(\.[a-z0-9]+)?(\?[a-z0-9\=\-\_\&\+]*)?/i";
$url_parts = preg_match($regex, $url);

/**
* 
* $url_parts:
*
* Array
* (
*     [0] => http://www.neowin.net/forum/topic/1229911-i-need-a-reg-ex-for-urls/
*     [1] => http://  	// Protocal
*     [2] => www.  		//Subdomain or www
*     [3] => neowin  	//Domain
*     [4] => .net  		//TLD
*     [5] => /forum/topic/1229911-i-need-a-reg-ex-for-urls/  //directory & file
* )
*
**/

It also works for non rewrite urls

$url = "http://www.example.net/app/index.php?pagename=account&id=0983477868";
$regex = "/(^https?:\/\/?)?([w]{3}|[a-z0-9]+\.)?([a-z0-9]+)(\.[a-z\.]{2,6}+)(\/[a-z0-9\-\_\/]+)+(\.[a-z0-9]+)?(\?[a-z0-9\=\-\_\&\+]*)?/i";
$url_parts = preg_match($regex, $url);

/**
*
* $url_parts
*
* Array
* (
*    [0] => http://www.example.net/app/index.php?pagename=account&id=0983477868
*    [1] => http://		//protocal
*    [2] => www.		//subdomain or www
*    [3] => example		//domain name
*    [4] => .net 		//TLD
*    [5] => /app/index 	//Directory & file
*    [6] => .php 		//File extension
*    [7] => ?pagename=account&id=0983477868 //Query
* )
* 
**/

wow very nice xD I have never been that good with reg_ex as i hardly use it so never needed to know more than the basics.... thank you for this this will help me alot in this and future projects

Link to post
Share on other sites
  • 0
mollick2    19

Yea I use to hate working with regex until recently. Using tools like regex101 or regexpal makes it a lot easier to see what exactly your doing.

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    No registered users viewing this page.