• Sign in to Neowin Faster!

    Create an account on Neowin to contribute and support the site.

  • 0
Sign in to follow this  

I need a reg_ex for urls

Question

Original Poster    324

hello! i need a regular expression to check a string (a url) to see if its a landing URL (e.g. neowin.net) or if its a sub page (neowin.net/forums) orrrr if possible to see if the url has ANY file extensions (css, html, pngs)

Share this post


Link to post
Share on other sites

7 answers to this question

Recommended Posts

  • 0
mollick2    19

You can actually accomplish this with a single regex expression. No need for a bunch of explodes and multiple regex expressions.

 

Edit: Heres a link so you can test it, http://regex101.com/r/xT3wZ9/1

(^https?:\/\/?)?([w]{3}|[a-z0-9]+\.)?([a-z0-9]+)(\.[a-z\.]{2,6}+)(\/[a-z0-9\-\_\/]+)+(\.[a-z0-9]+)?(\?[a-z0-9\=\-\_\&\+]*)?

Using this in a preg_match will get you you the different parts of the url.

$url = "http://www.neowin.net/forum/topic/1229911-i-need-a-reg-ex-for-urls/";
$regex = "/(^https?:\/\/?)?([w]{3}|[a-z0-9]+\.)?([a-z0-9]+)(\.[a-z\.]{2,6}+)(\/[a-z0-9\-\_\/]+)+(\.[a-z0-9]+)?(\?[a-z0-9\=\-\_\&\+]*)?/i";
$url_parts = preg_match($regex, $url);

/**
* 
* $url_parts:
*
* Array
* (
*     [0] => http://www.neowin.net/forum/topic/1229911-i-need-a-reg-ex-for-urls/
*     [1] => http://  	// Protocal
*     [2] => www.  		//Subdomain or www
*     [3] => neowin  	//Domain
*     [4] => .net  		//TLD
*     [5] => /forum/topic/1229911-i-need-a-reg-ex-for-urls/  //directory & file
* )
*
**/

It also works for non rewrite urls

$url = "http://www.example.net/app/index.php?pagename=account&id=0983477868";
$regex = "/(^https?:\/\/?)?([w]{3}|[a-z0-9]+\.)?([a-z0-9]+)(\.[a-z\.]{2,6}+)(\/[a-z0-9\-\_\/]+)+(\.[a-z0-9]+)?(\?[a-z0-9\=\-\_\&\+]*)?/i";
$url_parts = preg_match($regex, $url);

/**
*
* $url_parts
*
* Array
* (
*    [0] => http://www.example.net/app/index.php?pagename=account&id=0983477868
*    [1] => http://		//protocal
*    [2] => www.		//subdomain or www
*    [3] => example		//domain name
*    [4] => .net 		//TLD
*    [5] => /app/index 	//Directory & file
*    [6] => .php 		//File extension
*    [7] => ?pagename=account&id=0983477868 //Query
* )
* 
**/
  • Like 2

Share this post


Link to post
Share on other sites
  • 0
Seahorsepip    610

$urlend = explode("/", $url)[1];

$filetype = explode(".",$url);

$filetype = $filetype[count($filetype-1)];

if($urlend){

echo "url has something behind the domain";

if($filetype){

echo "filetype is $filetype";

}

}

Just some messy code written on my phone :P

Share this post


Link to post
Share on other sites
  • 0
Seahorsepip    610

Code fails with pages like neowin.net/?lol=true

It sees the stuff behind the url as a subpage, you could add a extra check to look if the $urlend starts with "?".

Share this post


Link to post
Share on other sites
  • 0
Original Poster    324

$urlend = explode("/", $url)[1];

$filetype = explode(".",$url);

$filetype = $filetype[count($filetype-1)];

if($urlend){

echo "url has something behind the domain";

if($filetype){

echo "filetype is $filetype";

}

}

Just some messy code written on my phone :p

yea I did it in the end lol though I went into alot more details I have around 7 explodes and 4 reg_expression :P to rip apart logs, file types, urls, error codes, and lots more xD i built my own not to shabby squid analytics system in about half a day... quite happy with my self

  • Like 1

Share this post


Link to post
Share on other sites
  • 0
Seahorsepip    610

 

You can actually accomplish this with a single regex expression. No need for a bunch of explodes and multiple regex expressions.

 

Edit: Heres a link so you can test it, http://regex101.com/r/xT3wZ9/1

(^https?:\/\/?)?([w]{3}|[a-z0-9]+\.)?([a-z0-9]+)(\.[a-z\.]{2,6}+)(\/[a-z0-9\-\_\/]+)+(\.[a-z0-9]+)?(\?[a-z0-9\=\-\_\&\+]*)?

Using this in a preg_match will get you you the different parts of the url.

$url = "http://www.neowin.net/forum/topic/1229911-i-need-a-reg-ex-for-urls/";
$regex = "/(^https?:\/\/?)?([w]{3}|[a-z0-9]+\.)?([a-z0-9]+)(\.[a-z\.]{2,6}+)(\/[a-z0-9\-\_\/]+)+(\.[a-z0-9]+)?(\?[a-z0-9\=\-\_\&\+]*)?/i";
$url_parts = preg_match($regex, $url);

/**
* 
* $url_parts:
*
* Array
* (
*     [0] => http://www.neowin.net/forum/topic/1229911-i-need-a-reg-ex-for-urls/
*     [1] => http://  	// Protocal
*     [2] => www.  		//Subdomain or www
*     [3] => neowin  	//Domain
*     [4] => .net  		//TLD
*     [5] => /forum/topic/1229911-i-need-a-reg-ex-for-urls/  //directory & file
* )
*
**/

It also works for non rewrite urls

$url = "http://www.example.net/app/index.php?pagename=account&id=0983477868";
$regex = "/(^https?:\/\/?)?([w]{3}|[a-z0-9]+\.)?([a-z0-9]+)(\.[a-z\.]{2,6}+)(\/[a-z0-9\-\_\/]+)+(\.[a-z0-9]+)?(\?[a-z0-9\=\-\_\&\+]*)?/i";
$url_parts = preg_match($regex, $url);

/**
*
* $url_parts
*
* Array
* (
*    [0] => http://www.example.net/app/index.php?pagename=account&id=0983477868
*    [1] => http://		//protocal
*    [2] => www.		//subdomain or www
*    [3] => example		//domain name
*    [4] => .net 		//TLD
*    [5] => /app/index 	//Directory & file
*    [6] => .php 		//File extension
*    [7] => ?pagename=account&id=0983477868 //Query
* )
* 
**/

Nice regex expression, I really need to learn more about regex myself :P

Share this post


Link to post
Share on other sites
  • 0
Original Poster    324

 

You can actually accomplish this with a single regex expression. No need for a bunch of explodes and multiple regex expressions.

 

Edit: Heres a link so you can test it, http://regex101.com/r/xT3wZ9/1

(^https?:\/\/?)?([w]{3}|[a-z0-9]+\.)?([a-z0-9]+)(\.[a-z\.]{2,6}+)(\/[a-z0-9\-\_\/]+)+(\.[a-z0-9]+)?(\?[a-z0-9\=\-\_\&\+]*)?

Using this in a preg_match will get you you the different parts of the url.

$url = "http://www.neowin.net/forum/topic/1229911-i-need-a-reg-ex-for-urls/";
$regex = "/(^https?:\/\/?)?([w]{3}|[a-z0-9]+\.)?([a-z0-9]+)(\.[a-z\.]{2,6}+)(\/[a-z0-9\-\_\/]+)+(\.[a-z0-9]+)?(\?[a-z0-9\=\-\_\&\+]*)?/i";
$url_parts = preg_match($regex, $url);

/**
* 
* $url_parts:
*
* Array
* (
*     [0] => http://www.neowin.net/forum/topic/1229911-i-need-a-reg-ex-for-urls/
*     [1] => http://  	// Protocal
*     [2] => www.  		//Subdomain or www
*     [3] => neowin  	//Domain
*     [4] => .net  		//TLD
*     [5] => /forum/topic/1229911-i-need-a-reg-ex-for-urls/  //directory & file
* )
*
**/

It also works for non rewrite urls

$url = "http://www.example.net/app/index.php?pagename=account&id=0983477868";
$regex = "/(^https?:\/\/?)?([w]{3}|[a-z0-9]+\.)?([a-z0-9]+)(\.[a-z\.]{2,6}+)(\/[a-z0-9\-\_\/]+)+(\.[a-z0-9]+)?(\?[a-z0-9\=\-\_\&\+]*)?/i";
$url_parts = preg_match($regex, $url);

/**
*
* $url_parts
*
* Array
* (
*    [0] => http://www.example.net/app/index.php?pagename=account&id=0983477868
*    [1] => http://		//protocal
*    [2] => www.		//subdomain or www
*    [3] => example		//domain name
*    [4] => .net 		//TLD
*    [5] => /app/index 	//Directory & file
*    [6] => .php 		//File extension
*    [7] => ?pagename=account&id=0983477868 //Query
* )
* 
**/

wow very nice xD I have never been that good with reg_ex as i hardly use it so never needed to know more than the basics.... thank you for this this will help me alot in this and future projects

Share this post


Link to post
Share on other sites
  • 0
mollick2    19

Yea I use to hate working with regex until recently. Using tools like regex101 or regexpal makes it a lot easier to see what exactly your doing.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

  • Recently Browsing   0 members

    No registered users viewing this page.