Jump to content



Photo

I need a reg_ex for urls

Answered Go to the full post

  • Please log in to reply
7 replies to this topic

#1 Original Poster

Original Poster

    Systems Developer

  • Tech Issues Solved: 1
  • Joined: 15-July 08
  • Location: my room
  • OS: windows 7/8, Kali, ubuntu, OSx 10.9
  • Phone: Android

Posted 17 September 2014 - 10:20

hello! i need a regular expression to check a string (a url) to see if its a landing URL (e.g. neowin.net) or if its a sub page (neowin.net/forums) orrrr if possible to see if the url has ANY file extensions (css, html, pngs)



Best Answer mollick2 , 20 September 2014 - 20:38

You can actually accomplish this with a single regex expression. No need for a bunch of explodes and multiple regex expressions.

 

Edit: Heres a link so you can test it, http://regex101.com/r/xT3wZ9/1

(^https?:\/\/?)?([w]{3}|[a-z0-9]+\.)?([a-z0-9]+)(\.[a-z\.]{2,6}+)(\/[a-z0-9\-\_\/]+)+(\.[a-z0-9]+)?(\?[a-z0-9\=\-\_\&\+]*)?

Using this in a preg_match will get you you the different parts of the url.

$url = "http://www.neowin.net/forum/topic/1229911-i-need-a-reg-ex-for-urls/";
$regex = "/(^https?:\/\/?)?([w]{3}|[a-z0-9]+\.)?([a-z0-9]+)(\.[a-z\.]{2,6}+)(\/[a-z0-9\-\_\/]+)+(\.[a-z0-9]+)?(\?[a-z0-9\=\-\_\&\+]*)?/i";
$url_parts = preg_match($regex, $url);

/**
* 
* $url_parts:
*
* Array
* (
*     [0] => http://www.neowin.net/forum/topic/1229911-i-need-a-reg-ex-for-urls/
*     [1] => http://  	// Protocal
*     [2] => www.  		//Subdomain or www
*     [3] => neowin  	//Domain
*     [4] => .net  		//TLD
*     [5] => /forum/topic/1229911-i-need-a-reg-ex-for-urls/  //directory & file
* )
*
**/

It also works for non rewrite urls

$url = "http://www.example.net/app/index.php?pagename=account&id=0983477868";
$regex = "/(^https?:\/\/?)?([w]{3}|[a-z0-9]+\.)?([a-z0-9]+)(\.[a-z\.]{2,6}+)(\/[a-z0-9\-\_\/]+)+(\.[a-z0-9]+)?(\?[a-z0-9\=\-\_\&\+]*)?/i";
$url_parts = preg_match($regex, $url);

/**
*
* $url_parts
*
* Array
* (
*    [0] => http://www.example.net/app/index.php?pagename=account&id=0983477868
*    [1] => http://		//protocal
*    [2] => www.		//subdomain or www
*    [3] => example		//domain name
*    [4] => .net 		//TLD
*    [5] => /app/index 	//Directory & file
*    [6] => .php 		//File extension
*    [7] => ?pagename=account&id=0983477868 //Query
* )
* 
**/
Go to the full post



#2 +Seahorsepip

Seahorsepip

    http://seapip.com

  • Tech Issues Solved: 27
  • Joined: 23-January 11
  • Location: Netherlands
  • OS: Windows 8.1 Pro
  • Phone: Nexus 5

Posted 17 September 2014 - 21:04

$urlend = explode("/", $url)[1];
$filetype = explode(".",$url);
$filetype = $filetype[count($filetype-1)];

if($urlend){
echo "url has something behind the domain";
if($filetype){
echo "filetype is $filetype";
}
}


Just some messy code written on my phone :p

#3 +Seahorsepip

Seahorsepip

    http://seapip.com

  • Tech Issues Solved: 27
  • Joined: 23-January 11
  • Location: Netherlands
  • OS: Windows 8.1 Pro
  • Phone: Nexus 5

Posted 17 September 2014 - 21:09

Code fails with pages like neowin.net/?lol=true
It sees the stuff behind the url as a subpage, you could add a extra check to look if the $urlend starts with "?".

#4 OP Original Poster

Original Poster

    Systems Developer

  • Tech Issues Solved: 1
  • Joined: 15-July 08
  • Location: my room
  • OS: windows 7/8, Kali, ubuntu, OSx 10.9
  • Phone: Android

Posted 19 September 2014 - 12:35

$urlend = explode("/", $url)[1];
$filetype = explode(".",$url);
$filetype = $filetype[count($filetype-1)];

if($urlend){
echo "url has something behind the domain";
if($filetype){
echo "filetype is $filetype";
}
}


Just some messy code written on my phone :p

yea I did it in the end lol though I went into alot more details I have around 7 explodes and 4 reg_expression :p to rip apart logs, file types, urls, error codes, and lots more xD i built my own not to shabby squid analytics system in about half a day... quite happy with my self



#5 mollick2

mollick2

    Neowinian

  • Tech Issues Solved: 3
  • Joined: 12-June 08

Posted 20 September 2014 - 20:38   Best Answer

You can actually accomplish this with a single regex expression. No need for a bunch of explodes and multiple regex expressions.

 

Edit: Heres a link so you can test it, http://regex101.com/r/xT3wZ9/1

(^https?:\/\/?)?([w]{3}|[a-z0-9]+\.)?([a-z0-9]+)(\.[a-z\.]{2,6}+)(\/[a-z0-9\-\_\/]+)+(\.[a-z0-9]+)?(\?[a-z0-9\=\-\_\&\+]*)?

Using this in a preg_match will get you you the different parts of the url.

$url = "http://www.neowin.net/forum/topic/1229911-i-need-a-reg-ex-for-urls/";
$regex = "/(^https?:\/\/?)?([w]{3}|[a-z0-9]+\.)?([a-z0-9]+)(\.[a-z\.]{2,6}+)(\/[a-z0-9\-\_\/]+)+(\.[a-z0-9]+)?(\?[a-z0-9\=\-\_\&\+]*)?/i";
$url_parts = preg_match($regex, $url);

/**
* 
* $url_parts:
*
* Array
* (
*     [0] => http://www.neowin.net/forum/topic/1229911-i-need-a-reg-ex-for-urls/
*     [1] => http://  	// Protocal
*     [2] => www.  		//Subdomain or www
*     [3] => neowin  	//Domain
*     [4] => .net  		//TLD
*     [5] => /forum/topic/1229911-i-need-a-reg-ex-for-urls/  //directory & file
* )
*
**/

It also works for non rewrite urls

$url = "http://www.example.net/app/index.php?pagename=account&id=0983477868";
$regex = "/(^https?:\/\/?)?([w]{3}|[a-z0-9]+\.)?([a-z0-9]+)(\.[a-z\.]{2,6}+)(\/[a-z0-9\-\_\/]+)+(\.[a-z0-9]+)?(\?[a-z0-9\=\-\_\&\+]*)?/i";
$url_parts = preg_match($regex, $url);

/**
*
* $url_parts
*
* Array
* (
*    [0] => http://www.example.net/app/index.php?pagename=account&id=0983477868
*    [1] => http://		//protocal
*    [2] => www.		//subdomain or www
*    [3] => example		//domain name
*    [4] => .net 		//TLD
*    [5] => /app/index 	//Directory & file
*    [6] => .php 		//File extension
*    [7] => ?pagename=account&id=0983477868 //Query
* )
* 
**/


#6 +Seahorsepip

Seahorsepip

    http://seapip.com

  • Tech Issues Solved: 27
  • Joined: 23-January 11
  • Location: Netherlands
  • OS: Windows 8.1 Pro
  • Phone: Nexus 5

Posted 22 September 2014 - 05:22

 

You can actually accomplish this with a single regex expression. No need for a bunch of explodes and multiple regex expressions.

 

Edit: Heres a link so you can test it, http://regex101.com/r/xT3wZ9/1

(^https?:\/\/?)?([w]{3}|[a-z0-9]+\.)?([a-z0-9]+)(\.[a-z\.]{2,6}+)(\/[a-z0-9\-\_\/]+)+(\.[a-z0-9]+)?(\?[a-z0-9\=\-\_\&\+]*)?

Using this in a preg_match will get you you the different parts of the url.

$url = "http://www.neowin.net/forum/topic/1229911-i-need-a-reg-ex-for-urls/";
$regex = "/(^https?:\/\/?)?([w]{3}|[a-z0-9]+\.)?([a-z0-9]+)(\.[a-z\.]{2,6}+)(\/[a-z0-9\-\_\/]+)+(\.[a-z0-9]+)?(\?[a-z0-9\=\-\_\&\+]*)?/i";
$url_parts = preg_match($regex, $url);

/**
* 
* $url_parts:
*
* Array
* (
*     [0] => http://www.neowin.net/forum/topic/1229911-i-need-a-reg-ex-for-urls/
*     [1] => http://  	// Protocal
*     [2] => www.  		//Subdomain or www
*     [3] => neowin  	//Domain
*     [4] => .net  		//TLD
*     [5] => /forum/topic/1229911-i-need-a-reg-ex-for-urls/  //directory & file
* )
*
**/

It also works for non rewrite urls

$url = "http://www.example.net/app/index.php?pagename=account&id=0983477868";
$regex = "/(^https?:\/\/?)?([w]{3}|[a-z0-9]+\.)?([a-z0-9]+)(\.[a-z\.]{2,6}+)(\/[a-z0-9\-\_\/]+)+(\.[a-z0-9]+)?(\?[a-z0-9\=\-\_\&\+]*)?/i";
$url_parts = preg_match($regex, $url);

/**
*
* $url_parts
*
* Array
* (
*    [0] => http://www.example.net/app/index.php?pagename=account&id=0983477868
*    [1] => http://		//protocal
*    [2] => www.		//subdomain or www
*    [3] => example		//domain name
*    [4] => .net 		//TLD
*    [5] => /app/index 	//Directory & file
*    [6] => .php 		//File extension
*    [7] => ?pagename=account&id=0983477868 //Query
* )
* 
**/

Nice regex expression, I really need to learn more about regex myself :p



#7 OP Original Poster

Original Poster

    Systems Developer

  • Tech Issues Solved: 1
  • Joined: 15-July 08
  • Location: my room
  • OS: windows 7/8, Kali, ubuntu, OSx 10.9
  • Phone: Android

Posted 22 September 2014 - 07:24

 

You can actually accomplish this with a single regex expression. No need for a bunch of explodes and multiple regex expressions.

 

Edit: Heres a link so you can test it, http://regex101.com/r/xT3wZ9/1

(^https?:\/\/?)?([w]{3}|[a-z0-9]+\.)?([a-z0-9]+)(\.[a-z\.]{2,6}+)(\/[a-z0-9\-\_\/]+)+(\.[a-z0-9]+)?(\?[a-z0-9\=\-\_\&\+]*)?

Using this in a preg_match will get you you the different parts of the url.

$url = "http://www.neowin.net/forum/topic/1229911-i-need-a-reg-ex-for-urls/";
$regex = "/(^https?:\/\/?)?([w]{3}|[a-z0-9]+\.)?([a-z0-9]+)(\.[a-z\.]{2,6}+)(\/[a-z0-9\-\_\/]+)+(\.[a-z0-9]+)?(\?[a-z0-9\=\-\_\&\+]*)?/i";
$url_parts = preg_match($regex, $url);

/**
* 
* $url_parts:
*
* Array
* (
*     [0] => http://www.neowin.net/forum/topic/1229911-i-need-a-reg-ex-for-urls/
*     [1] => http://  	// Protocal
*     [2] => www.  		//Subdomain or www
*     [3] => neowin  	//Domain
*     [4] => .net  		//TLD
*     [5] => /forum/topic/1229911-i-need-a-reg-ex-for-urls/  //directory & file
* )
*
**/

It also works for non rewrite urls

$url = "http://www.example.net/app/index.php?pagename=account&id=0983477868";
$regex = "/(^https?:\/\/?)?([w]{3}|[a-z0-9]+\.)?([a-z0-9]+)(\.[a-z\.]{2,6}+)(\/[a-z0-9\-\_\/]+)+(\.[a-z0-9]+)?(\?[a-z0-9\=\-\_\&\+]*)?/i";
$url_parts = preg_match($regex, $url);

/**
*
* $url_parts
*
* Array
* (
*    [0] => http://www.example.net/app/index.php?pagename=account&id=0983477868
*    [1] => http://		//protocal
*    [2] => www.		//subdomain or www
*    [3] => example		//domain name
*    [4] => .net 		//TLD
*    [5] => /app/index 	//Directory & file
*    [6] => .php 		//File extension
*    [7] => ?pagename=account&id=0983477868 //Query
* )
* 
**/

wow very nice xD I have never been that good with reg_ex as i hardly use it so never needed to know more than the basics.... thank you for this this will help me alot in this and future projects



#8 mollick2

mollick2

    Neowinian

  • Tech Issues Solved: 3
  • Joined: 12-June 08

Posted 22 September 2014 - 22:53

Yea I use to hate working with regex until recently. Using tools like regex101 or regexpal makes it a lot easier to see what exactly your doing.