• 0

I need a reg_ex for urls


Question

hello! i need a regular expression to check a string (a url) to see if its a landing URL (e.g. neowin.net) or if its a sub page (neowin.net/forums) orrrr if possible to see if the url has ANY file extensions (css, html, pngs)

Link to comment
Share on other sites

7 answers to this question

Recommended Posts

  • 0

You can actually accomplish this with a single regex expression. No need for a bunch of explodes and multiple regex expressions.

 

Edit: Heres a link so you can test it, http://regex101.com/r/xT3wZ9/1

(^https?:\/\/?)?([w]{3}|[a-z0-9]+\.)?([a-z0-9]+)(\.[a-z\.]{2,6}+)(\/[a-z0-9\-\_\/]+)+(\.[a-z0-9]+)?(\?[a-z0-9\=\-\_\&\+]*)?

Using this in a preg_match will get you you the different parts of the url.

$url = "https://www.neowin.net/forum/topic/1229911-i-need-a-reg-ex-for-urls/";
$regex = "/(^https?:\/\/?)?([w]{3}|[a-z0-9]+\.)?([a-z0-9]+)(\.[a-z\.]{2,6}+)(\/[a-z0-9\-\_\/]+)+(\.[a-z0-9]+)?(\?[a-z0-9\=\-\_\&\+]*)?/i";
$url_parts = preg_match($regex, $url);

/**
* 
* $url_parts:
*
* Array
* (
*     [0] => https://www.neowin.net/forum/topic/1229911-i-need-a-reg-ex-for-urls/
*     [1] => http://  	// Protocal
*     [2] => www.  		//Subdomain or www
*     [3] => neowin  	//Domain
*     [4] => .net  		//TLD
*     [5] => /forum/topic/1229911-i-need-a-reg-ex-for-urls/  //directory & file
* )
*
**/

It also works for non rewrite urls

$url = "http://www.example.net/app/index.php?pagename=account&id=0983477868";
$regex = "/(^https?:\/\/?)?([w]{3}|[a-z0-9]+\.)?([a-z0-9]+)(\.[a-z\.]{2,6}+)(\/[a-z0-9\-\_\/]+)+(\.[a-z0-9]+)?(\?[a-z0-9\=\-\_\&\+]*)?/i";
$url_parts = preg_match($regex, $url);

/**
*
* $url_parts
*
* Array
* (
*    [0] => http://www.example.net/app/index.php?pagename=account&id=0983477868
*    [1] => http://		//protocal
*    [2] => www.		//subdomain or www
*    [3] => example		//domain name
*    [4] => .net 		//TLD
*    [5] => /app/index 	//Directory & file
*    [6] => .php 		//File extension
*    [7] => ?pagename=account&id=0983477868 //Query
* )
* 
**/
  • Like 2
Link to comment
Share on other sites

  • 0

$urlend = explode("/", $url)[1];

$filetype = explode(".",$url);

$filetype = $filetype[count($filetype-1)];

if($urlend){

echo "url has something behind the domain";

if($filetype){

echo "filetype is $filetype";

}

}

Just some messy code written on my phone :P

Link to comment
Share on other sites

  • 0

Code fails with pages like neowin.net/?lol=true

It sees the stuff behind the url as a subpage, you could add a extra check to look if the $urlend starts with "?".

Link to comment
Share on other sites

  • 0

$urlend = explode("/", $url)[1];

$filetype = explode(".",$url);

$filetype = $filetype[count($filetype-1)];

if($urlend){

echo "url has something behind the domain";

if($filetype){

echo "filetype is $filetype";

}

}

Just some messy code written on my phone :p

yea I did it in the end lol though I went into alot more details I have around 7 explodes and 4 reg_expression :P to rip apart logs, file types, urls, error codes, and lots more xD i built my own not to shabby squid analytics system in about half a day... quite happy with my self

  • Like 1
Link to comment
Share on other sites

  • 0

 

You can actually accomplish this with a single regex expression. No need for a bunch of explodes and multiple regex expressions.

 

Edit: Heres a link so you can test it, http://regex101.com/r/xT3wZ9/1

(^https?:\/\/?)?([w]{3}|[a-z0-9]+\.)?([a-z0-9]+)(\.[a-z\.]{2,6}+)(\/[a-z0-9\-\_\/]+)+(\.[a-z0-9]+)?(\?[a-z0-9\=\-\_\&\+]*)?

Using this in a preg_match will get you you the different parts of the url.

$url = "https://www.neowin.net/forum/topic/1229911-i-need-a-reg-ex-for-urls/";
$regex = "/(^https?:\/\/?)?([w]{3}|[a-z0-9]+\.)?([a-z0-9]+)(\.[a-z\.]{2,6}+)(\/[a-z0-9\-\_\/]+)+(\.[a-z0-9]+)?(\?[a-z0-9\=\-\_\&\+]*)?/i";
$url_parts = preg_match($regex, $url);

/**
* 
* $url_parts:
*
* Array
* (
*     [0] => https://www.neowin.net/forum/topic/1229911-i-need-a-reg-ex-for-urls/
*     [1] => http://  	// Protocal
*     [2] => www.  		//Subdomain or www
*     [3] => neowin  	//Domain
*     [4] => .net  		//TLD
*     [5] => /forum/topic/1229911-i-need-a-reg-ex-for-urls/  //directory & file
* )
*
**/

It also works for non rewrite urls

$url = "http://www.example.net/app/index.php?pagename=account&id=0983477868";
$regex = "/(^https?:\/\/?)?([w]{3}|[a-z0-9]+\.)?([a-z0-9]+)(\.[a-z\.]{2,6}+)(\/[a-z0-9\-\_\/]+)+(\.[a-z0-9]+)?(\?[a-z0-9\=\-\_\&\+]*)?/i";
$url_parts = preg_match($regex, $url);

/**
*
* $url_parts
*
* Array
* (
*    [0] => http://www.example.net/app/index.php?pagename=account&id=0983477868
*    [1] => http://		//protocal
*    [2] => www.		//subdomain or www
*    [3] => example		//domain name
*    [4] => .net 		//TLD
*    [5] => /app/index 	//Directory & file
*    [6] => .php 		//File extension
*    [7] => ?pagename=account&id=0983477868 //Query
* )
* 
**/

Nice regex expression, I really need to learn more about regex myself :P

Link to comment
Share on other sites

  • 0

 

You can actually accomplish this with a single regex expression. No need for a bunch of explodes and multiple regex expressions.

 

Edit: Heres a link so you can test it, http://regex101.com/r/xT3wZ9/1

(^https?:\/\/?)?([w]{3}|[a-z0-9]+\.)?([a-z0-9]+)(\.[a-z\.]{2,6}+)(\/[a-z0-9\-\_\/]+)+(\.[a-z0-9]+)?(\?[a-z0-9\=\-\_\&\+]*)?

Using this in a preg_match will get you you the different parts of the url.

$url = "https://www.neowin.net/forum/topic/1229911-i-need-a-reg-ex-for-urls/";
$regex = "/(^https?:\/\/?)?([w]{3}|[a-z0-9]+\.)?([a-z0-9]+)(\.[a-z\.]{2,6}+)(\/[a-z0-9\-\_\/]+)+(\.[a-z0-9]+)?(\?[a-z0-9\=\-\_\&\+]*)?/i";
$url_parts = preg_match($regex, $url);

/**
* 
* $url_parts:
*
* Array
* (
*     [0] => https://www.neowin.net/forum/topic/1229911-i-need-a-reg-ex-for-urls/
*     [1] => http://  	// Protocal
*     [2] => www.  		//Subdomain or www
*     [3] => neowin  	//Domain
*     [4] => .net  		//TLD
*     [5] => /forum/topic/1229911-i-need-a-reg-ex-for-urls/  //directory & file
* )
*
**/

It also works for non rewrite urls

$url = "http://www.example.net/app/index.php?pagename=account&id=0983477868";
$regex = "/(^https?:\/\/?)?([w]{3}|[a-z0-9]+\.)?([a-z0-9]+)(\.[a-z\.]{2,6}+)(\/[a-z0-9\-\_\/]+)+(\.[a-z0-9]+)?(\?[a-z0-9\=\-\_\&\+]*)?/i";
$url_parts = preg_match($regex, $url);

/**
*
* $url_parts
*
* Array
* (
*    [0] => http://www.example.net/app/index.php?pagename=account&id=0983477868
*    [1] => http://		//protocal
*    [2] => www.		//subdomain or www
*    [3] => example		//domain name
*    [4] => .net 		//TLD
*    [5] => /app/index 	//Directory & file
*    [6] => .php 		//File extension
*    [7] => ?pagename=account&id=0983477868 //Query
* )
* 
**/

wow very nice xD I have never been that good with reg_ex as i hardly use it so never needed to know more than the basics.... thank you for this this will help me alot in this and future projects

Link to comment
Share on other sites

This topic is now closed to further replies.
  • Recently Browsing   0 members

    • No registered users viewing this page.