Original Poster Posted September 17, 2014 Share Posted September 17, 2014 hello! i need a regular expression to check a string (a url) to see if its a landing URL (e.g. neowin.net) or if its a sub page (neowin.net/forums) orrrr if possible to see if the url has ANY file extensions (css, html, pngs) Link to comment Share on other sites More sharing options...
0 mollick2 Posted September 20, 2014 Share Posted September 20, 2014 You can actually accomplish this with a single regex expression. No need for a bunch of explodes and multiple regex expressions. Edit: Heres a link so you can test it, http://regex101.com/r/xT3wZ9/1 (^https?:\/\/?)?([w]{3}|[a-z0-9]+\.)?([a-z0-9]+)(\.[a-z\.]{2,6}+)(\/[a-z0-9\-\_\/]+)+(\.[a-z0-9]+)?(\?[a-z0-9\=\-\_\&\+]*)? Using this in a preg_match will get you you the different parts of the url. $url = "https://www.neowin.net/forum/topic/1229911-i-need-a-reg-ex-for-urls/"; $regex = "/(^https?:\/\/?)?([w]{3}|[a-z0-9]+\.)?([a-z0-9]+)(\.[a-z\.]{2,6}+)(\/[a-z0-9\-\_\/]+)+(\.[a-z0-9]+)?(\?[a-z0-9\=\-\_\&\+]*)?/i"; $url_parts = preg_match($regex, $url); /** * * $url_parts: * * Array * ( * [0] => https://www.neowin.net/forum/topic/1229911-i-need-a-reg-ex-for-urls/ * [1] => http:// // Protocal * [2] => www. //Subdomain or www * [3] => neowin //Domain * [4] => .net //TLD * [5] => /forum/topic/1229911-i-need-a-reg-ex-for-urls/ //directory & file * ) * **/ It also works for non rewrite urls $url = "http://www.example.net/app/index.php?pagename=account&id=0983477868"; $regex = "/(^https?:\/\/?)?([w]{3}|[a-z0-9]+\.)?([a-z0-9]+)(\.[a-z\.]{2,6}+)(\/[a-z0-9\-\_\/]+)+(\.[a-z0-9]+)?(\?[a-z0-9\=\-\_\&\+]*)?/i"; $url_parts = preg_match($regex, $url); /** * * $url_parts * * Array * ( * [0] => http://www.example.net/app/index.php?pagename=account&id=0983477868 * [1] => http:// //protocal * [2] => www. //subdomain or www * [3] => example //domain name * [4] => .net //TLD * [5] => /app/index //Directory & file * [6] => .php //File extension * [7] => ?pagename=account&id=0983477868 //Query * ) * **/ ncc50446 and Seahorsepip 2 Share Link to comment Share on other sites More sharing options...
0 Seahorsepip Veteran Posted September 17, 2014 Veteran Share Posted September 17, 2014 $urlend = explode("/", $url)[1]; $filetype = explode(".",$url); $filetype = $filetype[count($filetype-1)]; if($urlend){ echo "url has something behind the domain"; if($filetype){ echo "filetype is $filetype"; } } Just some messy code written on my phone :P Link to comment Share on other sites More sharing options...
0 Seahorsepip Veteran Posted September 17, 2014 Veteran Share Posted September 17, 2014 Code fails with pages like neowin.net/?lol=true It sees the stuff behind the url as a subpage, you could add a extra check to look if the $urlend starts with "?". Link to comment Share on other sites More sharing options...
0 Original Poster Posted September 19, 2014 Author Share Posted September 19, 2014 $urlend = explode("/", $url)[1]; $filetype = explode(".",$url); $filetype = $filetype[count($filetype-1)]; if($urlend){ echo "url has something behind the domain"; if($filetype){ echo "filetype is $filetype"; } } Just some messy code written on my phone :p yea I did it in the end lol though I went into alot more details I have around 7 explodes and 4 reg_expression :P to rip apart logs, file types, urls, error codes, and lots more xD i built my own not to shabby squid analytics system in about half a day... quite happy with my self Seahorsepip 1 Share Link to comment Share on other sites More sharing options...
0 Seahorsepip Veteran Posted September 22, 2014 Veteran Share Posted September 22, 2014 You can actually accomplish this with a single regex expression. No need for a bunch of explodes and multiple regex expressions. Edit: Heres a link so you can test it, http://regex101.com/r/xT3wZ9/1 (^https?:\/\/?)?([w]{3}|[a-z0-9]+\.)?([a-z0-9]+)(\.[a-z\.]{2,6}+)(\/[a-z0-9\-\_\/]+)+(\.[a-z0-9]+)?(\?[a-z0-9\=\-\_\&\+]*)? Using this in a preg_match will get you you the different parts of the url. $url = "https://www.neowin.net/forum/topic/1229911-i-need-a-reg-ex-for-urls/"; $regex = "/(^https?:\/\/?)?([w]{3}|[a-z0-9]+\.)?([a-z0-9]+)(\.[a-z\.]{2,6}+)(\/[a-z0-9\-\_\/]+)+(\.[a-z0-9]+)?(\?[a-z0-9\=\-\_\&\+]*)?/i"; $url_parts = preg_match($regex, $url); /** * * $url_parts: * * Array * ( * [0] => https://www.neowin.net/forum/topic/1229911-i-need-a-reg-ex-for-urls/ * [1] => http:// // Protocal * [2] => www. //Subdomain or www * [3] => neowin //Domain * [4] => .net //TLD * [5] => /forum/topic/1229911-i-need-a-reg-ex-for-urls/ //directory & file * ) * **/ It also works for non rewrite urls $url = "http://www.example.net/app/index.php?pagename=account&id=0983477868"; $regex = "/(^https?:\/\/?)?([w]{3}|[a-z0-9]+\.)?([a-z0-9]+)(\.[a-z\.]{2,6}+)(\/[a-z0-9\-\_\/]+)+(\.[a-z0-9]+)?(\?[a-z0-9\=\-\_\&\+]*)?/i"; $url_parts = preg_match($regex, $url); /** * * $url_parts * * Array * ( * [0] => http://www.example.net/app/index.php?pagename=account&id=0983477868 * [1] => http:// //protocal * [2] => www. //subdomain or www * [3] => example //domain name * [4] => .net //TLD * [5] => /app/index //Directory & file * [6] => .php //File extension * [7] => ?pagename=account&id=0983477868 //Query * ) * **/ Nice regex expression, I really need to learn more about regex myself :P Link to comment Share on other sites More sharing options...
0 Original Poster Posted September 22, 2014 Author Share Posted September 22, 2014 You can actually accomplish this with a single regex expression. No need for a bunch of explodes and multiple regex expressions. Edit: Heres a link so you can test it, http://regex101.com/r/xT3wZ9/1 (^https?:\/\/?)?([w]{3}|[a-z0-9]+\.)?([a-z0-9]+)(\.[a-z\.]{2,6}+)(\/[a-z0-9\-\_\/]+)+(\.[a-z0-9]+)?(\?[a-z0-9\=\-\_\&\+]*)? Using this in a preg_match will get you you the different parts of the url. $url = "https://www.neowin.net/forum/topic/1229911-i-need-a-reg-ex-for-urls/"; $regex = "/(^https?:\/\/?)?([w]{3}|[a-z0-9]+\.)?([a-z0-9]+)(\.[a-z\.]{2,6}+)(\/[a-z0-9\-\_\/]+)+(\.[a-z0-9]+)?(\?[a-z0-9\=\-\_\&\+]*)?/i"; $url_parts = preg_match($regex, $url); /** * * $url_parts: * * Array * ( * [0] => https://www.neowin.net/forum/topic/1229911-i-need-a-reg-ex-for-urls/ * [1] => http:// // Protocal * [2] => www. //Subdomain or www * [3] => neowin //Domain * [4] => .net //TLD * [5] => /forum/topic/1229911-i-need-a-reg-ex-for-urls/ //directory & file * ) * **/ It also works for non rewrite urls $url = "http://www.example.net/app/index.php?pagename=account&id=0983477868"; $regex = "/(^https?:\/\/?)?([w]{3}|[a-z0-9]+\.)?([a-z0-9]+)(\.[a-z\.]{2,6}+)(\/[a-z0-9\-\_\/]+)+(\.[a-z0-9]+)?(\?[a-z0-9\=\-\_\&\+]*)?/i"; $url_parts = preg_match($regex, $url); /** * * $url_parts * * Array * ( * [0] => http://www.example.net/app/index.php?pagename=account&id=0983477868 * [1] => http:// //protocal * [2] => www. //subdomain or www * [3] => example //domain name * [4] => .net //TLD * [5] => /app/index //Directory & file * [6] => .php //File extension * [7] => ?pagename=account&id=0983477868 //Query * ) * **/ wow very nice xD I have never been that good with reg_ex as i hardly use it so never needed to know more than the basics.... thank you for this this will help me alot in this and future projects Link to comment Share on other sites More sharing options...
0 mollick2 Posted September 22, 2014 Share Posted September 22, 2014 Yea I use to hate working with regex until recently. Using tools like regex101 or regexpal makes it a lot easier to see what exactly your doing. Link to comment Share on other sites More sharing options...
Question
Original Poster
hello! i need a regular expression to check a string (a url) to see if its a landing URL (e.g. neowin.net) or if its a sub page (neowin.net/forums) orrrr if possible to see if the url has ANY file extensions (css, html, pngs)
Link to comment
Share on other sites
7 answers to this question
Recommended Posts