I just wanted to share a regular expression that works for me. I was just trying to shorten a URL from Nike’s website using my own URL shortener. There was a slight problem. My script was rejecting the long URL because it was not in the right format, but there was nothing wrong with Nike’s URL. My regex was faulty. Time for a change. So, I searched online for a better regex to work with my URL shortener script. I found this code from the regexlib.com library that works for me.
RegEx
^(http|https|ftp)\://([a-zA-Z0-9\.\-]+(\:[a-zA-Z0-9\.&%\$\-]+)*@)*((25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9])\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[0-9])|localhost|([a-zA-Z0-9\-]+\.)*[a-zA-Z0-9\-]+\.(com|edu|gov|int|mil|net|org|biz|arpa|info|name|pro|aero|coop|museum|[a-zA-Z]{2}))(\:[0-9]+)*(/($|[a-zA-Z0-9\.\,\?\’\\\+&%\$#\=~_\-]+))*$
It’s a handful. All in one line by the way.
URL Validation
Assuming we are using a HTML form to submit the long URL, here’s how to validate the URL with the help of PHP’s eregi function. The eregi function is a case insensitive regular expression matching function. If the posted URL matches the regular expression, then the statement is true. We can then shorten the URL. If the URL is invalid, we reject the URL and send out a message saying the URL is not in the correct format.
$urlregex = “^(http|https|ftp)\://([a-zA-Z0-9\.\-]+(\:[a-zA-Z0-9\.&%\$\-]+)*@)*((25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9])\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[0-9])|localhost|([a-zA-Z0-9\-]+\.)*[a-zA-Z0-9\-]+\.(com|edu|gov|int|mil|net|org|biz|arpa|info|me|name|pro|aero|coop|museum|[a-zA-Z]{2}))(\:[0-9]+)*(/($|[a-zA-Z0-9\.\,\?\’\\\+&%\$#\=~_\-]+))*$”;
if (eregi($urlregex, $_POST[‘url’])) {
// url is valid. shorten long url
} else {
// reject. url is invalid
}
Matches
http://www.sysrage.net | https://64.81.85.161/site/file.php?cow=moo’s | ftp://user:pass@host.com:123 | http://www.sysrage.net | https://64.81.85.161/site/file.php?cow=moo’s | ftp://user:pass@host.com:123
One last thing, You can add more TLDs and country code to the regular expression. I added the .me domain since one of my domains is a .me.