Preg Match circumflex ^ in php
-
21-08-2019 - |
Question
I cant quite get my head around what the ^ is doing in my preg_match.
if (preg_match('~^(\d\d\d\d)(-(\d{1,2})(-(\d{1,2}))?)?$~', trim($date), $dateParts)) {
echo the $dateparts and do some magic with them
} else {
tell me the date is formatted wrong
}
As I see it this is looking to see if the $date matches the format which I read as 4 decimals - 1 or 2 decimals - 1 or 2 decimals
if it does match then the IF statement displays the date, if it doesn't then it gives an error of incorrect date formatting.
However just passing it the year $date = '1977' with nothing else (no day or month) it still goes through as true and displays the dateparts, I would thought it would throw an error?
Can someone point out what I'm missing in the regular expression? I'm guessing it's the ^ or possibly the ?$ at the end may mean only match part of it?
Solution
There is no need to group absolutely everything. This looks nicer and will do the same:
preg_match('~^\d{4}(-\d{1,2}(-\d{1,2})?)?$~', trim($date), $dateParts)
This also explains why "1977
" is accepted - the month and day parts are both optional (the question mark makes something optional).
To do what you say ("4 decimals - 1 or 2 decimals - 1 or 2 decimals"), you need to remove both the optional groups:
preg_match('~^\d{4}-\d{1,2}-\d{1,2}$~', trim($date), $dateParts)
The "^
" and "$
" have nothing to do with the issue you are seeing. They are just start-of-string and end-of-string anchors, making sure that nothing else than what the pattern describes is in the checked string. Leave them off and "blah 1977-01-01 blah"
will start to match.
OTHER TIPS
Try this:
'~^(\d\d\d\d)-(\d{1,2})-(\d{1,2})$~'
The problem was the regex was allowing the month and day as optional by the '?' character.
^
and $
anchor your pattern to the beginning and end respectively of the string passed in. The ?
is a multiplier, matching 0 or 1 of the preceding pattern (in this case, the parenthesised bit).
Your pattern matches a year, or a year and a month, or a year and a month and a date; if you follow the parentheses, you'll see the final ?
is operating on the parens surrounding the whole of the pattern after the year.
^ # beginning of string
(\d\d\d\d) #year
(
-(\d{1,2}) #month after a dash
(
-(\d{1,2}) #date after a dash
)? #date optional
)? # month and date optional
$ # end of string
Ok, let's break this up for you:
- '~^(\d\d\d\d)(-(\d{1,2})(-(\d{1,2}))?)?$~'
- ~ - in the beginning and the end are RegExp-delimiters, so they are not really part of the regular expression.
- ^ - Means "This is the beginning of the line"
- Avoids matches in the middle of the string, and anchors it so that the start of the string must match
- (\d\d\d\d) - Matches (and captures) four digits, and is not optional
- This could also be written as \d{4}
- (-(\d{1,2})(-(\d{1,2}))?)? - Matches (and captures) an optional group.
- It says that if this group exists, it must be a dash, followed by one or two digits (day or month), followed by a dash, followed by one or two digits (day or month)
- $ - Means end of string, so this, together with ^ in the beginning of the string means that the whole string must match the Regexp.
Some examples of what this Regex will match:
- 1982-08-11
- 1982-30-01
- 8127-99-52
Some examples that will NOT match:
- 82-08-11
- 2009-10
As you can see, this regex will accept some "dates" that are not really valid dates, so I would probably run it through some sort of date-handling function too, such as strtotime.