I was in need of converting certain characters within the attribute values for any tag. For example, I wanted <input value="Oliver<Nassar" to become <input value="Oliver<Nassar". As you can see, I needed to convert the < character to it's entity equivalent <. In fact, I want to do the same for the > character.
While I am doing a replacement, the regular expression I came up with can be readily used for the removal. Here's the expression:
(\s{1}[a-z\-]+\s?=\s?([\'|"]{1}))([^\2]*)2
And here it is split up into seperate components:
(
\s{1}
[a-z\-]+
\s?
=
\s?
(
[\'|"]{1}
)
)
([^\2]*)
\2
The logic is as follows:
- Capture a whitespace character, attribute name and delimter (eg.
'or") as the first back-reference (eg.value=") - The
\s?marks that between the attribute name (eg.value) and the equal sign, an optional whitespace character can be defined - The same goes for after the equal sign
- As the second back-reference, capture the delimter (eg.
'or") - The third back-reference is then setup to capture the attribute value (eg.
Oliver<Nassar), and will stop under it reaches the delimeter that was capture as the second back-reference (eg. the'or"character) - Match the end of the attribute equation by searching for the second back-reference (not sure if this is really required, but ah well)
In PHP, the full expression, with replacement, turned into this:
echo preg_replace_callback(
'/' .
'(' .
'\s{1}' .
'[a-z\-]+' .
'\s?' .
'=' .
'\s?' .
'(' .
'[\'|"]{1}' .
')' .
')' .
'([^\2]*)' .
'\2' .
'/iU',
function($match) {
$replacedVersion = $match[1] . str_replace(
array('<', '>'),
array('<', '>'),
$match[3]
) . $match[2];
return $replacedVersion;
},
$markup
);
As you can see from above, I included the iU flags to be case-insenstive and have the associated sub-expressions not be greedy. The preg_replace_callback function then converts the contents of the second back-reference to have the < and > symbols encoded to their respective HTML entities.
Note that this logic is general enough to apply to any tag (eg. a, script, style, etc).
Hope that helps.