GREP - MSUB Find and Replace for "Last update:" entry

The Argument

When I first developed my web pages, I included an entry of the date when I last updated the page. However, it became clear that there is a distinction between changes in the content of a page, and the format of the page. So, I changed the existing "Last updated:" entry to "Content last updated:" and added a new entry (in HTML comments) for "HTML/Format last updated:".

An interesting problem was the fact that I wanted to keep the date value in the replacement string. MSUB offers this feature, implemented using the back-apostrophe `. The back-apostrophe starts and ends the string (in the search pattern) which is to be added to the replace string. In the replace pattern, the insertion place is marked by a single back-apostrophe.

The Regular Expressions

Search Expression

^:w*"<p>Last updated: "`:d:d?('st'|'nd'|'rd'|'th')' ':t' ':d:d:d:d`:w*'</p>'$

To simplify the explanation, I have grouped the expression into three segments, (1) the label, (2) the day and month, and (3) the year.


^:w*"<p>Last updated: "

HTML tag characters such as the backslash, less-than, and greater-than characters, can be interpreted as special characters by RE programs. Certainly, MSUB doesn't like them, so they have to be quoted. We anchor the search text with a start-of-line char, followed by any whitespace followed by a paragraph tag (I could have used "p|P" to catch upper and lower-case tags, but I knew that the tags would be in lower case. Next comes the actual text of interest is "Last updated: " (the start-of-line and whitespace just indicate the context). After this comes the date.


`:d:d?('st'|'nd'|'rd'|'th')' ':t

I want to preserve the date, so that it is retained in the replace-string. For this, I use a back-apostrophe marker here to indicate the start of the text to be preserved. The date appears in a variable format. I have specified it as a digit possibly followed by one other digit to allow for 1 and 2 digit numbers. ("1" and "21" for instance). After the digits, there is one of four suffixes, such as "st" (21st) or "nd" (22nd). After this comes a space, and this is followed by any alphabetic characters (the month in any form of letters). After the month, comes the year


' ':d:d:d:d`:w*'</p>'$

After the text of the month, there is a space followed by a four-digit year. This is the end of the date string, so we end the preserved-text with another back-apostrophe.

After the year, I allow for any whitespace which appears between the date and the trailing end-of-paragraph tag, which is immediately followed by an end-of-line. This is not essential but does help provide a context. By specifying as much context as possible (and as unique a context as possible), we can help avoid mistaking invalid but matching strings for the real thing.

Replace Expression

^"<p>Content last updated: "`"</p>"$^'<!-- HTML/Format last updated: 3rd March 2000 -->'$

Home About Me
Copyright © Neil Carter

Content last updated: 2000-06-30