When I first developed my web pages, I included an entry of the date when I last updated the page. However, it became clear that there is a distinction between changes in the content of a page, and the format of the page. So, I changed the existing "Last updated:" entry to "Content last updated:" and added a new entry (in HTML comments) for "HTML/Format last updated:".
An interesting problem was the fact that I wanted to keep the date value in the replacement string. MSUB offers this feature, implemented using the back-apostrophe `. The back-apostrophe starts and ends the string (in the search pattern) which is to be added to the replace string. In the replace pattern, the insertion place is marked by a single back-apostrophe.
^:w*"<p>Last updated: "`:d:d?('st'|'nd'|'rd'|'th')' ':t' ':d:d:d:d`:w*'</p>'$
To simplify the explanation, I have grouped the expression into three segments, (1) the label, (2) the day and month, and (3) the year.
^:w*"<p>Last updated: "
HTML tag characters such as the backslash, less-than, and
greater-than characters, can be interpreted as special characters by
RE programs. Certainly, MSUB doesn't like them, so they have to be
quoted. We anchor the search text with a start-of-line char,
followed by any whitespace followed by a paragraph tag (I could have
p|P" to catch upper and lower-case tags, but I
knew that the tags would be in lower case. Next comes the actual
text of interest is "Last updated: " (the start-of-line and
whitespace just indicate the context). After this comes the
I want to preserve the date, so that it is retained in the replace-string. For this, I use a back-apostrophe marker here to indicate the start of the text to be preserved. The date appears in a variable format. I have specified it as a digit possibly followed by one other digit to allow for 1 and 2 digit numbers. ("1" and "21" for instance). After the digits, there is one of four suffixes, such as "st" (21st) or "nd" (22nd). After this comes a space, and this is followed by any alphabetic characters (the month in any form of letters). After the month, comes the year
After the text of the month, there is a space followed by a four-digit year. This is the end of the date string, so we end the preserved-text with another back-apostrophe.
After the year, I allow for any whitespace which appears between the date and the trailing end-of-paragraph tag, which is immediately followed by an end-of-line. This is not essential but does help provide a context. By specifying as much context as possible (and as unique a context as possible), we can help avoid mistaking invalid but matching strings for the real thing.
^"<p>Content last updated: "`"</p>"$^'<!-- HTML/Format last updated: 3rd March 2000 -->'$
|Home||About Me||Copyright © Neil Carter|
Content last updated: 2000-06-30