How to Write a Vim Indent Script: A Pascal Example

Writing an indent file that indents perfectly is almost impossible since it would require Vim to be a compiler. Vim is a text editor, not a compiler (ignoring its ability to parse its own configuration files). Consequently, syntax files for some languages are only compromises.

Vim allows the user to control the way their text is indented. We do this with a Vim script file, which contains some code that returns a numeric value. The file has an appropriate name, and is stored in an appropriate directory.

For Vim to load the correct indent script, it must have some means of identifying the filetype. Vim already knows about lots of filetypes, but you may have to provide your own means of identifying it, if it is not already in Vim's list.

A Vim indent script works by following a protocol explained in Vim's documentation. The basic steps are thus:

  1. The script file is named after the filetype with the extension ".vim". In our case, the filetype is "pascal", so our script file is named pascal.vim.
  2. The script must be located in the /indent subdirectory of the $VIMFILES directory.
  3. To be a good citizen, the script should check to ensure it doesn't override another.
  4. The script must provide an expression that returns a numeric value corresponding to the indentation level measured in columns (i.e. spaces, not ‘indents’ or tabs).
  5. The script should indicate under what conditions the indentation is performed.

To make the correct indent choice every time, it is necessary to know exactly what context or state the source is in when it makes its choice. To be able to do this for any position in the file, the source code has to be completely parsed. And if your program can do that, you may as well make it a compiler! Having said that, we can get things right often enough to make it worthwhile. Doing so requires that we make a few assumptions about the source code. This obviously introduces errors, but these can easily be corrected. In some cases where the indenter has introduced an error, things sometimes get cumulatively worse; this can be fixed manually by correcting the error manually and rerunning the script from the new, corrected, position. Think of it as 'resynchronising' the indenter.

General Script Issues

There are a number of special functions, variables (options), and commands that we need to use for this Vim script. These are:

exists("identifier")
Returns true if this identifier already exists. Handy for checking that we're not about to overwrite something.
getline(line number)
Returns a string containing the given line's contents.
prevnonblank(line number)
Returns a number corresponding to the line number of the first line (including and going backwards from the specified line) that isn't merely blank.
indent(line number)
Returns the indentation level of the specified line
setlocal
Sets the value of a variable, but only within the current scope.
indentexpr
Vim calls the function identified by this option to calculate the required indent for a line.
indentkeys
If a line contains any of the strings contained in this option, Vim calls the indent function. In other words, this option dictates when the indentation process is performed.
shiftwidth
The amount of columns (i.e. the number of spaces, or the equivalent number of tabs) corresponding to one indent level. Often set by the user to suit their own taste.

Note that identifiers may have an an ampersand (&) suffix or prefix. Where the ampersand is placed at the end of an option-identifier (in combination with setlocal, it refers to that option's default value. Where the ampersand is placed at the front of the identifier, it refers to the option's current value.

Testing Scripts

There is no error message if you get the function name wrong in the call (i.e. you call a non–existent function). If you call a function without using the "s:" prefix, the intended function is simply not called.

It is convenient to be able to reload it without having to close and reopen Vim, or the file. To allow this, comment–out the

let b:did_indent = 1

line, and ensure the functions are suffixed with an exclamation mark (!). You can then reload the script with the command :runtime indent/pascal.vim.

It is helpful to display the value of certain variables (or merely to indicate that a certain point has been reached in the script). A quick and dirty way of doing this is to add a statement like:

echo "hello"

You can also pause the script while you read your messages; maybe you have more than one such message being displayed, but later messages overwrite the earlier ones before you've had chance to read them. You can achieve this effect by using the the input function, which will wait until you press enter:

call input "x = ".x

Note that you have to concatenate (using the dot operator) the variable's value onto the string containing your prompt.

Indentation Principles

Obviously, the rules for indentation are derived from a coding standard. We need to identify the patterns (keywords) that change the indentation level. Some keywords, such as begin change the indentation of the following line, whilst some keywords, such as until change the indentation of their own line.

Some keywords cause a relative change in indentation. For instance, begin gives rise to an increase in the existing indentation level. In contrast, some keywords force a specific indentation. For instance, program is placed on column 0.

Some keywords always give rise to the same indentation rule. For example, type always increases the indentation level by one (and is always placed itself to column 0). On the other hand, some keywords cause different indentation depending on their structure. For example, if indents the following line by one level if the ‘then’ part is a single statment that is placed on the next line. However, if does not indent the following line if it contains contains more than one statement (i.e. starts with a begin). There are more indentation rules for if, but this isn't a tutorial about coding standards!

An indent script's performance should be a balance between accuracy and speed. We could make the script extremely accurate, but it would take ages to do sufficient parsing of the Pascal file. On the other hand, we don't want to make the script so fast that it makes loads of mistakes.

A simple example of the dilemna is indenting "begin". If it comes after if, for or else, then the indentation should remain the same. If, on the other hand, it is the opening begin of a routine, then it should appear in the first column of the line. However, to determine which applies, we have to parse backwards. But there may be lots of intervening lines between this line and the line containing the keyword that tells us where to place this begin. Comments may intercede in the first case, and both comments and identifier lists in the second case. How many lines should we backtrack?

In general, we should assume that the code already conforms to our coding standard. This makes it easier to write the regular expressions and makes the script more accurate. The other advantage is that it helps reveal where we've not followed our standard (the script may indent the code badly, thus showing up our error).

Indentation Rules Grouped by Keyword

Here are the patterns that affect the indentation of the line on which they appear, or the line after. We typically assume we are reading top to bottom, so we don't (generally) refer to preceding code.

program
At column 0 always.
{$compiler directive
At column 0 always (ignore (*)
uses
At column 0 always. Indent next line.
const, var
type
At column 0 always. Indent next line.
record, object
At single indent. Indent next line.
private
At single indent. Indent next line.
procedure, function
end
Unindent.
end.
End of program source. At column 0 always.
begin
Indent next line.
if, for
else

Explaining The Script

The full script can be downloaded here. The following description contains only snippets of the code.

The script is essentially a series of tests which set a numeric value dependent on whether the given line (sometimes taking into account previous lines) contains, or does not contain, certain strings (i.e. language keywords).

Basically, we tell Vim whether to indent, un-indent, or keep-the-same the indentation of the current line by checking the current line against a set of indentation rules (i.e. a coding standard). Since we frequently base the current line's indentation on the previous line's, we always assume the previous line is indented properly.

In designing the script, we need to consider whether each test should allow any further tests, which might override the level it has set. If this test should be final, then we just place a return statement at the end of the if–then block.

The script will be case sensitive, even though Pascal is not. This is a personal choice; I prefer to ensure consistency, in that all keywords are lowercase.

When scanning the code, we nearly always ignore blank lines and comment lines. It is convenient to place the code to do this into its own routine, which returns the line number of the previous line that is not blank and not commented-out.

This is a breakdown of the regular expression used to match comment lines:

^  start-of-line FOLLOWED BY
\s*   zero or more whitespace chars (space/tab) FOLLOWED BY
\( start sub-expression 1
\(    start sub-expression 1a
(        a literal opening bracket FOLLOWED BY
\*          a literal asterisk
\)    end sub-expression 1a
\|    OR
\(    start sub-expression 1b
\*          a literal asterisk FOLLOWED BY
\           a literal space
\)    end sub-expression 1b
\|    OR
\(    start sub-expression 1c
\*       a literal asterisk FOLLOWED BY
)        a literal closing bracket
\)    end sub-expression 1c
\|    OR
{     a literal opening brace
\|    OR
{     a literal closing brace
\) end sub-expression 1

Our coding standard dictates that comments cannot come before executable code on the same line. Comments can, however, appear at teh end of a line containing executable code. In other words, nothing can appear after a comment, apart from another comment. This means we can ignore any line starting with a comment. Conversely, we have to allow for comments to appear after keywords.

First, we follow the Vim conventions for a script file by placing some information about the script's identity at the top of the file.

" Vim indent file
" Language:    Pascal
" Maintainer:  Neil Carter <n.carter@swansea.ac.uk>
" Created:     2004 Jul 13
" Last Change: 2005 Jun 15

Only load this indent file when no other was loaded, and flag the fact that we've loaded this script.

if exists("b:did_indent")
   finish
endif
let b:did_indent = 1

Tell Vim which function to run when it's performing indentation.

setlocal indentexpr=GetPascalIndent(v:lnum)

Appending an & to an option sets it to its default value. += means add (concatenate) the following values to the existing value. == means equals, =~ matches a regular expression using the "ignorecase" option (a question mark after =~ means ignore case).

Make sure we don't keep redefining this function. Comment–out the finish to be able to reload the script without having to quit and reload Vim.

if exists("*GetPascalIndent")
   finish
endif

We first write a function to skip over commented–out lines. When we initialise the loop, we have to subtract one to start prevnonblankline() on the line before the current one. Otherwise, it would return the current line! We indicate to Vim that the SKIP_LINES string is a regular expression (and not a literal string), by delimiting it with single quotes. The escape character here is the backslash (\).

To make the function local to the script (i.e. not callable from outside), we prefix it with s:.

We add an exclamation mark "!" suffix to the function keyword so that our function definition can override an existing one with the same name. If we didn't do that, Vim would issue an error if the new function's name was the same as an existing one.

function! s:GetPrevNonCommentLineNum( line_num )
   
   let SKIP_LINES = '^\s*\(\((\*\)\|\(\*\ \)\|\(\*)\)\|\({\)\)'
   
   let nline = a:line_num
   while nline > 0
      let nline = prevnonblank(nline-1)
      if getline(nline) !~? SKIP_LINES
         break
      endif
   endwhile
   
   return nline
endfunction

Here's where the fun really starts. We create the function that's going to look at our code, and return the appropriate indent level for the current line.

First of all, we immediately return a zero if we're on the first line, since the first line should not be indented. The a: prefix tells Vim that the line_num variable is the one given in the function's argument list.

if a:line_num == 0
   return 0
endif

Remember, there may be exceptions where we don't want the first line to go at column 0, but we can't please all the people all the time; we have to assume our coding standard applies. It's up to the writer of the Pascal file to avoid running our script over any lines that shouldn't conform to the indentation standard.

We now get the line to be indented; typically this is the line of our file–to–be–indented (AKA the current buffer) on which the cursor is currently placed. However, by making it a parameter of the function, we enable other script writers to obtain our Pascal indent for any line they chose.

let this_line = getline( a:line_num )

We now get the first line that contains executable code before the specified line. We might need it later on. We use the function we wrote earlier on in this script, so we add the s: prefix (for Script). We also obtain the existing indent level for the line.

let prev_codeline_num = s:GetPrevNonCommentLineNum( a:line_num )
let prev_codeline = getline( prev_codeline_num )
let indnt = indent( prev_codeline_num )

Our first pattern check is for being in the middle of a comment block. We assume that lines starting with zero or more whitespace (spaces and tabs) followed by an asterisk (*) are comments. It's conceivable that a line starting like this might be a multi–line mathematical formula where the asterisk is a multiplcation operator, but this is the first example of a compromise in our accuracy. We return the existing indent level (so it's unchanged).

if this_codeline =~ '^\s*\*'
   return indent( a:line_num )
endif

Notice that we delimit the pattern with single quotes ('). This saves us from having to double backslashes, which we would have to do if we used double-quotes.

The difference between strings delimited with double-quotes and single-quotes is tricky to understand, and seems to vary depending on how they are used. I admit to being confused by it myself. For more information, use :help expr-string, :help literal-string, and :help 41.4 (scroll down to the end of the section on LOGIC OPERATIONS).

Be careful: remember that indent scripts can interfere with the operation of Vim's commenting feature.

Having multiple exit points is not really elegant coding practice, but given this is a small script, and we want fast performance, optimisations are acceptable.

The following pattern matches a start of line (^) followed by zero or more whitespace (\s*) followed by either the keyword const, or the keyword var followed by an end of line. Note the \< and \> patterns, which correspond to the beginning and the end, respectively, of a word (but don't include the first or last letter of the word). We use these to avoid matching the strings where they appear as part of a larger word. For instance, we wouldn't want the const patter to match "constant".

if this_line =~ '^\s*\<(\const\|var\\)>$'
   return 0
endif

Beware: make sure you have the brackets and word delimiters in the correct order. Also, it is easy to fooled into thinking that \< and \> form a sub-expression, but they don't! For instance, the following pattern will match "constant" and "multivar": '\(\<const\|var\>\)'

In scripts, all conditions of an if statement must be on the same line. For example:

if this_codeline =~ '^\s*begin\>' && prev_codeline !~ '^\s*\<\(if\|for\|else\)\>'

If we get to the end of a script, we probably haven't matched anything, which means this is a non indenting code line, so we just keep the same indentation.

   return indnt
endfunction

Indentation Limitations

This is a list of situations where the indenter gets it wrong.

* procedure can't easily be indented. It appears in three situations:

procedure within an object declaration is placed at column zero, whilst it should be placed at two indentations.

* The following isn't indented properly:

if condition then
   statement1
else
   statement2;
   statement3;

In this case, statement3 should be unindented.


Home About Me
Copyright © Neil Carter

Content last updated: 2008-08-22