Writing an indent file that indents perfectly is almost impossible since it would require Vim to be a compiler. Vim is a text editor, not a compiler (ignoring its ability to parse its own configuration files). Consequently, syntax files for some languages are only compromises.
Vim allows the user to control the way their text is indented. We do this with a Vim script file, which contains some code that returns a numeric value. The file has an appropriate name, and is stored in an appropriate directory.
For Vim to load the correct indent script, it must have some means of identifying the filetype. Vim already knows about lots of filetypes, but you may have to provide your own means of identifying it, if it is not already in Vim's list.
A Vim indent script works by following a protocol explained in Vim's documentation. The basic steps are thus:
To make the correct indent choice every time, it is necessary to know exactly what context or state the source is in when it makes its choice. To be able to do this for any position in the file, the source code has to be completely parsed. And if your program can do that, you may as well make it a compiler! Having said that, we can get things right often enough to make it worthwhile. Doing so requires that we make a few assumptions about the source code. This obviously introduces errors, but these can easily be corrected. In some cases where the indenter has introduced an error, things sometimes get cumulatively worse; this can be fixed manually by correcting the error manually and rerunning the script from the new, corrected, position. Think of it as 'resynchronising' the indenter.
There are a number of special functions, variables (options), and commands that we need to use for this Vim script. These are:
exists("identifier")
getline(line number)
prevnonblank(line number)
indent(line number)
setlocal
indentexpr
indentkeys
shiftwidth
Note that identifiers may have an an ampersand (&) suffix or
prefix. Where the ampersand is placed at the end of an
option-identifier (in combination with setlocal
, it
refers to that option's default value. Where the ampersand
is placed at the front of the identifier, it refers to the option's
current value.
There is no error message if you get the function name wrong in the call (i.e. you call a non–existent function). If you call a function without using the "s:" prefix, the intended function is simply not called.
It is convenient to be able to reload it without having to close and reopen Vim, or the file. To allow this, comment–out the
let b:did_indent = 1
line, and ensure the function
s are suffixed with an
exclamation mark (!). You can then reload the script with the
command :runtime indent/pascal.vim.
It is helpful to display the value of certain variables (or merely to indicate that a certain point has been reached in the script). A quick and dirty way of doing this is to add a statement like:
echo "hello"
You can also pause the script while you read your messages; maybe you have more than one such message being displayed, but later messages overwrite the earlier ones before you've had chance to read them. You can achieve this effect by using the the input function, which will wait until you press enter:
call input "x = ".x
Note that you have to concatenate (using the dot operator) the variable's value onto the string containing your prompt.
Obviously, the rules for indentation are derived from a coding
standard. We need to identify the patterns (keywords) that change
the indentation level. Some keywords, such as begin
change the indentation of the following line, whilst some
keywords, such as until
change the indentation of their
own line.
Some keywords cause a relative change in indentation. For
instance, begin
gives rise to an increase in the
existing indentation level. In contrast, some keywords force a
specific indentation. For instance, program
is placed
on column 0.
Some keywords always give rise to the same indentation rule. For
example, type
always increases the indentation level by
one (and is always placed itself to column 0). On the other hand,
some keywords cause different indentation depending on their
structure. For example, if
indents the following line
by one level if the ‘then’ part is a single statment
that is placed on the next line. However, if
does not
indent the following line if it contains contains more than one
statement (i.e. starts with a begin
). There are more
indentation rules for if
, but this isn't a tutorial
about coding standards!
An indent script's performance should be a balance between accuracy and speed. We could make the script extremely accurate, but it would take ages to do sufficient parsing of the Pascal file. On the other hand, we don't want to make the script so fast that it makes loads of mistakes.
A simple example of the dilemna is indenting "begin". If it comes
after if
, for
or else
, then
the indentation should remain the same. If, on the other hand, it is
the opening begin
of a routine, then it should appear
in the first column of the line. However, to determine which
applies, we have to parse backwards. But there may be lots of
intervening lines between this line and the line containing the
keyword that tells us where to place this begin. Comments may
intercede in the first case, and both comments and identifier lists
in the second case. How many lines should we backtrack?
In general, we should assume that the code already conforms to our coding standard. This makes it easier to write the regular expressions and makes the script more accurate. The other advantage is that it helps reveal where we've not followed our standard (the script may indent the code badly, thus showing up our error).
Here are the patterns that affect the indentation of the line on which they appear, or the line after. We typically assume we are reading top to bottom, so we don't (generally) refer to preceding code.
program
{$compiler directive
(*
)uses
const
, var
type
record
, object
private
procedure
, function
end
end.
begin
if
, for
begin
, indent it,
and unindent subsequent line (back to same level as this
line).begin
, give it same
indentation.else
begin
, indent it,
and unindent subsequent line.begin
, give it same
indentation.The full script can be downloaded here. The following description contains only snippets of the code.
The script is essentially a series of tests which set a numeric value dependent on whether the given line (sometimes taking into account previous lines) contains, or does not contain, certain strings (i.e. language keywords).
Basically, we tell Vim whether to indent, un-indent, or keep-the-same the indentation of the current line by checking the current line against a set of indentation rules (i.e. a coding standard). Since we frequently base the current line's indentation on the previous line's, we always assume the previous line is indented properly.
In designing the script, we need to consider whether each test should allow any further tests, which might override the level it has set. If this test should be final, then we just place a return statement at the end of the if–then block.
The script will be case sensitive, even though Pascal is not. This is a personal choice; I prefer to ensure consistency, in that all keywords are lowercase.
When scanning the code, we nearly always ignore blank lines and comment lines. It is convenient to place the code to do this into its own routine, which returns the line number of the previous line that is not blank and not commented-out.
This is a breakdown of the regular expression used to match comment lines:
^ start-of-line FOLLOWED BY \s* zero or more whitespace chars (space/tab) FOLLOWED BY \( start sub-expression 1 \( start sub-expression 1a ( a literal opening bracket FOLLOWED BY \* a literal asterisk \) end sub-expression 1a \| OR \( start sub-expression 1b \* a literal asterisk FOLLOWED BY \ a literal space \) end sub-expression 1b \| OR \( start sub-expression 1c \* a literal asterisk FOLLOWED BY ) a literal closing bracket \) end sub-expression 1c \| OR { a literal opening brace \| OR { a literal closing brace \) end sub-expression 1
Our coding standard dictates that comments cannot come before executable code on the same line. Comments can, however, appear at teh end of a line containing executable code. In other words, nothing can appear after a comment, apart from another comment. This means we can ignore any line starting with a comment. Conversely, we have to allow for comments to appear after keywords.
First, we follow the Vim conventions for a script file by placing some information about the script's identity at the top of the file.
" Vim indent file " Language: Pascal " Maintainer: Neil Carter <n.carter@swansea.ac.uk> " Created: 2004 Jul 13 " Last Change: 2005 Jun 15
Only load this indent file when no other was loaded, and flag the fact that we've loaded this script.
if exists("b:did_indent") finish endif let b:did_indent = 1
Tell Vim which function to run when it's performing indentation.
setlocal indentexpr=GetPascalIndent(v:lnum)
Appending an & to an option sets it to its default value.
+=
means add (concatenate) the following values to the
existing value. ==
means equals, =~
matches a regular expression using the "ignorecase" option (a
question mark after =~
means ignore case).
Make sure we don't keep redefining this function.
Comment–out the finish
to be able to reload the
script without having to quit and reload Vim.
if exists("*GetPascalIndent") finish endif
We first write a function to skip over commented–out lines.
When we initialise the loop, we have to subtract one to start
prevnonblankline()
on the line before the current one.
Otherwise, it would return the current line! We indicate to Vim that
the SKIP_LINES
string is a regular expression (and not
a literal string), by delimiting it with single quotes. The escape
character here is the backslash (\).
To make the function local to the script (i.e. not callable from
outside), we prefix it with s:
.
We add an exclamation mark "!" suffix to the
function
keyword so that our function definition can
override an existing one with the same name. If we didn't do that,
Vim would issue an error if the new function's name was the same as
an existing one.
function! s:GetPrevNonCommentLineNum( line_num ) let SKIP_LINES = '^\s*\(\((\*\)\|\(\*\ \)\|\(\*)\)\|\({\)\)' let nline = a:line_num while nline > 0 let nline = prevnonblank(nline-1) if getline(nline) !~? SKIP_LINES break endif endwhile return nline endfunction
Here's where the fun really starts. We create the function that's going to look at our code, and return the appropriate indent level for the current line.
First of all, we immediately return a zero if we're on the first
line, since the first line should not be indented. The
a:
prefix tells Vim that the line_num
variable is the one given in the function's argument list.
if a:line_num == 0 return 0 endif
Remember, there may be exceptions where we don't want the first line to go at column 0, but we can't please all the people all the time; we have to assume our coding standard applies. It's up to the writer of the Pascal file to avoid running our script over any lines that shouldn't conform to the indentation standard.
We now get the line to be indented; typically this is the line of our file–to–be–indented (AKA the current buffer) on which the cursor is currently placed. However, by making it a parameter of the function, we enable other script writers to obtain our Pascal indent for any line they chose.
let this_line = getline( a:line_num )
We now get the first line that contains executable code
before the specified line. We might need it later on. We
use the function we wrote earlier on in this script, so we add the
s:
prefix (for Script). We also obtain the existing
indent level for the line.
let prev_codeline_num = s:GetPrevNonCommentLineNum( a:line_num ) let prev_codeline = getline( prev_codeline_num ) let indnt = indent( prev_codeline_num )
Our first pattern check is for being in the middle of a comment block. We assume that lines starting with zero or more whitespace (spaces and tabs) followed by an asterisk (*) are comments. It's conceivable that a line starting like this might be a multi–line mathematical formula where the asterisk is a multiplcation operator, but this is the first example of a compromise in our accuracy. We return the existing indent level (so it's unchanged).
if this_codeline =~ '^\s*\*' return indent( a:line_num ) endif
Notice that we delimit the pattern with single quotes
('
). This saves us from having to double backslashes,
which we would have to do if we used double-quotes.
The difference between strings delimited with double-quotes and single-quotes is tricky to understand, and seems to vary depending on how they are used. I admit to being confused by it myself. For more information, use :help expr-string, :help literal-string, and :help 41.4 (scroll down to the end of the section on LOGIC OPERATIONS).
Be careful: remember that indent scripts can interfere with the operation of Vim's commenting feature.
Having multiple exit points is not really elegant coding practice, but given this is a small script, and we want fast performance, optimisations are acceptable.
The following pattern matches a start of line (^
)
followed by zero or more whitespace (\s*
) followed by
either the keyword const
, or the keyword
var
followed by an end of line. Note the
\<
and \>
patterns, which correspond
to the beginning and the end, respectively, of a word (but don't
include the first or last letter of the word). We use these to avoid
matching the strings where they appear as part of a larger word. For
instance, we wouldn't want the const patter to match "constant".
if this_line =~ '^\s*\<(\const\|var\\)>$' return 0 endif
Beware: make sure you have the brackets and word delimiters in
the correct order. Also, it is easy to fooled into thinking that
\<
and \>
form a sub-expression, but
they don't! For instance, the following pattern will match
"constant" and "multivar":
'\(\<const\|var\>\)'
In scripts, all conditions of an if
statement must
be on the same line. For example:
if this_codeline =~ '^\s*begin\>' && prev_codeline !~ '^\s*\<\(if\|for\|else\)\>'
If we get to the end of a script, we probably haven't matched anything, which means this is a non indenting code line, so we just keep the same indentation.
return indnt endfunction
This is a list of situations where the indenter gets it wrong.
* procedure
can't easily be indented. It appears in
three situations:
procedure
within an object declaration is placed at
column zero, whilst it should be placed at two indentations.
* The following isn't indented properly:
if condition then statement1 else statement2; statement3;
In this case, statement3 should be unindented.
Home | About Me | Copyright © Neil Carter |
Content last updated: 2008-08-22