Intro
The C++ lambda preprocessor (clamp) converts C++ code containing
lambda expressions into ordinary C++ code. Here's a simple example:
vector<int> v;
// ...
std::for_each (v.begin(), v.end()
, lambda (int &p) {
if (p == 5) p = 0;
} );
This example uses the standard algorithm for_each to apply an anonymous function to each
element of a vector. The anonymous function accepts an integer
parameter by reference, and resets the value to zero if it is
currently five (a simple, but not very useful example). The
preprocessor replaces the entire lambda expression in its output, so
that the C++ compiler ends up seeing something like the following:
std::for_each (v.begin(), v.end()
, lambda_generator_1<void, int &>::generate () );
The exact nature of the template lambda_generator_1 is beyond the scope of this
introduction, except to say that its generate() member function returns a function
object by value. The function object has, in this case, a member
function void operator()(int &)
which for_each applies to each element of
the vector. Some people would probably prefer to use the standard
transform algorithm for this example, as in:
std::transform (v.begin(), v.end(), v.begin()
, lambda int (int p) {
return (p == 5) ? 0 : p;
} );
This example shows an anonymous function that returns a value, in this
case int. Rather than hard-wiring a value into the function body, it
is also possible to include contextual information in the function
object. For instance:
void reset (std::vector<int> &v, int val) {
std::transform (v.begin(), v.end(), v.begin()
, lambda int (int p) {
return (p == __ctx(val)) ? 0 : p;
} );
}
The __ctx expression is an example of
context information bound by value. The clamp preprocessor also
supports reference semantics for contextual information via __ref expressions. For example:
int sum = 0;
std::for_each (v.begin(), v.end()
, lambda (int p) { __ref(sum) += p; });
This, of course, calculates the sum of elements in the vector.
Getting into some more complicated examples, it is possible to name
the type of the function object generated by a lambda expression by
simply omitting the function body. You have to do this, for instance,
if you want to use an anonymous function generated by a lambda
expression as a function parameter or return value. For example, the
type of the expression from the previous example:
lambda (int p) { __ref(sum) += p; }
can be referred to in the code as "lambda (int &)
(int)". The first pair of brackets contains the context binding
(or closure) parameters, and the second pair contains the function
parameters. The closure parameter list is optional for context-less
functions, as is the return type for functions returning void, such as this one. Putting all of that
together, here's a templated function that returns a function object:
template<typename T>
lambda bool (T) (const T &)
match (const T &target) {
return lambda bool (const T &candidate) {
return candidate == __ctx(target);
};
}
// Use a generated comparison object
std::vector<int>::iterator
i = find_if (v.begin(), v.end(), match (7));
This find_if example returns an iterator to
the first 7 in the vector (or v.end(), if
none) using an instantiation of the match
template with an int parameter. For a vector
of strings, you could do the following:
std::vector<std::string>::iterator
i = find_if (v.begin(), v.end()
, match (std::string("hello")));
I wrote the preprocessor just for fun. There doesn't seem to be any
way to achieve real lambda expressions in pure C++, since it won't let
you insert a function definition in the middle of an expression. The
limits of what pure C++ allows are pretty well exhausted by the
Lambda expressions simplify some coding tasks, so it would be nice to
have them in C++. In the time it takes you to extract that one-liner
into a named function, I bet you could write two lambda
expressions for sure. Not to mention cases which require a named class
that contains context information.
clamp scans its input for lambda expressions, passing any plain C++
through unchanged. When it encounters a lambda expression, it extracts
the function body into a separate file. It also generates a class
template with a suitable operator() and
(where necessary) member variables to store any context binding. This
class template also goes into a separate file. The whole lambda
expression is then replaced in the output by a single constructor
call, which creates an object of the templated class.
The first line of the output is always a #include directive, which
drags in the generated templates and (indirectly) the function bodies.
The generated templates do not refer explicitly to any types used in
the original lambda expressions, which is how it can be included
before any user code. The actual types are only bound at the point of
use. Because of this, the clamp parser doesn't have to know what scope
a lambda expression appears in, or where the required types are
defined. This also makes including lambda expressions in templated
code a breeze, since the type binding is done within the template
scope where the expression was originally used.
The clamp preprocessor consists of a lexical analyser (lexer) written
in flex, a parser written in bison and a code generator in plain C++.
The clamp parser mostly tries to ignore everything in the input file,
letting the lexer copy input to output. When the lexer encounters the
lambda keyword, it enters a different mode
("start condition" in flex terminology) in which is behaves like a
normal lexer and supplies tokens to the parser. The parser does some
messy stuff redirecting output and resetting the lexer mode as
necessary.
Note: clamp is actually pretty dumb. It performs purely syntactic
transformations on the input, without really understanding scope,
types or variables. This will no doubt result in some incomprehensible
errors from the C++ compiler if something goes wrong. This is also the
reason that clamp requires the __ctx and
__ref keywords, since it wouldn't otherwise
be able to tell that an expression relies on surrounding context
information.