Writing a Handler for Custom Ruby Syntax (DSL)

This guide will explain how to use YARD to document a Domain Specific Language (DSL) or custom Ruby syntax.

A Hello World Handler

The most basic handler is implemented by inheriting from the YARD::Handlers::Ruby::Base class. By subclassing, our handler is immediately registered and is checked whenever a statement is parsed. The following is the most basic handler.

class MyHandler < YARD::Handlers::Ruby::Base
  handles :class

  process do
    puts "Handling a class statement!"
  end
end

This handler will tell us whenever a class is processed.

Note: the process do ... end block is equivalent to defining #process method.

How Handlers Get Called

To understand how and when this handler is called, we must briefly explain how YARD processes source files. When a Ruby source file is parsed, it is done statement by statement. For each statement, YARD checks the list of registered handlers for all of the handlers that are set to "handle" the statement. Whichever handlers match will be called (by executing the #process method).

Nodes and the AST

Statements are passed into the #process method as an Abstract Syntax Tree (AST). Each node in the AST has a #type which uniquely identifies the node type. YARD uses Ripper to parse the AST, and therefore a full list of node types can be found by running ruby -rripper -e 'puts Ripper::EVENTS' or in irb:

>> require 'ripper'
=> true
>> Ripper::EVENTS
=> [:BEGIN, :END, :alias, :alias_error, :aref, :aref_field,
 :arg_ambiguous, :arg_paren, :args_add, :args_add_block, :args_add_star,
 :args_new, :array, :assign, :assign_error, :assoc_new,
 :assoclist_from_args, :bare_assoc_hash, :begin, :binary, :block_var,
 :block_var_add_block, :block_var_add_star, :blockarg, :bodystmt,
 :brace_block, :break, :call, :case, :class, :class_name_error, :command,
 :command_call, :const_path_field, :const_path_ref, :const_ref, :def,
 :defined, :defs, :do_block, :dot2, :dot3, :dyna_symbol, :else, :elsif,
 :ensure, :excessed_comma, :fcall, :field, :for, :hash, :if, :if_mod,
 :ifop, :lambda, :magic_comment, :massign, :method_add_arg,
 :method_add_block, :mlhs_add, :mlhs_add_star, :mlhs_new, :mlhs_paren,
 :module, :mrhs_add, :mrhs_add_star, :mrhs_new, :mrhs_new_from_args,
 :next, :opassign, :operator_ambiguous, :param_error, :params, :paren,
 :parse_error, :program, :qwords_add, :qwords_new, :redo, :regexp_add,
 :regexp_literal, :regexp_new, :rescue, :rescue_mod, :rest_param, :retry,
 :return, :return0, :sclass, :stmts_add, :stmts_new, :string_add,
 :string_concat, :string_content, :string_dvar, :string_embexpr,
 :string_literal, :super, :symbol, :symbol_literal, :top_const_field,
 :top_const_ref, :unary, :undef, :unless, :unless_mod, :until, :until_mod,
 :var_alias, :var_field, :var_ref, :void_stmt, :when, :while, :while_mod,
 :word_add, :word_new, :words_add, :words_new, :xstring_add,
 :xstring_literal, :xstring_new, :yield, :yield0, :zsuper, :CHAR,
 :__end__, :backref, :backtick, :comma, :comment, :const, :cvar, :embdoc,
 :embdoc_beg, :embdoc_end, :embexpr_beg, :embexpr_end, :embvar, :float,
 :gvar, :heredoc_beg, :heredoc_end, :ident, :ignored_nl, :int, :ivar, :kw,
 :label, :lbrace, :lbracket, :lparen, :nl, :op, :period, :qwords_beg,
 :rbrace, :rbracket, :regexp_beg, :regexp_end, :rparen, :semicolon, :sp,
 :symbeg, :tlambda, :tlambeg, :tstring_beg, :tstring_content,
 :tstring_end, :words_beg, :words_sep]

You should consult Ripper documentation on the meaning of each node type, though currently the documentation for these nodes is sparse.

You do not need to know each node, just that there are many kinds of nodes to express the various Ruby statements. We will use these nodes to tell our handler what statement to match.

Matchers

The handles statement above therefore describes to YARD which statements a handler should process. We call these "matchers", because they determine if the current statement matches the handler.

The most basic matcher is a Symbol value that represents the node type of the statement. In our example above, we are looking for any statement which is represented by the :class node, also known as the "class" statement. A full list of nodes can be found in the Ripper documentation.

A handler can have multiple handles statements and multiple matchers in each statement. The following is also valid:

class MyHandler < YARD::Handlers::Ruby::Base
  handles :class, :sclass
  handles :module

  process do end
end

The above handler would handle classes and modules.

Note: :sclass is the node for class << obj blocks.

Meta and Special Matchers

We discussed basic matchers based on a node type, but you can also create more complex custom matchers by subclassing the HandlesExtension class which responds to #matches?. YARD has a few of these matchers already available for common tasks, like matching method calls and conditionals.

Specifically, the new-style handlers provide the two matcher extensions method_call and meta_type. Which can be used in the form:

handles method_call(:describe)

Which will match the method call describe in the forms:

object.describe do ... end
describe(foo)
describe 'a', 'b', 'c'
...

You can also match all conditionals (if, unless, etc.) in one shot with:

handles meta_type(:condition)

Which calls #condition? on the node. A full set of meta-types that can be tested for is found in the AstNode class.

Creating a Simple DSL Handler

Now that we have the basics out of the way, we can create our first handler for a DSL syntax. Let's say, for our example, that our framework has a method cattr_accessor that we want to document in our HTML documentation as class level read/write attributes. To show an example, we want to document this:

class OurClass
  cattr_accessor :foo
end

As if it were written like this:

class OurClass
  class << self
    attr_accessor :foo
  end
end

With YARD, it's quite simple. Here is our handler:

class ClassAttributeHandler < YARD::Handlers::Ruby::AttributeHandler
  handles method_call(:cattr_accessor)
  namespace_only

  process do
    push_state(:scope => :class) { super }
  end
end

First we should note that we've subclassed the AttributeHandler class to do most of the legwork in creating our actual attribute objects for us, since our DSL is basically an attribute but in the "class" scope. We then setup a matcher for the cattr_accessor method call (described above).

You'll now notice something we never discussed before, the namespace_only method. This declaration tells our handler that we should only match method calls inside a namespace (class or module), not inside a method. This is not strictly necessary, but it avoids dealing with dynamic attributes and method calls that may not really be attribute declarations at all.

Our process method simply calls #push_state to set our scope to "class" level before calling super and running the AttributeHandler's process method. This basically makes our AttributeHandler class run inside the class level and create attributes on our class rather than as instance methods.

Creating and Modifying Objects in a Handler and Processing Blocks

We just saw a very simple handler that didn't do very much manipulation or object creation. Often, however, the purpose of a handler is to create a new CodeObject or modify an existing one. To illustrate how to create and manipulate these code objects in YARD, let's look at a very simple DSL that creates new method objects that we'd want to document. Our DSL would create instance methods using the function "methodify":

class SomeClass
  methodify "foo" do
    raise NotImplementedError
  end
end

In the above example, we'd want to document "foo" as an instance method inside of "SomeClass". This time we will not subclass an existing handler, but rather we will create the method object ourselves. Let's look at the handler code to achieve this.

class MethodifyHandler < YARD::Handlers::Ruby::Base
  handles method_call(:methodify)
  namespace_only

  process do
    name = statement.parameters.first.jump(:tstring_content, :ident).source
    object = YARD::CodeObjects::MethodObject.new(namespace, name)
    register(object)
    parse_block(statement.last.last, :owner => object)

    # modify the object
    object.dynamic = true

    # add custom metadata to the object
    object['custom_field'] = 'Generated by Methodify'
  end
end

From the previous example you should already be familiar with the first few lines of this handler. We are matching a method call for "methodify" inside a namespace.

The process method is where it all gets interesting. On the first line of the method you will see that we access the statement object, which pertains to the root node of our current statement. Because our statement is a method call, we are dealing with a MethodCallNode which has a list of parameters. We then take the first parameter and "jump" inside the string's quotes and get the inner text, which will become our method name. The next line creates our MethodObject by name in our current "namespace" (the current lexical module/class).

Now we need to #register the object. This method is not strictly necessary, but is a helper method in handlers used to add common attributes to an object, like line range for the source code, file name the object is located in, source language, and other attributes.

We then parse the block (the inside of the method). YARD by default does not parse statements inside a block unless told to do so with this method. Again, it not strictly necessary, but it allows YARD to run handlers for statements inside of our method (like generating a tag for that "raise" method). The #parse_block method does this for us, and takes two parameters: the node with the block and any extra state information to push while inside the block (similar to the push_state method we saw before). statement.last.last is the list of statements inside our block. For our state, we use :owner to specify that we are inside of the "foo" method. We use :owner instead of :namespace because a method is not a namespace. To clarify, :owner is a special state object to keep track of a lexical position inside non-namespace objects like methods. The distinction between an owner and a namespace is important because of Ruby's name resolution rules (it must always know what "namespace" it is inside of).

After we parsed the method contents, we set some more data on our new object. Neither of these are necessary, they are just here to illustrate that we can modify our object after it's been created. First we make it "dynamic", because it was generated dynamically (just as a note to the user). We then create a custom field on our object that will store a little notice that the method was created with our DSL. We could utilize this information later in a custom theme, if we wanted.

Running Our Handlers in YARD

We talked about how to implement handlers, but you may still be wondering where this Ruby code goes and how we call on it. There are a few ways to answer this question, but in both cases we would create a separate source .rb file with our handler and other extension code, and load it in our runtime. A good place to put extensions is in a yard_extensions.rb file in the root of the project, or create a separate directory for these files.

If you're running inside of a Rake task, we need only to require our Ruby source file and have the handlers loaded into the runtime. The top of your Rakefile would look like:

require 'yard'
require_relative './yard_extensions'

If you're running the yardoc tool from the command line, there is a -e (--load) command-line switch to load a Ruby file before parsing source. In this case, you would use the command:

$ yardoc -e yard_extensions.rb 'lib/**/*.rb'

You can also create a plugin that is installed in your gem library and automatically loaded by YARD.