(UPDATED) Understanding how to parse per-field protobuf options in your protoc plugins

Daisuke Maki
6 min readDec 26, 2022

Protocol Buffers (“protobuf”) is undoubtedly a great tool to standardize RPC calls, and other tools/features that are related to it.

One thing, however, that I personally always felt problematic was the lack of examples and documentation, especially in light of the relatively complicated internal structure of protobuf protocol.

Recently I had to confront this problem: namely I wanted to use a per-field protobuf option, and use it to generate related resources using a protoc plugin. Specifically, I was trying to parse a protobuf definition that looks like the following:

message MyMessage {
string field1 = 1 [ (myoption) = { optfield: "optvalue" } ];
}

Needless to say, I spent more time than I wanted trying to figure out how exactly to access the option values. This article aims to shed some light into how to do this, so that you do not have to spend so much trying to figure it out for yourselves when/if you encounter the same problem.

The Goal

The goal for this article is to declare a protobuf message to be used as a per-field option, and write a protoc plugin, which we will call protoc-gen-myoption , to parse the option out to generate resources (which may or may not include code) from this metadata.

Note that in this article we are only concerned with writing protoc plugins using Go.

The following is an example of how we wanted to use per-field options:

# Message object that can be associated with a field of
# other messages as per-field option
message OptionMessage {
string fieldA = 1;
int64 fieldB = 2;
# ... possibly more fields ...
}

# Register the message as a field option so protoc recognizes
# it while parsing
extend google.protobuf.FieldOptions {
OptionMessage myoption = 123456;
}

# elsewhere...
message FooRequest {
string field1 = 1 [ (myoption) = { field!: "bar" } ];
# ... possibly more fields ...
}

Understanding the Pipeline

If you have ever worked with protoc , you know that you can specify one or more plugins to emit some artifact from the given .proto file(s). For example, to generate Go bindings, you would use the protoc-gen-go plugin, which can be automatically invoked by specifying the -go_out=xxxx CLI option.

protoc --go_out=. # emit go code 
protoc --go_out=. --foo_out=. # emit go code, and also let `protoc-gen-foo` run too

One thing you need to adjust when using options is that you will have to separate your compilation phase into at least two steps: one to compile the language-specific bindings, and one to process the options.

Therefore following will not work:

protoc --go_out=. --myoption_out=.

Instead we want this, in two steps:

protoc --go_out=.
protoc --myoption_out=.

The reason why this has to be done will be explained later. For the time being please just accept that that’s the case.

Basic protoc Plugin

Below is a skeleton code for a protoc plugin. It iterates through all the input files, messages, and fields.

func main() {
var options protogen.Options
options.Run(func(gen *protogen.Plugin) error {
for _, file := range gen.Files {
for _, msg := range file.Messages {
for _, field := range msg.Fields {
... # (code to access per-field options)
}
}
}
return nil
})
}

It’s a simple hierarchy that you can easily navigate. At some point you will find a protogen.Field object stored in the field variable which has a per-field option associated with it.

Working With The Option Objects (Updated)

The way to access the option object. You just need to pass the right arguments to proto.GetExtension method:

import (
"google.golang.org/protobuf/proto"
...
)

for _, field := range msg.Fields {
opts, ok := proto.GetExtension(field.Desc.Options(), pkg.E_MyOption).(*pkg.MyOption)
if !ok || opts == nil {
continue
}

opts.Field1 // do something with these fields
opts.Field2
}

The first argument, field.Desc.Options() returns a proto.Message object representing the option message associated with the field, if any (personally I think this is the most crucial bit and yet not a single sample code teaches us that any field/message/etc’s .Desc.Options() is suitable to be used by proto.GetExtension )

The second argument is generated as a result of compiling your protobuf message for OptionMessage . When compiled, there will be a variable named pkg.E_OptionName. The pkg part is the package name you specified via option go_package = ... , and the variable name starts with a E_ for “extension” followed by the option name you registered in extend google.protobuf.FieldOption { ... } (but capitalized).

Using these, you will be able to access the option message associated with the target object.

The rest of this article talks about the “old” way that I was doing this. You do not need to know any of it to use extensions, but if you read it, you will get a sense of how the extension mechanism is internally implemented.

Working With The Option Objects (Convoluted Way)

UPDATE!! Everything below here is not wrong, but it wasn’t correct. Please read the section “Working With The Option Objects (Updated)” for the “right” usage.

Accessing the Option Object

Per-field options are stored in the descriptor for the field. You can call the Option() method on the descriptor to get to this object.

opts, ok := field.Desc.Options().(*descriptorpb.FieldOptions)
if !ok || opts == nil {
continue
}

Note that you need to convert the type of the option object to *descriptorpb.FieldOptions . This sort of conversion is an implementation detail of the Go protobuf plugin mechanism, so you will have to live with it.

The FieldOptions object unfortunately has no way of directly providing us with the actual message, though.

When the OptionMessage protobuf message was declared, you would think that you would be able to convert the opts variable above to a Go object of *OptionMessage type, but because of how protoc works internally, it does not contain the code from generated language specific bindings (if it did, it would create a circular dependency, and things would not be good)

Instead, the opts variable contains data in a format that conceptually resembles a multi-level map, whose slots each contains the data from the options, keyed by the associated descriptor. So you need to lookup the data using the descriptor for the option you are looking for.

This descriptor becomes available through a variable that is declared when you compiled the OptionMessage . It will be available in the format pkg.E_OptionName where pkg is the package name you specified via option go_package = ... , and the variable name starts with a E_ for “extension” followed by the option name you registered in extend google.protobuf.FieldOption { ... } (but capitalized).

opt := opts.ProtoReflect().Get(pkg.E_Myoption.TypeDescriptor()).Message()

The ProtoReflect() bit gives us a handle to manipulate/retrieve the data from the FieldDescriptor object. But the more important bit is the fact that pkg refers to the code generated by protoc for OptionMessage .

This generated code would not be available unless we went through the code generation in two phases. This is why we needed to separate the call to protoc in two phases earlier.

Once the object is fetched through the Get() method, it further needs to be converted to its specific type, which is a protoreflect.Message() . We are using Message() here because our option is a protobuf message. If the option was something else, you would have to convert to its correct type, such as List or some such.

Accessing The Option Fields

Now we have the object that contains the options! However we still can’t access the values as if the object was a normal Go struct.

The Message() method returns a protoreflect.Message object, but it’s still a glorified map, so you need to lookup the stored field values using a key. The key is a protoreflect.FieldDescriptor object, which you need to fetch from somewhere.

Since we are using a message as the container for the options, you can convert field names into FieldDescriptors and use them as keys. To do this, you first need to get a protoreflect.FieldDescriptors (note the “s” at the end), and use its ByName() method to lookup the corresponding protoreflect.FieldDescriptor object.

fields := opt.Descriptor().Fields() // extract the FieldDescriptors from Message
fd1 := fields.ByName("field1") // extrat the field descriptor for field1

Then you can use this descriptor to finally look up the associated value:

val := opt.Get(fd1)

Unfortunately we are not done yet. The value you got from the final Get() call is of type protoreflect.Value , which needs a type conversion like reflect.Value to be usable. In our case field1 is a string, so we can use the String() method:

stringValue := val.String()

And voila! Combining all of these, you get the plugin to do something interesting with the per-field options:

func main() {
var options protogen.Options
options.Run(func(gen *protogen.Plugin) error {
for _, file := range gen.Files {
for _, msg := range file.Messages {
for _, field := range msg.Fields {
opts, ok := field.Desc.Options().(*descriptorpb.FieldOptions)
if !ok || opts == nil {
continue
}

opt := opts.ProtoReflect.Get(pkg.E_Myoption.TypeDescriptor()).Message()
fields := opt.Descriptor().Fields()
v1 := opt.Get(fields.ByName("field1")).String()
v2 := opt.Get(fields.ByName("field2")).Int()
// do somethning interesting using v1 and v2
}
}
}
return nil
})
}

That’s it! Now you know how to write protoc plugins that can work with per-field options! Happy hacking!

--

--

Daisuke Maki

Go/perl hacker; author of peco; works @ Mercari; ex-mastermind of builderscon; Proud father of three boys;