(UPDATED) Understanding how to parse per-field protobuf options in your protoc plugins
Protocol Buffers (“protobuf”) is undoubtedly a great tool to standardize RPC calls, and other tools/features that are related to it.
One thing, however, that I personally always felt problematic was the lack of examples and documentation, especially in light of the relatively complicated internal structure of protobuf protocol.
Recently I had to confront this problem: namely I wanted to use a per-field protobuf option, and use it to generate related resources using a protoc
plugin. Specifically, I was trying to parse a protobuf definition that looks like the following:
message MyMessage {
string field1 = 1 [ (myoption) = { optfield: "optvalue" } ];
}
Needless to say, I spent more time than I wanted trying to figure out how exactly to access the option values. This article aims to shed some light into how to do this, so that you do not have to spend so much trying to figure it out for yourselves when/if you encounter the same problem.
The Goal
The goal for this article is to declare a protobuf message to be used as a per-field option, and write a protoc
plugin, which we will call protoc-gen-myoption
, to parse the option out to generate resources (which may or may not include code) from this metadata.
Note that in this article we are only concerned with writing protoc
plugins using Go.
The following is an example of how we wanted to use per-field options:
# Message object that can be associated with a field of
# other messages as per-field option
message OptionMessage {
string fieldA = 1;
int64 fieldB = 2;
# ... possibly more fields ...
}
# Register the message as a field option so protoc recognizes
# it while parsing
extend google.protobuf.FieldOptions {
OptionMessage myoption = 123456;
}
# elsewhere...
message FooRequest {
string field1 = 1 [ (myoption) = { field!: "bar" } ];
# ... possibly more fields ...
}
Understanding the Pipeline
If you have ever worked with protoc
, you know that you can specify one or more plugins to emit some artifact from the given .proto
file(s). For example, to generate Go bindings, you would use the protoc-gen-go
plugin, which can be automatically invoked by specifying the -go_out=xxxx
CLI option.
protoc --go_out=. # emit go code
protoc --go_out=. --foo_out=. # emit go code, and also let `protoc-gen-foo` run too
One thing you need to adjust when using options is that you will have to separate your compilation phase into at least two steps: one to compile the language-specific bindings, and one to process the options.
Therefore following will not work:
protoc --go_out=. --myoption_out=.
Instead we want this, in two steps:
protoc --go_out=.
protoc --myoption_out=.
The reason why this has to be done will be explained later. For the time being please just accept that that’s the case.
Basic protoc Plugin
Below is a skeleton code for a protoc
plugin. It iterates through all the input files, messages, and fields.
func main() {
var options protogen.Options
options.Run(func(gen *protogen.Plugin) error {
for _, file := range gen.Files {
for _, msg := range file.Messages {
for _, field := range msg.Fields {
... # (code to access per-field options)
}
}
}
return nil
})
}
It’s a simple hierarchy that you can easily navigate. At some point you will find a protogen.Field
object stored in the field
variable which has a per-field option associated with it.
Working With The Option Objects (Updated)
The way to access the option object. You just need to pass the right arguments to proto.GetExtension
method:
import (
"google.golang.org/protobuf/proto"
...
)
for _, field := range msg.Fields {
opts, ok := proto.GetExtension(field.Desc.Options(), pkg.E_MyOption).(*pkg.MyOption)
if !ok || opts == nil {
continue
}
opts.Field1 // do something with these fields
opts.Field2
}
The first argument, field.Desc.Options()
returns a proto.Message
object representing the option message associated with the field, if any (personally I think this is the most crucial bit and yet not a single sample code teaches us that any field/message/etc’s .Desc.Options()
is suitable to be used by proto.GetExtension
)
The second argument is generated as a result of compiling your protobuf message for OptionMessage
. When compiled, there will be a variable named pkg.E_OptionName
. The pkg
part is the package name you specified via option go_package = ...
, and the variable name starts with a E_
for “extension” followed by the option name you registered in extend google.protobuf.FieldOption { ... }
(but capitalized).
Using these, you will be able to access the option message associated with the target object.
The rest of this article talks about the “old” way that I was doing this. You do not need to know any of it to use extensions, but if you read it, you will get a sense of how the extension mechanism is internally implemented.
Working With The Option Objects (Convoluted Way)
UPDATE!! Everything below here is not wrong, but it wasn’t correct. Please read the section “Working With The Option Objects (Updated)” for the “right” usage.
Accessing the Option Object
Per-field options are stored in the descriptor for the field. You can call the Option()
method on the descriptor to get to this object.
opts, ok := field.Desc.Options().(*descriptorpb.FieldOptions)
if !ok || opts == nil {
continue
}
Note that you need to convert the type of the option object to *descriptorpb.FieldOptions
. This sort of conversion is an implementation detail of the Go protobuf plugin mechanism, so you will have to live with it.
The FieldOptions
object unfortunately has no way of directly providing us with the actual message, though.
When the OptionMessage
protobuf message was declared, you would think that you would be able to convert the opts
variable above to a Go object of *OptionMessage
type, but because of how protoc
works internally, it does not contain the code from generated language specific bindings (if it did, it would create a circular dependency, and things would not be good)
Instead, the opts
variable contains data in a format that conceptually resembles a multi-level map, whose slots each contains the data from the options, keyed by the associated descriptor. So you need to lookup the data using the descriptor for the option you are looking for.
This descriptor becomes available through a variable that is declared when you compiled the OptionMessage
. It will be available in the format pkg.E_OptionName
where pkg
is the package name you specified via option go_package = ...
, and the variable name starts with a E_
for “extension” followed by the option name you registered in extend google.protobuf.FieldOption { ... }
(but capitalized).
opt := opts.ProtoReflect().Get(pkg.E_Myoption.TypeDescriptor()).Message()
The ProtoReflect()
bit gives us a handle to manipulate/retrieve the data from the FieldDescriptor
object. But the more important bit is the fact that pkg
refers to the code generated by protoc
for OptionMessage
.
This generated code would not be available unless we went through the code generation in two phases. This is why we needed to separate the call to protoc
in two phases earlier.
Once the object is fetched through the Get()
method, it further needs to be converted to its specific type, which is a protoreflect.Message()
. We are using Message()
here because our option is a protobuf message. If the option was something else, you would have to convert to its correct type, such as List
or some such.
Accessing The Option Fields
Now we have the object that contains the options! However we still can’t access the values as if the object was a normal Go struct.
The Message()
method returns a protoreflect.Message
object, but it’s still a glorified map, so you need to lookup the stored field values using a key. The key is a protoreflect.FieldDescriptor
object, which you need to fetch from somewhere.
Since we are using a message as the container for the options, you can convert field names into FieldDescriptors
and use them as keys. To do this, you first need to get a protoreflect.FieldDescriptors
(note the “s” at the end), and use its ByName()
method to lookup the corresponding protoreflect.FieldDescriptor
object.
fields := opt.Descriptor().Fields() // extract the FieldDescriptors from Message
fd1 := fields.ByName("field1") // extrat the field descriptor for field1
Then you can use this descriptor to finally look up the associated value:
val := opt.Get(fd1)
Unfortunately we are not done yet. The value you got from the final Get()
call is of type protoreflect.Value
, which needs a type conversion like reflect.Value
to be usable. In our case field1
is a string, so we can use the String()
method:
stringValue := val.String()
And voila! Combining all of these, you get the plugin to do something interesting with the per-field options:
func main() {
var options protogen.Options
options.Run(func(gen *protogen.Plugin) error {
for _, file := range gen.Files {
for _, msg := range file.Messages {
for _, field := range msg.Fields {
opts, ok := field.Desc.Options().(*descriptorpb.FieldOptions)
if !ok || opts == nil {
continue
}
opt := opts.ProtoReflect.Get(pkg.E_Myoption.TypeDescriptor()).Message()
fields := opt.Descriptor().Fields()
v1 := opt.Get(fields.ByName("field1")).String()
v2 := opt.Get(fields.ByName("field2")).Int()
// do somethning interesting using v1 and v2
}
}
}
return nil
})
}
That’s it! Now you know how to write protoc
plugins that can work with per-field options! Happy hacking!