Skip to content

Implement lazily converted wrapper types phase one: strings#456

Draft
andrewparmet wants to merge 3 commits intoopen-toast:mainfrom
andrewparmet:caching-wrapper-types-v2
Draft

Implement lazily converted wrapper types phase one: strings#456
andrewparmet wants to merge 3 commits intoopen-toast:mainfrom
andrewparmet:caching-wrapper-types-v2

Conversation

@andrewparmet
Copy link
Collaborator

@andrewparmet andrewparmet commented Feb 13, 2026

Begins implementation of replacing naive, eagerly converted wrapped fields with a deferred, lazy implementation. The runtime API introduced in this PR is not the final API, which will be change in a follow-up that generalizes this pattern to all wrapper types; this PR is just the minimal proof-of-concept.

A comparison of generated code:

Before:

  @GeneratedMessage("protokt.v1.testing.Test2")
  public class Test2 private constructor(                                                                                                                                
    @GeneratedProperty(1)                                                                                       
    public val `val`: Bytes,                                                                                                                                             
    @GeneratedProperty(2)                                                                                                                                                
    public val extra: String,                                                                                                                                            
    public val unknownFields: UnknownFieldSet = UnknownFieldSet.empty()                                                                                                  
  ) : AbstractMessage() {                                                                                                                                                
    private val `$messageSize`: Int by lazy {                                                                                                                            
      // ...                                                                                                                                                             
      if (extra.isNotEmpty()) {                                                                                                                                          
        result += sizeOf(18u) + sizeOf(extra)        // recomputes UTF-8 byte length                                                                                     
      }
    }

    override fun serialize(writer: Writer) {
      if (extra.isNotEmpty()) {
        writer.writeTag(18u).write(extra)             // re-encodes String → UTF-8 bytes
      }
    }

    public class Builder {
      public fun build(): Test2 =
        Test2(`val`, extra, unknownFields)
    }

    public companion object Deserializer : AbstractDeserializer<Test2>() {
      override fun deserialize(reader: Reader): Test2 {
        var extra = ""                                // deserialized as String immediately
        // ...
        18u -> extra = reader.readString()            // UTF-8 decode on read
        0u -> return Test2(`val`, extra, UnknownFieldSet.from(unknownFields))
      }
    }
  }

After:

  @GeneratedMessage("protokt.v1.testing.Test2")
  public class Test2 private constructor(
    @GeneratedProperty(1)
    public val `val`: Bytes,
    private val _extra: LazyReference<Bytes, String>,  // holds raw bytes until needed
    public val unknownFields: UnknownFieldSet = UnknownFieldSet.empty()
  ) : AbstractMessage() {
    @GeneratedProperty(2)
    public val extra: String
      get() = _extra.value()                              // lazy decode on first access

    private val `$messageSize`: Int by lazy {
      // ...
      if (_extra.isNotDefault()) {
        result += sizeOf(18u) + _extra.sizeOf()           // O(1) from cached raw bytes length
      }
    }

    override fun serialize(writer: Writer) {
      if (_extra.isNotDefault()) {
        writer.writeTag(18u)
        _extra.writeTo(writer)                            // writes raw bytes directly, no re-encode
      }
    }

    public class Builder {
      public fun build(): Test2 =
        Test2(`val`, LazyReference(extra, StringConverter), unknownFields)
    }

    public companion object Deserializer : AbstractDeserializer<Test2>() {
      override fun deserialize(reader: Reader): Test2 {
        var extra: Bytes? = null                          // stays as raw bytes
        // ...
        18u -> extra = StringConverter.readValidatedBytes(reader)  // validated but not decoded
        0u -> return Test2(
          `val`,
          LazyReference(extra ?: Bytes.empty(), StringConverter),
          UnknownFieldSet.from(unknownFields)
        )
      }
    }
  }

Key differences:

  • Constructor: String val → private val _extra: CachingReference<Bytes, String> with a public getter delegate
  • Deserializer: reader.readString() → StringCachingConverter.readValidatedBytes(reader) (validates UTF-8 but keeps raw bytes)
  • Serialize: writer.write(extra) (re-encodes String→bytes) → _extra.writeTo(writer) (writes cached raw bytes)
  • Size: sizeOf(extra) (recomputes byte length) → _extra.sizeOf() (O(1) from cached bytes length)
  • Builder/equals/hashCode/toString/copy: unchanged public API surface

Benchmark Results: Baseline (main) vs Caching String Fields

  ┌───────────────────┬───────────────┬──────────────┬─────────────────┬─────────────────┬──────────────────────────┐
  │     Benchmark     │ protobuf-java │ protokt main │ protokt caching │ caching vs main │ caching vs protobuf-java │                                                    
  ├───────────────────┼───────────────┼──────────────┼─────────────────┼─────────────────┼──────────────────────────┤
  │ deserializeLarge  │ 1438.7 ±10.0  │ 790.1 ±21.8  │ 793.4 ±37.2     │ +0.4%           │ -44.8%                   │                                                    
  ├───────────────────┼───────────────┼──────────────┼─────────────────┼─────────────────┼──────────────────────────┤                                                    
  │ deserializeMedium │ 3.020 ±0.066  │ 2.348 ±0.028 │ 2.041 ±0.067    │ -13.1%          │ -32.4%                   │
  ├───────────────────┼───────────────┼──────────────┼─────────────────┼─────────────────┼──────────────────────────┤                                                    
  │ deserializeSmall  │ 0.006 ±0.001  │ 0.004 ±0.001 │ 0.004 ±0.001    │ ~0%             │ -33.3%                   │
  ├───────────────────┼───────────────┼──────────────┼─────────────────┼─────────────────┼──────────────────────────┤                                                    
  │ passThroughLarge  │ 3248.6 ±118.8 │ 2695.7 ±68.6 │ 1968.5 ±90.8    │ -27.0%          │ -39.4%                   │                                                    
  ├───────────────────┼───────────────┼──────────────┼─────────────────┼─────────────────┼──────────────────────────┤
  │ passThroughMedium │ 4.723 ±0.148  │ 3.924 ±0.164 │ 3.332 ±0.093    │ -15.1%          │ -29.4%                   │
  ├───────────────────┼───────────────┼──────────────┼─────────────────┼─────────────────┼──────────────────────────┤
  │ passThroughSmall  │ 0.008 ±0.001  │ 0.007 ±0.001 │ 0.005 ±0.001    │ -28.6%          │ -37.5%                   │
  ├───────────────────┼───────────────┼──────────────┼─────────────────┼─────────────────┼──────────────────────────┤
  │ serializeLarge    │ 1220.0 ±3.2   │ 1358.5 ±68.9 │ 889.7 ±17.2     │ -34.5%          │ -27.1%                   │
  ├───────────────────┼───────────────┼──────────────┼─────────────────┼─────────────────┼──────────────────────────┤
  │ serializeMedium   │ 0.916 ±0.042  │ 1.038 ±0.159 │ 0.878 ±0.051    │ -15.4%          │ -4.1%                    │
  ├───────────────────┼───────────────┼──────────────┼─────────────────┼─────────────────┼──────────────────────────┤
  │ serializeSmall    │ 0.002 ±0.001  │ 0.003 ±0.001 │ 0.001 ±0.001    │ -66.7%          │ -50.0%                   │
  └───────────────────┴───────────────┴──────────────┴─────────────────┴─────────────────┴──────────────────────────┘


Key findings:

  • protokt deserialization is already ~1.8x faster than protobuf-java (790ms vs 1439ms for large) - that's the existing baseline
  • protokt with caching is 39% faster than protobuf-java on large pass-through and 29% faster on medium
  • protokt with caching serialization (889ms) now beats protobuf-java serialization (1220ms) - on main it was slower (1358ms)

@andrewparmet andrewparmet changed the title first cut at caching wrapper types Implement lazily converted wrapper types Feb 13, 2026
@andrewparmet andrewparmet changed the title Implement lazily converted wrapper types Implement lazily converted wrapper types phase one: Strings Feb 13, 2026
@andrewparmet andrewparmet changed the title Implement lazily converted wrapper types phase one: Strings Implement lazily converted wrapper types phase one: strings Feb 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant