Message queues and event streams are foundational to modern distributed systems. They decouple producers and consumers, improve resilience, and allow independent scaling. But as systems grow, data contracts between services tend to rot—often silently.
Apache Avro addresses this problem by making schemas explicit, evolvable, and enforceable. This article walks through why Avro works well with message queues and provides a hands-on schema evolution example with Java code.
In this article, you will learn:
Unlike synchronous RPC, message queues:
That looseness is dangerous without a contract.
Typical failure modes:
Avro shifts these failures from runtime surprises to schema-level incompatibilities.
Avro is a schema-based binary serialization format designed for distributed systems.
Key characteristics:
Critically, Avro separates writer schema (used by the producer) from reader schema (used by the consumer).

Producer Service
→ Avro serialize (writer schema)
→ Embed schema ID
→ Publish to queue
Consumer Service
→ Read schema ID
→ Fetch schema from registry
→ Avro deserialize (reader schema)A schema registry (internal or third-party) is assumed. Without one, operational complexity increases dramatically.
Let’s walk through a realistic evolution scenario.
We start with a simple UserCreated event.
{
"type": "record",
"name": "UserCreated",
"namespace": "com.example.events",
"fields": [
{ "name": "userId", "type": "string" },
{ "name": "email", "type": "string" },
{ "name": "createdAt", "type": "long" }
]
}This schema is registered and assigned schema ID = 1.
Schema schema = new Schema.Parser()
.parse(new File("user_created_v1.avsc"));
GenericRecord record = new GenericData.Record(schema);
record.put("userId", "123");
record.put("email", "user@example.com");
record.put("createdAt", System.currentTimeMillis());
ByteArrayOutputStream out = new ByteArrayOutputStream();
DatumWriter<GenericRecord> writer =
new GenericDatumWriter<>(schema);
BinaryEncoder encoder =
EncoderFactory.get().binaryEncoder(out, null);
writer.write(record, encoder);
encoder.flush();
byte[] payload = out.toByteArray();
// publish payload + schema ID (1)Schema readerSchema = new Schema.Parser()
.parse(new File("user_created_v1.avsc"));
DatumReader<GenericRecord> reader =
new GenericDatumReader<>(readerSchema);
BinaryDecoder decoder =
DecoderFactory.get().binaryDecoder(payload, null);
GenericRecord record = reader.read(null, decoder);
String email = record.get("email").toString();So far, nothing special.
user_created_v1.avsc Come From?The user_created_v1.avsc file is not generated at runtime. It is a source-controlled Avro schema definition that acts as the canonical contract for the event.
In practice, teams treat .avsc files the same way they treat API definitions or database migrations.
An Avro schema is a plain JSON document.
The initial user_created_v1.avsc file is typically created manually by the team that owns the event.
Example:
{
"type": "record",
"name": "UserCreated",
"namespace": "com.example.events",
"fields": [
{ "name": "userId", "type": "string" },
{ "name": "email", "type": "string" },
{ "name": "createdAt", "type": "long" }
]
}This file is saved as:
schemas/user_created.avsc
(Notice that we do not version the filename in production systems. Versioning is handled by the schema registry.)
The lifecycle usually looks like this:
Most teams do not hand-register schemas.
Instead:
Conceptually:
schema registry:
register(subject="UserCreated", schema=user_created.avsc)
→ returns schema_idThe application then embeds that schema_id into each message.
When an Avro schema is registered with a schema registry, it is assigned a numeric schema ID (typically an integer).
At runtime, producers do not guess or hardcode this ID — they retrieve it from the registry and embed it directly into the message payload (or headers).
There are two common patterns.
In this model, the message payload has two parts:
[ schema_id ][ avro_binary_data ]+------------+----------------------+
| schema_id | Avro-encoded payload |
+------------+----------------------+schema_id is typically 4 bytes (int32)This pattern is used by many schema-registry implementations.
At startup (or on first use):
.avscschema_idschema_id in memoryAt message send time:
schema_id to output bufferThis example shows exactly how a producer embeds the schema ID.
Schema schema = new Schema.Parser()
.parse(new File("schemas/user_created.avsc"));
// Step 1: register schema and get ID
int schemaId = schemaRegistryClient.register(
"UserCreated",
schema
);In practice:
schemaId is cachedGenericRecord record = new GenericData.Record(schema);
record.put("userId", "123");
record.put("email", "user@example.com");
record.put("createdAt", System.currentTimeMillis());
ByteArrayOutputStream out = new ByteArrayOutputStream();
// Step 2: write schema ID first
DataOutputStream dataOut = new DataOutputStream(out);
dataOut.writeInt(schemaId);
// Step 3: write Avro payload
DatumWriter<GenericRecord> writer =
new GenericDatumWriter<>(schema);
BinaryEncoder encoder =
EncoderFactory.get().binaryEncoder(out, null);
writer.write(record, encoder);
encoder.flush();
byte[] message = out.toByteArray();
// publish message to queueAt this point, the payload contains both:
Consumers reverse the process.
ByteArrayInputStream in = new ByteArrayInputStream(message);
DataInputStream dataIn = new DataInputStream(in);
// Step 1: read schema ID
int schemaId = dataIn.readInt();
// Step 2: fetch writer schema
Schema writerSchema =
schemaRegistryClient.getById(schemaId);
// Step 3: deserialize using reader schema
DatumReader<GenericRecord> reader =
new GenericDatumReader<>(writerSchema, readerSchema);
BinaryDecoder decoder =
DecoderFactory.get().binaryDecoder(in, null);
GenericRecord record = reader.read(null, decoder);This is where schema evolution happens:
Some systems place the schema ID in message headers instead of the payload:
headers:
schema-id = 42
payload:
Avro binary dataThis has trade-offs:
Payload-embedded IDs are more portable.
Schema IDs:
Best practice:
This approach:
The schema ID is the link between binary data and meaning.
When people say “the schema ID is embedded in the message”, what they mean is: The producer writes the registry-assigned schema ID into the message before the Avro payload, and the consumer reads it first to resolve the correct schema. Once you understand this, Avro’s evolution model becomes concrete rather than magical.
Avro supports generating schemas from Java classes, but we avoid this for event messaging.
Reasons:
Instead, we follow schema-first development:
Even though the blog uses names like v1 and v2 for clarity:
In production, your code typically loads:
new Schema.Parser().parse(
getClass().getResourceAsStream("/schemas/user_created.avsc")
);That schema may already have multiple historical versions registered.
The .avsc file is:
Once teams adopt this mindset, schema evolution becomes predictable instead of risky.
Now we want to:
displayName{
"type": "record",
"name": "UserCreated",
"namespace": "com.example.events",
"fields": [
{ "name": "userId", "type": "string" },
{ "name": "email", "type": "string" },
{ "name": "displayName",
"type": ["null", "string"],
"default": null
},
{ "name": "createdAt", "type": "long" }
]
}This schema is backward compatible:
displayNameRegistered as schema ID = 2.
record.put("displayName", "Jane Doe");
// other fields unchangedThat’s it.
No version checks. No feature flags. No branching logic.
The consumer still uses user_created_v1.avsc.
Avro automatically:
displayName fieldNo redeploy required.
Later, the consumer updates to v2.
Object displayName = record.get("displayName");
if (displayName != null) {
log.info("Display name: {}", displayName.toString());
}Old messages (written with v1):
displayNamenull)This is safe forward compatibility.
Examples of dangerous changes:
email without a defaultuserId without an aliasstring → intSchemas are APIs. Breaking changes must be treated as such.
Avro supports two styles:
In messaging systems with independent deployment cycles, GenericRecord is often the safer default.
Set compatibility rules:
BACKWARD for event streamsAvoid:
UserCreatedV1
UserCreatedV2
Prefer:
UserCreated (evolved schema)
Versioning is implicit in the registry.
Avro excels when:
It is less suitable for:
Avro doesn’t just serialize data—it forces discipline.
By making schemas explicit and evolvable, it:
The real cost isn’t Avro itself—it’s failing to treat schemas as APIs.
A schema registry is a service that stores, versions, and enforces compatibility rules for schemas. You don’t need a single “official” one—several mature options exist, and teams choose based on ecosystem fit, operational model, and governance needs.
Below is an overview of commonly used schema registry tools.
Most widely used implementation
Supports
Key features
Operational model
Best fit
Kafka-compatible, vendor-neutral alternative
Supports
Key features
Best fit
Lightweight, Kafka-compatible registry
Supports
Key features
Best fit
AWS-native schema registry
Supports
Key features
Trade-offs
Best fit
Azure-native schema registry
Supports
Key features
Best fit
Some teams build internal schema registries using:
When this makes sense
When it doesn’t
Regardless of vendor, a production-ready registry should support:
If any of these are missing, schema evolution will become fragile.
A schema registry is not just a database of schemas — it is the enforcement point for data contracts.
The choice of tool matters less than:
Once those are in place, Avro-based messaging scales cleanly across teams and services.
You need to load content from reCAPTCHA to submit the form. Please note that doing so will share data with third-party providers.
More Information