How to use Avro schema for serialization with Kafka
When it comes to streaming records of data from one microservices to another within a system, Kafka usually comes as the first choice. But in doing that we also need to keep in mind the performance part of the serialization of data sent over kafka topics and ways to optimize it . So Avro was the first suggestion that came in as its faster, more robust and supports JSON as well protoBuf for serialization.
Problem with Kafka byte serializer
- Kafka takes bytes as an input and publishes to consumer which takes a lot of time in case of record size in thousands.
- No data verification is being done on the end of consumer for the format and structure of data received via kafka serializer if not done explicitly
What is Schema registry and need for it
As we saw in the above section that problems of sending data in byte format brings in issues of data verification and delay. For that reason , Confluent’s Schema Registry comes in as solution with below salient attributes :-
- It provides a serving layer for the metadata.
- It provides a RESTful interface for storing and retrieving Avro schemas.
- It stores a versioned history of all schemas, provides multiple compatibility settings and allows evolution of schemas according to the configured compatibility setting.
- It provides serializers that plug into Kafka clients that handle schema storage and retrieval for Kafka messages that are sent in the Avro format.
Data flow between Producer and Consumer using Schema Registry
Schema Registry provides a way to manage Avro Schemas for Kafka consumers and producers to transfer data in Record as shown below. Avro provides schema migration to stream data between producers and consumers in a micro-services based architecture.
Advantages of using Avro Schema
- Using Avro schema specifies the structure, type and meaning of the data.
- Data can be encoded more efficiently with a schema.
- Data is fully typed.
- Data is compressed automatically.
- Schema can be evolving over time.
- Stores version history
Create schema file with .avsc format
Below is a sample AVRO record for an employeeInfo JSON data containing basic details of an employee and this file is stored in .avsc format.
{
"type": "record",
"name": "EmployeeInfo",
"namespace": "com.domain.avro",
"fields": [
{
"name": "employeeId",
"type": "long"
},
{
"name": "employeeName",
"type": "string"
},
{
"name": "employeeAddress",
"type": ["null", "string"]
},
{
"name": "employeeSal",
"type": ["null", "integer"]
}
]
}
Maven plugin required to generate avro schemas
<plugin>
<groupId>org.apache.avro</groupId>
<artifactId>avro-maven-plugin</artifactId>
<version>${avro-maven-plugin.version}</version>
<executions>
<execution>
<id>schemas</id>
<phase>generate-sources</phase>
<goals>
<goal>schema</goal>
<goal>protocol</goal>
<goal>idl-protocol</goal>
</goals>
<configuration>
<sourceDirectory>${project.basedir}/src/main/resources/</sourceDirectory>
<outputDirectory>${project.build.directory}/generated-sources/</outputDirectory>
</configuration>
</execution>
</executions>
</plugin>
Conclusion: Confluent provides Schema Registry to manage Avro Schemas for Kafka consumers and producers. As the data can grow over time that means the schema needs to evolve over time which is done by schema compatibility feature. Now this is something very interesting pattern called schema evolution which is one of the several big advantages of this AVRO schema and will be my next thing to explore.
Reference:
https://docs.confluent.io/platform/current/schema-registry/index.html