How to use Avro schema for serialization with Kafka

Problem with Kafka byte serializer

  • Kafka takes bytes as an input and publishes to consumer which takes a lot of time in case of record size in thousands.
  • No data verification is being done on the end of consumer for the format and structure of data received via kafka serializer if not done explicitly

What is Schema registry and need for it

As we saw in the above section that problems of sending data in byte format brings in issues of data verification and delay. For that reason , Confluent’s Schema Registry comes in as solution with below salient attributes :-

  • It provides a serving layer for the metadata.
  • It provides a RESTful interface for storing and retrieving Avro schemas.
  • It stores a versioned history of all schemas, provides multiple compatibility settings and allows evolution of schemas according to the configured compatibility setting.
  • It provides serializers that plug into Kafka clients that handle schema storage and retrieval for Kafka messages that are sent in the Avro format.

Data flow between Producer and Consumer using Schema Registry

Schema Registry provides a way to manage Avro Schemas for Kafka consumers and producers to transfer data in Record as shown below. Avro provides schema migration to stream data between producers and consumers in a micro-services based architecture.

Data flow between Producer and Consumer

Advantages of using Avro Schema

  • Using Avro schema specifies the structure, type and meaning of the data.
  • Data can be encoded more efficiently with a schema.
  • Data is fully typed.
  • Data is compressed automatically.
  • Schema can be evolving over time.
  • Stores version history
Skeleton of schema registry

Create schema file with .avsc format

Below is a sample AVRO record for an employeeInfo JSON data containing basic details of an employee and this file is stored in .avsc format.

{
"type": "record",
"name": "EmployeeInfo",
"namespace": "com.domain.avro",
"fields": [
{
"name": "employeeId",
"type": "long"
},
{
"name": "employeeName",
"type": "string"
},
{
"name": "employeeAddress",
"type": ["null", "string"]
},
{
"name": "employeeSal",
"type": ["null", "integer"]
}
]
}

Maven plugin required to generate avro schemas

<plugin>
<groupId>org.apache.avro</groupId>
<artifactId>avro-maven-plugin</artifactId>
<version>${avro-maven-plugin.version}</version>
<executions>
<execution>
<id>schemas</id>
<phase>generate-sources</phase>
<goals>
<goal>schema</goal>
<goal>protocol</goal>
<goal>idl-protocol</goal>
</goals>
<configuration>
<sourceDirectory>${project.basedir}/src/main/resources/</sourceDirectory>
<outputDirectory>${project.build.directory}/generated-sources/</outputDirectory>
</configuration>
</execution>
</executions>
</plugin>

Conclusion: Confluent provides Schema Registry to manage Avro Schemas for Kafka consumers and producers. As the data can grow over time that means the schema needs to evolve over time which is done by schema compatibility feature. Now this is something very interesting pattern called schema evolution which is one of the several big advantages of this AVRO schema and will be my next thing to explore.

Reference:
https://docs.confluent.io/platform/current/schema-registry/index.html

--

--

--

Tech Lead from India. I write about Technology & Public speaking Dreamer. Thinker. Cloud Enthusiast. My Youtube channel link -https://tinyurl.com/yn5av8sc

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

How to receive an unfinished project

Creating Github profile Readme

Flash Stock Rom on Celkon a105 plus

Flash Android phone

Custom Text Highlighting with CSS

#build #medical #clinic #Metaverse → watch video ON YOUTUBE ,

Linux Booting Process

Methods to Login Azure Container Registry

gRPC for microservices communication

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Vidhita Kher

Vidhita Kher

Tech Lead from India. I write about Technology & Public speaking Dreamer. Thinker. Cloud Enthusiast. My Youtube channel link -https://tinyurl.com/yn5av8sc

More from Medium

KafkaAvroSerializer: Efficient way to serialize messages with Avro to a Kafka topic

How to solve the issue of querying Kafka Streaming Data? Writing а KSQL Query

Kafka Client Library Comparison

Message Prioritization in Kafka