Skip to content

Latest commit

 

History

History

schema-evolution-example

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

schema-evolution-example

An example to demonstrate how schema evolution is handled between a client and a service in Avro IPC (Inter-Process Communication).


Design

Avro specification defines how schemas may evolve in a backward-compatible way. If the change from an old version of the schema to a new version is backward compatible, one may use the new version of the schema to decode the data encoded in the old version, and rules may be applied to upgrade or downgrade the data between versions.

This also applies to Avro IPC, in which requests and responses between a client and a service are encoded in Avro records. As long as the service can decode the request from the client and upgrade it to the service version, and the client can decode the response from the service and downgrade it to the client version, proper communication may happen between them, even if the client is using an older version of the communication protocol.

Data in ZooKeeper

The design of schema evolution in Avro IPC leverages the infrastructure underlying the Play Avro D2 module (an example of which can be found here). To recap, Avro D2 utilizes ZooKeeper as a service registry, which allows clients to dynamically discover (hence the acronym "D2") addresses of a service. In the scenario of schema evolution, ZooKeeper is also utilized as the schema repository, where different versions of a protocol may be stored and be referenced in a concise way (by using the name of the protocol as well as MD5 of the protocol schema).

Request flow

The idea can be captured in this flow of operations:

  1. When the server starts, it registers its service(s) in the service registry in ZooKeeper for potential clients to dynamically discover those service(s). Technically, it stores the URI(s) of its service(s) in /protocols/<protocol>/servers/<id>, where <protocol> is the full name of the protocol for the service, and <id> is a sequential number generated by ZooKeeper. For each service, it also stores the schema of the current version of the protocol in /protocols/<protocol>/versions/<md5>, where <protocol> is the same full name of the protocol, and <md5> is the MD5 checksum of the protocol schema.
  2. Before a client makes a request to a service, it stores its own version of the protocol in /protocols/<protocol>/versions/<md5> in ZooKeeper. This version need not be the same as the version that the service is using.
  3. In the request that the client sends, the MD5 of the client version of the protocol is stored in the handshake request of the message.
  4. When the service receives the request, it reads the MD5 of the client protocol and compares it against the MD5 of its own protocol. If the two do not match, the client version of the protocol is loaded from ZooKeeper (and cached thereafter). The service then decodes the request with the client version of the protocol and converts it into its own version. The conversion will be successful if the two versions follow the rules of schema evolution defined in Avro specification.
  5. The service processes the request and produces a result.
  6. The service sends back a response message to the client. The handshake response in the message contains MD5 of the server-side protocol schema. Notice that, different from the original Avro IPC protocol, the server-side protocol is not included in the handshake response, but only the MD5. This is because the protocol is already stored in ZooKeeper, so there is no need to include it in the response and to increase the payload.
  7. When the client receives the response, it decodes the response using the server-side the protocol. If necessary, the client loads the protocol from ZooKeeper (and caches it thereafter).
Benefit

The benefit of this approach is that the client and the server may continue to use their own versions of the protocol indefinitely, assuming all new versions of the protocol are backward compatible with the client version. There is no need to synchronize the release of the client and the server in order to ensure they communicate properly. Moreover, the protocol itself is never sent in either the request or the response. Instead, the system relies on ZooKeeper as the schema repository to store all verions of the protocol. This reduces payload of the request and response messages.

Manual testing

Run with activator run.

When the application is started, it first creates a ZooKeeper server using a temporary directory as data storage. A server supporting protocol controllers.protocols.EmployeeRegistry is registered at /protocols/controllers.protocols.EmployeeRegistry/servers/0000000000. The server-side version of the protocol is stored at /protocols/controllers.protocols.EmployeeRegistry/versions/E8CCE83C6AA44C6FCB0873CF0621CB16.

Direct requests to the server

The server has an HTTP endpoint that supports requests in the Json format, so one may issue the following command to directly access its service.

Count the current employees.

$ curl -X POST -H "Content-Type: avro/json" http://localhost:9000/current/countEmployees
0

Add 3 employees.

$ curl -X POST -H "Content-Type: avro/json" -d '{"employee": {"firstName": "Thomas", "lastName": "Feng", "gender": "MALE", "dateOfBirth": {"year": 2000, "month": 1, "day": 1}}}' http://localhost:9000/current/addEmployee
1
$ curl -X POST -H "Content-Type: avro/json" -d '{"employee": {"firstName": "Jackson", "lastName": "Wang", "gender": "MALE", "dateOfBirth": {"year": 2001, "month": 5, "day": 15}}}' http://localhost:9000/current/addEmployee
2
$ curl -X POST -H "Content-Type: avro/json" -d '{"employee": {"firstName": "Christine", "lastName": "Lee", "gender": "FEMALE", "dateOfBirth": {"year": 2000, "month": 8, "day": 20}}}' http://localhost:9000/current/addEmployee
3

Count the current employees.

$ curl -X POST -H "Content-Type: avro/json" http://localhost:9000/current/countEmployees
3

Make an employee manager.

$ curl -X POST -H "Content-Type: avro/json" -d '{"managerId": 1, "employeeId": 2}' http://localhost:9000/current/makeManager
null
$ curl -X POST -H "Content-Type: avro/json" -d '{"managerId": 1, "employeeId": 3}' http://localhost:9000/current/makeManager
null

Get all the employees under a manager.

$ curl -X POST -H "Content-Type: avro/json" -d '{"managerId": 1}' http://localhost:9000/current/getEmployees
[{"id":2,"firstName":"Jackson","lastName":"Wang","gender":"MALE","dateOfBirth":{"year":2001,"month":5,"day":15}},{"id":3,"firstName":"Christine","lastName":"Lee","gender":"FEMALE","dateOfBirth":{"year":2000,"month":8,"day":20}}]

Get the manager of an employee.

$ curl -X POST -H "Content-Type: avro/json" -d '{"employeeId": 2}' http://localhost:9000/current/getManager
{"id":1,"firstName":"Thomas","lastName":"Feng","gender":"MALE","dateOfBirth":{"year":2000,"month":1,"day":1}}
Sending requests to the server through a legacy client

The server also has a different HTTP endpoint under /legacy that implements a legacy client communicating with an old version of the protocol. When the user accesses this endpoint, the request is first sent to the customized controller, which then invokes the client to send requests to the server.

Because the protocol that the client has is older than the server's, a schema evolution scenario is encountered. The server must upgrade requests from the client to the new version, and the client must downgrade responses from the server to the old version. The client does not take advantage of any feature added in the new version.

The old version does not have a method to count employees, so counting the current employees would fail.

$ curl -i "http://localhost:9000/legacy/countEmployees"
HTTP/1.1 400 Bad Request
Content-Length: 0
Date: Tue, 22 Mar 2016 10:22:23 GMT

Add 3 employees. The requests must be sent in a customized way to the Play controller, as defined in the routes file.

$ curl "http://localhost:9000/legacy/addEmployee?firstName=Thomas&lastName=Feng"
1
$ curl "http://localhost:9000/legacy/addEmployee?firstName=Jackson&lastName=Wang"
2
$ curl "http://localhost:9000/legacy/addEmployee?firstName=Christine&lastName=Lee"
3

Make an employee manager.

$ curl "http://localhost:9000/legacy/makeManager?managerId=1&employeeId=2"
null
$ curl "http://localhost:9000/legacy/makeManager?managerId=1&employeeId=3"
null

Get all the employees under a manager. Because the old protocol does not define the extra fields gender and dateOfBirth, they are not being returned by the server to the client. Hence, the user does not see those fields in the response.

$ curl "http://localhost:9000/legacy/getEmployees?managerId=1"
[{"id": 2, "firstName": "Jackson", "lastName": "Wang"}, {"id": 3, "firstName": "Christine", "lastName": "Lee"}]

Get the manager of an employee.

$ curl "http://localhost:9000/legacy/getManager?employeeId=2"
{"id": 1, "firstName": "Thomas", "lastName": "Feng"}