Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jackson databind changes for draft v4 schema generator #838

Closed
wants to merge 6 commits into from

Conversation

zoliszel
Copy link

Pull request for TypeNameIdResolver, please let me know if you need anything more.


protected String idFromClass(Class<?> clazz)
{
if(clazz==null){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this actually work? Caller assumes value is not null by calling value.getClass()? Is this method called directly with null from some place?
Alternatively call above could pass null for null instance.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point albeit it is backward compatible :-) Previous impl:

    @Override
    public String idFromValue(Object value)
    {
        Class<?> cls = _typeFactory.constructType(value.getClass()).getRawClass();

Please let me know if you want this changed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, I am just trying to understand what is the intended change -- not saying it breaks anything that wasn't already broken. But I guess the answer is seen below... so never mind.

I am ok without changes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologise if I was not clear enough, let me try to explain.

As you are very well aware when Jackson generates JSON documents out of java objects the generated JSON document itself might not have enough type information to turn that JSON document back to a Java object. In that case an optional Typer can be provided which will enrich the JSON document with additional type information so the deserializer knows which class to instantiate.

Now from a schema generator point of view if the schema generator does not take this additional meta-data into consideration than the JSON document will no longer be valid against the generated schema. To solve this problem the schema generator will need to know exactly how Jackson will generate JSON. To avoid duplication I am allowing the same typer to be provided to the schema mapper as what the original mapper was using. Now when in schema generation mode I do not have access to any instances of the class for which schema is being generated, I purely need to operate on the class itself. Without this change I won't be able to extract the typeId used. The added test case shows how the generator will use this to decide what type of metadata is added and if the typeId known it will also restrict the allowed values to it.

Please let me know if you need further information.

Zoltan

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the need, from perspective that while traversal is based on serialization, in many aspects it really relates more to deserialization because there is no access to instances just types.
I am just trying to see how pieces fit together: looking at code I can see how it relates to JsonSerializer, which is the entrypoint. I didn't think visitors exposed information directly.

@zoliszel zoliszel changed the title Fixes FasterXML/jackson-databind#831 Jackson databind changes for draft v4 schema generator Jun 25, 2015
@zoliszel
Copy link
Author

Hi,

it looks like my follow-up commits are also part of this pull request (github newbie here...). These are all needed for the v4 generator, let me explain what are those:

I) Support for array typed JSON typeIDs
In the draft v4 spec the schema type "any" has been removed in favour of support for multiple types. For example a schema which validates any type of JSON document can be expressed as:

{
  "type": [
    "string",
    "number",
    "integer",
    "boolean",
    "object",
    "array",
    "null"
  ]
}

This is causing issues with the schema parser. The schema parser uses the type property as typeId but the current jackson parser assumes the typeId is a String and can't deal with arrays. To support it I had to change the AsPropertyTypeDeserializer to allow for String Arrays. My change also exposed a bug in the TokenBuffer. Unit tests for both features are attached.
(Please note I did not wanted to introduce an additional typeId field, having done that so would cause interopability issues)

II) Expose SerializerFactory on SerializerProvider
You were right, the SerializerFactory was not exposed to the Visitors - I was only able to get hold of the default typer which is not exactly what I want. I added in an accessor to solve this.

III) Unit test to ensure custom KeySerializer is indeed invoked when seria…
This is not a new feature but more of a regression test. When migrating to 2.5 I have noticed that custom Key serializers are no longer invoked. If you run the attached unit test against 2.4, it will pass, against 2.5 it will fail, but against 2.6.0-rc3 it will pass again. To be frank did not have time to understand the root of the regression but added in this test so it can be captured later on.

Regards,
Zoltan

@cowtowncoder
Copy link
Member

I'm sorry but I don't think I want to change type handling to support arrays of types, just to support JSON Schema. Jackson itself wouldn't have use for union types, as Java has no way of expressing or using such. JSON Schema does define those, which I think is a mistake, but it is what it is.

On SerializerFactory: I understand that some access may be needed, but usually it is actually accessed through SerializerProvider. There are some parts of initialization that (I think) provider handles, but factory does not, so it would be good if SerializerProvider was accessed instead.

I could check in some of the fixes separately; esp. one for TokenBuffer seems useful (wrt getCurrentName()).

@cowtowncoder
Copy link
Member

Looking at KeySerializer test: testing wrt KeySerializer part makes sense.
But it looks like value handling does NOT work; custom serializer is not getting called. I don't know if that is due to @JsonValue (which should not actually mask it) or something else.
I'll see if I can find what gives.

cowtowncoder added a commit that referenced this pull request Jun 25, 2015
@cowtowncoder
Copy link
Member

Turns out your test managed to reproduce #848, which I just fixed. Good catch!
(basically, custom serializer did not override one that'd be built for @JsonValue, or from @JsonDeserialize).
Fix will only be in 2.6.0, since while all unit tests pass I am bit hesitant to make the internal changes in older versions; lookup logic does not change so there's small but non-zero possibility of something breaking.

@zoliszel
Copy link
Author

I) Support for arrays of type
I would counter argue that union types can indeed be expressed with Java and Jackson :-) Consider the following use case:
Given the following mixins

    @JsonTypeName("Number")
    @JsonSubTypes({@JsonSubTypes.Type(Integer.class),
                                  @JsonSubTypes.Type(Double.class)})
    interface NumberType{

    }

    @JsonTypeName("Double")
    interface DoubleType{

    }

    @JsonTypeName("Integer")
    interface IntegerType{

    }

With the following mappings

   ObjectMapper mapper = new ObjectMapper();
   mapper.addMixIn(Number.class,UnionType.class);
   mapper.addMixIn(Double.class,DoubleType.class);
   mapper.addMixIn(Integer.class,IntegerType.class);
  ...

would basically define any Number instance to be either a number type or an integer type.
A v4 generator for the Number class could yield the following JSON schema:

{
  "type": [
    "number",
    "integer"
  ],
  "definitions": {
    "Integer": {
      "type": "integer"
    },
    "Double": {
      "type": "number"
    }
  },
  "anyOf": [
    {
      "$ref": "#/definitions/Double"
    },
    {
      "$ref": "#/definitions/Integer"
    }
  ]
}

(I understand number includes integer hence the above schema can be much simpler, but the point I am trying to make that usage of marker interfaces - topped with JSONSubType - will indeed yield JSON documents where the type of the document - and the corresponding java object - will depend on the actual message)

II) On SerializerFactory
OK, lets step back here for a bit. What I need to do is to get(or create) a typeSerializer for a given class when only a SerializerProvider is available. I did not find any API on the SerializerProvider which can do that for me hence the getter to the SerializerFactory which has the following method:

    /**
     * Method called to create a type information serializer for given base type,
     * if one is needed. If not needed (no polymorphic handling configured), should
     * return null.
     *
     * @param baseType Declared type to use as the base type for type information serializer
     * 
     * @return Type serializer to use for the base type, if one is needed; null if not.
     */
    public abstract TypeSerializer createTypeSerializer(SerializationConfig config,
            JavaType baseType)
        throws JsonMappingException;

Given these requirements what would be your recommendation?

III) Key Serializer
Thank you for looking into this. I am targeting 2.6.0 anyway given these changes needed but let me re-iterate that this is a breaking change between 2.4 and 2.5.

@cowtowncoder
Copy link
Member

I don't think that creates union type, at least to my understanding of union types, although I am actually not sure I fully follow the example (I am guessing you are actually thinking Jackson does more work than it really does here, wrt mix-in annotations).
But starting with what I mean by union type is combination of two unrelated types; unrelated meaning that they do not have common base type above Object.class.
Java does not have a type that corresponds to something like "Either String or array of Strings" .
I am ok with sort-of union types with incomplete set (that is, there is no binding for all possible subtypes).

But maybe that is bit irrelevant... what I am trying to figure out is just the question of what actual use for multiple type ids would be, for Java/JSON databinding, not including types that JSON Schema can express (and for which JSON may exist) but that do not have translation to Java. Like that "String or Array of Strings", which in Java world would just have to be more general "any java.lang.Object".

But as to TypeSerializer access. Interesting. I could have sword SerializerProvider exposed construction directly.
What is exposed, instead, is findTypedValueSerializer() which may construct TypeWrappedSerializer if a TypeSerializer is actually needed. Not pretty but possibly useful.
All other access is by SerializerProvider itself via SerializerFactory.

So.... it may be that more access needs to be given. Interestingly enough SerializerFactory is not even accessible via either provider or SerializationConfig. I think this hiding was intentional, when creation 2.0, since direct access was problematic back then.

Long story short: I am fine with more access to get TypeSerializer instances, if and as necessary.
I do not want breaking changes to signatures there, although additions are acceptable.
I would prefer something like findTypeSerializer(...) in SerializerProvider, over accessor to get SerializerFactory, but perhaps both are to be added just in case. I can add javadocs suggesting that accessor should only be called if caller knows what they are doing ("only use if nothing else works").

@cowtowncoder
Copy link
Member

FWTW, 2.6.0-rc3 is out as well, with the earlier fix I mentioned, and addition of test cases.

I have a feeling that I do not quite understand the part about "typeId as JSON array".
Access to SerializerFactory makes sense in some form; as do minor changes needed to avoid requiring instance. It might make sense to isolate those changes so that I can merge them more easily and incrementally.

@zoliszel
Copy link
Author

Hi,

I) SerializerFactory
Sure, if hiding the instance is a requirement I can make that change.

II) typeId as JSON Array
OK, spent a bit more time on this, here is what I came up with :)

    @JsonTypeName("UnionType")
    @JsonTypeInfo(use = JsonTypeInfo.Id.NAME,include= JsonTypeInfo.As.WRAPPER_OBJECT)
    @JsonSubTypes({@JsonSubTypes.Type(StringType.class),
                    @JsonSubTypes.Type(StringArrayType.class)})
    public static abstract class UnionType {

        @JsonCreator
        public static StringType createStringType(String string){
            return new StringType(string);
        }

        @JsonCreator
        public static StringArrayType createStringArrayType(String[] stringArray){
            return new StringArrayType(stringArray);
        }

    }

    @JsonTypeInfo(use = JsonTypeInfo.Id.NAME,include= JsonTypeInfo.As.WRAPPER_OBJECT)
    public static class StringType extends UnionType {

        @JsonIgnore
        private String myString;

        @JsonCreator
        public StringType(String myString) {
            this.myString = myString;
        }

        @JsonValue
        public String getMyString() {
            return myString;
        }
    }

    @JsonTypeInfo(use = JsonTypeInfo.Id.NAME,include= JsonTypeInfo.As.WRAPPER_OBJECT)
    public static class StringArrayType extends UnionType {
        @JsonCreator
        public StringArrayType(String[] myStringArray) {
            this.myStringArray = myStringArray;
        }

        @JsonIgnore
        private String[] myStringArray;

        @JsonValue
        public String[] myStringArray() {
            return myStringArray;
        }
    }
    @org.junit.Test
    public void testUnionType() throws IOException {
        ObjectMapper mapper = new ObjectMapper();
        UnionType myArray = new StringArrayType(new String[]{"5", "6"});
        UnionType stringType = new StringType("4");
        System.out.println(mapper.writerFor(UnionType.class).writeValueAsString(myArray));
        System.out.println(mapper.writerFor(UnionType.class).writeValueAsString(stringType));

        UnionType array = mapper.readValue("{\"UnionType\" : [\"5\",\"6,\"]}", UnionType.class);
        UnionType string = mapper.readValue("{\"UnionType\" : \"5\"}", UnionType.class);
        System.out.println("Array is instance of StringArrayType " + (array instanceof StringArrayType));
        System.out.println("string is instance of StringType " + (string instanceof StringType));
    }

which yields

{"String;":["5","6"]}
{"Test$StringType":"4"}

Array is instance of StringArrayType true
string is instance of StringType true

Going through this example I seem to have uncovered a bug:
Jackson prints "String;" as type instead of UnionType - what I think is bug

Assuming the following example would print the - what i think - correct result:

{"UnionType":["5","6"]}
{"UnionType : "4"}

than the following schema would apply:

{
  "type": "object",
  "properties":{
      "UnionType" : {
              "type" : ["string","array"]
       }
   }
}

If you take away the encoded type information the case would be even more obvious albeit that would be a one way street - deserialization would fail ...

I do not want to argue for too long :) but this type id as array is a real blocker for me. If for not the above - which we can argue wether needs to be supported or not - the other use case I have - as I think I mentioned before - is the support for "any". The "any" keyword has been replaced by an array of types in v4 and my generator is basically stuck if it encounters Object or an interface - for which no type information is provided -. If supporting array of types by core jackson is not an option what would you recommend in here instead?

Can we do something like custom typeId parser for which I can support my own implementation? (unfortunately my understanding of jackson internals are somewhat limited, so any guidance here would be highly appreciated!)

Hope you understand the need here, once we have an agreement I'll create a new pull request with the changes discussed.

@cowtowncoder
Copy link
Member

Ok, sorry, I think we have mix up here: my earlier comments referred to Type Ids, for which only JSON String values are supported, and not arrays of Strings. Maybe I misread some part of examples, but it seemed like type id handler was expecting actual JSON Array values, and this would be problematic.

As to array types, Jackson supports them although there are some limitations due to the fact that there is no matching explicit Class for array types, and all annotation information must be related to value classes. But in your example this is not a problem as "array-ness" is hidden within POJO, so from Java/Jackson perspective this is not problematic.

Type id of "String;" is indeed peculiar, and since no explicit type name was specific (either in @JsonSubTypes entry, or explicitly with @JsonTypeName), should be "Test$StringArrayType" (i.e. 'simple' class name). Although typically you would probably assign it an explicit type name.

Now: as to how to find all the type names; I think this is not something that should be bolted on API that does translation of individual type id to/from class, but rather an accessor that would give out mapping, if known. Whether that should be accessible via JsonSerializer, TypeSerializer or TypeIdResolver I don't know; it probably depends on what is available.
... actually, looking at BeanPropertyWriter, it DOES have getTypeSerializer(), and TypeSerializer in turn has getTypeIdResolver() which returns TypeIdResolver. Currently TypeIdResolver does NOT expose mapping, but this could be changed. One challenge is that mapping may not be fully known, for example, if class name is used as id.
But assuming that support for just cases where "type name" is used (not class name), and that Jackson representation is expected to fully specify mapping (which is required for actual deserialization), it might be quite possible to expose mapping between classes and matching type names. In case of serialization, duplicates on type names is actually allowed, although schema generation code could further eliminate duplicates if it wants to.

Still: there may be additional problem here. Whereas Jackson (and typical JSON representation) is more interested in how JSON data binds to polymorphic types, JSON Schema seems to focus more on JSON representation. Problem here is that it is possible to have various relationships between Java type and JSON structural type: for example, some date/time types may be serialized as either JSON Strings, JSON (integer) Numbers, or as JSON Arrays; but still be represented by a single Java type.
I am not sure if there is a way to represent that; or whether that is an actual problem. Just thinking out aloud.

@zoliszel
Copy link
Author

"Ok, sorry, I think we have mix up here: my earlier comments referred to Type Ids, for which only JSON String values are supported, and not arrays of Strings. Maybe I misread some part of examples, but it seemed like type id handler was expecting actual JSON Array values, and this would be problematic."

No mix up here, this is indeed about supporting array of String Type IDs. To create the Java representation of JSON schema the implementation is using the "type" property to map between json documents and Java classes:

{"type":"string"} => StringSchema
{"type":"integer"} => IntegerSchema
{"type":"object"} => ObjectSchema

the problem comes when the resulting JSON document can have more than one type, hence the resulting JSON schema needs to have a type of array's:

{"type":["string","integer","object"]} => PolymorphicObjectSchema

To achive the above mapping I am using a custom TypeId resolver(the same as what came with the v3 schema). The problem here is that the Json Parser assumes that the type-id is a simple string, hence my type id resolver will receive "[" to resolve against with the actual content of the "type" property lost. Without changing AsPropertyTypeDeserializer to parse the entire "type" property I can not deserialize my PolymorhipObjectSchema.

One solution could be is to use a different property than "type" for the mapping, but in that case the v4 deserializer will not be able to parse legit schemas created by other applications.

There is a few use case to create json schemas like that:
I) Any => "type" property to contain all available types
II) Support for non - numeric numbers (Infinity,NaN,etc) => "type" would be ["number","string"]:

{
  "$ref": "#/definitions/___NUMBER_WTIH_NON_NUMERIC_VALUES___",
  "definitions": {
    "___NUMBER___": {
      "id": "urn:jsonschema:___NUMBER___",
      "type": "number"
    },
    "___NUMBER_WTIH_NON_NUMERIC_VALUES___": {
      "id": "urn:jsonschema:___NUMBER_WTIH_NON_NUMERIC_VALUES___",
      "type": [
        "string",
        "number"
      ],
      "oneOf": [
        {
          "$ref": "#/definitions/___NUMBER___"
        },
        {
          "$ref": "#/definitions/__ALLOWED_NON_NUMERIC_VALUES___"
        }
      ]
    },
    "__ALLOWED_NON_NUMERIC_VALUES___": {
      "id": "urn:jsonschema:__ALLOWED_NON_NUMERIC_VALUES___",
      "type": "string",
      "enum": [
        "INF",
        "-INF",
        "-Infinity",
        "Infinity",
        "NaN"
      ]
    }
  }
}

III) Java objects with multiple representations (think Joda time classes which can be serialized as strings or as longs)

@zoliszel
Copy link
Author

One more thing...

I have the v4 generator working and tested on our quite large and complex configuration code base (JSON-based). From a contribution point of view this typeId resolution is the last missing piece (together with the aggreed change on SerializerFactory) and once this issue is resolved I can contribute it back ;-)

@cowtowncoder
Copy link
Member

Jackson's type identifier is not intended to enumerate possible types that a property might have, but rather specific type that instance document has. So it sounds like use of type id is not the right way to go about JSON Schema generation: type id included in JSON, passed by type id resolvers, is the specific type that value is created from, and intended to be bound back as. It is not meant to be set of possible types that the property might have.
Or from other perspective: type ids only exist for document instances, and JSON Schema generation deals with type system, not with instances. Access can not and should not be done using this mechanism.

Question then is how to access this information. This should be accessible by resolvers, but not by trying to create virtual instances and asking for resolutions. Direct accessors are acceptable.

@zoliszel
Copy link
Author

Aha! Your comment made me found:

@JsonTypeResolver 

where I can pass in my custom TypeResolverBuilder which in turn can return my custom TypeDeserializer :-). The only problem is that AsPropertyTypeDeserializer defines the following function as final:

    @SuppressWarnings("resource")
    protected final Object _deserializeTypedForId(JsonParser jp, DeserializationContext ctxt, TokenBuffer tb) throws IOException

If you aggree on relaxing that I should have everything I need :) Please let me know.

@cowtowncoder
Copy link
Member

Sure, I'll go ahead and make it non-final right away.

@cowtowncoder
Copy link
Member

Done.

@zoliszel
Copy link
Author

Thank you!

I) Re Exposing SerializerFactory
From the SerializerFactory I am only using this method

   TypeSerializer typeSerializer = null;
        try {
            typeSerializer = this.provider.getSerializerFactory().createTypeSerializer(this.provider.getConfig(), originalType);
        } catch (JsonMappingException e) {
            //Can't get serializer, just return...
            return TypeInfo.NOT_AVAILABLE;
        }

Are you ok to add a method to SerializerProvider which just delegates to the underlying SerializerFactory? If yes, let me know and I create a separate pull request

II) V4 generator
As I mentioned before I have a fairly strong beta candidate for it. I have tested it against our configuration repo, yielding good results so far although formal QA is still ahead(meaning we might uncover some bugs). I think there would be value to share what I have now, would you agree? If yes what form would you like to receive it? I currently have it in a different package in the json-schema module but you mentioned It might requires a new module.

@cowtowncoder
Copy link
Member

@zoliszel Yes, I think ability to do, say, findTypeSerializer(), similar to existing findValueSerializer(), seems reasonable to me.

On v4; I think there are two main approaches that would make sense to me:

  1. Change existing generator to be a multi-Maven-module project, produce 2 jars
  2. Create a new project, keep repos as single-Maven-module projects

I do think that it's better to have separate jars for the two, as I assume that users would typically migrate from one to the other. But also make sure that Java packages used are differently named, to allow use of both concurrently, if necessary.

As the very first step, maybe you could just create a github repo for v4, work on that; and then we can see about merging projects. This way you would have full access from beginning and could release versions, regardless of how busy I am. Once projects are regrouped I will of course give you full access where necessary as well.

@zoliszel zoliszel closed this Jul 1, 2015
@zoliszel zoliszel deleted the master branch July 1, 2015 11:18
@zoliszel zoliszel restored the master branch July 1, 2015 11:19
@zoliszel zoliszel deleted the master branch July 1, 2015 11:21
cowtowncoder added a commit that referenced this pull request Jul 1, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants