Skip to content
This repository has been archived by the owner on Jan 20, 2025. It is now read-only.

Add support for com.google.common.collect.Table #73

Closed
wants to merge 1 commit into from
Closed

Add support for com.google.common.collect.Table #73

wants to merge 1 commit into from

Conversation

michaelhixson
Copy link
Contributor

This solves #11

The serialized form of tables mirrors the structure of table.rowMap().
For example, a table containing these values:

     a  b  c
  x  1  2  3
  y  4  5  6

should have a JSON form of:

{
  "x": { "a": 1, "b": 2, "c": 3 },
  "y": { "a": 4, "b": 5, "c": 6 }
}

Any table can be serialized.  Only four table types can be
deserialized:

 1. Table - uses an ImmutableTable
 2. ImmutableTable
 3. HashBasedTable
 4. ArrayTable

It is not necessarily the case that the serialized form of one table
type can be deserialized into another table type.  ArrayTable may
contain null values, which the other table types do not support.  (And
the null values cannot be omitted from a serialized ArrayTable if we
expect the deserialized ArrayTable to be equal.  The universe of row
and column keys in an ArrayTable is fixed.  If null values were
omitted during serialization, the knowledge of which row and column
keys are supposed to exist might be lost.)

All serialization and deserialization is accomplished through
StdDelegatingSerializer and StdDelegatingDeserializer.  Tables are
converted to and from Map instances before they are converted to or
from JSON:

  Table -> Map -> JSON -> Map -> Table

A more direct, non-delegating implentation might have been more
efficient, but this approach was simpler.

Deserialization uses LinkedHashMap for the Map implementation to
preserve cell ordering, in case the Table implementation also makes
guarantees about ordering.  (Only HashBasedTable does not.)
@michaelhixson
Copy link
Contributor Author

@cowtowncoder CLA sent

@stevenschlansker
Copy link
Contributor

This is a lovely PR overall, my only complaint is that especially for "data dense" structures like a Table reading the entire structure first into a Map<R, Map<>> and then converting it may cause an excessive amount of GC pressure, or failure since you need double the actual size of the Table free to actually deserialize it.

I wouldn't reject this PR on those grounds, just pointing it out, and maybe this is a good v1 and if someone finds it too inefficient it can be fixed up later.

@michaelhixson
Copy link
Contributor Author

Thanks, @stevenschlansker.

I agree that these Table deserializers are not ideal. Basically, I started off saying "I'm going to copy the Multimap deserializers", realized that I didn't understand a fair amount of the code there, and retreated to this less efficient delegating approach.

I think I still need to learn more about Jackson internals before I'd be comfortable authoring the direct approach myself. Where I am now, I could try, but I'd be too embarrassed to vouch for that code in a PR with my name on it.

@cowtowncoder
Copy link
Member

@michaelhixson I wonder if I somehow misplaced the CLA, since I don't see it via info at fasterxml dot com. Apologies if so. Could you re-send it? I am planning to merge this patch, I agree with Steven's assessment.

@michaelhixson
Copy link
Contributor Author

@cowtowncoder I just sent it again. I sent it to clas at that domain the first time, but I also included info this time.

@cowtowncoder
Copy link
Member

I received CLA, so that's all good.

But going through open pull requests, I realized that I had somehow missed #62 which is overlapping with this one. That is not necessarily problematic per se, but I would like to see if there are missing aspects. And one thing I think the other PR has is handling of polymorphic type information, which I think is missing (or partially implemented) in this PR.

Would it be possible to merge some of that in here? I can help with changes as necessary; but I recall this often being somewhat of an important area: first versions only handling simple types like Strings, and then bug reports being filed for POJOs and so on.

@michaelhixson
Copy link
Contributor Author

If the other PR is not using a delegating approach, maybe it would be better to accept that one?

I think mine respects type info in deserialization, but it's losing a bit in serialization. The fix I would make in mine would be in TableToMapConverter. Right now there is this:

@Override
public JavaType getOutputType(TypeFactory typeFactory)
{
  return typeFactory.constructRawMapType(Map.class);
}

If I replace that with the following code, I can verify that custom key serializers are applied (like the ComplexKey and module in the test code of the other PR):

@Override
public JavaType getOutputType(TypeFactory typeFactory)
{
  JavaType rowKeyType = tableType.containedTypeOrUnknown(0);
  JavaType columnKeyType = tableType.containedTypeOrUnknown(1);
  JavaType valueType = tableType.containedTypeOrUnknown(2);
  return typeFactory.constructMapType(
      Map.class,
      rowKeyType,
      typeFactory.constructMapType(
          Map.class,
          columnKeyType,
          valueType));
}

Let me know if I should go ahead with fixes. If you'd prefer the non-delegating approach, it would probably be better to close this and follow up on the other PR.

@cowtowncoder
Copy link
Member

I do not have strong preference, not having dug deep into code.

As to type handling: tableType does not necessarily contain type parameters (at least it will not in case of sub-classes), so it needs to actually call method "findTypeParameters(...)" in TypeFactory.
This needs to be changed in the other PR as well, but I like its idea of resolving type in TypeModifier since that simplifies handling slightly.

@stevenschlansker Given these 2 alternate PRs, what do you think?

@cowtowncoder cowtowncoder mentioned this pull request Sep 14, 2015
@cowtowncoder
Copy link
Member

Ok. I started work on this, first adding bit of type resolution support, and then merging serializer support from the "other" PR, as serialization is relatively simple that way.
I'll have to think of deser side, since both approaches have their benefit, so more to follow.
Once again thank you for submitting this PR, and apologies for slow & complicated merging here. :)

@perceptron8
Copy link

Hi! I know you know, but... doubles do not work well with equality checks. Tests are somewhat fragile now.

@cowtowncoder
Copy link
Member

@perceptron8 in general, no, but floating point numbers that do have exact representation (which is anything that can be expressed by 1/2^n and optional integral part... meaning numbers like 1.25, 2.0, -4.025 etc) should behave reliably. Especially since JDK specifies exact rules to/from-string conversions should work. So it would seem to me that tests just use such fp numbers (I didn't write tests, just merge).

Do you have specific case of failure on some platform, or is this just based on general principle of avoiding fp equality checks (which is definitely useful for arbitrary fp numbers)?

@perceptron8
Copy link

It's based on general principle. Not everyone must know about iee754, one day someone could break test without even noticing (for instance by introducing constant that can't be accurately represented) - and that's what I'm afraid. It won't be obvious at first glance.

Using something more "unambiguous" can prevent this sort of bugs easily. How about Table<String, Integer, Boolean>?

@cowtowncoder
Copy link
Member

@perceptron8 We are always open to PRs. I do not know why exactly original author set the test up that way, and have no attachment to test in that regard.

@perceptron8
Copy link

@cowtowncoder Could you tell me, how can I make a PR now? Do I need to go deeper? Should I branch author's branch and send him PR or wait until this is merged into FasterXML master and branch then?

@michaelhixson
Copy link
Contributor Author

@perceptron8 If you're talking about my tests, then yeah Boolean instead of Double would be fine. The tests were just about preserving type information (or not in case of raw types).

@perceptron8
Copy link

@michaelhixson Great! @cowtowncoder Never mind... :)
Thank you, guys!

@cowtowncoder
Copy link
Member

@perceptron8 Yeah no prob; typically you'd fork separately as this patch has been merged. And @michaelhixson pointed out correctly that there is a reason for choosing Double, as it is one of so-called "natural" types. Boolean, Long or String would also work; these are only such types.

@cowtowncoder
Copy link
Member

Also: if you don't want full PR I'd be fine with cut'n paste added to this pull request: I can make the changes. I'm just so busy that typically I do ask for help in patching things :)

@lrlinde
Copy link

lrlinde commented Aug 3, 2016

What happened to this PR? Is it going to be merged?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants