forked from skiplang/skip
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path.merge_file_a00600
1127 lines (959 loc) · 36.3 KB
/
.merge_file_a00600
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
/**
* Copyright (c) Facebook, Inc. and its affiliates.
*
* This source code is licensed under the MIT license found in the
* LICENSE file in the root directory of this source tree.
*/
/*
This file use a graph coloring algorithm to lay out vtables.
This general idea is known as as "selector coloring" in the literature.
As in other languages, vtables hold method code pointers for fast method
dispatching. For example, suppose we have a call site that invokes method X
on an object O. Through the inheritance graph we can find all concrete
subclasses that O might be an instance of at runtime. Each of those
subclasses must have a code pointer for method X stored at the same offset
in its vtable, so the call site can do a simple load from the vtable to find it.
Each Skip object has only a single vtable, which must work no matter
which base class it is cast to. This is difficult in the multiple inheritance
case, because those different base classes may each have their own ideas
about how their vtables should be laid out in their subclasses. When a class
inherits from two base classes, those layouts may conflict.
Some languages, like C++, solve this by giving each object instance multiple
vtables, but that consumes memory in each instance. Skip uses a different
approach.
We call values that call sites want us to record in vtables "Requests".
Method pointers are just one kind of Request; we also store flags, "match"
code labels, etc. Vtables should be thought of as a general way to map types
to constant values, not just code pointers.
To solve this problem, we can view vtable slot assignment as a graph coloring
problem: the nodes are Requests, the edges indicate two Requests that
cannot occupy the same slot (because some vtable contains both
Requests), and the assigned colors are the slot indices.
A conventional graph coloring algorithm tries to minimize the total number of
colors used, which in our case would map to minimizing the size of the largest
vtable. That's not quite the problem we want to solve; we would rather minimize
the total size of all vtables. Furthermore, actually materializing such a
graph is a bit bulky because we have so many fully-connected subgraphs
(cliques). For example, a class that has 50 Requests will need at a
fully-connected 50-node graph just to keep those Requests from occupying
the same slot.
Instead of creating an explicit graph we represent and solve the problem more
directly, by literally "building up" a vtable per class and greedily adding
each Request to the same slot in all vtables that need it:
- For each Request in decreasing order of "popularity"
- Examine all vtables that must contain that Request and find the
"cheapest" slot "compatible" in all of them.
- Update all vtables to record they have that Request in that slot.
Some definitions for the above:
- "Popularity" is defined as how many vtables contain each Request.
The intuition behind allocating slots for popular Requests first is that
placing a popular Request to a "high" slot could force a large number of
vtables to grow.
- A VTable slot is "compatible" with a Request if it is either unassigned,
or already contains the Request's value in that slot (because some other
Request wanted the same value).
- A slot is "cheapest" when it consumes the fewest empty slots across all
vtables. If no such slot exists and one or more vtables needs to grow
beyond its theoretical minimum size, that solution is harshly penalized.
TODO: In the future we could investigate further improvements:
We process Requests in order of how many vtables contain them.
- Using a greedy hill climbing postprocessing step that attempts to "swap"
Request slots to reduce total vtable size. The idea is that a vtable's size is
defined as the index of its highest used slot plus one. "Swapping" the slot
used for the last Request in with an earlier "hole" in that vtable will
make that vtable smaller, but may increase or decrease the size of other
vtables. The idea is to keep applying beneficial swaps until no
single swap improves things.
- Using a better heuristic search order, perhaps taking into account
something analogous to saturation degree ordering.
*/
module VTable;
// A bit in the vtable address that indicates the object owning that vtable
// is frozen. This allows an object to be frozen in place by changing its
// vtable pointer. Unfortunately this means that each vtable has to appear
// twice, once at an address with this bit clear and once with it set.
// See vtableLogicalToPhysicalBitOffset() for the gory details.
//
// This value is copied from the runtime and must stay in sync.
const kFrozenMask: Int = 256;
// Each VTableRequest gets a unique ID so that CallMethodBase etc. can
// later find out where it ended up.
value class VTableRequestID uses TypeSafeID<Int> {
const none: VTableRequestID = VTableRequestID(-1);
}
// A type -> Constant mapping, which serves as input to the
// selector coloring algorithm.
//
// This object typically corresponds to a method; The "mapping" field
// provides the code pointer to use for each class that provides that method.
// Ultimately at runtime method dispatching code will load the vtable from
// an object instance then load the code pointer from whatever slot is
// allocated for this VTableRequest.
class VTableRequest(
mapping: Array<(SClassID, Constant)>,
// TODO: Do we really need a name, and should it really affect equality?
name: String,
id: VTableRequestID,
) uses Hashable, Equality {
fun hash(): Int {
(this.mapping, this.name).hash()
}
fun ==(other: VTableRequest): Bool {
this.name == other.name && this.mapping == other.mapping
}
}
// An object that incrementally builds up a vtable by adding entries to it.
// Holes between these entries can be used by later values inserted into
// the vtable.
mutable private class VTableBuilder(
sclass: SClassID,
env: GlobalEnv,
// A possibly sparse map of which slots are occupied so far.
slots: mutable Vector<StaticImageField> = mutable Vector[],
// Theoretical lower bound on the bit size based on distinct values it holds.
mutable bitSizeLowerBound: Int = 0,
) {
mutable fun insert(value: Constant, bitOffset: Int): void {
scalarType = value.typ.getScalarType(this.env);
invariant(bitOffset % scalarType.bitAlignment == 0, "Buggy misalignment");
// Find the index where to insert.
slots = this.slots;
index = slots.size();
while (index > 0 && slots[index - 1].bitOffset >= bitOffset) {
!index = index - 1
};
if (index == slots.size()) {
slots.push(StaticImageField(value, bitOffset))
} else if (slots[index].bitOffset != bitOffset) {
invariant(
bitOffset + scalarType.bitSize <= slots[index].bitOffset,
"Buggy overlap",
);
slots.insert(index, StaticImageField(value, bitOffset))
} else {
invariant(
slots[index].value == value,
"Selector coloring tried to " +
"put two nonequal values in the same vtable slot.",
)
}
}
}
// Advances through a VTableBuilder reporting valid bit offsets where
// a value could be stored.
mutable private class VTableBuilderIterator(
builder: mutable VTableBuilder,
value: Constant,
// The index of the first field that this might overlap with.
mutable index: Int = 0,
) {
// Advance to the first legal offset at or after 'begin'.
// Assumes "begin" is monotonically nondecreasing across calls.
//
// Returns:
// - the next legal bit offset >= begin
// - "cost" of the next choice (higher is worse
// - a flag indicating whether we had to grow the vtable to make room
mutable fun next(begin: Int, align: Int): (Int, Int, Bool) {
env = this.builder.env;
end = begin + this.value.typ.getScalarType(env).bitSize;
if (this.index >= this.builder.slots.size()) {
// This is after all existing slots, so no overlap is possible.
(begin, 0x10000, end > this.builder.bitSizeLowerBound)
} else {
// See if this overlaps with some existing slot, either compatibly
// or incompatibly.
slot = this.builder.slots[this.index];
slotBegin = slot.bitOffset;
slotEnd = slotBegin + slot.value.typ.getScalarType(env).bitSize;
if (begin == slotBegin && this.value == slot.value) {
// Excellent, identical value already exists, zero-cost!
(begin, 0, false)
} else if (end <= slotBegin) {
// We found some dead space in between existing slots. Cheap.
(begin, 0x10000, false)
} else {
newBegin = if (begin < slotEnd) {
// At least partial overlap, skip over existing slot.
roundUp(slotEnd, align);
} else {
// Advance to the next index that might overlap and check again.
this.!index = this.index + 1;
begin
};
this.next(newBegin, align)
}
}
}
}
// Generate one initial empty VTableBuilder per type mentioned by any
// VTableRequest. This is a bit convoluted, returning a Vector rather
// than a Map, because we cannot currently have a mutable value in
// a Map.
private fun createVTableBuilders(
requests: Array<VTableRequest>,
classesNeedingVTables: UnorderedSet<SClassID>,
env: GlobalEnv,
): mutable Map<SClassID, mutable VTableBuilder> {
// Create our return array. Unused slots have a dummy entry identifiable
// by the SClassID::none.
builders = mutable Map[];
// Track what must go into each VTable so we can lower-bound its size.
uniqueEntriesPerVTable = mutable UnorderedMap[];
// Intern table for SkipGcType structs.
gcTypes = mutable UnorderedSet[];
for (sc in classesNeedingVTables) {
_ = builders.getOrAdd(sc, () -> {
builder = mutable VTableBuilder(sc, env);
// Sanity check what we are being asked to create a VTable for.
sclass = env.getSClass(sc);
if (!sclass.kind.isKClass()) {
sclass.die(
"Trying to create VTable for class " +
`${sclass}, but it is not a concrete reference type.`,
)
};
// Seed the VTable with its SkipGcType, which all vtables must have.
// Note that it goes in the second possible pointer slot, not the first.
tentative = createGCType(sclass, env);
gct = gcTypes.maybeGet(tentative) match {
| Some(old) -> old
| None() ->
gcTypes.add(tentative);
tentative
};
builder.insert(gct, ptrBitSize);
// Lambdas and memoize thunks are callable from the runtime
// so they need to have their code pointer at a predictable location,
// vtable offset zero.
if (sclass.isLambda || sclass.isMemoizeThunk) {
// Only do this if we actually decided to compile a code pointer at all.
key = MethodKey("call", MethodSuperpositionID::empty);
sclass.methods.maybeGet(key) match {
| Some(value) if (env.sfuns.contains(value)) ->
code = ConstantFun{id => InstrID::none, typ => tNonGCPointer, value};
builder.insert(code, 0)
| _ -> void
}
};
// Create a Set tracking this builder's unique entries.
unique = mutable UnorderedSet[];
for (vs in builder.slots) unique.insert(vs.value);
uniqueEntriesPerVTable.set(sc, unique);
builder
})
};
// Compute every unique value that goes into each builder's vtable.
for (request in requests) {
for (m in request.mapping) {
(sc, value) = m;
_ = uniqueEntriesPerVTable[sc].maybeInsert(value)
}
};
// Sum the sizes of each vtable's contents as a rough lower bound on
// how small it could possibly be.
for (builder in builders) {
if (builder.sclass != SClassID::none) {
bound = 0;
for (value in uniqueEntriesPerVTable[builder.sclass]) {
!bound = bound + value.typ.getScalarType(env).bitSize
};
builder.!bitSizeLowerBound = roundUp(bound, 64);
}
};
builders
}
// Returns the best bit offset to use for 'entry', taking into account what
// slots in 'vtables' are already in use.
private fun allocateBestSlot(
request: VTableRequest,
builders: mutable Map<SClassID, mutable VTableBuilder>,
env: GlobalEnv,
): Int {
invariant(!request.mapping.isEmpty(), "Empty vtable request");
// Create iterators to walk through all the vtables and find the best
// offset for this method.
iterators = mutable Vector[];
for (m in request.mapping) {
(sc, val) = m;
iterators.push(mutable VTableBuilderIterator(builders[sc], val))
};
align = request.mapping[0].i1.typ.getScalarType(env).bitAlignment;
if (targetIsWasm() || isEmbedded32()) {
!align = 2 * align;
};
bestBitOffset = -1;
bestCost = Int::max;
// TODO: Technically vtable offsets should start at -128 bytes, since
// that can be encoded as an 8-bit offset in x86, and that would maximize
// the number of 8-bit offsets we use, which would reduce machine code size.
bitOffset = 0;
// Should we keep looking even after we have a legal solution?
keepSearching = true;
while (bestBitOffset < 0 || keepSearching) {
first = true;
cost = 0;
foundImprovedSolution = iterators.all(it -> {
(newBitOffset, thisCost, hadToGrow) = it.next(bitOffset, align);
!cost = cost + thisCost;
if (hadToGrow) {
// This solution expands at least one VTable, which we really don't
// want to do, so avoid expanding it further once we have any legal
// solution.
!keepSearching = false
};
if (newBitOffset == bitOffset || first) {
// This offset is consistent with all previous iterators.
if (cost < bestCost) {
// Still looking good.
!first = false;
!bitOffset = newBitOffset;
true
} else {
// This slot was not good enough, try the next one.
!bitOffset = newBitOffset + align;
false
}
} else {
// This VTable requires a different bit offset than some earlier
// vtable, so we'll start over checking the earlier vtables starting
// at this bit offset to see if they are OK with it too.
!bitOffset = newBitOffset;
false
}
});
if (foundImprovedSolution) {
// This bit offset was legal for every VTable.
!bestBitOffset = bitOffset;
!bestCost = cost;
if (cost == 0) {
// Stop searching if we happen to get a perfect answer (one that
// entirely overlaps existing entries).
!keepSearching = false
};
// Try the next possible iterator.
!bitOffset = bitOffset + align
}
};
// Update the builders to note the newly-added value in each one.
for (it in iterators) it.builder.insert(it.value, bestBitOffset);
bestBitOffset
}
// This function maps logical vtable bit offsets to physical ones.
//
// Selector coloring allocates logical vtable offsets from a simple
// linear space. But the physical bit offsets for those are actually
// permuted, for two reasons:
//
// 1) On x86, signed 8-bit offsets assemble to smaller machine code, so we want
// to use both the 128 bytes before AND after the vtable pointer to reduce
// code size. So although logical byte offsets [0, 128) map to themselves,
// logical byte offsets [128, 256) are actually stored at [-128, 0), where
// they can still be reached via a signed 8-bit displacement. To keep
// the vtable layout a bit denser we store this range "backwards", so byte
// offset 128 is physical -8, 128+8 is -16, etc. Reversing is OK since no
// vtable entry can span an 8-byte boundary (if they ever do, we'll need
// to adjust this mapping).
//
// This trick is only applied to the first 256 logical bytes. After that,
// increasing logical offsets correspond to increasing physical offsets.
//
// 2) The runtime needs somewhere to stash a bit indicating whether an
// object instance is frozen or mutable, and it chooses to store that in
// each object's vtable pointer since it doesn't have anywhere else.
//
// What this means is that each vtable entry appears twice, once at
// an address with the "frozen" bit set to zero, and again at an
// address with that bit set to one. Each vtable is chunked into pairs
// of adjacent, identical 256-byte blocks (so no matter how you set
// that frozen bit, you'll load the same value).
//
// This "mirroring" means that logical offset 256 bytes cannot actually
// be physical 256, since that's the start of the first mirror.
// Instead it skips over that mirror and becomes 512. Similarly, 512 maps
// to 1024, to skip over the second mirror, etc.
//
// Note that vtables have no guaranteed alignment (beyond 8 bytes),
// but the object's mutable vtable pointer must of course start at an
// address where the "frozen" bit (kFrozenMask) is clear. Not every
// slot in the "mutable" vtable mirror need have kFrozenMask clear, just
// logical offset 0, the one the object instance points to.
//
// Note that we can and do pack small vtables together rather than
// allocating a full 512 bytes for each one.
//
// Putting it all together, the first logical 32 8-byte words A-Za-f are
// laid out and mirrored in this physical order, with the vtable pointing
// 128 bytes into the allocated vtable storage:
//
// fedcbaZY XWVUTSRQ ABCDEFGH IJKLMNOP fedcbaZY XWVUTSRQ ABCDEFGH IJKLMNOP
// ^
// |
// vtable
fun vtableLogicalToPhysicalBitOffset(bo: Int): Int {
if (bo < 128 * 8) {
bo
} else if (bo < 256 * 8) {
128 * 8 - 64 - bo.and(-64) + bo.and(63);
} else {
mask = (kFrozenMask * 8) - 1;
(bo.and(mask.not()) * 2 + bo.and(mask)) - 128 * 8
}
}
// Vtable bit offsets are allowed to go as low as this.
// See vtableLogicalToPhysicalBitOffset for an explanation.
const kMinVTableBitOffset: Int = -128 * 8;
// Create fields for a struct containing the SkipGcType for the given SClass.
private fun createGCType(sc: SClass, env: GlobalEnv): ConstantStruct {
(
userByteSize,
numMaskWords,
tilesPerMask,
uninternedMetaSize,
internedMetaSize,
kind,
isAllDeepFrozen,
) = sc.arraySlot match {
| None() ->
// This is a non-Vector type.
layout = sc.getLayout();
userSize = if (layout.isEmpty()) {
0
} else {
// Figure out the object size.
last = layout[layout.size() - 1];
roundUp(last.bitOffset + last.typ.bitSize, 64) / 8
};
// We need this many words for our bit mask.
words = if (layout.any(x -> x.typ.isGCPointer())) {
(userSize / ptrByteSize + (ptrBitSize - 1)) / ptrBitSize
} else {
0
};
isAllDeepFrozen = sc.fields.all(x -> x.typ.isDeepFrozen());
(gcKindClass, internedMetaSize_) = if (sc.isMemoizeThunk) {
(
2, // kSkipGcKindInvocation
56,
) // FIXME: Bogus to hardcode this, see T21762773
} else {
(
0, // kSkipGcKindClass
3 * ptrByteSize,
) // sizeof(IObjMetadata)
};
(
userSize, // userByteSize
words, // numMaskWords
1, // tilesPerMask
ptrByteSize, // sizeof(RObjMetadata)
internedMetaSize_,
gcKindClass,
isAllDeepFrozen,
) // kSkipGcRefsHintFrozenRefs
| Some(slotInfo) ->
// This is a Vector type.
// Count how many pointers are in each array slot.
// They go at the beginning of each slot (in the case of a tuple etc.)
numPointers = slotInfo.types.foldl(
(acc, t) ->
if (t.getScalarType(env).isGCPointer()) {
acc + 1
} else {
acc
},
0,
);
userSize = slotInfo.bitSize / 8;
words = if (numPointers == 0) {
0
} else {
(userSize / ptrByteSize + (ptrBitSize - 1)) / ptrBitSize
};
tiles = if (words == 0) {
1
} else {
max(1, 64 / (userSize / ptrByteSize))
};
// Set the GC bits. All pointers are at the beginning of each slot,
// but we tile it out as many times as will fit in a 64-bit word
// so the inner loop of the GC can be tighter.
isAllDeepFrozen = slotInfo.types.all(x -> x.isDeepFrozen());
(
userSize, // userByteSize
words, // numMaskWords
tiles, // tilesPerMask
2 * ptrByteSize, // uninternedMetaSize
3 * ptrByteSize, // internedMetaSize
1, // kSkipGcKindArray
isAllDeepFrozen,
) // kSkipGcRefsHintFrozenRefs
};
// Create the string declaring a SkipGcType.
!numMaskWords =
numMaskWords * 2 -
(if (isAllDeepFrozen && (numMaskWords > 0)) 1 else 0);
refsHint = 0;
if (numMaskWords != 0) {
!refsHint = refsHint.or(1) // kSkipGcRefsHintMixedRefs
};
if (isAllDeepFrozen) {
!refsHint = refsHint.or(2) // kSkipGcRefsHintAllFrozenRefs
};
if (!sc.canAliasMutableTypes) {
!refsHint = refsHint.or(4) // kSkipGcRefsHintNoDuplicateMutables
};
// Inserting names into GC info prevents SkipGcTypes from being shared
// between different classes, which prevents their vtables from being
// shared, which can bloat the binary. So we only include detailed
// information in release mode.
provideName = !kConfig.release;
fields = mutable Vector[];
fieldBitOffset = 0;
pushField = (f: Constant) -> {
fields.push(StaticImageField(f, fieldBitOffset));
!fieldBitOffset = fieldBitOffset + f.typ.getScalarType(env).bitSize
};
// m_refsHint
pushField(kByteConstants[refsHint]);
// m_kind
pushField(kByteConstants[kind]);
// m_tilesPerMask
pushField(kByteConstants[tilesPerMask]);
// m_hasName
pushField(kByteConstants[if (provideName) 1 else 0]);
// m_uninternedMetadataByteSize
pushField(
ConstantInt{
id => InstrID::none,
typ => tInt16,
value => uninternedMetaSize,
},
);
// m_internedMetadataByteSize
pushField(
ConstantInt{id => InstrID::none, typ => tInt16, value => internedMetaSize},
);
// m_userByteSize
pushField(
ConstantInt{id => InstrID::none, typ => tInt, value => userByteSize},
);
// m_onStateChange
pushField(
if (sc.isMemoizeThunk) {
ConstantFun{
id => InstrID::none,
typ => tNonGCPointer,
value => env.runtimeFunctions["Runtime.invocationOnStateChange"],
}
} else {
ConstantPointer{id => InstrID::none, typ => tNonGCPointer, value => 0}
},
);
if (ptrBitSize == 32) {
// Emit alignment padding before the mask words.
// TODO: Maybe use 32-bit mask words in this case.
pushField(ConstantInt{id => InstrID::none, typ => tInt32, value => 0})
};
// m_refMask
if (numMaskWords != 0) {
// Create a memory image of the flag words array.
flags = Array::mfill(numMaskWords, 0);
setBit = (bitOffset, isMutable) -> {
bitIndex = bitOffset / ptrBitSize;
wordIndex = 2 * (bitIndex / 64);
bit = 1.shl(bitIndex % 64);
flags.set(wordIndex, flags[wordIndex].or(bit));
if (isMutable) {
flags.set(wordIndex + 1, flags[wordIndex + 1].or(bit));
};
};
sc.arraySlot match {
| None() ->
for (x in sc.fields) {
if (x.typ.getScalarType(env).isGCPointer()) {
bitOffset = sc.getLayoutSlot(x.name, sc.pos).bitOffset;
setBit(bitOffset, !isAllDeepFrozen && !x.typ.isDeepFrozen())
}
}
| Some(slotInfo) ->
// Set the GC bits. All pointers are at the beginning of each slot,
// but we tile it out as many times as will fit in a 64-bit word
// so the inner loop of the GC can be tighter.
for (i in Range(0, tilesPerMask)) {
slotInfo.types.eachWithIndex((idx, x) -> {
scalarType = x.getScalarType(env);
if (scalarType.isGCPointer()) {
bitOffset = slotInfo.bitOffsets[idx];
setBit(
(slotInfo.bitSize * i) + bitOffset,
!isAllDeepFrozen && !x.isDeepFrozen(),
)
}
});
}
};
for (f in flags) {
pushField(ConstantInt{id => InstrID::none, typ => tInt, value => f})
};
};
if (provideName) {
// Append a 0-terminated name to the end of the struct.
name = UTF8String::make(
if (kConfig.mainConfig.useSpecializedNames) {
sc.toString()
} else if (sc.superposition == ClassSuperpositionID::empty) {
sc.gclassName.id
} else {
// Indicate that there were some Tparams, but we collapsed them.
sc.gclassName.id + "<...>"
},
);
for (c in name.utf8) pushField(kByteConstants[c.toInt()]);
pushField(kByteConstants[0]);
};
fv = fields.toArray();
ConstantStruct{
id => InstrID::none,
typ => tNonGCPointer,
values => fv,
cachedHash => ("ConstantStruct", fv).hash(),
byteAlignment => 8,
}
}
// For layout purposes we organize VTables into "slots" of this many bits
// each. Each slot may contain multiple values, if they fit.
//
// NOTE: This is unrelated to ptrBitSize, we always use 64-bit slots.
const kLog2VTableSlotBitSize: Int = 6;
const kVTableSlotBitSize: Int = 1.shl(kLog2VTableSlotBitSize);
const kVTableSlotByteSize: Int = kVTableSlotBitSize.shr(3);
// Populate an array of 32-bit masks, one per 256-byte vtable block
// (skipping mirrors) indicating which vtable offsets this vtable
// touches.
private fun computeVTableMask(slots: Array<StaticImageField>): Array<Int> {
buf = mutable Vector[];
for (slot in slots) {
bitOffset = vtableLogicalToPhysicalBitOffset(slot.bitOffset);
// Which 64-bit object word contains this slot? (it can only be one).
// Offset to start from 0.
mirroringWordIndex = (bitOffset - kMinVTableBitOffset).shr(
kLog2VTableSlotBitSize,
);
// Splice out the mirrored copies, to make the numbering system dense
// for purposes of this search.
bpb = kFrozenMask / 8;
high = mirroringWordIndex.shr(1).and(-bpb);
low = mirroringWordIndex.and(bpb - 1);
wordIndex = high.or(low);
// We pack in 32 bits per array entry.
bufIndex = wordIndex.shr(5);
if (bufIndex >= buf.size()) {
buf.resize(bufIndex + 1, 0)
};
// Set a bit indicating this word is in use by this vtable.
inUseMask = 1.shl(wordIndex.and(31));
buf.set(bufIndex, buf[bufIndex].or(inUseMask));
};
buf.toArray()
}
private fun isPlacementAllowed(
vtableMask: Array<Int>,
megaMask: mutable Vector<Int>,
shift: Int,
vi: Int,
mi: Int,
): Bool {
if (vi >= vtableMask.size()) {
// We have run out of room for conflicts, so it works.
true
} else {
m = megaMask[mi];
v = vtableMask[vi];
if (m.and(v.shl(shift)) != 0) {
// Conflict with existing megaVTable entry, so can't go here.
false
} else {
// So far so good, see if the next word is also compatible.
isPlacementAllowed(vtableMask, megaMask, shift, vi + 1, mi + 1)
}
}
}
// For debugging: convert an int to a binary string, but with LSB first.
private fun binaryToString(n: Int, numBits: Int = 64): String {
ret = "";
for (_ in Range(0, numBits)) {
!ret =
ret +
if (n.and(1) == 0) {
"0"
} else {
"1"
};
!n = n.ushr(1)
};
ret
}
// For debugging: checks that megaMask is consistent with the current
// contents of megaVTable. This is slow.
private fun checkMegamask(
megaMask: mutable Vector<Int>,
megaVTable: mutable Vector<StaticImageField>,
): void {
newMask = mutable Vector[];
for (slot in megaVTable) {
bo = slot.bitOffset / 8;
if (bo.and(kFrozenMask) == 0) {
o = (bo % kFrozenMask) + ((bo / (kFrozenMask * 2)) * kFrozenMask);
word = o / 8;
maskIndex = word / 32;
if (maskIndex >= newMask.size()) {
newMask.resize(maskIndex + 1, 0xFFFF00000000FFFF);
};
newMask.set(maskIndex, newMask[maskIndex].or(1.shl((word % 32) + 16)))
}
};
v1 = megaMask.toArray();
v2 = newMask.toArray();
if (v1 != v2) {
txt = n -> {
// Discard the high and low 16 guard bits.
bits = n.ushr(16).and(0xFFFFFFFF);
binaryToString(bits, 32);
};
invariant_violation(
"Masks differ:\n" +
"computed: " +
v1.map(txt) +
"\n" +
"scratch: " +
v2.map(txt),
)
}
}
// Places one VTable into the mega vtable and returns its BYTE offset.
private fun storeOneVTableInMegaVTable(
slots: Array<StaticImageField>,
megaMask: mutable Vector<Int>,
megaVTable: mutable Vector<StaticImageField>,
): Int {
debugMode = false;
if (debugMode) {
checkMegamask(megaMask, megaVTable)
};
baseMask = 0xFFFF00000000FFFF;
vtableMask = computeVTableMask(slots);
// This is the slot number (i.e. array entry, treating megaVTable as
// an array of pointers) where logical & physical vtable entry 0 should
// end up. Note that entries may precede it, as negative physical offsets
// are possible (see vtableLogicalToPhysicalBitOffset).
slotNumber = Int::min;
// Scan the mega vtable being built up, looking for holes where we can
// insert this vtable. To avoid O(n^2) behavior we limit ourselves to
// looking at the most recent 100 blocks.
megaBlockIndex = max(-1, megaMask.size() - 100);
while (slotNumber == Int::min) {
!megaBlockIndex = megaBlockIndex + 1;
// Make sure we have enough mask entries for our search.
minSize = megaBlockIndex + vtableMask.size();
if (minSize > megaMask.size()) {
megaMask.resize(minSize, baseMask)
};
if (megaMask[megaBlockIndex] != -1) {
// See if we can place the vtable starting anywhere in megaBlockIndex.
shift = 0;
while ({
ok = isPlacementAllowed(vtableMask, megaMask, shift, 0, megaBlockIndex);
if (ok) {
// Got the answer!
!slotNumber = megaBlockIndex * (32 * 2) + shift;
false
} else {
!shift = shift + 1;
shift < 32
}
}) void;
}
};
// OR in the new vtable slots we just reserved.
vtableMask.eachWithIndex((i, mask) -> {
old = megaMask[megaBlockIndex + i];
new = mask.shl(slotNumber.and(31));
invariant(old.and(new) == 0, "Unexpected overlap");
megaMask.set(megaBlockIndex + i, old.or(new))
});
// Record the new mega vtable entries we just created, including mirror.
for (slot in slots) {
bitOffset = (
vtableLogicalToPhysicalBitOffset(slot.bitOffset) +
slotNumber * kVTableSlotBitSize
);
// Mutable class entry.
megaVTable.push(slot with {bitOffset});
// Frozen class "mirror" entry, kFrozenMask bytes later.
megaVTable.push(slot with {bitOffset => bitOffset + kFrozenMask * 8});
};
if (debugMode) {
checkMegamask(megaMask, megaVTable)
};
slotNumber * kVTableSlotByteSize
}
// Final result describing how vtables got laid out.
class VTableInfo(
// Maps each VTableRequestID to the bit offset where that entry ended
// up the vtable for each class in that VTableRequest.
vtableBitOffsets: Array<Int>,
// The single array containing all vtables interleaved & overlapped.
megaVTable: Array<StaticImageField>,
// Byte offsets where the VTable for each KClass ended up in the mega vtable.
// This points to the mutable variant, the frozen variant is kFrozenMask
// bytes later.
classByteOffsets: UnorderedMap<SClassID, Int>,
)
// Run an internal consistency check on a VTableInfo.
private fun verifyVTableInfo(
vti: VTableInfo,
requests: Array<VTableRequest>,
env: GlobalEnv,
): void {
// Map bit offsets to what is at that offset.
o2v = UnorderedMap::mcreate(vti.megaVTable.size());
for (vs in vti.megaVTable) {
o2v.add(vs.bitOffset, vs)
};
prevEnd = Int::min;
for (vs in vti.megaVTable) {
// Make sure entries are sorted and do not overlap.
bo = vs.bitOffset;
invariant(bo >= prevEnd, "vtable fields overlap");
scalarType = vs.value.typ.getScalarType(env);
invariant(bo % scalarType.bitAlignment == 0, "Misaligned");
!prevEnd = bo + scalarType.bitSize;
// Make sure each entry is mirrored.
invariant(o2v[bo] == vs, "Hash table mishap");
invariant(
o2v[bo.xor(kFrozenMask * 8)].value == vs.value,
"Value not mirrored in vtable",
);
};
// Make sure every request does in fact map to the value it wanted
// in the mega vtable.
for (r in requests) {
bo = vti.vtableBitOffsets[r.id.id];
for (m in r.mapping) {
(sc, value) = m;
invariant(
o2v[vti.classByteOffsets[sc] * 8 + bo].value == value,
"Class did not get requested vtable value.",
)
}
};
// Make sure every vtable starts at an address with the kFrozenMask bit clear.
// TODO: Catastrophically stupid that we need to clone a mutable version of
// this table to iterate through it.
for (byteOffset in vti.classByteOffsets.clone().values()) {
invariant(