Skip to content
This repository has been archived by the owner on Apr 10, 2024. It is now read-only.

New norm disk format for zap #27

Merged
merged 3 commits into from
Oct 1, 2020
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion build.go
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ import (
"github.com/couchbase/vellum"
)

const Version uint32 = 14
const Version uint32 = 15

const Type string = "zap"

Expand Down
2 changes: 1 addition & 1 deletion merge.go
Original file line number Diff line number Diff line change
Expand Up @@ -487,7 +487,7 @@ func mergeTermFreqNormLocs(fieldsMap map[string]uint16, term []byte, postItr *Po
newRoaring.Add(uint32(hitNewDocNum))

nextFreq := next.Frequency()
nextNorm := uint64(math.Float32bits(float32(next.Norm())))
nextNorm := uint64(next.Norm())

locs := next.Locations()

Expand Down
2 changes: 1 addition & 1 deletion new.go
Original file line number Diff line number Diff line change
Expand Up @@ -456,7 +456,7 @@ func (s *interim) processDocument(docNum uint64,
// now that it's been rolled up into fieldTFs, walk that
for fieldID, tfs := range fieldTFs {
dict := s.Dicts[fieldID]
norm := float32(1.0 / math.Sqrt(float64(fieldLens[fieldID])))
norm := math.Float32frombits(uint32(fieldLens[fieldID]))

for term, tf := range tfs {
pid := dict[term] - 1
Expand Down
5 changes: 2 additions & 3 deletions posting.go
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,7 @@ func under32Bits(x uint64) bool {

const DocNum1HitFinished = math.MaxUint64

var NormBits1Hit = uint64(math.Float32bits(float32(1)))
var NormBits1Hit = uint64(1)

// PostingsList is an in-memory representation of a postings list
type PostingsList struct {
Expand Down Expand Up @@ -478,8 +478,7 @@ func (i *PostingsIterator) nextAtOrAfter(atOrAfter uint64) (segment.Posting, err
if err != nil {
return nil, err
}

rv.norm = math.Float32frombits(uint32(normBits))
rv.norm = float32(normBits)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry this still is incorrect. The normBit's contains the integer length, we cannot just stick it directly into a float32 here, as the upstream scoring code is expect the actual computed norm. This line needs to be:

rv.norm = float32(1.0 / math.Sqrt(float64(normBits)

The normBits has our field length as an integer. We put that into a float64, so that w can use math.Sqrt, compute the inverse, then stick that result in a float32.

I think if we do this, the higher-level scoring code will not need any changes.

Copy link
Contributor Author

@sreekanth-cb sreekanth-cb Aug 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mschoch , i think we need to sync..
One problem with the approach of decoding it here is that - while merging the merger takes the decoded norm from the postings and then it again has to encode it back to the fieldLen before persisting to disk. So if we keep doing this encode/decode within Zap, the precision will compromise for higher order fieldLengths (eg field length >2400).

I have done similar changes in the first commit.. But then we decided that the norm decoding logic will be moved out of zap.

Now the norm decoding logic is in bleve.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code functionally works fine with the corresponding bleve change. .

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What corresponding change in bleve are you referring to?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have already mentioned it above -

@mschoch , Tried to move the norm decoding logic out of zap and added that in bleve here - blevesearch/bleve#1432


if i.includeLocs && hasLocs {
// prepare locations into reused slices, where we assume
Expand Down
2 changes: 1 addition & 1 deletion segment_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -216,7 +216,7 @@ func TestOpen(t *testing.T) {
if nextPosting.Number() != 0 {
t.Errorf("expected doc number 0, got %d", nextPosting.Number())
}
expectedNorm := float32(1.0 / math.Sqrt(float64(5)))
expectedNorm := float32(5)
if nextPosting.Norm() != float64(expectedNorm) {
t.Errorf("expected norm %f, got %f", expectedNorm, nextPosting.Norm())
}
Expand Down