Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Support RankNet in LTR #51

Closed
smacrakis opened this issue Oct 22, 2024 · 3 comments
Closed

[FEATURE] Support RankNet in LTR #51

smacrakis opened this issue Oct 22, 2024 · 3 comments
Labels
enhancement New feature or request

Comments

@smacrakis
Copy link

Is your feature request related to a problem?

User currently using RankNet for LTR in Solr.

What solution would you like?

Support RankNet in LTR.

What alternatives have you considered?

Do you have any additional context?

@sstults
Copy link
Collaborator

sstults commented Oct 24, 2024

RankNet appears to be supported by the underlying library, so this might be a matter of documentation or demo code.

@JohannesDaniel
Copy link
Collaborator

JohannesDaniel commented Nov 14, 2024

Just tested this and it works.

Step 1: Create a RankNET model with RankLib

Create file with sample data

3 qid:1 1:1 2:1
2 qid:1 1:0 2:0
1 qid:1 1:0 2:1
1 qid:1 1:0 2:0
1 qid:2 1:0 2:0
2 qid:2 1:1 2:0
1 qid:2 1:0 2:0
1 qid:2 1:0 2:0
2 qid:3 1:0 2:0
3 qid:3 1:1 2:1
4 qid:3 1:1 2:0
1 qid:3 1:0 2:1

Train model with

java -jar RankLib-2.18.jar \
-train sample_data.txt \
-ranker 1 \
-save ranknet_sample_data.txt

The output file has the following format

## RankNet
## Epochs = 100
## No. of features = 5
## No. of hidden layers = 1
## Layer 1: 10 neurons
## RankNet
## Epochs = 100
## No. of features = 2
## No. of hidden layers = 1
## Layer 1: 10 neurons
1 2
1
10
0 0 -0.013491530393429608 0.031183180961270988 0.06558792020112071 -0.006024092627087733 0.05729619574181734 -0.0017010373987742411 0.07684848696852313 -0.06570387602230028 0.04390491141617467 0.013371636736099578
0 1 -0.04795440514263621 0.06903752140115849 0.01945999419045321 -0.06690111197720977 0.0351856458777309 0.026563172489040512 0.02940313583043465 -0.07519826848527132 0.03312289317904028 0.08390625865525887
0 2 -0.0697021510755211 0.08600177285014865 -0.04542123990381912 0.025410473787168493 0.06674122726595544 0.06302119235218674 0.04571944422750241 -0.07556290182817473 0.09883834265878703 -0.08659214423993346 0.0984023297582032
1 0 0.07692335772052283
1 1 0.0017497311501703194
1 2 0.04040419284929422
1 3 -0.062296864911007305
1 4 0.0035554478427023708
1 5 -0.00921413528540745
1 6 -0.09280515494841188
1 7 0.08841552241684562
1 8 -0.01728258564667504
1 9 0.03611693722110545

Step 2: Push the model to OpenSearch

Create featureset

POST _ltr/_featureset/movie_features
{
  "featureset" : {
      "name" : "movie_features",
      "features" : [
        {"name" : "1", "params" : ["keywords"], "template_language" : "mustache",
          "template" : {"match" : {"title" : "{{keywords}}"}}},
        {"name" : "2", "params" : ["keywords"], "template_language" : "mustache",
          "template" : {"match" : {"overview" : "{{keywords}}"}}}
      ]
    }
}

Create model

POST _ltr/_featureset/movie_features/_createmodel?pretty
{
  "model": {
    "name": "my_ranklib_model",
    "model": {
      "type": "model/ranklib",
      "definition": """## RankNet
## Epochs = 100
## No. of features = 2
## No. of hidden layers = 1
## Layer 1: 10 neurons
1 2
1
10
0 0 -0.013491530393429608 0.031183180961270988 0.06558792020112071 -0.006024092627087733 0.05729619574181734 -0.0017010373987742411 0.07684848696852313 -0.06570387602230028 0.04390491141617467 0.013371636736099578
0 1 -0.04795440514263621 0.06903752140115849 0.01945999419045321 -0.06690111197720977 0.0351856458777309 0.026563172489040512 0.02940313583043465 -0.07519826848527132 0.03312289317904028 0.08390625865525887
0 2 -0.0697021510755211 0.08600177285014865 -0.04542123990381912 0.025410473787168493 0.06674122726595544 0.06302119235218674 0.04571944422750241 -0.07556290182817473 0.09883834265878703 -0.08659214423993346 0.0984023297582032
1 0 0.07692335772052283
1 1 0.0017497311501703194
1 2 0.04040419284929422
1 3 -0.062296864911007305
1 4 0.0035554478427023708
1 5 -0.00921413528540745
1 6 -0.09280515494841188
1 7 0.08841552241684562
1 8 -0.01728258564667504
1 9 0.03611693722110545
"""
    }
  }
}

Step 3: Rerank with model

POST movies/_search
{
   "_source": {
    "includes": ["title", "overview"]
  },
  "query": {
    "multi_match": {
      "query": "rambo",
      "fields": ["title", "overview"]
    }
  },
  "rescore": {
    "query": {
      "rescore_query": {
        "sltr": {
          "params": {
            "keywords": "rambo"
          },
          "model": "my_ranklib_model"
        }
      }
    }
  }
}

@JohannesDaniel
Copy link
Collaborator

JohannesDaniel commented Nov 14, 2024

In general we should think about a section in the documentation to describe the different supported models and add descriptions about how to train, format or convert them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Archived in project
Development

No branches or pull requests

4 participants