Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mismatch between Matmul op in FLOAT16 and pytorch Linear op. #23272

Open
AyoubMDL opened this issue Jan 7, 2025 · 0 comments
Open

Mismatch between Matmul op in FLOAT16 and pytorch Linear op. #23272

AyoubMDL opened this issue Jan 7, 2025 · 0 comments

Comments

@AyoubMDL
Copy link

AyoubMDL commented Jan 7, 2025

Describe the issue

I’ve noticed mismatches between the outputs of a PyTorch model and the corresponding ONNX model when running inference with ONNX Runtime. Specifically, I’m working with float16 precision, and the results differ between the two frameworks. While I’m aware that such mismatches can occur for float32, should I also expect similar discrepancies when working with float16 (maybe because intermediate ops are computed in float32) ? If so, what are the potential causes, and how can I resolve or minimize these differences?

Any insights or guidance on this matter would be greatly appreciated!

To reproduce

import numpy as np
import onnxruntime
import torch
import torch.nn as nn

class Dense(nn.Linear):
    def __init__(self, in_features, out_features):
        super().__init__(in_features=in_features, out_features=out_features,
                         bias=False, device="cpu", dtype=torch.float16)
        self.weight.requires_grad = False

    def forward(self, input):
        return super().forward(input)


def compare_outputs(pytorch_model, onnx_model_path, inputs):
    def _to_numpy(tensor):
        return tensor.cpu().numpy()

    # ONNXRuntime inference
    ort_session = onnxruntime.InferenceSession(onnx_model_path)
    ort_outputs = ort_session.run(None, {'x': _to_numpy(inputs)})

   # Torch inference
    pytorch_model.eval()
    torch_outputs = [_to_numpy(pytorch_model(inputs))]

    # Test fail
    np.testing.assert_array_equal(ort_outputs, torch_outputs)


def main():
    torch.manual_seed(0)

    # Create random float16 inputs either between [-fp16min, fp16max]
    size = (64, 256)
    x_rand_tensor = torch.rand(size, requires_grad=False, dtype=torch.float32)
    f16_min = torch.finfo(torch.float16).min + 1
    f16_max = torch.finfo(torch.float16).max - 1

    scale_factor = (f16_max - f16_min)
    offset = f16_min

    x = (x_rand_tensor * scale_factor + offset).to(torch.float16)

    # Create the model
    dense_model = Dense(256, 1024)

    onnx_model_path = "dense_model.onnx"

    torch.onnx.export(
        dense_model,
        x,
        onnx_model_path,
        opset_version=15,
        input_names=['x'],
        output_names=['output'],
    )

    print(f"[INFO] Model exported to {onnx_model_path}")
    compare_outputs(dense_model, onnx_model_path, x)


if __name__ == "__main__":
    main()

Urgency

No

Platform

Linux

OS Version

Ubuntu 22.04.3 LTS

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

1.21.0

ONNX Runtime API

Python

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant