1) What it is

  • ONNX = Open Neural Network Exchange
  • An open standard format for representing machine learning models.
  • Created by Facebook & Microsoft (2017), now supported by many frameworks.
  • Goal: make models portable across frameworks and runtimes.

Instead of being locked into PyTorch, TensorFlow, etc., you can export to ONNX and run the model anywhere that supports ONNX Runtime.


2) Why ONNX matters

  • Interoperability: Train in one framework, deploy in another.
  • Performance: Optimized execution via ONNX Runtime (C/C++ backend, GPU, CPU, accelerators).
  • Standardization: Defines operators and computation graphs in a consistent way.

3) Supported Frameworks

  • Export: PyTorch, TensorFlow, Keras, scikit-learn, XGBoost, LightGBM, CatBoost, HuggingFace Transformers.
  • Import/Run: ONNX Runtime, TensorRT, OpenVINO, CoreML, Windows ML, mobile runtimes.

4) ONNX Runtime

  • High-performance inference engine for ONNX models.
  • Supports CPU, GPU (CUDA), TensorRT, DirectML, OpenVINO, ROCm.
  • Features: quantization, graph optimizations, mixed precision.
  • Used in production systems at Microsoft (Office, Bing, Azure).

5) Example Workflow

Train → Export → Deploy

Train (PyTorch):

import torch
import torch.onnx

# Dummy model
class SimpleModel(torch.nn.Module):
    def __init__(self): 
        super().__init__()
        self.fc = torch.nn.Linear(10, 2)

    def forward(self, x):
        return self.fc(x)

model = SimpleModel()
dummy_input = torch.randn(1, 10)

# Export to ONNX
torch.onnx.export(model, dummy_input, "model.onnx",
                  input_names=["input"], output_names=["output"],
                  opset_version=11)

Deploy (ONNX Runtime):

import onnxruntime as ort
import numpy as np

session = ort.InferenceSession("model.onnx")
inputs = {session.get_inputs()[0].name: np.random.randn(1,10).astype(np.float32)}
outputs = session.run(None, inputs)

print(outputs)

6) Applications

  • Cross-framework model exchange (train in PyTorch → run in TensorFlow).
  • Edge deployment (IoT, mobile, embedded devices).
  • High-performance inference (optimized runtimes like TensorRT/OpenVINO).
  • Production AI services at scale.

Summary

  • ONNX = open standard for ML model exchange.
  • Lets you train in one framework and deploy anywhere.
  • Powered by ONNX Runtime, which provides optimized inference across hardware.
  • Essential for production ML pipelines, portability, and efficiency.