Benchmark results

AI Inference Server Release Notes

AI Inference Server
Product Version

Benchmark results presented here are only for reference and will be extended to additional types of Industrial Edge Devices and reference models for future versions.

Models :

Task Model name Input data Model framework Model accuracy Model data precision Accuracy (%FP32 ref)
Image classification ResNet50 v1.5 (224x224) ImageNet2012 Tensorflow 76.456% fp32 99%
Object detection (small) SSD MobileNet (300x300) CoCo Tensorflow mAP 0.23 unit8 99%
Object detection (large) SSD ResNet34 (1200x1200) CoCo Tensorflow fp32
Image classification GPU ResNet50 v1 (224x224) ImageNet2012 ONNX / TensorRT fp32
Object detection (small) GPU SSD MobileNet v1 (300x300) CoCo ONNX / TensorRT unit8
Object detection (large) GPU SSD ResNet34 (1200x1200) CoCo ONNX / TensorRT fp32

Test Scenarios :

Scenario Query generation Duration Samples / query
Single stream LoadGen sends next query as soon as SUT completes the previous query 1024 queries and 60 seconds 1
Multiple stream* LoadGen sends next query as soon as SUT completes the previous query 4096 queries and 240 seconds 2

*Multiple streams scenario's original duration definition: 270,336 queries and 600 seconds, 8 samples/query. The tests were performed with the given numbers due to memory limitation of the built-in runtime module.

Hardware: IPC427E with Intel® Xeon® (with 4 cores) , 16 GB RAM , 240 GB SSD , NO GPU

Task Scenario Time per inference (ms) Max frequency (fps) CPU consumption (%) Memory consumption (MB)
Image classification Single stream 83.04 12.04 66.31% 1119.03 MB
Object detection (small) Single stream 48.07 20.80 63.81% 1155.46 MB
Object detection (large) Single stream 3062.28 0.33 87.62% 1157.53 MB
Image classification Multiple stream 161.50 n.a. 81.75% 1151.94 MB
Object detection (small) Multiple stream 84.27 n.a. 79.55% 1092.54 MB
Object detection (large) Multiple stream 6372.60 n.a. 88.35% 1125.80 MB

Hardware: IPC847E with Intel® Xeon® E-2278GE , 128 GB RAM , 960 GB SSD , NO GPU

Task Scenario Time per inference (ms) Max frequency (fps) CPU consumption (%) Memory consumption (MB)
Image classification Single stream 36.83 27.15 52.45% 1000.75 MB
Object detection (small) Single stream 18.79 53.21 51.77% 1064.71 MB
Object detection (large) Single stream 1011.43 0.99 78.17% 1065.33 MB
Image classification Multiple stream 57.66 n.a. 67.89% 1124.01 MB
Object detection (small) Multiple stream 31.90 n.a. 67.89% 1108.73 MB
Object detection (large) Multiple stream 1968.88 n.a. 105.17% 1092.78 MB

Hardware: IPC BX-59A with Intel® CoreTM i9-13900E (with 24 cores) , 32 GB RAM , 500 GB SSD , NVIDIA® L4 Tensor GPU

Task Scenario Time per inference (ms) Max frequency (fps) CPU consumption (%) Memory consumption (MB)
Image classification Single stream 44.47 22.48 44.52% 966.12 MB
Object detection (small) Single stream 24.68 40.51 35.68% 1105.76 MB
Object detection (large) Single stream 828.27 1.21 65.07% 878.28 MB
Image classification Multiple stream 50.26 n.a. 60.22% 1107.26 MB
Object detection (small) Multiple stream 30.33 n.a. 54.54% 1106.01 MB
Object detection (large) Multiple stream 1321.83 n.a. 82.72 1072.42 MB
Image classification GPU Single stream 12.68 78.80 2.59% 158.21 MB
Object detection (small) GPU Single stream 13.67 73.13 49.64% 154.71 MB
Object detection (large) GPU Single stream 156.69 6.38 2.25% 275.15 MB
Image classification GPU Multiple stream 20.80 n.a. 5.27% 192.97 MB
Object detection (small) GPU Multiple stream 27.72 n.a. 51.49% 188.06 MB
Object detection (large) GPU Multiple stream 338.13 n.a. 5.37% 441.92 MB

Vision performance

Test scenario: Low complexity use case

Use case characteristics​:

  • The AI model only tells what is on the whole input image (image classification).​

  • Low input resolution: 224 x 224 ≈ 0,05 Mpx.​

  • Images are evaluated and post processed individually.​


  • Identify what type of product is on the image.​

  • Identify if the product on the image is okay on large scale or not.​

Basic performance setup​:

  • Single camera ​

  • 1 Gigabit network​

  • CPU based device

High performance setup​:

  • n.a.

Test scenario: Medium complexity use case

Use case characteristics​:

  • The AI model identifies multiple objects on the input image (object detection).​

  • Medium input resolution: 640 x 640 ≈ 0,4 Mpx.​

  • Images taken at the same time might be processed together.​​


  • Identify smaller defects on an object.​

  • Identify faulty objects arriving in multiple rows on a conveyor belt.

Basic performance setup​

  • Single camera ​

  • 1 Gigabit network​

  • CPU based device

High performance setup​

  • Multiple cameras​

  • 10 Gigabit network​

  • GPU based device

Test scenario: High complexity use case

Use case characteristics​:

  • The AI model identifies multiple objects on the input image (object detection).​

  • High input resolution: 2448 x 2024 ≈ 5 Mpx.​

  • Images taken at the same time are processed together.


  • Identify very small defects on a large object, e.g. a car body part.

Basic performance setup​

  • n.a.

High performance setup​

  • Multiple cameras​

  • 10 Gigabit network​

  • GPU based device

Test scenario / Use case Image resolution Color format Model Hardware setup Thoughput Latency CPU and GPU load (%)
Low complexity 224x224 ~ 0,05 MPx RG8 MobileNet with 224x224 input 1-8 Basler ace 2 cameras, 10 Gigabit network, IPC BX-59A with NVIDIA L4 GPU 160 FPS overall ≈ 9600 PPM​ 6 ms avg / 34 ms max pipeline latency​ < 20%
Medium complexity 640x640 ~ 0,4 MPx RG8 YOLO v7 tiny with 640x640 input 1-8 Basler ace 2 cameras, 10 Gigabit network, IPC BX-59A with NVIDIA L4 GPU 80 FPS overall ≈ 4800 PPM​ 13 ms avg / 43 ms max pipeline latency​ < 20%
High complexity 2448x2024 ~ 5 MPx RG8 upscaled object detection with 2448x2024 input 1-8 Basler ace 2 cameras, 10 Gigabit network, IPC BX-59A with NVIDIA L4 GPU 5 FPS overall ≈ 300 PPM​ 500 ms avg / 515 ms max pipeline latency​ < 20% CPU, <15% GPU

Notes: Every use case is individual. The benchmarks were chosen to represent an average.​ Do not compare E2E system performance with MLPerf benchmarks covering raw model inferencing only. ​ Tuning performance for a given use case means finding a viable balance between prediction speed and accuracy.​ Tuning levers include but are not limited to:​

  • Hardware used​

  • Image and model resolution​

  • Choice of model architecture​

  • Pre- and postprocessing code​

  • Selection of training data

Throughput and load figures include:​

  • Siemens Vision Connector receiving images from a real camera​

  • AI Inference Server running the images through the given AI pipeline​

  • Basic pre- and postprocessing in the pipeline.​

  • Base load of Industrial Edge device​

The tests were executed on a prototype BX-59A. An official BX-59A IED with L4 GPU and 10 gig NIC will be measured soon.