Benchmark results

AI Inference Server Release Notes

Product
AI Inference Server
Product Version
2.1.0
Language
en-US

Benchmark results presented here are only for reference and will be extended to additional types of Industrial Edge Devices and reference models for future versions.

Models :

Task Model name Input data Model framework Model accuracy Model data precision Accuracy (%FP32 ref)
Image classification ResNet50 v1.5 (224x224) ImageNet2012 Tensorflow 76.456% fp32 99%
Object detection (small) SSD MobileNet (300x300) CoCo Tensorflow mAP 0.23 unit8 99%
Object detection (large) SSD ResNet34 (1200x1200) CoCo Tensorflow fp32
Image classification GPU ResNet50 v1 (224x224) ImageNet2012 ONNX / TensorRT fp32
Object detection (small) GPU SSD MobileNet v1 (300x300) CoCo ONNX / TensorRT unit8
Object detection (large) GPU SSD ResNet34 (1200x1200) CoCo ONNX / TensorRT fp32

Test Scenarios :

Scenario Query generation Duration Samples / query
Single stream LoadGen sends next query as soon as SUT completes the previous query 1024 queries and 60 seconds 1
Multiple stream* LoadGen sends next query as soon as SUT completes the previous query 4096 queries and 240 seconds 2

*Multiple streams scenario's original duration definition: 270,336 queries and 600 seconds, 8 samples/query. The tests were performed with the given numbers due to memory limitation of the built-in runtime module.

Hardware: IPC427E with Intel® Xeon® (with 4 cores) , 16 GB RAM , 240 GB SSD , NO GPU

Task Scenario Time per inference (ms) Max frequency (fps) CPU consumption (%) Memory consumption (MB)
Image classification Single stream 83.04 12.04 66.31% 1119.03 MB
Object detection (small) Single stream 48.07 20.80 63.81% 1155.46 MB
Object detection (large) Single stream 3062.28 0.33 87.62% 1157.53 MB
Image classification Multiple stream 161.50 n.a. 81.75% 1151.94 MB
Object detection (small) Multiple stream 84.27 n.a. 79.55% 1092.54 MB
Object detection (large) Multiple stream 6372.60 n.a. 88.35% 1125.80 MB

Hardware: IPC847E with Intel® Xeon® E-2278GE , 128 GB RAM , 960 GB SSD , NO GPU

Task Scenario Time per inference (ms) Max frequency (fps) CPU consumption (%) Memory consumption (MB)
Image classification Single stream 36.83 27.15 52.45% 1000.75 MB
Object detection (small) Single stream 18.79 53.21 51.77% 1064.71 MB
Object detection (large) Single stream 1011.43 0.99 78.17% 1065.33 MB
Image classification Multiple stream 57.66 n.a. 67.89% 1124.01 MB
Object detection (small) Multiple stream 31.90 n.a. 67.89% 1108.73 MB
Object detection (large) Multiple stream 1968.88 n.a. 105.17% 1092.78 MB

Hardware: IPC BX-59A with Intel® CoreTM i9-13900E (with 24 cores) , 32 GB RAM , 500 GB SSD , NVIDIA® L4 Tensor GPU

Task Scenario Time per inference (ms) Max frequency (fps) CPU consumption (%) Memory consumption (MB)
Image classification Single stream 44.47 22.48 44.52% 966.12 MB
Object detection (small) Single stream 24.68 40.51 35.68% 1105.76 MB
Object detection (large) Single stream 828.27 1.21 65.07% 878.28 MB
Image classification Multiple stream 50.26 n.a. 60.22% 1107.26 MB
Object detection (small) Multiple stream 30.33 n.a. 54.54% 1106.01 MB
Object detection (large) Multiple stream 1321.83 n.a. 82.72 1072.42 MB
Image classification GPU Single stream 12.68 78.80 2.59% 158.21 MB
Object detection (small) GPU Single stream 13.67 73.13 49.64% 154.71 MB
Object detection (large) GPU Single stream 156.69 6.38 2.25% 275.15 MB
Image classification GPU Multiple stream 20.80 n.a. 5.27% 192.97 MB
Object detection (small) GPU Multiple stream 27.72 n.a. 51.49% 188.06 MB
Object detection (large) GPU Multiple stream 338.13 n.a. 5.37% 441.92 MB

Vision performance

Test scenario: Low complexity use case

Use case characteristics​:

  • The AI model only tells what is on the whole input image (image classification).​

  • Low input resolution: 224 x 224 ≈ 0,05 Mpx.​

  • Images are evaluated and post processed individually.​

Examples​:

  • Identify what type of product is on the image.​

  • Identify if the product on the image is okay on large scale or not.​

Basic performance setup​:

  • Single camera ​

  • 1 Gigabit network​

  • CPU based device

High performance setup​:

  • n.a.

Test scenario: Medium complexity use case

Use case characteristics​:

  • The AI model identifies multiple objects on the input image (object detection).​

  • Medium input resolution: 640 x 640 ≈ 0,4 Mpx.​

  • Images taken at the same time might be processed together.​​

Examples​:

  • Identify smaller defects on an object.​

  • Identify faulty objects arriving in multiple rows on a conveyor belt.

Basic performance setup​

  • Single camera ​

  • 1 Gigabit network​

  • CPU based device

High performance setup​

  • Multiple cameras​

  • 10 Gigabit network​

  • GPU based device

Test scenario: High complexity use case

Use case characteristics​:

  • The AI model identifies multiple objects on the input image (object detection).​

  • High input resolution: 2448 x 2024 ≈ 5 Mpx.​

  • Images taken at the same time are processed together.

Examples​:

  • Identify very small defects on a large object, e.g. a car body part.

Basic performance setup​

  • n.a.

High performance setup​

  • Multiple cameras​

  • 10 Gigabit network​

  • GPU based device

Test scenario / Use case Image resolution Color format Model Hardware setup Thoughput Latency CPU and GPU load (%)
Low complexity 224x224 ~ 0,05 MPx RG8 MobileNet with 224x224 input 1-8 Basler ace 2 cameras, 10 Gigabit network, IPC BX-59A with NVIDIA L4 GPU 160 FPS overall ≈ 9600 PPM​ 6 ms avg / 34 ms max pipeline latency​ < 20%
Medium complexity 640x640 ~ 0,4 MPx RG8 YOLO v7 tiny with 640x640 input 1-8 Basler ace 2 cameras, 10 Gigabit network, IPC BX-59A with NVIDIA L4 GPU 80 FPS overall ≈ 4800 PPM​ 13 ms avg / 43 ms max pipeline latency​ < 20%
High complexity 2448x2024 ~ 5 MPx RG8 upscaled object detection with 2448x2024 input 1-8 Basler ace 2 cameras, 10 Gigabit network, IPC BX-59A with NVIDIA L4 GPU 5 FPS overall ≈ 300 PPM​ 500 ms avg / 515 ms max pipeline latency​ < 20% CPU, <15% GPU

Notes: Every use case is individual. The benchmarks were chosen to represent an average.​ Do not compare E2E system performance with MLPerf benchmarks covering raw model inferencing only. ​ Tuning performance for a given use case means finding a viable balance between prediction speed and accuracy.​ Tuning levers include but are not limited to:​

  • Hardware used​

  • Image and model resolution​

  • Choice of model architecture​

  • Pre- and postprocessing code​

  • Selection of training data

Throughput and load figures include:​

  • Siemens Vision Connector receiving images from a real camera​

  • AI Inference Server running the images through the given AI pipeline​

  • Basic pre- and postprocessing in the pipeline.​

  • Base load of Industrial Edge device​

The tests were executed on a prototype BX-59A. An official BX-59A IED with L4 GPU and 10 gig NIC will be measured soon.