Benchmark results presented here are only for reference and will be extended to additional types of Industrial Edge Devices and reference models for future versions.
Models :
Task | Model name | Input data | Model framework | Model accuracy | Model data precision | Accuracy (%FP32 ref) |
---|---|---|---|---|---|---|
Image classification | ResNet50 v1.5 (224x224) | ImageNet2012 | Tensorflow | 76.456% | fp32 | 99% |
Object detection (small) | SSD MobileNet (300x300) | CoCo | Tensorflow | mAP 0.23 | unit8 | 99% |
Object detection (large) | SSD ResNet34 (1200x1200) | CoCo | Tensorflow | fp32 | ||
Image classification GPU | ResNet50 v1 (224x224) | ImageNet2012 | ONNX / TensorRT | fp32 | ||
Object detection (small) GPU | SSD MobileNet v1 (300x300) | CoCo | ONNX / TensorRT | unit8 | ||
Object detection (large) GPU | SSD ResNet34 (1200x1200) | CoCo | ONNX / TensorRT | fp32 |
Test Scenarios :
Scenario | Query generation | Duration | Samples / query |
---|---|---|---|
Single stream | LoadGen sends next query as soon as SUT completes the previous query | 1024 queries and 60 seconds | 1 |
Multiple stream* | LoadGen sends next query as soon as SUT completes the previous query | 4096 queries and 240 seconds | 2 |
*Multiple streams scenario's original duration definition: 270,336 queries and 600 seconds, 8 samples/query. The tests were performed with the given numbers due to memory limitation of the built-in runtime module.
Hardware: IPC427E with Intel® Xeon® (with 4 cores) , 16 GB RAM , 240 GB SSD , NO GPU
Task | Scenario | Time per inference (ms) | Max frequency (fps) | CPU consumption (%) | Memory consumption (MB) |
---|---|---|---|---|---|
Image classification | Single stream | 83.04 | 12.04 | 66.31% | 1119.03 MB |
Object detection (small) | Single stream | 48.07 | 20.80 | 63.81% | 1155.46 MB |
Object detection (large) | Single stream | 3062.28 | 0.33 | 87.62% | 1157.53 MB |
Image classification | Multiple stream | 161.50 | n.a. | 81.75% | 1151.94 MB |
Object detection (small) | Multiple stream | 84.27 | n.a. | 79.55% | 1092.54 MB |
Object detection (large) | Multiple stream | 6372.60 | n.a. | 88.35% | 1125.80 MB |
Hardware: IPC847E with Intel® Xeon® E-2278GE , 128 GB RAM , 960 GB SSD , NO GPU
Task | Scenario | Time per inference (ms) | Max frequency (fps) | CPU consumption (%) | Memory consumption (MB) |
---|---|---|---|---|---|
Image classification | Single stream | 36.83 | 27.15 | 52.45% | 1000.75 MB |
Object detection (small) | Single stream | 18.79 | 53.21 | 51.77% | 1064.71 MB |
Object detection (large) | Single stream | 1011.43 | 0.99 | 78.17% | 1065.33 MB |
Image classification | Multiple stream | 57.66 | n.a. | 67.89% | 1124.01 MB |
Object detection (small) | Multiple stream | 31.90 | n.a. | 67.89% | 1108.73 MB |
Object detection (large) | Multiple stream | 1968.88 | n.a. | 105.17% | 1092.78 MB |
Hardware: IPC BX-59A with Intel® CoreTM i9-13900E (with 24 cores) , 32 GB RAM , 500 GB SSD , NVIDIA® L4 Tensor GPU
Task | Scenario | Time per inference (ms) | Max frequency (fps) | CPU consumption (%) | Memory consumption (MB) |
---|---|---|---|---|---|
Image classification | Single stream | 44.47 | 22.48 | 44.52% | 966.12 MB |
Object detection (small) | Single stream | 24.68 | 40.51 | 35.68% | 1105.76 MB |
Object detection (large) | Single stream | 828.27 | 1.21 | 65.07% | 878.28 MB |
Image classification | Multiple stream | 50.26 | n.a. | 60.22% | 1107.26 MB |
Object detection (small) | Multiple stream | 30.33 | n.a. | 54.54% | 1106.01 MB |
Object detection (large) | Multiple stream | 1321.83 | n.a. | 82.72 | 1072.42 MB |
Image classification GPU | Single stream | 12.68 | 78.80 | 2.59% | 158.21 MB |
Object detection (small) GPU | Single stream | 13.67 | 73.13 | 49.64% | 154.71 MB |
Object detection (large) GPU | Single stream | 156.69 | 6.38 | 2.25% | 275.15 MB |
Image classification GPU | Multiple stream | 20.80 | n.a. | 5.27% | 192.97 MB |
Object detection (small) GPU | Multiple stream | 27.72 | n.a. | 51.49% | 188.06 MB |
Object detection (large) GPU | Multiple stream | 338.13 | n.a. | 5.37% | 441.92 MB |
Vision performance
Test scenario: Low complexity use case
Use case characteristics:
The AI model only tells what is on the whole input image (image classification).
Low input resolution: 224 x 224 ≈ 0,05 Mpx.
Images are evaluated and post processed individually.
Examples:
Identify what type of product is on the image.
Identify if the product on the image is okay on large scale or not.
Basic performance setup:
Single camera
1 Gigabit network
CPU based device
High performance setup:
n.a.
Test scenario: Medium complexity use case
Use case characteristics:
The AI model identifies multiple objects on the input image (object detection).
Medium input resolution: 640 x 640 ≈ 0,4 Mpx.
Images taken at the same time might be processed together.
Examples:
Identify smaller defects on an object.
Identify faulty objects arriving in multiple rows on a conveyor belt.
Basic performance setup
Single camera
1 Gigabit network
CPU based device
High performance setup
Multiple cameras
10 Gigabit network
GPU based device
Test scenario: High complexity use case
Use case characteristics:
The AI model identifies multiple objects on the input image (object detection).
High input resolution: 2448 x 2024 ≈ 5 Mpx.
Images taken at the same time are processed together.
Examples:
Identify very small defects on a large object, e.g. a car body part.
Basic performance setup
n.a.
High performance setup
Multiple cameras
10 Gigabit network
GPU based device
Test scenario / Use case | Image resolution | Color format | Model | Hardware setup | Thoughput | Latency | CPU and GPU load (%) |
---|---|---|---|---|---|---|---|
Low complexity | 224x224 ~ 0,05 MPx | RG8 | MobileNet with 224x224 input | 1-8 Basler ace 2 cameras, 10 Gigabit network, IPC BX-59A with NVIDIA L4 GPU | 160 FPS overall ≈ 9600 PPM | 6 ms avg / 34 ms max pipeline latency | < 20% |
Medium complexity | 640x640 ~ 0,4 MPx | RG8 | YOLO v7 tiny with 640x640 input | 1-8 Basler ace 2 cameras, 10 Gigabit network, IPC BX-59A with NVIDIA L4 GPU | 80 FPS overall ≈ 4800 PPM | 13 ms avg / 43 ms max pipeline latency | < 20% |
High complexity | 2448x2024 ~ 5 MPx | RG8 | upscaled object detection with 2448x2024 input | 1-8 Basler ace 2 cameras, 10 Gigabit network, IPC BX-59A with NVIDIA L4 GPU | 5 FPS overall ≈ 300 PPM | 500 ms avg / 515 ms max pipeline latency | < 20% CPU, <15% GPU |
Notes: Every use case is individual. The benchmarks were chosen to represent an average. Do not compare E2E system performance with MLPerf benchmarks covering raw model inferencing only. Tuning performance for a given use case means finding a viable balance between prediction speed and accuracy. Tuning levers include but are not limited to:
Hardware used
Image and model resolution
Choice of model architecture
Pre- and postprocessing code
Selection of training data
Throughput and load figures include:
Siemens Vision Connector receiving images from a real camera
AI Inference Server running the images through the given AI pipeline
Basic pre- and postprocessing in the pipeline.
Base load of Industrial Edge device
The tests were executed on a prototype BX-59A. An official BX-59A IED with L4 GPU and 10 gig NIC will be measured soon.