Custom metrics created with AI SDK are automatically collected from the topic
/siemens/edge/aiinference/{model-name}/{model-version}/metrics/{component-name}/{metric-name}.
If you wish to collect custom metrics, make sure that the topic is added to your configured Databus user, and the necessary credentials are configured for AI Model Monitor Agent as well.
The metrics are extended with labels identifying the running pipeline where it is originated from.
| **Metric name** | **Metric type** | **Metric origin** | **Metric description** |
|---|---|---|---|
| host\_box\_availability | gauge | Edge device | Device availability. Represented by: 0 - not available and 1 - available. |
| host\_box\_memory\_total | gauge | Edge device | Total available memory on the box in bytes. |
| host\_box\_memory\_used | gauge | Edge device | Used memory on the box in bytes. |
| host\_box\_cpu\_percentage | gauge | Edge device | Used CPU percentage. |
| host\_box\_edge\_uptime | gauge | Edge device | Uptime of the box represented in minutes. |
| host\_box\_running\_application\_count | gauge | Edge device | Number of running applications on the box. |
| host\_box\_max\_running\_application\_count | gauge | Edge device | Maximum number of applications allowed to run. |
| host\_box\_ApplicationName\_status | gauge | Edge device | Application availability. Represented by: 0 - not available and 1 - available. |
| pipeline\_global\_inputs | gauge | AI Inference Server | Number of pipeline inputs generated since the active pipeline is running. |
| pipeline\_global\_outputs | gauge | AI Inference Server | Number of pipeline outputs generated since the active pipeline is running. |
| pipeline\_NodeName\_exec\_min | gauge | AI Inference Server | Node execution time minimum. |
| pipeline\_NodeName\_exec\_max | gauge | AI Inference Server | Node execution time maximum. |
| pipeline\_NodeName\_exec\_avg | gauge | AI Inference Server | Node execution time average. |
| pipeline\_NodeName\_inputs | gauge | AI Inference Server | Number of node inputs. |
| pipeline\_NodeName\_outputs | gauge | AI Inference Server | Number of node outputs. |
| pipeline\_status | gauge | AI Inference Server | The status of the AI Inference Server pipeline stored as label value. The metric always returns 0 value irrespective of the status of the pipeline. |
| pipeline\_StepName\_gpuruntime\_inference\_count | gauge | AI Inference Server | The cumulative count of successful inference requests made for this model \(DOES NOT include cache hits\). |
| pipeline\_StepName\_gpuruntime\_execution\_count | gauge | AI Inference Server | The cumulative count of successful inference executions performed for the model \(DOES NOT include cache hits\). |
| pipeline\_StepName\_gpuruntime\_success\_count | gauge | AI Inference Server | The cumulative count of all successful inference requests made for this model \(INCLUDING cache hits\). |
| pipeline\_StepName\_gpuruntime\_success\_duration | gauge | AI Inference Server | The cumulative duration for all successful inference requests in nanoseconds \(INCLUDING cache hits\). |
| pipeline\_StepName\_gpuruntime\_fail\_count | gauge | AI Inference Server | The cumulative count of all failed inference requests made for this model. |
| pipeline\_StepName\_gpuruntime\_fail\_duration | gauge | AI Inference Server | The cumulative duration for all failed inference requests in nanoseconds. |
| pipeline\_StepName\_gpuruntime\_queue\_count | gauge | AI Inference Server | The cumulative count of inference requests waited in scheduling or in other queues. \(INCLUDING cache hits\) |
| pipeline\_StepName\_gpuruntime\_queue\_duration | gauge | AI Inference Server | The cumulative duration that inference requests wait in scheduling \(or in other queues\) in nanoseconds \(INCLUDING cache hits\). |
| pipeline\_StepName\_gpuruntime\_compute\_input\_count | gauge | AI Inference Server | The cumulative count of the prepared tensor data input required by the model \(DOES NOT include cache hits\). |
| pipeline\_StepName\_gpuruntime\_compute\_input\_duration | gauge | AI Inference Server | The cumulative duration to prepare input tensor data as required by the model in nanoseconds \(DOES NOT include cache hits\). |
| pipeline\_StepName\_gpuruntime\_compute\_output\_count | gauge | AI Inference Server | The cumulative count of the extracted tensor data output produced by the model \(DOES NOT include cache hits\). |
| pipeline\_StepName\_gpuruntime\_compute\_output\_duration | gauge | AI Inference Server | The cumulative duration to extract output tensor data produced by the model in nanoseconds \(DOES NOT include cache hits\). |
| pipeline\_StepName\_gpuruntime\_cache\_hit\_count | gauge | AI Inference Server | The count of response cache hits. |
| pipeline\_StepName\_gpuruntime\_cache\_hit\_duration | gauge | AI Inference Server | The cumulative duration to look up and extract output tensor data from the Response Cache on a cache hit in nanoseconds. |
| pipeline\_StepName\_gpuruntime\_cache\_miss\_count | gauge | AI Inference Server | The count of response cache misses. |
| pipeline\_StepName\_gpuruntime\_cache\_miss\_duration | gauge | AI Inference Server | The cumulative duration to look up and insert output tensor data to the Response Cache on a cache miss in nanoseconds. |
| pipeline\_StepName\_gpuruntime\_batch\_compute\_input\_duration | gauge | AI Inference Server | The cumulative duration to prepare input tensor data as required by the model in nanoseconds with the given batch size. |
| pipeline\_StepName\_gpuruntime\_batch\_compute\_input\_count | gauge | AI Inference Server | The cumulative count of input tensor data with the given batch size. |
| pipeline\_StepName\_gpuruntime\_batch\_compute\_output\_duration | gauge | AI Inference Server | The cumulative duration to extract output tensor data as required by the model in nanoseconds with the given batch size. |
| pipeline\_StepName\_gpuruntime\_batch\_compute\_output\_count | gauge | AI Inference Server | The cumulative count of output tensor data with the given batch size. |
| pipeline\_StepName\_gpuruntime\_batch\_compute\_infer\_duration | gauge | AI Inference Server | The cumulative duration to execute the model in nanoseconds with the given batch size. |
| pipeline\_StepName\_gpuruntime\_batch\_compute\_infer\_count | gauge | AI Inference Server | The cumulative count of model executions with the given batch size. |
| is\_drift | gauge | Monitoring Node | Whether a model drift is detected or not according to the collected buffer. Represented by: -1 - buffer is being collected or error occurred, 0 - no model drift detected, 1 - model drift detected. |
| number\_of\_data\_type\_errors | gauge | Monitoring Node | Number of data type errors. The default threshold is set to 0. |
| number\_of\_missing\_properties | gauge | Monitoring Node | Number of missing properties. The range of this metric is between 0 and the number of properties. |
| ratio\_of\_categorical\_out\_of\_domain\_features | gauge | Monitoring Node | Ratio of categorical out of domain features. This metric is bounded between 0 and 1. |
| ratio\_of\_numerical\_out\_of\_domain\_features | gauge | Monitoring Node | Ratio of numerical out of domain features. This metric is bounded between 0 and 1. |
| ratio\_of\_numerical\_outlier\_features | gauge | Monitoring Node | Ratio of numerical outlier features. |
| <custom metric name> | gauge | AI Inference Server | Custom metric defined in AI SDK at model creation. |
| error\_metric\_created | gauge | AI Model Monitor Agent | Error metric counter creation time. |
| error\_metric\_total | counter | AI Model Monitor Agent | Number of errors occurred during scraping. |
| scrape\_duration\_seconds | gauge | Prometheus | Duration of the scrape. |
| scrape\_samples\_post\_metric\_relabeling | gauge | Prometheus | Number of samples remaining after metric relabeling was applied. |
| scrape\_samples\_scraped | gauge | Prometheus | Number of samples the target exposed. |
| scrape\_series\_added | gauge | Prometheus | Approximate number of new series in this scrape. |
| up | gauge | Prometheus | Prometheus availability. Represented by: 0 - not available and 1 - available. |