Real-time Metrics Endpoints

MakeHub provides access to real-time performance metrics for all supported AI models and providers. These metrics help you understand the current performance characteristics of different providers, which can be valuable for optimizing your routing decisions.

Metrics Endpoint

You can access real-time performance metrics through the /v1/metrics endpoint:

GET https://api.makehub.ai/v1/metrics

Query Parameters

Parameter	Type	Description
`model_id`	string	Required. The model ID to get metrics for (e.g., `openai/gpt-4o`)
`provider_id`	string	Optional. Specific provider to get metrics for
`n_last_minutes`	integer	Optional. Number of minutes to consider for metrics calculation (default: 3000)

Response Format

The response contains metrics for each provider that supports the requested model. For each provider, you'll receive the following metrics:

Metric	Description
`avg_latency_3000min_ms`	Average initial response latency in milliseconds (over the last 3000 minutes by default)
`avg_throughput_3000min_tokens_per_second`	Average throughput in tokens per second (over the last 3000 minutes by default)
`last_latency_ms`	Latency of the most recent request in milliseconds
`last_throughput_tokens_per_second`	Throughput of the most recent request in tokens per second
`latency_variance_3000min_ms`	Variance in latency measurements (indicates stability)
`throughput_variance_3000min_tokens_per_second`	Variance in throughput measurements (indicates stability)
`dt_since_last_measurement_ms`	Time since the last measurement was taken in milliseconds
`rtt_from_makehub_ms`	Round-trip time from MakeHub to the provider's API in milliseconds

Example Usage

Python Example

import requests
import json
 
# Configuration
API_BASE_URL = "https://api.makehub.ai" 
BEARER_TOKEN = "your_makehub_api_key"
MODEL_ID = "openai/gpt-4o"
N_LAST_MINUTES = 10  # Get metrics for the last 10 minutes
 
def get_metrics(model_id, provider_id=None):
    """Simple call to the /v1/metrics endpoint"""
    # Build the URL
    url = f"{API_BASE_URL}/v1/metrics?model_id={model_id}"
    if provider_id:
        url += f"&provider_id={provider_id}"
    if N_LAST_MINUTES:
        url += f"&n_last_minutes={N_LAST_MINUTES}"
    
    # Headers with Bearer Token
    headers = {
        "Authorization": f"Bearer {BEARER_TOKEN}",
        "Content-Type": "application/json"
    }
    
    # Execute GET request
    response = requests.get(url, headers=headers)
    
    if response.status_code == 200:
        return response.json()
    else:
        return {"error": response.text}
 
if __name__ == "__main__":
    result = get_metrics(MODEL_ID)
    print(json.dumps(result, indent=2, ensure_ascii=False))

TypeScript Example

import axios from 'axios';
 
async function getMetrics(modelId: string, providerId?: string, nLastMinutes?: number) {
  // Build the URL with query parameters
  let url = `https://api.makehub.ai/v1/metrics?model_id=${modelId}`;
  if (providerId) {
    url += `&provider_id=${providerId}`;
  }
  if (nLastMinutes) {
    url += `&n_last_minutes=${nLastMinutes}`;
  }
 
  // Make the request
  try {
    const response = await axios.get(url, {
      headers: {
        'Authorization': `Bearer your_makehub_api_key`,
        'Content-Type': 'application/json'
      }
    });
    
    return response.data;
  } catch (error) {
    console.error('Error fetching metrics:', error);
    return { error: error.message };
  }
}
 
// Example usage
getMetrics('openai/gpt-4o', undefined, 10)
  .then(data => console.log(JSON.stringify(data, null, 2)));

cURL Example

curl "https://api.makehub.ai/v1/metrics?model_id=openai/gpt-4o&n_last_minutes=10" \
  -H "Authorization: Bearer your_makehub_api_key" \
  -H "Content-Type: application/json"

Sample Response

Here's an example of what the response might look like for the model openai/gpt-4o:

{
  "azure-aoai": {
    "avg_latency_3000min_ms": 992.87,
    "avg_throughput_3000min_tokens_per_second": 99.92,
    "dt_since_last_measurement_ms": 20737.39,
    "last_latency_ms": 524.75,
    "last_throughput_tokens_per_second": 74.07,
    "latency_variance_3000min_ms": 57459061.50,
    "rtt_from_makehub_ms": 70.34,
    "throughput_variance_3000min_tokens_per_second": 1418.82
  },
  "azure-eastus2": {
    "avg_latency_3000min_ms": 786.43,
    "avg_throughput_3000min_tokens_per_second": 93.21,
    "dt_since_last_measurement_ms": 19895.76,
    "last_latency_ms": 597.00,
    "last_throughput_tokens_per_second": 98.80,
    "latency_variance_3000min_ms": 13982901.08,
    "rtt_from_makehub_ms": null,
    "throughput_variance_3000min_tokens_per_second": 1510.09
  },
  "azure-francecentral": {
    "avg_latency_3000min_ms": 570.10,
    "avg_throughput_3000min_tokens_per_second": 88.45,
    "dt_since_last_measurement_ms": 19895.00,
    "last_latency_ms": 517.34,
    "last_throughput_tokens_per_second": 81.32,
    "latency_variance_3000min_ms": 114192.80,
    "rtt_from_makehub_ms": 85.38,
    "throughput_variance_3000min_tokens_per_second": 266.00
  },
  "azure-swedencentral": {
    "avg_latency_3000min_ms": 590.05,
    "avg_throughput_3000min_tokens_per_second": 101.40,
    "dt_since_last_measurement_ms": 19894.22,
    "last_latency_ms": 711.30,
    "last_throughput_tokens_per_second": 70.82,
    "latency_variance_3000min_ms": 478525.62,
    "rtt_from_makehub_ms": 97.18,
    "throughput_variance_3000min_tokens_per_second": 829.45
  },
  "openai": {
    "avg_latency_3000min_ms": 24571.52,
    "avg_throughput_3000min_tokens_per_second": 63.32,
    "dt_since_last_measurement_ms": 30.16,
    "last_latency_ms": 47029.82,
    "last_throughput_tokens_per_second": 38.62,
    "latency_variance_3000min_ms": 19706887504.74,
    "rtt_from_makehub_ms": 3.01,
    "throughput_variance_3000min_tokens_per_second": 2211.73
  }
  ...
}

Using Metrics Data

Use the metrics to dynamically adjust your routing strategy based on current conditions.

def select_best_provider(model_id, optimization_goal='latency'):
    metrics = get_metrics(model_id)
    
    best_provider = None
    best_value = float('inf') if optimization_goal == 'latency' else 0
    
    for provider, provider_metrics in metrics.items():
        if optimization_goal == 'latency':
            current_value = provider_metrics.get('last_latency_ms', float('inf'))
            if current_value < best_value:
                best_value = current_value
                best_provider = provider
        elif optimization_goal == 'throughput':
            current_value = provider_metrics.get('last_throughput_tokens_per_second', 0)
            if current_value > best_value:
                best_value = current_value
                best_provider = provider
    
    return best_provider

Select a Specific Version Native Tool Calling Support