docs
Basic Usage
Real-time Metrics

Real-time Metrics Endpoints

MakeHub provides access to real-time performance metrics for all supported AI models and providers. These metrics help you understand the current performance characteristics of different providers, which can be valuable for optimizing your routing decisions.

Metrics Endpoint

You can access real-time performance metrics through the /v1/metrics endpoint:

GET https://api.makehub.ai/v1/metrics

Query Parameters

ParameterTypeDescription
model_idstringRequired. The model ID to get metrics for (e.g., openai/gpt-4o)
provider_idstringOptional. Specific provider to get metrics for
n_last_minutesintegerOptional. Number of minutes to consider for metrics calculation (default: 3000)

Response Format

The response contains metrics for each provider that supports the requested model. For each provider, you'll receive the following metrics:

MetricDescription
avg_latency_3000min_msAverage initial response latency in milliseconds (over the last 3000 minutes by default)
avg_throughput_3000min_tokens_per_secondAverage throughput in tokens per second (over the last 3000 minutes by default)
last_latency_msLatency of the most recent request in milliseconds
last_throughput_tokens_per_secondThroughput of the most recent request in tokens per second
latency_variance_3000min_msVariance in latency measurements (indicates stability)
throughput_variance_3000min_tokens_per_secondVariance in throughput measurements (indicates stability)
dt_since_last_measurement_msTime since the last measurement was taken in milliseconds
rtt_from_makehub_msRound-trip time from MakeHub to the provider's API in milliseconds

Example Usage

Python Example

import requests
import json
 
# Configuration
API_BASE_URL = "https://api.makehub.ai" 
BEARER_TOKEN = "your_makehub_api_key"
MODEL_ID = "openai/gpt-4o"
N_LAST_MINUTES = 10  # Get metrics for the last 10 minutes
 
def get_metrics(model_id, provider_id=None):
    """Simple call to the /v1/metrics endpoint"""
    # Build the URL
    url = f"{API_BASE_URL}/v1/metrics?model_id={model_id}"
    if provider_id:
        url += f"&provider_id={provider_id}"
    if N_LAST_MINUTES:
        url += f"&n_last_minutes={N_LAST_MINUTES}"
    
    # Headers with Bearer Token
    headers = {
        "Authorization": f"Bearer {BEARER_TOKEN}",
        "Content-Type": "application/json"
    }
    
    # Execute GET request
    response = requests.get(url, headers=headers)
    
    if response.status_code == 200:
        return response.json()
    else:
        return {"error": response.text}
 
if __name__ == "__main__":
    result = get_metrics(MODEL_ID)
    print(json.dumps(result, indent=2, ensure_ascii=False))

TypeScript Example

import axios from 'axios';
 
async function getMetrics(modelId: string, providerId?: string, nLastMinutes?: number) {
  // Build the URL with query parameters
  let url = `https://api.makehub.ai/v1/metrics?model_id=${modelId}`;
  if (providerId) {
    url += `&provider_id=${providerId}`;
  }
  if (nLastMinutes) {
    url += `&n_last_minutes=${nLastMinutes}`;
  }
 
  // Make the request
  try {
    const response = await axios.get(url, {
      headers: {
        'Authorization': `Bearer your_makehub_api_key`,
        'Content-Type': 'application/json'
      }
    });
    
    return response.data;
  } catch (error) {
    console.error('Error fetching metrics:', error);
    return { error: error.message };
  }
}
 
// Example usage
getMetrics('openai/gpt-4o', undefined, 10)
  .then(data => console.log(JSON.stringify(data, null, 2)));

cURL Example

curl "https://api.makehub.ai/v1/metrics?model_id=openai/gpt-4o&n_last_minutes=10" \
  -H "Authorization: Bearer your_makehub_api_key" \
  -H "Content-Type: application/json"

Sample Response

Here's an example of what the response might look like for the model openai/gpt-4o:

{
  "azure-aoai": {
    "avg_latency_3000min_ms": 992.87,
    "avg_throughput_3000min_tokens_per_second": 99.92,
    "dt_since_last_measurement_ms": 20737.39,
    "last_latency_ms": 524.75,
    "last_throughput_tokens_per_second": 74.07,
    "latency_variance_3000min_ms": 57459061.50,
    "rtt_from_makehub_ms": 70.34,
    "throughput_variance_3000min_tokens_per_second": 1418.82
  },
  "azure-eastus2": {
    "avg_latency_3000min_ms": 786.43,
    "avg_throughput_3000min_tokens_per_second": 93.21,
    "dt_since_last_measurement_ms": 19895.76,
    "last_latency_ms": 597.00,
    "last_throughput_tokens_per_second": 98.80,
    "latency_variance_3000min_ms": 13982901.08,
    "rtt_from_makehub_ms": null,
    "throughput_variance_3000min_tokens_per_second": 1510.09
  },
  "azure-francecentral": {
    "avg_latency_3000min_ms": 570.10,
    "avg_throughput_3000min_tokens_per_second": 88.45,
    "dt_since_last_measurement_ms": 19895.00,
    "last_latency_ms": 517.34,
    "last_throughput_tokens_per_second": 81.32,
    "latency_variance_3000min_ms": 114192.80,
    "rtt_from_makehub_ms": 85.38,
    "throughput_variance_3000min_tokens_per_second": 266.00
  },
  "azure-swedencentral": {
    "avg_latency_3000min_ms": 590.05,
    "avg_throughput_3000min_tokens_per_second": 101.40,
    "dt_since_last_measurement_ms": 19894.22,
    "last_latency_ms": 711.30,
    "last_throughput_tokens_per_second": 70.82,
    "latency_variance_3000min_ms": 478525.62,
    "rtt_from_makehub_ms": 97.18,
    "throughput_variance_3000min_tokens_per_second": 829.45
  },
  "openai": {
    "avg_latency_3000min_ms": 24571.52,
    "avg_throughput_3000min_tokens_per_second": 63.32,
    "dt_since_last_measurement_ms": 30.16,
    "last_latency_ms": 47029.82,
    "last_throughput_tokens_per_second": 38.62,
    "latency_variance_3000min_ms": 19706887504.74,
    "rtt_from_makehub_ms": 3.01,
    "throughput_variance_3000min_tokens_per_second": 2211.73
  }
  ...
}

Using Metrics Data

Use the metrics to dynamically adjust your routing strategy based on current conditions.

def select_best_provider(model_id, optimization_goal='latency'):
    metrics = get_metrics(model_id)
    
    best_provider = None
    best_value = float('inf') if optimization_goal == 'latency' else 0
    
    for provider, provider_metrics in metrics.items():
        if optimization_goal == 'latency':
            current_value = provider_metrics.get('last_latency_ms', float('inf'))
            if current_value < best_value:
                best_value = current_value
                best_provider = provider
        elif optimization_goal == 'throughput':
            current_value = provider_metrics.get('last_throughput_tokens_per_second', 0)
            if current_value > best_value:
                best_value = current_value
                best_provider = provider
    
    return best_provider