How Routing Works
MakeHub's core functionality is its intelligent routing system that directs your requests to the most appropriate AI model provider based on various criteria. This page explains the routing logic in detail to help you understand how your requests are processed.
Routing Decision Process
When you send a request to MakeHub, our routing engine follows this decision-making process:
- Model Validation: Verify that the requested model is supported
- Provider Filtering: Apply any provider restrictions specified in the request
- Performance Constraint Filtering: Apply any performance constraints (latency, throughput)
- Cost Optimization: If multiple providers remain eligible, select the most cost-effective option
- Fallback Mechanism: If the selected provider fails, try alternative providers
Fallback Mechanism
If the initially selected provider fails to process the request (due to rate limits, downtime, or other errors), MakeHub will automatically attempt to route the request to alternative providers.
Important: MakeHub applies the fallback mechanism for all error codes, except for OpenAI API errors. For OpenAI API errors, the original error is returned directly to the user without attempting to redirect to another provider.
Real-time Performance Metrics
MakeHub continuously monitors the performance of all providers, collecting metrics such as:
- Latency: Average response time for initial tokens
- Throughput: Tokens generated per second
- Error Rates: Percentage of requests that result in errors
- Availability: Provider uptime and responsiveness
These metrics are used to make informed routing decisions and are updated in real-time. You can access these metrics through our Real-time Metrics Endpoints.
Cost Optimization
When multiple providers meet all other criteria, MakeHub will route to the most cost-effective option. Cost calculations are based on:
- Input Tokens: Cost per token for the prompt
- Output Tokens: Cost per token for the generated response
- Total Cost: Combined input and output costs
Example Routing Scenarios
Scenario 1: Basic Routing
Request:
{
"model": "openai/gpt-4"
}
Routing Logic:
- Find all providers that offer GPT-4
- Select the cheapest provider
- If that provider fails, try the next cheapest provider
Scenario 2: Performance-Optimized Routing
Request:
{
"model": "anthropic/claude-3-opus",
"extra_query": {
"max_latency": "best"
}
}
Routing Logic:
- Find all providers that offer Claude 3 Opus
- Select the provider with the lowest current latency
- If that provider fails, there is no fallback (since we specifically requested the best latency)
Scenario 3: Complex Routing
Request:
{
"model": "mistral/mistral-large",
"extra_query": {
"providers": ["mistral", "openai"],
"max_latency": 300,
"min_throughput": 15
}
}
Routing Logic:
- Find providers that offer Mistral Large
- Filter to only include Mistral and OpenAI
- Further filter to providers with latency ≤ 300ms and throughput ≥ 15 tokens/sec
- From the remaining providers, select the cheapest
- If the first provider fails, try the second provider (if any)
Conclusion
MakeHub's routing system is designed to optimize for cost, performance, and reliability while giving you the flexibility to specify your own constraints. By understanding how the routing works, you can make more informed decisions about how to configure your requests for optimal results.