A comprehensive benchmarking tool that tests how well different Language Models adhere to structured output formats across multiple providers (OpenAI, Anthropic, Google, Groq, OpenRouter). 1 One-shot ...