AWS announces support for NVIDIA Inference Xfer Library (NIXL) with Elastic Fabric Adapter (EFA) to accelerate disaggregated large language model (LLM) inference on Amazon EC2. This integration enhances disaggregated inference serving through three key improvements: increased KV-cache throughput, reduced inter-token latency, and optimized KV-cache memory utilization.
NIXL with EFA enables high throughput, low-latency KV-cache transfer between prefill and decode nodes, and it enables efficient KV-cache movement between various storage layers. NIXL is interoperable with all EFA-enabled EC2 instances and integrates natively with frameworks including NVIDIA Dynamo, SGLang, and vLLM. Combined, NIXL with EFA enables flexible integration with your EC2 instance and framework of choice, providing performant disaggregated inference at scale.
AWS supports NIXL version 1.0.0 or higher with EFA installer version 1.47.0 or higher on all EFA-enabled EC2 instance types in all AWS regions at no additional cost. For more information, visit the EFA documentation.
Categories: marketing:marchitecture/networking-and-content-delivery
Source: Amazon Web Services

![Microsoft Teams for Mac: Improved access to account management [MC1256301] 2 pexels karolina grabowska 4199098](https://mwpro.co.uk/wp-content/uploads/2025/06/pexels-karolina-grabowska-4199098-150x150.webp)
![Microsoft 365 Copilot: Planner Agent available in Frontier [MC1256306] 3 pexels picjumbo com 55570 196645](https://mwpro.co.uk/wp-content/uploads/2024/08/pexels-picjumbo-com-55570-196645-150x150.webp)


