- Optimus: Warming Serverless ML Inference via Inter-Function . . .
model transformation for serverless ML inference, which delves into models within containers at a finer granularity of operations, designs a set of in-container meta-operators for both CNN and transformer model transformation, and devel-ops an eficient scheduling algorithm with linear complexity for a low-cost transformation strategy
- chenhongyu2048 LLM-inference-optimization-paper - GitHub
For example, LLMSys-PaperList contains many excellent articles, and is keeping updating (which I believe is the most important for a paperlist) Awesome-LLM-Inference and Awesome_LLM_Accelerate-PaperList are also worth reading Besides, awesome-AI-system works also very well And you can find other repositories in its content The log "Large Transformer Model Inference Optimization" helps me a
- A Survey on Inference Engines for Large Language Models . . .
Large language models (LLMs) are widely applied in chatbots, code generators, and search engines Workloads such as chain-of-thought, complex reasoning, and agent services significantly increase the inference cost by invoking the model repeatedly Optimization methods such as parallelism, compression, and caching have been adopted to reduce costs, but the diverse service requirements make it
- Advancing Serverless Computing for Scalable AI Model . . .
From the 31 selected works, we classify them into ML-, DL-, LLMs-based inference Subsequently, we further divide these works into 10 subcategories for detailed analysis Statistics of 10 trending topics in ML-, DL-, LLMs-based inference Deploy AI model inference systems with serverless paradigm on the cloud
- Vision-Language Models CheatSheet - Inferless
An all-in-one cheatsheet for vision-language models, including open-source models, inference toolkits, datasets, use cases, deployment strategies, optimization techniques, and ethical considerations for developers and organizations
- ServerlessLLM: Low-Latency Serverless Inference for Large . . .
This section offers a comprehensive evaluation of Server-lessLLM, covering three key aspects: (i) assessing the per-formance of our loading-optimized checkpoints and model manager, (ii) examining the eficiency and overheads associ-ated with live migration for LLM inference, and (iii) evaluat-ing ServerlessLLM against a large-scale serverless
|