-
FlashDecoding++: Faster Large Language Model Inference on GPUs
Paper • 2311.01282 • Published • 38 -
Unifying the Perspectives of NLP and Software Engineering: A Survey on Language Models for Code
Paper • 2311.07989 • Published • 26 -
When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method
Paper • 2402.17193 • Published • 26 -
Training Language Models to Self-Correct via Reinforcement Learning
Paper • 2409.12917 • Published • 140
Collections
Discover the best community collections!
Collections including paper arxiv:2311.01282
-
S^{3}: Increasing GPU Utilization during Generative Inference for Higher Throughput
Paper • 2306.06000 • Published • 1 -
Fast Distributed Inference Serving for Large Language Models
Paper • 2305.05920 • Published • 1 -
Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline
Paper • 2305.13144 • Published • 1 -
Towards MoE Deployment: Mitigating Inefficiencies in Mixture-of-Expert (MoE) Inference
Paper • 2303.06182 • Published • 2
-
When can transformers reason with abstract symbols?
Paper • 2310.09753 • Published • 3 -
In-Context Pretraining: Language Modeling Beyond Document Boundaries
Paper • 2310.10638 • Published • 30 -
Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model
Paper • 2310.09520 • Published • 11 -
Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers
Paper • 2309.08532 • Published • 53
-
Detecting Pretraining Data from Large Language Models
Paper • 2310.16789 • Published • 11 -
Let's Synthesize Step by Step: Iterative Dataset Synthesis with Large Language Models by Extrapolating Errors from Small Models
Paper • 2310.13671 • Published • 19 -
AutoMix: Automatically Mixing Language Models
Paper • 2310.12963 • Published • 14 -
An Emulator for Fine-Tuning Large Language Models using Small Language Models
Paper • 2310.12962 • Published • 13
-
Large Language Models for Compiler Optimization
Paper • 2309.07062 • Published • 25 -
Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time
Paper • 2310.17157 • Published • 14 -
FP8-LM: Training FP8 Large Language Models
Paper • 2310.18313 • Published • 33 -
Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
Paper • 2310.19102 • Published • 11
-
MADLAD-400: A Multilingual And Document-Level Large Audited Dataset
Paper • 2309.04662 • Published • 26 -
Neurons in Large Language Models: Dead, N-gram, Positional
Paper • 2309.04827 • Published • 18 -
Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs
Paper • 2309.05516 • Published • 12 -
DrugChat: Towards Enabling ChatGPT-Like Capabilities on Drug Molecule Graphs
Paper • 2309.03907 • Published • 12
-
FlashDecoding++: Faster Large Language Model Inference on GPUs
Paper • 2311.01282 • Published • 38 -
Unifying the Perspectives of NLP and Software Engineering: A Survey on Language Models for Code
Paper • 2311.07989 • Published • 26 -
When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method
Paper • 2402.17193 • Published • 26 -
Training Language Models to Self-Correct via Reinforcement Learning
Paper • 2409.12917 • Published • 140
-
Detecting Pretraining Data from Large Language Models
Paper • 2310.16789 • Published • 11 -
Let's Synthesize Step by Step: Iterative Dataset Synthesis with Large Language Models by Extrapolating Errors from Small Models
Paper • 2310.13671 • Published • 19 -
AutoMix: Automatically Mixing Language Models
Paper • 2310.12963 • Published • 14 -
An Emulator for Fine-Tuning Large Language Models using Small Language Models
Paper • 2310.12962 • Published • 13
-
S^{3}: Increasing GPU Utilization during Generative Inference for Higher Throughput
Paper • 2306.06000 • Published • 1 -
Fast Distributed Inference Serving for Large Language Models
Paper • 2305.05920 • Published • 1 -
Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline
Paper • 2305.13144 • Published • 1 -
Towards MoE Deployment: Mitigating Inefficiencies in Mixture-of-Expert (MoE) Inference
Paper • 2303.06182 • Published • 2
-
Large Language Models for Compiler Optimization
Paper • 2309.07062 • Published • 25 -
Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time
Paper • 2310.17157 • Published • 14 -
FP8-LM: Training FP8 Large Language Models
Paper • 2310.18313 • Published • 33 -
Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
Paper • 2310.19102 • Published • 11
-
When can transformers reason with abstract symbols?
Paper • 2310.09753 • Published • 3 -
In-Context Pretraining: Language Modeling Beyond Document Boundaries
Paper • 2310.10638 • Published • 30 -
Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model
Paper • 2310.09520 • Published • 11 -
Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers
Paper • 2309.08532 • Published • 53
-
MADLAD-400: A Multilingual And Document-Level Large Audited Dataset
Paper • 2309.04662 • Published • 26 -
Neurons in Large Language Models: Dead, N-gram, Positional
Paper • 2309.04827 • Published • 18 -
Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs
Paper • 2309.05516 • Published • 12 -
DrugChat: Towards Enabling ChatGPT-Like Capabilities on Drug Molecule Graphs
Paper • 2309.03907 • Published • 12