-
Multilingual Instruction Tuning With Just a Pinch of Multilinguality
Paper • 2401.01854 • Published • 11 -
LLaMA Beyond English: An Empirical Study on Language Capability Transfer
Paper • 2401.01055 • Published • 54 -
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
Paper • 2401.01325 • Published • 27 -
Improving Text Embeddings with Large Language Models
Paper • 2401.00368 • Published • 84
Collections
Discover the best community collections!
Collections including paper arxiv:2403.19887
-
AppAgent: Multimodal Agents as Smartphone Users
Paper • 2312.13771 • Published • 54 -
GPT4Tools
🚀37 -
Jamba: A Hybrid Transformer-Mamba Language Model
Paper • 2403.19887 • Published • 113 -
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length
Paper • 2404.08801 • Published • 66
-
havenhq/mamba-chat
Updated • 75 • 101 -
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts
Paper • 2401.04081 • Published • 74 -
VMamba: Visual State Space Model
Paper • 2401.10166 • Published • 40 -
Jamba: A Hybrid Transformer-Mamba Language Model
Paper • 2403.19887 • Published • 113
-
Any-Size-Diffusion: Toward Efficient Text-Driven Synthesis for Any-Size HD Images
Paper • 2308.16582 • Published • 12 -
DreamSpace: Dreaming Your Room Space with Text-Driven Panoramic Texture Propagation
Paper • 2310.13119 • Published • 13 -
DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior
Paper • 2310.16818 • Published • 33 -
Text-to-3D with classifier score distillation
Paper • 2310.19415 • Published • 6
-
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling
Paper • 2312.15166 • Published • 62 -
PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU
Paper • 2312.12456 • Published • 45 -
Cached Transformers: Improving Transformers with Differentiable Memory Cache
Paper • 2312.12742 • Published • 13 -
Mini-GPTs: Efficient Large Language Models through Contextual Pruning
Paper • 2312.12682 • Published • 9
-
Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision
Paper • 2312.09390 • Published • 33 -
OneLLM: One Framework to Align All Modalities with Language
Paper • 2312.03700 • Published • 24 -
Generative Multimodal Models are In-Context Learners
Paper • 2312.13286 • Published • 36 -
The LLM Surgeon
Paper • 2312.17244 • Published • 9
-
Text-to-3D using Gaussian Splatting
Paper • 2309.16585 • Published • 33 -
FP8-LM: Training FP8 Large Language Models
Paper • 2310.18313 • Published • 33 -
Zephyr: Direct Distillation of LM Alignment
Paper • 2310.16944 • Published • 124 -
Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models
Paper • 2312.06585 • Published • 29
-
Multilingual Instruction Tuning With Just a Pinch of Multilinguality
Paper • 2401.01854 • Published • 11 -
LLaMA Beyond English: An Empirical Study on Language Capability Transfer
Paper • 2401.01055 • Published • 54 -
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
Paper • 2401.01325 • Published • 27 -
Improving Text Embeddings with Large Language Models
Paper • 2401.00368 • Published • 84
-
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling
Paper • 2312.15166 • Published • 62 -
PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU
Paper • 2312.12456 • Published • 45 -
Cached Transformers: Improving Transformers with Differentiable Memory Cache
Paper • 2312.12742 • Published • 13 -
Mini-GPTs: Efficient Large Language Models through Contextual Pruning
Paper • 2312.12682 • Published • 9
-
AppAgent: Multimodal Agents as Smartphone Users
Paper • 2312.13771 • Published • 54 -
GPT4Tools
🚀37 -
Jamba: A Hybrid Transformer-Mamba Language Model
Paper • 2403.19887 • Published • 113 -
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length
Paper • 2404.08801 • Published • 66
-
Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision
Paper • 2312.09390 • Published • 33 -
OneLLM: One Framework to Align All Modalities with Language
Paper • 2312.03700 • Published • 24 -
Generative Multimodal Models are In-Context Learners
Paper • 2312.13286 • Published • 36 -
The LLM Surgeon
Paper • 2312.17244 • Published • 9
-
havenhq/mamba-chat
Updated • 75 • 101 -
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts
Paper • 2401.04081 • Published • 74 -
VMamba: Visual State Space Model
Paper • 2401.10166 • Published • 40 -
Jamba: A Hybrid Transformer-Mamba Language Model
Paper • 2403.19887 • Published • 113
-
Text-to-3D using Gaussian Splatting
Paper • 2309.16585 • Published • 33 -
FP8-LM: Training FP8 Large Language Models
Paper • 2310.18313 • Published • 33 -
Zephyr: Direct Distillation of LM Alignment
Paper • 2310.16944 • Published • 124 -
Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models
Paper • 2312.06585 • Published • 29
-
Any-Size-Diffusion: Toward Efficient Text-Driven Synthesis for Any-Size HD Images
Paper • 2308.16582 • Published • 12 -
DreamSpace: Dreaming Your Room Space with Text-Driven Panoramic Texture Propagation
Paper • 2310.13119 • Published • 13 -
DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior
Paper • 2310.16818 • Published • 33 -
Text-to-3D with classifier score distillation
Paper • 2310.19415 • Published • 6