Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
- Website
- Community
- Solutions
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2403.19887

Multilingual Instruction Tuning With Just a Pinch of Multilinguality

Paper • 2401.01854 • Published Jan 3, 2024 • 11
LLaMA Beyond English: An Empirical Study on Language Capability Transfer

Paper • 2401.01055 • Published Jan 2, 2024 • 54
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning

Paper • 2401.01325 • Published Jan 2, 2024 • 27
Improving Text Embeddings with Large Language Models

Paper • 2401.00368 • Published Dec 31, 2023 • 84

AppAgent: Multimodal Agents as Smartphone Users

Paper • 2312.13771 • Published Dec 21, 2023 • 54
Runtime error

Agents

37

GPT4Tools

🚀

37
Jamba: A Hybrid Transformer-Mamba Language Model

Paper • 2403.19887 • Published Mar 28, 2024 • 113
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

Paper • 2404.08801 • Published Apr 12, 2024 • 66

havenhq/mamba-chat

Updated Dec 8, 2023 • 75 • 101
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts

Paper • 2401.04081 • Published Jan 8, 2024 • 74
VMamba: Visual State Space Model

Paper • 2401.10166 • Published Jan 18, 2024 • 40
Jamba: A Hybrid Transformer-Mamba Language Model

Paper • 2403.19887 • Published Mar 28, 2024 • 113

any size diffusion

Any-Size-Diffusion: Toward Efficient Text-Driven Synthesis for Any-Size HD Images

Paper • 2308.16582 • Published Aug 31, 2023 • 12
DreamSpace: Dreaming Your Room Space with Text-Driven Panoramic Texture Propagation

Paper • 2310.13119 • Published Oct 19, 2023 • 13
DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior

Paper • 2310.16818 • Published Oct 25, 2023 • 33
Text-to-3D with classifier score distillation

Paper • 2310.19415 • Published Oct 30, 2023 • 6

SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling

Paper • 2312.15166 • Published Dec 23, 2023 • 62
PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU

Paper • 2312.12456 • Published Dec 16, 2023 • 45
Cached Transformers: Improving Transformers with Differentiable Memory Cache

Paper • 2312.12742 • Published Dec 20, 2023 • 13
Mini-GPTs: Efficient Large Language Models through Contextual Pruning

Paper • 2312.12682 • Published Dec 20, 2023 • 9

Interesting Papers

Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision

Paper • 2312.09390 • Published Dec 14, 2023 • 33
OneLLM: One Framework to Align All Modalities with Language

Paper • 2312.03700 • Published Dec 6, 2023 • 24
Generative Multimodal Models are In-Context Learners

Paper • 2312.13286 • Published Dec 20, 2023 • 36
The LLM Surgeon

Paper • 2312.17244 • Published Dec 28, 2023 • 9

Text-to-3D using Gaussian Splatting

Paper • 2309.16585 • Published Sep 28, 2023 • 33
FP8-LM: Training FP8 Large Language Models

Paper • 2310.18313 • Published Oct 27, 2023 • 33
Zephyr: Direct Distillation of LM Alignment

Paper • 2310.16944 • Published Oct 25, 2023 • 124
Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models

Paper • 2312.06585 • Published Dec 11, 2023 • 29

Multilingual Instruction Tuning With Just a Pinch of Multilinguality

Paper • 2401.01854 • Published Jan 3, 2024 • 11
LLaMA Beyond English: An Empirical Study on Language Capability Transfer

Paper • 2401.01055 • Published Jan 2, 2024 • 54
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning

Paper • 2401.01325 • Published Jan 2, 2024 • 27
Improving Text Embeddings with Large Language Models

Paper • 2401.00368 • Published Dec 31, 2023 • 84

SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling

Paper • 2312.15166 • Published Dec 23, 2023 • 62
PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU

Paper • 2312.12456 • Published Dec 16, 2023 • 45
Cached Transformers: Improving Transformers with Differentiable Memory Cache

Paper • 2312.12742 • Published Dec 20, 2023 • 13
Mini-GPTs: Efficient Large Language Models through Contextual Pruning

Paper • 2312.12682 • Published Dec 20, 2023 • 9

AppAgent: Multimodal Agents as Smartphone Users

Paper • 2312.13771 • Published Dec 21, 2023 • 54
Runtime error

Agents

37

GPT4Tools

🚀

37
Jamba: A Hybrid Transformer-Mamba Language Model

Paper • 2403.19887 • Published Mar 28, 2024 • 113
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

Paper • 2404.08801 • Published Apr 12, 2024 • 66

Interesting Papers

Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision

Paper • 2312.09390 • Published Dec 14, 2023 • 33
OneLLM: One Framework to Align All Modalities with Language

Paper • 2312.03700 • Published Dec 6, 2023 • 24
Generative Multimodal Models are In-Context Learners

Paper • 2312.13286 • Published Dec 20, 2023 • 36
The LLM Surgeon

Paper • 2312.17244 • Published Dec 28, 2023 • 9

havenhq/mamba-chat

Updated Dec 8, 2023 • 75 • 101
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts

Paper • 2401.04081 • Published Jan 8, 2024 • 74
VMamba: Visual State Space Model

Paper • 2401.10166 • Published Jan 18, 2024 • 40
Jamba: A Hybrid Transformer-Mamba Language Model

Paper • 2403.19887 • Published Mar 28, 2024 • 113

Text-to-3D using Gaussian Splatting

Paper • 2309.16585 • Published Sep 28, 2023 • 33
FP8-LM: Training FP8 Large Language Models

Paper • 2310.18313 • Published Oct 27, 2023 • 33
Zephyr: Direct Distillation of LM Alignment

Paper • 2310.16944 • Published Oct 25, 2023 • 124
Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models

Paper • 2312.06585 • Published Dec 11, 2023 • 29

any size diffusion

Any-Size-Diffusion: Toward Efficient Text-Driven Synthesis for Any-Size HD Images

Paper • 2308.16582 • Published Aug 31, 2023 • 12
DreamSpace: Dreaming Your Room Space with Text-Driven Panoramic Texture Propagation

Paper • 2310.13119 • Published Oct 19, 2023 • 13
DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior

Paper • 2310.16818 • Published Oct 25, 2023 • 33
Text-to-3D with classifier score distillation

Paper • 2310.19415 • Published Oct 30, 2023 • 6

Previous
1
...
3
4
5
Next

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs