Text Generation
Transformers
PyTorch
English
qovaryx
pretraining
from-scratch
random-init
scratch-base
decoder
causal-lm
causal-language-model
transformer
compact-llm
compact-ai
consumer-gpu
local-ai
on-device
edge-ai
16gb-vram
rtx-4090
rtx-5090
rtx-5070-ti
small-language-model
tiny-llm
nanogpt
pretraining-template
mtp
multi-token-prediction
gqa
grouped-query-attention
swiglu
ternary-swiglu
bitnet
Mixture of Experts
mixture-of-experts
sparse-moe
low-rank
flash-attention-2
blackwell
sm120
own-base
sovereign-ai
custom_code
Instructions to use tjarvis91/qovaryx-350m-scratch-base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use tjarvis91/qovaryx-350m-scratch-base with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="tjarvis91/qovaryx-350m-scratch-base", trust_remote_code=True)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("tjarvis91/qovaryx-350m-scratch-base", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use tjarvis91/qovaryx-350m-scratch-base with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "tjarvis91/qovaryx-350m-scratch-base" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "tjarvis91/qovaryx-350m-scratch-base", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/tjarvis91/qovaryx-350m-scratch-base
- SGLang
How to use tjarvis91/qovaryx-350m-scratch-base with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "tjarvis91/qovaryx-350m-scratch-base" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "tjarvis91/qovaryx-350m-scratch-base", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "tjarvis91/qovaryx-350m-scratch-base" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "tjarvis91/qovaryx-350m-scratch-base", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use tjarvis91/qovaryx-350m-scratch-base with Docker Model Runner:
docker model run hf.co/tjarvis91/qovaryx-350m-scratch-base
| """HF PreTrainedModel wrapper around Qovaryx FinanceDecoder. | |
| Loads the random-init Qovaryx compact decoder via the standard HF | |
| AutoModelForCausalLM.from_pretrained API. trust_remote_code=True required. | |
| The underlying class is named FinanceDecoder for legacy reasons; the | |
| architecture is task-agnostic. | |
| """ | |
| from __future__ import annotations | |
| import torch | |
| from torch import nn | |
| from transformers import PreTrainedModel | |
| from transformers.modeling_outputs import CausalLMOutput | |
| from .configuration_qovaryx import QovaryxConfig | |
| from ._qovaryx_decoder import DecoderConfig, FinanceDecoder | |
| def _to_decoder_config(cfg: QovaryxConfig) -> DecoderConfig: | |
| return DecoderConfig( | |
| vocab_size=cfg.vocab_size, | |
| d_model=cfg.d_model, | |
| n_layer=cfg.n_layer, | |
| n_head=cfg.n_head, | |
| n_kv_head=cfg.n_kv_head, | |
| d_ff=cfg.d_ff, | |
| max_seq_len=cfg.max_seq_len, | |
| rope_base=cfg.rope_base, | |
| rms_eps=cfg.rms_eps, | |
| dropout=cfg.dropout, | |
| decision_head_classes=cfg.decision_head_classes, | |
| decision_head_dropout=cfg.decision_head_dropout, | |
| mtp_k=cfg.mtp_k, | |
| mtp_head_kind=cfg.mtp_head_kind, | |
| init_std=cfg.init_std, | |
| tie_word_embeddings=cfg.tie_word_embeddings, | |
| ffn_kind=cfg.ffn_kind, | |
| ffn_rank=cfg.ffn_rank, | |
| ffn_experts=cfg.ffn_experts, | |
| ffn_top_k=cfg.ffn_top_k, | |
| chart_patch_encoder_enabled=cfg.chart_patch_encoder_enabled, | |
| chart_image_size=cfg.chart_image_size, | |
| chart_patch_size=cfg.chart_patch_size, | |
| chart_channels=cfg.chart_channels, | |
| chart_embed_dropout=cfg.chart_embed_dropout, | |
| ) | |
| class QovaryxForCausalLM(PreTrainedModel): | |
| config_class = QovaryxConfig | |
| base_model_prefix = "qovaryx" | |
| supports_gradient_checkpointing = False | |
| def __init__(self, config: QovaryxConfig): | |
| super().__init__(config) | |
| self.decoder = FinanceDecoder(_to_decoder_config(config)) | |
| self.post_init() | |
| def _init_weights(self, module): | |
| # Already initialized by FinanceDecoder; leave alone. | |
| pass | |
| def get_input_embeddings(self): | |
| return self.decoder.embed | |
| def set_input_embeddings(self, value): | |
| self.decoder.embed = value | |
| def get_output_embeddings(self): | |
| return self.decoder.lm_head | |
| def forward( | |
| self, | |
| input_ids: torch.LongTensor, | |
| labels: torch.LongTensor = None, | |
| attention_mask=None, | |
| **kwargs, | |
| ): | |
| out = self.decoder(input_ids) | |
| logits = out.logits if hasattr(out, "logits") else out | |
| loss = None | |
| if labels is not None: | |
| shift_logits = logits[..., :-1, :].contiguous() | |
| shift_labels = labels[..., 1:].contiguous() | |
| loss = nn.functional.cross_entropy( | |
| shift_logits.view(-1, shift_logits.size(-1)), | |
| shift_labels.view(-1), | |
| ignore_index=-100, | |
| ) | |
| return CausalLMOutput(loss=loss, logits=logits) | |
| def prepare_inputs_for_generation(self, input_ids, **kwargs): | |
| return {"input_ids": input_ids} | |
| def can_generate(cls): | |
| return True | |