How to use from
vLLM
Install from pip and serve model
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "yerevann/chemlactica-125m"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "yerevann/chemlactica-125m",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'
Use Docker
docker model run hf.co/yerevann/chemlactica-125m
Quick Links

Chemlactica-125m is a continually pretrained galactica-125m model for organic molecules. It is pretrained on 40B tokens covering 110M+ molecules from PubChem as well as their chemical properties (molecular weight, synthetic accessibility score, drug-likeness etc.) and similarities (Tanimoto distance between ECFP fingerprints).

Example prompts:

</s>[START_SMILES]CC(=O)OC1=CC=CC=C1C(=O)O[END_SMILES][SAS] will attempt to predict the synthetic accessibility score of the given molecule.

</s>[SAS]2.25[/SAS][SIMILAR]0.62 CC(=O)OC1=CC=CC=C1C(=O)O[/SIMILAR][START_SMILES] will attempt to generate a molecule that has 2.25 SAS score and has a 0.62 similarity score to the given molecule.

The model can be wrapped into an optimization loop to traverse the chemical space with evolving prompts. See the code on GitHub.

A preprint with the details of the model and an optimization algorithm built on top of this model that sets state-of-the-art on Practical Molecular Optimization and other benchmarks is available on arxiv.

Few notes:

  • All queries should start with </s> symbol.
  • All numbers are rounded to two decimal points.
  • All SMILES are canonicalized using rdkit.
  • Available tags: [CLOGP], [WEIGHT], [QED], [SAS], [TPSA], [RINGCOUNT], [SIMILAR]...

The model is part of the 3-model family: Chemlactica-125M, Chemlactica-1.3B and Chemma-2B.

We are looking forward to see the community using the model in new applications and contexts.

Downloads last month
26
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including yerevann/chemlactica-125m

Paper for yerevann/chemlactica-125m