Instructions to use yodyamahesa/paper-recommendation-bert-kd with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use yodyamahesa/paper-recommendation-bert-kd with sentence-transformers:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("yodyamahesa/paper-recommendation-bert-kd")

sentences = [
"the development of the food and beverage culinary industry is growing very rapidly. making food and beverage business owners, especially restaurants, have to make the right decision to stay in a very strong competition, restaurant owners must be ready to always innovate and remain to be able to meet consumer needs through products that can attract customers and determine strategies promotions that can boost sales. stored transaction data has information that can be extracted by data mining techniques, for example knowing the pattern of sales in purchases by consumers. information about sales patterns can be used by o! fish restaurants to create more potential promotional strategies to boost sales by referring to items (menus) that are often purchased together. . to be able to find out the purchase patterns by consumers simultaneously, knowing what products are often purchased simultaneously can be used data mining techniques using a priori algorithms. a priori algorithm is used to generate association rules. information about the association’s rules in purchasing items (menus) by consumers can be used by o! fish restaurants to create more potential promotional strategies to boost sales by referring to a combination of items that are often purchased simultaneously. later the results of this study are in the form of a website-based application to analyze purchasing patterns (item association rules) by consumers where the purchase pattern can be used as recommendations in determining the promotion development strategy for o! fish restaurants.",
"analysis of histopathology slides is a critical step for many diagnoses, and in particular in oncology where it defines the gold standard. in the case of digital histopathological analysis, highly trained pathologists must review vast whole-slide-images of extreme digital resolution (100,000^2 pixels) across multiple zoom levels in order to locate abnormal regions of cells, or in some cases single cells, out of millions. the application of deep learning to this problem is hampered not only by small sample sizes, as typical datasets contain only a few hundred samples, but also by the generation of ground-truth localized annotations for training interpretable classification and segmentation models. we propose a method for disease available during training. even without pixel-level annotations, we are able to demonstrate performance comparable with models trained with strong annotations on the camelyon-16 lymph node metastases detection challenge. we accomplish this through the use of pre-trained deep convolutional networks, feature embedding, as well as learning via top instances and negative evidence, a multiple instance learning technique fromatp the field of semantic segmentation and object detection.",
"this paper develops recommendations for selecting and connecting three single-phase transformers as a neutral-deriving transformer bank to ground a 480v low-voltage power system supplied from a delta-connected source transformer. assumptions made are that the system has no 277v loads and the bank is used solely as a source of ground fault current.",
"new normal atau normal baru merupakan suatu cara hidup baru atau cara baru dalam menjalankan aktivitas hidup di tengah situasi pandemi coronavirus disease 2019 (covid-19) yang sedang melanda dunia saat ini. hal ini diperlukan untuk menjawab masalah kehidupan selama pandemi, dan sebuah adaptasi manusia untuk mempertahankan hidupnya. menjawab tantangan di era new normal, di mana masyarakat dihantui perasaan kecemasan dan mulai timbul sikap apriori maka diperlukan hubungan interpersonal yang dapat menciptakan nilai-nilai bertoleransi, simpati, dan empati. dalam hal inilah diperlukan pemimpin yang mampu menggerakkan dan mengarahkan semua anggotanya dalam satu ritme yang selaras. pemimpin yang berani hadir terdepan memberikan teladan di tengah-tengah masyarakat. berani menindak dengan tegas setiap pelanggaran protokol kesehatan, dan tentunya senantiasa memperhatikan kesejahteraan masyarakat. berkenaan dengan hal tersebut, maka seorang pemimpin dapat mengaktualisasikan ajaran asta brata dalam kitab rāmāyaṇa untuk menjawab tantangan bagi pemimpin di era new normal ini."
]
embeddings = model.encode(sentences)

similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [4, 4]

Notebooks
Google Colab
Kaggle

SentenceTransformer based on google-bert/bert-base-multilingual-uncased

This is a sentence-transformers model finetuned from google-bert/bert-base-multilingual-uncased. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: google-bert/bert-base-multilingual-uncased
Maximum Sequence Length: 512 tokens
Output Dimensionality: 768 dimensions
Similarity Function: Cosine Similarity

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'pemrosesan bahasa alami berhubungan dengan cara manusia berkomunikasi dengan mesin menggunakan bahasa yang digunakan oleh manusia sesuai dengan lingkungan sekitarnya untuk memberikan respon atau perintah. pada penelitian ini, dirancang aplikasi yang mampu merekayasa bahasa alami sebagai suatu perintah yang dapat dimengerti komputer untuk melakukan operasi transformasi geometri dua dimensi. masukan bagi aplikasi berupa kalimat perintah akan melalui proses scanner , parser , dan translator yang merupakan bagian dari pemrosesan bahasa alami. keluaran dari pemrosesan bahasa alami ini menghasilkan solusi bagi kalimat masukan disertai dengan tampilan grafik dua dimensi untuk transformasi geometri. pada tahap scanner , kalimat perintah akan dibaca dan dikelompokkan ke dalam token-token. pada tahap parser , token-token akan dikelompokkan menjadi struktur sintak sesuai dengan aturan produksi yang telah disusun. pada tahap translator , struktur sintak akan diterjemahkan ke dalam operasi transformasi geometri yang bersesuaian. keluaran dari aplikasi akan memberikan solusi akhir berupa titik koordinat akhir hasil transformasi. sistem dibuat dengan mempertimbangkan beberapa kalimat perintah operasi transformasi dan kemudian dicari struktur kalimat yang memiliki kesamaan untuk kemudian disusun aturan produksi agar sistem mampu menangani kalimat masukan. hasil pengujian menunjukkan bahwa sistem mampu menangani beberapa kalimat masukan mengenai empat perintah operasi transformasi geometri dasar selama kalimat yang menjadi masukan memiliki struktur yang sesuai dengan aturan produksi yang disusun.',
    'bleeding as a major cause of high maternal mortality rate in indonesia begins with anemia.the prevalence of anemia was found different in other countries. adolescence is a vulnerable age group to anemic. anemia in adolescence girls will have an impact on reproductive health. the purposeof this study to determine the relationship between the intake of energy nutrients, iron proteinm and menstrual pattern with the incidence of anemia in adolescent girls in kebumen regency in 2016. this research is an analytic observational with case control design.the sample in this study of 120 respondents.the study was conducted in may-june 2016. data were collected by questionnaire instruments and semi quantitative- food frequency quotionare ( sq-ffq).data analysis included univariate analysis of the frequency distribution of research variables, bivariate analysis withchi_square test, and multivariate logistic regression analysis. the results showed that there was a significant relationship between energy intake of p = (0.047), protein p = (0,000), iron p = (0.002), menstrual pattern p = (0.001) with anemia incidence in adolescent girls. multivariate analysis of logistic regression showed the most dominant variable on the occurrence of anemia was protein nutrient intake of or 4.255 in ci (1,850-9,784). kebumen district health office needs to socialize school intensive nutrition program intensively and comprehensively to reduce the incidence of adolescent anemia. the activities of socialization and provision of iron supplementation should be carried out continuously with good evaluation after implementation. keywords: anemia, adolescence girl, nutrient intake, menstrual pattern',
    'the purpose of this research is to examine the capital effect and land area toward the coffee farmers income in lewa jadi village, bandar district, bener meriah regency. this research based on quantitative approach. the sampel used is the communities of coffee farmers who the coffee ground about 73 respondents. the primary data as the instrument of collecting data utilized in this study. the primary data is obtained by distributing questonnaires to the coffee farmers. the data analysis used in this study is multiple linear regression test, classic assumption test, multicoliniarity test, heterokedasticity test, determinant test, t test, f test with spss version 16. the result of study shows that the variable of capital has the possitive and significant effect toward the coffee farmers income in lewa jadi village, bandar district, bener meriah regency. next, the variable of land area has the possitive and significant effect too on the income of coffee farmers in lewa jadi village, bandar district, bener meriah regency. keywords: capital, land area, income. abstrak penelitian ini bertujuan untuk menguji pengaruh modal dan luas lahan terhadap pendapatan petani kopi di desa lewa jadi kecamatan bandar kabupaten bener meriah. metodelogi yang digunakan adalah pendekatan kuantitatif.sampel yang digunakan adalah masyarakat petani kopi yang mempunyai lahan kopi sebanyak 73 responden.instrumen pengumpulan data menggunakan data primer. data primer diperoleh dengan cara penyebaran angket (kuisioner) kepada petani kopi. analisis data yang digunakan pada penelitian ini adalah uji regresi linear berganda, uji asumsi klasik, uji multikoleniaritas, uji heterokedastisitas, uji determinan, uji t, uji f dengan bantuan spss versi 16.hasil penelitian yang dilakukan menunjukan bahwa variabel modal berpengaruh positif dan signifikan terhadap pendapatan petani kopi di desa lewa jadi kecamatan bandar kabupaten bener meriah. dan variabel luas lahan berpengaruh positif dan signifikan terhadap pendapatan petani kopi di desa lewa jadi kecamatan bandar kabupaten bener meriah. kata kunci: modal, luas lahan, pendapatan.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

Size: 12,000 training samples
Columns: sentence1, sentence2, and score
Approximate statistics based on the first 1000 samples:
sentence1 sentence2 score
type string string float
details
min: 58 tokens
mean: 316.15 tokens
max: 512 tokens

min: 41 tokens
mean: 321.14 tokens
max: 512 tokens

min: 0.0
mean: 0.31
max: 1.0

	sentence1	sentence2	score
type	string	string	float
details	min: 58 tokens mean: 316.15 tokens max: 512 tokens	min: 41 tokens mean: 321.14 tokens max: 512 tokens	min: 0.0 mean: 0.31 max: 1.0

Samples:

sentence1	sentence2	score
anak berkebutuhan khusus dapat ditemui pada beberapa sekolah, baik sekolah reguler maupun non reguler. terkadang keberadaan anak berkebutuhan khusus disekolah tidak disadari oleh guru, karena kurangnya kompetensi guru untuk mengenali anak berkebutuhan khusus. apabila hal ini dibiarkan, maka akan sulit untuk menangani anak berkebutuhan khusus, karena kebiasaan anak sudah sulit untuk diubah. melalui penelitian ini menerapkan sebuah pendekatan baru menggunakan metode business intelligence dengan model klasifikasi: algoritma c4.5 dan naive bayes, metode ini digunakan untuk membantu proses deteksi dini untuk mengenali anak berkebutuhan khusus. algoritma c4.5 digunakan untuk menciptakan pola, sehingga didapatkan atribut yang paling berpengaruh sampai yang tidak terlalu berpengaruh dari dataset. nilai auc(area under curve) dan akurasi sebagai model evaluasi. dan model perbandingan yang digunakan yaitu metode parametrik, paired t-test. jenis berkebutuhan khusus yang digunakan sebagai kategori ...	[relationship of slope steepness to soil water content, soil ph, and performances of gerga orange at lebong regency]. in lebong regency, gerga orange is commonly grown in hilly areas and many of the crop stands were found on steep sloped land. objective of this study was to determine the pattern of relationship of slope steepness to soil water content, soil ph, and the overall plant performances. soil samples were collected from the area below the canopy of 300 gerga orange trees differing in the slope steepness for for soil water content (swc) and soil ph. the observation of plant performances were also made from the same tree as used for the soil properties observations. the analysis of regression indicated that relationship of slope steepness to both the observed soil properties and plant performances could be represented by the linear models suggesting that all the observed variables were reduced along with the increasing slope steepness.	`0.021910585414216525`
penelitian ini bertujuan untuk memahami sentimen dan perilaku penggemar coldplay di twitter. data tweet dikumpulkan, diolah, dan diklasifikasikan menggunakan metode naïve bayes. hasil penelitian ini diharapkan memberikan manfaat bagi coldplay, peneliti, dan industri musik dalam memahami perilaku penggemar dan meningkatkan strategi pemasaran. dalam penelitian ini, digunakan metode naive bayes untuk menganalisis perilaku pendukung coldplay di twitter. data awal dikumpulkan dengan mengumpulkan tweet yang mengandung kata kunci yang relevan selama periode waktu tertentu. kemudian, dilakukan teknik pemrosesan lanjutan seperti eliminasi stopword, normalisasi kata, dan stemming. tweet-tweet tersebut diklasifikasikan menjadi dua kategori berdasarkan sentimennya: positif dan negatif. dengan kemampuan untuk mengelola data dalam jumlah besar, metode naive bayes melakukan klasifikasi sentimen dengan memprediksi kategori melalui perhitungan probabilitas berdasarkan teorema bayes dan asumsi independe...	dalam era digital saat ini, twitter (sekarang dikenal sebagai "x") menjadi platform yang sangat populer untuk berbagi pendapat dan pengalaman pengguna terhadap produk atau layanan, termasuk platform e-commerce seperti shopee. tujuan penelitian ini adalah untuk menemukan sentimen positif dan negatif dari tweet yang berkaitan dengan shopee. untuk menganalisis sentimen, penelitian ini menggunakan naive bayes classifier. langkah pertama dalam penelitian ini adalah pengumpulan dataset tweet yang terkait dengan shopee. kemudian, data tweet dilakukan pre-processing untuk membersihkan dan mengubah formatnya untuk analisis sentimen. setelah pre-processing, dataset dibagi menjadi dua bagian: data pelatihan dan data pengujian. model naive bayes classifier dilatih dengan menghitung kemungkinan setiap fitur (atau kata) muncul dalam setiap kategori sentimen. data pelatihan digunakan untuk melatih model ini. hasil penelitian ini menunjukkan bahwa metode naive bayes classifier dengan rasio pembagian d...	`0.7025633357384152`
this study investigates the phonetic realization of consonant length in hungarian. it is hypothesized that spectral structure differences between obstruents and sonorants may lead to distinct strategies in expressing quantity contrast. to test this hypothesis, intervocalic nasals (/n ɲ/) and plosives (/t k/) were analyzed in spontaneous speech from 20 monolingual hungarian-speaking adults. linear mixed-effects models and decision trees were applied to explore the effect of quantity, consonant type, and their interaction on various acoustic parameters, such as the durations of the target consonants and neighboring vowels, relative durations, and geminate-to-singleton ratio. our findings indicate that nasals require more robust adjustments compared to plosives in the realization of the consonant length contrast. this study contributes to the understanding of phonetic variation in hungarian and the distribution of geminates across languages.	we present two novel approaches to phonetic speech segmentation. one is based on acoustical clustering plus dynamic time warping and the other is based on a boundary specific correction by means of a decision tree. the use of objective or perceptual evaluations is discussed. the novel approaches clearly outperform the objective results of the baseline system based on hmm. they get results similar to agreement between manual segmentations. we show how phonetic features can be successfully used for boundary detection together with hmms. finally, the need for perceptual tests in order to evaluate segmentation systems is pointed out.	`0.49608579207351444`

Loss: CosineSimilarityLoss with these parameters:

{
    "loss_fct": "torch.nn.modules.loss.MSELoss"
}

Evaluation Dataset

Unnamed Dataset

Size: 3,000 evaluation samples
Columns: sentence1, sentence2, and score
Approximate statistics based on the first 1000 samples:
sentence1 sentence2 score
type string string float
details
min: 45 tokens
mean: 316.94 tokens
max: 512 tokens

min: 57 tokens
mean: 313.85 tokens
max: 512 tokens

min: 0.0
mean: 0.33
max: 1.0

	sentence1	sentence2	score
type	string	string	float
details	min: 45 tokens mean: 316.94 tokens max: 512 tokens	min: 57 tokens mean: 313.85 tokens max: 512 tokens	min: 0.0 mean: 0.33 max: 1.0

Samples:

sentence1	sentence2	score
kontes robot indonesia (kri) adalah kompetisi penggambaran, perencanaan, dan pembuatan rekayasa dalam bidang robotika. salah satu divisi yang dilombakan yaitu kontes robot sepak bola beroda (krsbi beroda). salah satu strategi pertandingan untuk memenangkan pertandingan yaitu saling umpan antar robot. jadi robot diharuskan dapat melakukan identifikasi mana kawannya. untuk melakukan tracking bola dan kawan dibutuhkan sebuah sistem pendeteksian objek. pada penelitian ini, akan dikembangkan sistem tracking bola dan pendeteksian robot kawan dengan berbasis deep learning. metode deep learning yang digunakan yaitu metode cnn (convolutional neural network). pada penelitian ini akan menggunakan kamera omnidirectional dan kamera webcam logitech yang masing-masing akan digunakan untuk proses deteksi objek bola dan kawan. pendeteksian objek yang dilakukan menggunakan algoritma yolo yang arsitekturnya terdiri dari 24 layer kovolusi, 4 layer max pooling, dan 2 layer fully connected. pendeteksian obj...	background : stunting is a state of height index according to age under -2 sd according to who standards. nutrition problems in farmers can occur due to poverty which is the root of nutrition problems. the purpose of this study was to determine the factors associated with the incidence of stunting in children aged 24-59 months from farming families in the gunung labu primary health care in kerinci regency. method :the design of this study was cross sectional. the total population in this study was 1,422 toddlers, while the sample in this study was 98 toddlers from farming families. analysis used the chi-square test and multiple logistic regression. result :this study found the prevalence of stunting in infants 32.34%. factors related to the incidence of stunting in infants were household level food security and mother's education level. the most dominant factor related to the incidence of stunting in infants was household-level food security (or = 4,722; 95% ci = 1,599-13,941). househo...	`0.0021461383970958117`
risk behavior detection based on blob and trajectory is presented in this paper for intelligent substation. the features of substation are extracted and clustered to blobs by k-means algorithm. in order to get the description of the blobs, the blobs are divided into danger area and safety area by conditional random fields (crf) model. target is detected by optical flow, and then the trajectory of the object is obtained. the blobs description and the trajectory are combined to feature vector. this feature vector is modeled by hidden markov model (hmm). and use these models to detection risk behavior. the experimental results show that risk behavior in intelligent substation can be described accurately.	two human gastric cancer xenograft lines (gc-yn and gc-sf) transplanted in nude mice were employed to evaluate and compare the anticancer effect of seven single anticancer agents and their various combinations. mitomycin c, cisplatin (briplatin) (cddp) and 5-fluorouracil (5-fu) were screened out to be effective against gc-yn and only epirubicin (farmorubicin) (epir) was effective against gc-sf. combinations of two of these ‘effective' agents revealed that fp (5-fu + cddp) is the most effective two-agent combination regimen against both lines, and some of those ‘ineffective' single agents showed synergistic effects against both lines when combined with 5-fu. moreover, three-agent combinations composed of fp and one of the other five agents were also evaluated to select out the most effective regimen. all the combinations showed higher inhibition on the tumor growth of gc-yn than fp regimen, and fp + adriamycin (adriacin) (adr) and fp + epir were more effective against gc-sf than fp. how...	`0.0008308212427001122`
it is a common observation nowadays that the personal information of user is difficult to manage, the material which is copied by the users to their personal system are often forgotten by the users. so when they require their information it becomes very difficult to find the relevant information from huge repository. we have introduced a method using which the activities of user for reading documents are captured from running process list and managed in a dataset along with accessing time, then frequent item set and associated weights are calculated for each document with other using apriori algorithm and confidence measure in conjunction with combined access time. when user searches a document, the document list appears using any conventional model of retrieval, we have used primary metadata including title, author, type for document searching. beside this, a visual interface is designed to display the list correlated document on the basis of users activities may help them to indentif...	the impact of on jogarbelini@hotmail.com abstract the aim of this investigation was to evaluate the impact of multiple freezing and thawing cycles on the physicochemical properties of nile tilapia fillets. for this purpose, 72 fresh nile tilapia fillets were packed and stored in a freezer at -18 °c. the frozen samples were submitted to five freeze-thaw cycles; in each cycle, the freezer was switched off during 14 hours. the consecutive freeze-thaw cycles resulted in a fillet’s total weight loss of 9.48%, with a quadratic regression (p < 0.0001) for thaw loss, with a greater loss percentage in cycle 3 (2.68%). ph values differed between the cycles (p < 0.0001), being observed an increment in this parameter only from cycle 4. the lipid oxidation remained constant in cycles 1, 2, 4 and 5, however in cycle 3 the lowest value (p < 0.0002) was observed. the luminosity, and intensity of the red and yellow colours increased linearly (p < 0.0001) as the cycles increased. thereby, the tilapia fi...	`0.010447352850014332`

Loss: CosineSimilarityLoss with these parameters:

{
    "loss_fct": "torch.nn.modules.loss.MSELoss"
}

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: epoch
per_device_train_batch_size: 16
per_device_eval_batch_size: 16
num_train_epochs: 5
warmup_ratio: 0.1
bf16: True

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: epoch
prediction_loss_only: True
per_device_train_batch_size: 16
per_device_eval_batch_size: 16
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 5e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 5
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.1
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: True
fp16: False
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: False
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: None
hub_always_push: False
hub_revision: None
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
include_for_metrics: []
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
liger_kernel_config: None
eval_use_gather_object: False
average_tokens_across_devices: False
prompts: None
batch_sampler: batch_sampler
multi_dataset_batch_sampler: proportional

Training Logs

Epoch	Step	Training Loss	Validation Loss
1.0	750	0.0197	0.0086
2.0	1500	0.0053	0.0059
3.0	2250	0.0027	0.0042
4.0	3000	0.0014	0.0039
5.0	3750	0.0007	0.0036

Framework Versions

Python: 3.12.3
Sentence Transformers: 4.1.0
Transformers: 4.53.0
PyTorch: 2.6.0+cu126
Accelerate: 1.8.1
Datasets: 3.6.0
Tokenizers: 0.21.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

Downloads last month: 5

Safetensors

Model size

0.2B params

Tensor type

F32

Model tree for yodyamahesa/paper-recommendation-bert-kd

Base model

google-bert/bert-base-multilingual-uncased

Finetuned

(1845)

this model

Paper for yodyamahesa/paper-recommendation-bert-kd

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

Paper • 1908.10084 • Published Aug 27, 2019 • 14