Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup
Paper • 2101.06983 • Published • 2
How to use lochhonest/modernbert-finetuned-for-sas with sentence-transformers:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("lochhonest/modernbert-finetuned-for-sas")
sentences = [
"What is the meaning of the pattern code 128 in the table?",
"epevents\nThe following table may serve as a quick reference to select certain\npattern types of recognized events (i.e. away from CCD edges, bad pixels\netc.):\n\n ‘PATTERN‘ Meaning\n ----------- ---------------------------------------------------\n 0 singles\n 1 doubles in Y with Y(main)<Y(secondary)\n 2 doubles in X with X(main)<X(secondary)\n 3 doubles in Y with Y(main)>Y(secondary)\n 4 doubles in X with X(main)>X(secondary)\n 5–8 triples\n 9–12 quadruples\n 128 singles at CCD window (RAWX=1, RAWX=64, RAWY=200)\n or close to bad pixels\n 205 doubles at CCD window or bad pixels\n 206 triples at CCD window or bad pixels\n 207 quadruples at CCD window or bad pixels\n\nNote: as of version 6.30.4 PATTERN values of 128 have been changed to 0\n(i.e. $8^{\\rm th}$ bit is not set anymore for singles), and PATTERN\nvalues of 205 have been changed to 1–4 (i.e. $7^{\\rm th}$ and\n$8^{\\rm th}$ bit are not set anymore for doubles)!\n\nSecondary events of those valid doubles, triples, and quadruples\n($`PATTERN`=1..12$) have ${\\rm PATTERN(main)} + 64$ (as listed above),\ni.e. $7^{\\rm th}$ bit set.\n\n1. For the pattern codes in ‘PAT_ID‘ and ‘PATTERN‘ the following\n bit-wise storing is used:\n\n ‘PAT_ID‘ \n ---------- ------- -------------------------------------------\n bit value Meaning\n 16 32768 free for additional pattern related flag\n 15 16384 – \" –\n 14 8192 – \" –\n 13 4096 PAT_ORI first digit (x-coordinate)\n 12 2048 – \" –\n 11 1024 PAT_ORI second digit (y-coordinate)\n 10 512 – \" –\n 9 256 PAT_IND: 1, ..., < 512 (telemetry limit)\n ... ... – \" –\n 1 1 – \" –\n\n ‘PATTERN‘ \n ----------- ------- ------------------------- ----------------------------------\n bit value Meaning \n 8 128 sign of PAT_TYP \n 7 64 sign of PAT_IND \n 6 32 used to flag PAT_TYP > 4\n 5 16 if bit 6, then use next 5 bits\n 4 8 MOS code numbers 0 - 12 to store PAT_TYP - 5\n 3 4 a combination of hence, max storage: PAT_TYP = 36\n 2 2 PAT_TYP ≤ 4 and – \" –\n 1 1 PAT_ORI – \" –\n\n Note: as of version 6.30.4 PATTERN values of 128 have been changed\n to 0 (i.e. $8^{\\rm th}$ bit is not set anymore for singles), and\n PATTERN values of 205 have been changed to 1–4 (i.e. $7^{\\rm th}$\n and $8^{\\rm th}$ bit are not set anymore for doubles)!\n\n2. Creation of event quality flags in column ‘FLAG‘. Task: epevents\n makes use of the common MOS/pn event related flag code (see ) and\n uses the following bits (other flags are set by the Task: epframes\n task):\n\n ‘FLAG‘ \n -------- --------- -------------------------------------\n bit value Meaning (information)\n 1 0x2 ‘INVALID_PATTERN‘\n 2 0x4 ‘CLOSE_TO_CCD_WINDOW‘\n 5 0x20 ‘CLOSE_TO_ONBOARD_BADPIX‘\n 6 0x40 ‘CLOSE_TO_BRIGHTPIX‘ (not on-board)\n 8 0x100 ‘CLOSE_TO_DEADPIX‘ (not on-board)\n 16 0x10000 ‘OUT_OF_FOV‘ \n\n bit value Meaning (rejection)\n ------- ---------- ---------------------\n 19 0x80000 ‘COSMIC_RAY‘\n 21 0x200000 ‘ON_BADPIX‘\n 22 0x400000 ‘SECONDARY‘\n 23 0x800000 ‘TRAILING‘\n total 0xfa0000 EPN rejection mask\n",
"rgssources\nThe source data can come from several sources:\n\n- A source list from a previous run of Task: rgssources (note that\n from version 5.1, Task: rgssources is now compatible with all\n earlier source list formats).\n\n- The proposed target source.\n\n- The attitude of the spacecraft.\n\n- A source list output by either Task: emldetect or Task: eboxdetect.\n\n- A source position supplied on the command line by the user.\n\nThese are described individually below.\n",
"rgssources\n## Parameters\n\n \\label{rgssources:description:parameters}\n \n **filemode}\t{modify** (Optional): no\n(Type: \n Controls whether the task opens a previous source list for editing or creates a new one.\n }\n \\optparm{changeprime}\t{no}\t{boolean}\t{yes|no, Default: string}\t{modify|create, Range: \n Only active in `filemode`=`modify'. Unless this parameter is set, the previous prime source index number is retained.\n }\n \\optparm{changeattitude)\t{boolean}\t{yes|no}{\n Only active in `filemode`=`modify'. Unless this parameter is set, the previous attitude (stored in the header) is retained.\n }\n **srclist}\t{rgsset.ds** (Mandatory): yes\n(Type: \n The name of the rgs source list. If `filemode`=`create', the output is written to this file. If there is an existing file of this name, it will be overwritten unless SAS\\_CLOBBER is unset. If `filemode`=`modify', the task looks for an existing source list of this name and modifies it.\n }\n **instexpid}\t{}\t{string}\t{, Default: dataset}\t{, Range: \n This parameter contains information about both the instrument (that is, RGS1 or 2) and the exposure identifier (a letter S or U, indicating scheduled or unscheduled, followed by a three-digit numeric identifier. The `instexpid` string can be supplied in a number of different forms, but the two most useful are (i) as a six-character string comprising either R1 or R2 followed by the exposure identifier (an example: `R2S003'); (ii) the name of any of RGS-specific files in the ODF can also be used. This parameter is mandatory if `filemode`=`create', or in cases where the instrument and/or exposure can neither be read from the file header or deduced from its name.\n }\n \\optparm{writeobskwds)\t{boolean}\t{yes|no** (Optional): no\n(Type: yes}\t{boolean}\t{yes|no, Default: \n If this is set, the task attempts to write observation-specific keywords to the file header. The user must point the environment variable SAS\\_ODF to the ODF directory for this to succeed.\n }\n \\optparm{writeexpkwds, Range: \n If this is set, the task attempts to write exposure-specific keywords to the file header. For this to succeed, the user must point the environment variable SAS\\_ODF to the ODF directory, and the task must also be able to determine the exposure number, either via the `instexpid` parameter, or from the `EXPIDSTR` keyword in the file header, or (if neither are present) from the file name.\n }\n \\optparm{clobberonlabel)\t{boolean}\t{yes|no}{\n Labels in RGS source lists are required to be unique. Where a clash is detected between a source already in the list and a new candidate source, the task takes one of two actions, depending on the value of this parameter: if `yes', the candidate is discarded; if `no', the task halts with an error.\n }\n\n **primestyle}\t{label}\t{string** (Optional): \n If `primestyle\n(Type: \n Only active if \\param{changeprime`=yes and either `addusersource` or `userasprime`=no. It controls the way in which the prime source is specified. See the parameters `primelabel` and `primeindex`. (An additional possible value of `expression' is planned.)\n }\n \\optparm{primelabel}\t{PROPOSAL, Default: label|index|expr|brightest|auto, Range: string}\t{) is active and set to `label', this parameter gives the value of the `LABEL` column of the source that it is desired the `PRIMESRC` keyword should point to.\n }\n **primeindex}\t{1}\t{integer}\t{$0<$primeindex** (Optional): expmedian\n(Type: }\t{string}\t{, Default: \n If `primestyle` is active and set to `index', the `PRIMESRC` keyword is set to this value.\n }\n \\optparm{primeexpression, Range: \n This mode is not yet supported.\n }\n\n \\optparm{attstyle)\t{string}{mean|median|start|user|expmedian}{\n Controls the way the attitude is calculated. If `mean', the attitude is calculated from the mean of the values in the attitude history file. If `median', the median of these values is used. If the value is `start', the task uses the attitude at the start of the exposure as the reference attitude. A value of `expmedian' tells the task to use the median of the attitude during the exposure only, as calculated by Task: attfilter. The final value, `user', allows the user to input the numbers him/herself via the next three parameters.\n }\n **meanset}\t{atthk.dat** (Optional): \n The name of the attitude history file. This file is a necessary input in the case that `attstyle\n(Type: \n The name of the attitude history file. This file is a necessary input in the case that \\param{attstyle` is `mean'.\n }\n \\optparm{medianset}\t{atthk.dat, Default: dataset}\t{, Range: dataset}\t{) is `median'.\n }\n **attra}\t{0}\t{angle}\t\t{$0\\le$`attra`$\\le 360$** (Mandatory): attgti.ds:STDGTI\n(Type: \n Only active if `attstyle`=`user'. The declination of the attitude, in decimal degrees.\n }\n \\mandparm{attapos}\t{0}\t{angle}\t{$0\\le$`attapos`$\\le 360$, Default: \n Only active if `attstyle`=`user'. The right ascension of the attitude, in decimal degrees.\n }\n \\mandparm{attdec}\t{0}\t{angle}\t{$-90\\le$`attdec`$\\le 90$, Range: \n Only active if `attstyle`=`user'. The position angle of the attitude, in decimal degrees.\n }\n **expmediantable){table** (Optional): \n This should be set if the user wishes to add a source to the list with a position specified on the command line.\n \n(Type: \n The name of the table in the filtered attitude history file in which the exposure-median keywords can be found. This file is a necessary input in the case that `attstyle` is `expmedian'.\n }\n\n \\optparm{addusersource, Default: , Range: no}\t{boolean}\t{yes|no)\n **label}\t{USER}\t{string}\t{** (Optional): \n Only active if `addusersource\n(Type: \n Only active if \\param{addusersource`=yes. The brightness of the source in counts per second. It is anticipated that this parameter won't be used much, since this is not a quantity that is likely to be known in most circumstances. The default value of 0.0 is harmless.\n }\n \\optparm{userasprime}\t{no}\t{boolean}\t{yes|no, Default: \n Only active if `addusersource`=yes. This is written directly to the `LABEL` column of the output source list. The empty string is not permitted.\n }\n \\optparm{rate}\t{0.0}\t{real}\t\t{$0.0<$rate, Range: \n Only active if `addusersource`=yes. If `changeprime`=yes and `userasprime`=yes, then the attribute `PRIMESRC` is set to the index number of the user source.\n }\n \\optparm{process}\t{no}\t{boolean}\t{yes|no)=yes. This causes the value in the `PROCESS` column to be set to true for the user-added source.\n }\n **bkgexclude}\t{yes}\t{boolean}\t{yes|no** (Optional): \n Only active if `addusersource\n(Type: radec, Default: \n Only active if \\param{addusersource`=yes. This causes the value in the `BKG\\_EXCLUDE` column to be set to true for the user-added source.\n }\n \\optparm{positionstyle, Range: string}\t{radec|wrtatt)=yes. If `positionstyle`=`radec', then the position of the user-added source is expected via the parameters `ra` and `dec`. If on the other hand `positionstyle`=`wrtatt' (With Respect To ATTitude), then the position of the user-added source is expected via the parameters `deltadisp` and `deltaxdsp`.\n }\n **ra}\t\t{0}\t{angle}\t{$0\\le$`ra`$\\le 360$** (Mandatory): \n Only active if `addusersource\n(Type: \n Only active if \\param{addusersource`=yes and `positionstyle`=`radec'. The declination of the user-added source, in decimal degrees.\n }\n \\mandparm{deltaxdsp}\t{0.0}\t{real}\t\t{, Default: \n Only active if `addusersource`=yes and `positionstyle`=`radec'. The right ascension of the user-added source, in decimal degrees.\n }\n \\mandparm{dec}\t{0}\t{angle}\t{$-90\\le$`dec`$\\le 90$, Range: \n Only active if `addusersource`=yes and `positionstyle`=`wrtatt'. The displacement in arcminutes of the user-added source from the pointing direction, in the dispersion direction.\n }\n \\mandparm{deltadisp}\t{0.0}\t{real}\t\t{)=yes and `positionstyle`=`wrtatt'. The displacement in arcminutes of the user-added source from the pointing direction, in the cross-dispersion direction.\n }\n\n **withepicset}\t{no}\t{boolean}\t{yes|no** (Optional): string\n(Type: \n The name of a set containing a list of sources. Formats output by the tasks Task: emldetect and Task: eboxdetect are accepted.\n }\n \\optparm{epiclabelprefix, Default: \n If this is set, the task looks for the parameter `epicset`, giving the name of an EPIC source list.\n }\n \\optparm{epicset}\t{}\t{dataset}\t{, Range: EPIC)\t{}{\n This parameter gives the string which is used by the task as a prefix when constructing `LABEL` values for EPIC-derived sources. The other part of the `LABEL` is the number `ML\\_ID\\_SRC` or `BOX\\_ID\\_SRC`. The main purpose of this parameter is to allow several EPIC-derived source lists to be included in the one RGS list if desired, while retaining unique labels.\n }\n **doconfusion}\t{no}\t{boolean}\t{yes|no** (Optional): \n Active only if `withepicset\n(Type: 3.5,1.0,1.0, Default: \n Active only if \\param{withepicset`=true. This parameter causes the task to check the epic sources + proposal position for confusion in the EPIC field of view. It is mainly designed for use in the PCMS, to prevent automatic extraction of too many spectra for what is essentially the same object. The degree of confusion depends on the size of the PSF, which is a function of energy. Therefore, strictly speaking, it depends on the selection of the energy band of interest (`bandids`). At the moment, however, the a-priori energy of $(0.5+2)/2 = 1.25$~keV is unconditionally used for it, whatever `bandids` is.\n }\n \\optparm{instweights, Range: real list}\t{)=true. This parameter gives the list of weighting factors for EPIC instruments for the use of calculation of RATE, where the order is the normal ID\\_INST number (i.e., pn, MOS1 and 2). The resultant RATE in the output RGS source list is normalised to 1.0 in the list, namely in default, it is normalised to the RATE of MOS1 (or 2).\n }\n **flagepicsrcoutoffov** (Optional): \n If this is set, the task carries out filtering, where only those sources, the position of which corresponds to cross-dispersion angles on the RGS camera between $-$2.9 and +2.9 arcminutes from camera centre, are regarded as a good source. If `withepicset\n(Type: \n Active only if \\param{withepicset`. If this is set, the input EPIC sources falling outside the FOV (see the description of `enablefilter` for definition) are flagged and are not dropped from the output source list due to that reason. If not (default), either they are dropped from the source list (if `enablefilter`=true) or nothing is done. See the description of `enablefilter` for the summary of the behaviour.\n }\n \\optparm{enablefilter, Default: no}\t{boolean}\t{yes|no, Range: no}\t{boolean}\t{yes|no)=true, the filtering is made also for the input EPIC sources, and the those EPIC sources regarded as no-good are either dropped out of the output list (`flagepicsrcoutoffov`=false) or just flagged as OUTOFFOV (if `flagepicsrcoutoffov`=false) (see section~\\ref{rgssources:description:outputfiles} for the OUTOFFOV flag). Regardless of whether epic sources are added or not (`withepicset`), the task checks the positions of all sources if `enablefilter` is set and flags them as it is and warns about any that fall outside the FOV.\n \\begin{center}\n \\begin{tabular}{|l|cc|}\n \\multicolumn{3}{c}{When `enablefilter`=true}\\\\\n \\hline\n & EPIC sources & Anything else\\\\\n \\hline\n `flagepicsrcoutoffov` = true & Flagged & Flagged\\\\\n `flagepicsrcoutoffov` = false & Dropped & Flagged\\\\\n \\hline\n \\end{tabular}\n \\end{center}\n }\n **bandids** (Optional): yes\n(Type: integer list}\t{, Default: 2,3, Range: \n This parameter gives the list of energy bands accepted for the input EPIC source list. The RATE value of each source in the output RGS source list is the sum of the RATEs of the corresponding source for the energy bands specified with this parameter. For 1XMM-source-catalogue type ones, this list should be 2, whereas for 2XMM-source-catalogue type ones, this list should be 2, 3 (default). Although an arbitrary number of elements in the list is allowed, if it is more than 9, only the first 9 energy bands are stated in the `E\\_mBNDnn` header keyword and the rest is unstated (see section~\\ref{rgssources:description:outputfiles}) in the output list.\n }\n \\optparm{withboresightfudge)\t{boolean}\t{yes|no}{\n Flip the sign of the boresight euler\\%psi. {\\bf This parameter will be removed} after the boresight is fixed. \n }\n\n[INPUT FILES]\nrgssources\n1. EPIC sources set with a binary extension table named ‘SRCLIST‘\n (required only if ‘withepicset‘ = ‘yes’).\n\n The following columns need to be present in this table:\n\n - ‘RA‘: this value is copied into the RGS column of the same name.\n\n - ‘DEC‘: this value is copied into the RGS column of the same\n name.\n\n - ‘ML_ID_SRC‘ (if the source list was made by Task: emldetect) or\n ‘BOX_ID_SRC‘ (if the source list was made by Task: eboxdetect):\n this number is included in the ‘LABEL‘ value of the source in\n the RGS list.\n\n - ‘ID_BAND‘: this value is used in distinguishing the energy band\n in calculating RATE (see below).\n\n - ‘RATE‘: the sum of these values in the specified energy bands\n are written in the output RGS list. The energy band (ID) is\n listed in the above-mentioned ‘ID_BAND‘ column, whereas the\n energy band IDs are specified in ‘bandids‘ command-line\n parameter.\n\n2. RGS sources set as described in the ‘Output files’ section (required\n only if ‘filemode‘ = ‘modify’).\n\n3. The attitude history file created by Task: atthkgen (required only\n if ((‘filemode‘ = ‘modify’ and ‘changeattitude‘ = ‘yes’) or\n ‘filemode‘ = ‘create’) and ‘attstyle‘ = ‘mean’ or ‘median’.).\n\n4. The filtered attitude history file created by Task: attfilter\n (required only if ((‘filemode‘ = ‘modify’ and ‘changeattitude‘ =\n ‘yes’) or ‘filemode‘ = ‘create’) and ‘attstyle‘ = ‘expmedian’.).\n\n[OUTPUT FILES]\nrgssources\n1. RGS sources set with a binary extension table named ‘SRCLIST‘. The\n header has all the keywords mandatory for PPS products, in\n particular\n\n - ‘RA_PNT‘: The right ascension of the attitude in decimal\n degrees.\n\n - ‘DEC_PNT‘: The declination of the attitude in decimal degrees.\n\n - ‘PA_PNT‘: The position angle of the attitude in decimal degrees.\n\n The ‘SRCLIST‘ table has the following keywords:\n\n - ‘PRIMESRC‘: The ‘INDEX‘ value (see column description below) of\n the prime source.\n\n - ‘E_EXPRn‘: There are n ( ≤ 99) occurrences of this keyword, one\n for each EPIC source list added to the RGS list. The numbers ‘n‘\n are consecutive, starting at 1. The values of these keywords are\n taken from the ‘INSTRUME‘ header keyword in the input EPIC\n source list (that is, probably EPN, in most of the cases, which\n does not carry a lot of practical meaning, in fact), although it\n used to be the exposure IDs of the respective EPIC source files\n (in the old-style source lists).\n\n - ‘E_CONTn‘: Similar to the ‘E_EXPRn‘ keyword, but this records\n the value of the ‘CONTENT‘ keyword in the EPIC file header.\n\n - ‘E_mBNDn‘: Similar to the ‘E_EXPRn‘ keyword, but this records\n the value of either ‘ID_BAND‘ (in the input RGS source file,\n when ‘filemode‘=‘modify’) or ‘bandids‘, which is used to select\n the EPIC sources and to calculate the RATE value, transmitted\n into the output RGS source list. Note that this used to be\n ‘E_BANDn‘(=2) before Ver.6.0. If ‘filemode‘=‘modify’ and if the\n input RGS source list has ‘E_BANDn‘ keywords, then they will be\n preserved in the output RGS source list (i.e., both ‘E_BANDn‘\n and ‘E_mBNDn‘ keywords may appear).\n\n - ‘E_FILTn‘: Similar to the ‘E_EXPRn‘ keyword, but this records\n the value of the ‘FILTER‘ keyword in the EPIC file header.\n\n The ‘SRCLIST‘ table has the following columns:\n\n Column name: Data type: Description:\n ---------------- ------------ --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------\n ‘INDEX‘ int16 Source index number. Each source has a unique value, which Task: rgssources never alters.\n ‘LABEL‘ string Label for the source. These values are also unique to each source. Only upper case is used. At present, label values can only be 20 characters or less in length. Trailing spaces are not allowed.\n ‘RA‘ real32 J2000 right ascension in decimal degrees.\n ‘DEC‘ real32 J2000 declination in decimal degrees.\n ‘RATE‘ real32 Counts per second.\n ‘DELTA_DISP‘ real32 Offset on the sky, in the dispersion direction, of the source with respect to the pointing direction. Given in arcminutes.\n ‘DELTA_XDSP‘ real32 Offset on the sky, in the cross-dispersion direction, of the source with respect to the pointing direction. Given in arcminutes.\n ‘FOV_PHI‘ real32 This and the next column give the polar coordinates of ‘DELTA_DISP‘ and ‘FOV_PHI‘. Units for both are decimal degrees. ‘FOV_PHI‘ is the angle of the source position from the -ve dispersion axis towards the +ve cross-dispersion axis.\n ‘FOV_R‘ real32 \n ‘CONFUSION‘ real32 This is a measure of how confused the source is with respect to the prime source. See subsection [confusion] for a description of how it is calculated. It is a dimensionless number.\n ‘PROCESS‘ bool This column is used by Task: rgsregions to flag those sources for which spectrum extraction regions should be calculated. This column is no longer set by Task: rgssources, though, so all values are written as false in principle. An exception is the case of ‘filemode‘=‘modify’; in that case the PROCESS column in the input RGS source list is in principle preserved. Another exception is the sources added by the user (‘addusersource‘=true), where the value of the command-line option ‘process‘ is written as it is in principle. In any case, if ‘filemode‘=‘modify’ and ‘changeattitude‘=true, all PROCESS values are forcibly written as false regardless of the value ‘process‘ or PROCESS in the input RGS source list.\n ‘BKG_EXCLUDE‘ bool This column is used by Task: rgsregions to flag those sources which should be excluded from the background spectrum extraction region. This column is no longer set by Task: rgssources, so all values are written as false.\n ‘FIXED_ON_SKY‘ bool This column flags those sources for which the positional information was derived from right ascension and declination. The only sources for which ‘FIXED_ON_SKY‘ is false are the attitude source and any user source supplied with ‘userstyle‘=‘wtatt’.\n\n Column name: Data type: Description:\n -------------- ------------ ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------\n ‘EPIC_FILE‘ int16 This gives the number of the ‘E_EXPRn‘, ‘E_CONTn‘, ‘E_mBNDn‘ (or ‘E_BANDn‘ before Ver.6.0) and ‘E_FILTn‘ keywords appropriate to the source if it has been derived from an EPIC source list. Eg, for ‘EPIC_FILE‘=3, the details of the original list from which this source came can be found from the keywords ‘E_EXPR3‘, ‘E_CONT3‘, ‘E_mBND3‘ and ‘E_FILT3‘.\n ‘FLAG‘ int32 If non-zero, something goes wrong in the source. It is a binary (bit-type) form of representation for each cause – see the following table for detail (n.b., The representation of this FLAG column is entirely different from that in the input EPIC source list). Note that some of the checks may be bypassed if requested (by command-line parameters); for example if ‘enablefilter‘=false and ‘flagepicsrcoutoffov‘=false, no check for OUTOFFOV is carried out.\n\n The following is the description for the ‘FLAG‘ column:\n\n Name Bit Description\n ------------ ----- ---------------------------------------------------------------\n OUTOFFOV 0 The source is out of field of view.\n CONFUSED 1 The source may be confused with other source(s).\n BADBAND[1] 2 The energy band used (hence RATE) may be wrong.\n WIDESRC 3 The source is greater than 90 degrees away from the pointing.\n\n Note that the RGS source list set is also used to store the spectrum\n extraction regions created by Task: rgsregions. These become\n invalidated if the attitude is altered; in this case Task:\n rgssources deletes them. See the algorithm (section\n [rgssources:description:algorithm]) for details of the circumstances\n under which this occurs.\n\n The RGS source list table is required to have 1 source whose\n position is taken from the observation proposal, and 1 source whose\n position is equal to the RGS attitude (stored in the dataset header\n keywords ‘RA_PNT‘, ‘DEC_PNT‘ and ‘PA_PNT‘). The ‘LABEL‘ values of\n these two sources are PROPOSAL and ONAXIS respectively.\n\n[1] Since Ver.6.0, this flag is not set by rgssources.\n\n[ABSTRACT] rgssources\nThe task constructs a list of sources that are to be processed by RGS\npipeline.\n[DESCRIPTION] rgssources\n[ATTITUDE PARAMETERS.] rgssources\n[CCF.] rgssources\nTo access this, the user should set SAS_CCF in the usual way.\n[ADDING FURTHER SOURCES.] rgssources\n[FUTURE DEVELOPMENTS] rgssources\n-\n[CAL USAGE] rgssources\n- CAL_setState\n\n- CAL_getMiscellaneousDataValue"
]
embeddings = model.encode(sentences)
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [4, 4]This is a sentence-transformers model finetuned from nomic-ai/modernbert-embed-base. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: ModernBertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("lochhonest/modernbert-finetuned-for-sas")
# Run inference
sentences = [
'In nearly all cases, how many source and background region spectra are supplied for the RGS?',
'RGS spectral products\n\nThis section describes the spectral data products to be generated from\npointed observations.\n\nSource and background region spectra and a background-subtracted source\nspectrum are supplied for the brightest point sources in the RGS (in\nnearly all cases this is just one source). Spectral response matrices\nare also supplied.\n',
"- This extension gives the good time intervals for the event list.\n\n- There is one extension per CCD in the relevant mode (IMAGING or\n TIMING) during the exposure.\n\n- The following keywords are present:\n\n HDUCLASS= 'OGIP ' / format conforms to OGIP standard\n HDUCLAS1= 'GTI ' / table contains Good Time Intervals\n HDUCLAS2= 'STANDARD' / standard Good Time Interval table\n\n- This extension contains the following columns:\n\n Name Type Description\n ------- ------------- --------------------------------\n START 8-byte REAL seconds (since reference time)\n STOP 8-byte REAL seconds (since reference time)\n",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
anchor and positive| anchor | positive | |
|---|---|---|
| type | string | string |
| details |
|
|
| anchor | positive |
|---|---|
What is the purpose of the document described in the preface? |
Preface |
What version of the document is described in the preface? |
Preface |
What is the main change in version 4.3 of the document? |
Preface |
CachedMultipleNegativesRankingLoss with these parameters:{
"scale": 20.0,
"similarity_fct": "get_similarity"
}
anchor and positive| anchor | positive | |
|---|---|---|
| type | string | string |
| details |
|
|
| anchor | positive |
|---|---|
What is the purpose of the PPS cross-correlation products? |
General cross-correlation products |
What are the task parameters of rgssources? |
rgssources |
How many stars were used in the U-filter analysis for the G153 pointing to create the distortion map? |
OM distortion |
CachedMultipleNegativesRankingLoss with these parameters:{
"scale": 20.0,
"similarity_fct": "get_similarity"
}
eval_strategy: stepsper_device_train_batch_size: 16per_device_eval_batch_size: 4num_train_epochs: 2lr_scheduler_type: constantwarmup_ratio: 0.1bf16: Truebatch_sampler: no_duplicatesoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: stepsprediction_loss_only: Trueper_device_train_batch_size: 16per_device_eval_batch_size: 4per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 2max_steps: -1lr_scheduler_type: constantlr_scheduler_kwargs: {}warmup_ratio: 0.1warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Truefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Nonedispatch_batches: Nonesplit_batches: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: no_duplicatesmulti_dataset_batch_sampler: proportional| Epoch | Step | Training Loss | Validation Loss |
|---|---|---|---|
| 0.2203 | 50 | 0.2209 | - |
| 0.4405 | 100 | 0.1635 | 0.0402 |
| 0.6608 | 150 | 0.1759 | - |
| 0.8811 | 200 | 0.1674 | 0.1307 |
| 1.1013 | 250 | 0.1134 | - |
| 1.3216 | 300 | 0.0809 | 0.0441 |
| 1.5419 | 350 | 0.0571 | - |
| 1.7621 | 400 | 0.077 | 0.0268 |
| 1.9824 | 450 | 0.0557 | - |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{gao2021scaling,
title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup},
author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan},
year={2021},
eprint={2101.06983},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
Base model
answerdotai/ModernBERT-base