DavidBord commited on
Commit
1878aa0
·
verified ·
1 Parent(s): eaa345b

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +87 -76
README.md CHANGED
@@ -1,51 +1,64 @@
1
- # nvidia/CUDA-Autocomplete Overview
2
-
3
- ## Description:
4
- NVIDIA CUDA Autocomplete is a fine-tuned version of Qwen/Qwen2.5-Coder-7B enhanced for CUDA code completion. The model takes as input two strings of code context: the prefix (code before the cursor) and the suffix (code after the cursor), and outputs a single line of code that logically continues the prefix. By analyzing the surrounding code structure, variable names, and CUDA-specific patterns, the model predicts the most likely next line of code, enabling intelligent autocomplete functionality for general programming and CUDA development in the Nsight Copilot extension for VSCode and Cursor.
5
-
6
- _This model is ready for commercial/non-commercial use._
7
-
8
-
9
- ### License/Terms of Use:
10
- [NVIDIA Open Model License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/)
11
-
12
- ### Deployment Geography:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
  Global
14
 
15
- ### Use Case:
16
- This model is intended to be used for code completion in the Nsight Copilot extension for VSCode / Cursor.
17
-
18
 
19
- ### Release Date:
20
- Huggingface : 05/28/2026
21
 
22
- ## Reference(s):
23
  [Qwen2.5-Coder paper](https://arxiv.org/abs/2409.12186)
24
- [Qwen2.5-Coder blog](https://qwenlm.github.io/blog/qwen2.5-coder-family/)
25
- [Qwen2.5-Coder GitHub repository](https://github.com/QwenLM/Qwen2.5-Coder)
26
 
27
- ## Model Architecture:
28
- **Architecture Type:** Transformer
29
- **Network Architecture:** Qwen2ForCausalLM
30
- **This model was developed based on Qwen/Qwen2.5-Coder-7B.**
31
- **Number of model parameters:** 7B (7*10^9)
32
 
33
 
34
- ## Computational Load (Internal Only: For NVIDIA Models Only)
35
- **Cumulative Compute:** 1.23 * 10^20 FLOPS
36
- **Estimated Energy and Emissions for Model Training:** 150.52 kWh
37
-
38
- ## Input:
39
  **Input Type(s):** Code
40
  **Input Format(s):** String of code (meant for prefix code and suffix code)
41
  **Input Parameters:** One-Dimensional (1D)
42
  **Other Properties Related to Input:**
43
  - **Context Window:** The model processes sequential code text with prefix and suffix context
44
  - **Encoding:** UTF-8 text encoding
45
- - **Input Structure:** Fill-in-the-middle (FIM) format with prefix and suffix tokens
46
-
47
 
48
- ## Output:
 
49
  **Output Type(s):** Code
50
  **Output Format:** String
51
  **Output Parameters:** One-Dimensional (1D)
@@ -53,60 +66,58 @@ Huggingface : 05/28/2026
53
  - **Output Length:** Single line of code completion
54
  - **Generation Method:** Autoregressive token-by-token generation
55
  - **Encoding:** UTF-8 text encoding
56
- - **Output Structure:** Sequential code text that continues from the input prefix
57
-
58
 
59
  Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA's hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.
60
 
61
- ## Software Integration:
62
- **Runtime Engine(s):** vLLM
63
- **Supported Hardware Microarchitecture Compatibility:**
64
- * H100
65
- * DGX Spark
66
- **[Supported] Operating System(s):** Linux
 
 
67
 
68
- The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure compliance with safety and ethical standards before deployment.
69
 
 
 
70
 
71
- ## Model Version(s):
72
- v0.3
73
-
74
- ## Training, Testing, and Evaluation Datasets:
75
 
76
 
77
- ## Training Dataset:
78
 
79
- **Link:** Subset of
80
- 1) https://huggingface.co/datasets/bigcode/the-stack-v2
81
- 2) Synthetically generated CUDA data using OSS models like GPT-OSS 120B
82
- **Data Modality:** Text
83
- **Text Training Data Size:** ~700000 samples
84
- **Data Collection Method by dataset:** Hybrid: Automated, Synthetic
85
- **Labeling Method by dataset:** Not Applicable
86
- **Properties (Quantity, Dataset Descriptions, Sensor(s)):** ~700,000 samples. Text modality (source code). Content includes open-source CUDA and general programming code collected from permissive-licensed repositories, as well as machine-generated synthetic CUDA code produced by OSS models. Primarily English-language code with CUDA-specific constructs and APIs. No sensor data involved.
87
 
88
- ### Testing Dataset:
89
- **Link:** NVIDIA Internal Data.
90
- (Internal Only: Not To Be Published)
91
- **Benchmark Score:** ROUGE-L score on cuda-samples dataset is 77.45 %.
92
- **Data Collection Method by dataset:** Automated
93
- **Labeling Method by dataset:** Not Applicable
94
- **Properties (Quantity, Dataset Descriptions, Sensor(s)):** 2,156 samples. Text modality (source code). Content consists of internal proprietary CUDA and HPC library code (e.g., cuDNN, cuda-hpc) parsed from internal GitLab repositories. Code is CUDA-specific with domain-specific APIs and patterns. No sensor data involved.
95
 
96
- ### Evaluation Dataset:
97
- **Link:** Subset of https://huggingface.co/datasets/bigcode/the-stack-v2
98
- **Data Collection Method by dataset:** Automated
99
- **Labeling Method by dataset:** Not Applicable
100
- **Properties (Quantity, Dataset Descriptions, Sensor(s)):** ~33,000 samples. Each sample corresponds to a single source code file. Text modality (source code). Content includes open-source code collected from permissive-licensed repositories. CUDA and general programming code in English. No sensor data involved.
101
 
 
 
 
 
 
102
 
103
- ## Inference:
 
104
  **Acceleration Engine:** vLLM
105
- **Test Hardware:**
106
- * H100
107
- * DGX Spark
108
-
109
- ## Ethical Considerations:
110
- NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
111
- For more detailed information on ethical considerations for this model, please see the Model Card++ Explainability, Bias, Safety & Security, and Privacy Subcards.
112
- Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns [here](https://app.intigriti.com/programs/nvidia/nvidiavdp/detail).
 
 
1
+ ---
2
+ license: other
3
+ license_name: nvidia-open-model-license
4
+ license_link: >-
5
+ https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/
6
+ library_name: transformers
7
+ pipeline_tag: text-generation
8
+ tags:
9
+ - code
10
+ - cuda
11
+ - fill-in-the-middle
12
+ - nvidia
13
+ - pytorch
14
+ datasets:
15
+ - bigcode/the-stack-v2
16
+ base_model: Qwen/Qwen2.5-Coder-7B
17
+ ---
18
+ ## Model Overview
19
+ NVIDIA CUDA Autocomplete is a fine-tuned version of Qwen/Qwen2.5-Coder-7B enhanced for CUDA code completion. The model takes as input two strings of code context: the prefix (code before the cursor) and the suffix (code after the cursor), and outputs several lines of code that logically continues the prefix. By analyzing the surrounding code structure, variable names, and CUDA-specific patterns, the model predicts the most likely next line of code, enabling intelligent autocomplete functionality for general programming and CUDA development in the Nsight Copilot extension for VSCode and Cursor.
20
+
21
+ _This model is ready for commercial/non-commercial use._
22
+
23
+
24
+ ### License/Terms of Use
25
+ Use of this model is governed by the [NVIDIA Open Model License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/).
26
+
27
+ Additional Information.For Qwen2.5-Coder-7B, [Apache License, Version 2.0](https://huggingface.co/Qwen/Qwen2.5-Coder-7B/blob/main/LICENSE).
28
+
29
+ ### Deployment Geography
30
  Global
31
 
32
+ ### Use Case
33
+ This model is intended to be used for code completion in the Nsight Copilot extension for VSCode / Cursor.
34
+
35
 
36
+ ### Release Date
37
+ Huggingface : 06/09/2026 via [https://huggingface.co/nvidia/CUDA-Autocomplete](https://huggingface.co/nvidia/CUDA-Autocomplete)
38
 
39
+ ## Reference(s)
40
  [Qwen2.5-Coder paper](https://arxiv.org/abs/2409.12186)
41
+ [Qwen2.5-Coder blog](https://qwenlm.github.io/blog/qwen2.5-coder-family/)
42
+ [Qwen2.5-Coder GitHub repository](https://github.com/QwenLM/Qwen2.5-Coder)
43
 
44
+ ## Model Architecture
45
+ **Architecture Type:** Transformer
46
+ **Network Architecture:** Qwen2ForCausalLM
47
+ **This model was developed based on Qwen/Qwen2.5-Coder-7B.**
48
+ **Number of model parameters:** 7B (7*10^9)
49
 
50
 
51
+ ## Input
 
 
 
 
52
  **Input Type(s):** Code
53
  **Input Format(s):** String of code (meant for prefix code and suffix code)
54
  **Input Parameters:** One-Dimensional (1D)
55
  **Other Properties Related to Input:**
56
  - **Context Window:** The model processes sequential code text with prefix and suffix context
57
  - **Encoding:** UTF-8 text encoding
58
+ - **Input Structure:** Fill-in-the-middle (FIM) format with prefix and suffix tokens
 
59
 
60
+
61
+ ## Output
62
  **Output Type(s):** Code
63
  **Output Format:** String
64
  **Output Parameters:** One-Dimensional (1D)
 
66
  - **Output Length:** Single line of code completion
67
  - **Generation Method:** Autoregressive token-by-token generation
68
  - **Encoding:** UTF-8 text encoding
69
+ - **Output Structure:** Sequential code text that continues from the input prefix
70
+
71
 
72
  Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA's hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.
73
 
74
+ ## Software Integration
75
+ **Runtime Engine(s):** vLLM
76
+ **Supported Hardware Microarchitecture Compatibility:**
77
+ * H100
78
+ * DGX Spark
79
+ **[Supported] Operating System(s):** Linux
80
+
81
+ The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure compliance with safety and ethical standards before deployment.
82
 
 
83
 
84
+ ## Model Version(s)
85
+ v0.3.0
86
 
87
+ ## Training, Testing, and Evaluation Datasets
 
 
 
88
 
89
 
90
+ ### Training Dataset
91
 
92
+ * **Source:** Subset of [bigcode/the-stack-v2](https://huggingface.co/datasets/bigcode/the-stack-v2) & synthetically generated CUDA data using OSS models like GPT-OSS 120B
93
+ * **Data Modality:** Text
94
+ * **Text Training Data Size:** ~700000 samples
95
+ * **Data Collection Method by dataset:** Hybrid: Automated, Synthetic
96
+ * **Labeling Method by dataset:** Not Applicable
97
+ * **Properties (Quantity, Dataset Descriptions, Sensor(s)):** ~700,000 samples. Text modality (source code). Content includes open-source CUDA and general programming code collected from permissive-licensed repositories, as well as machine-generated synthetic CUDA code produced by OSS models. Primarily English-language code with CUDA-specific constructs and APIs. No sensor data involved.
 
 
98
 
99
+ ### Testing Dataset
100
+ * **Source:** NVIDIA Internal Data
101
+ * **Data Collection Method by dataset:** Automated
102
+ * **Labeling Method by dataset:** Not Applicable
103
+ * **Properties (Quantity, Dataset Descriptions, Sensor(s)):** 2,156 samples. Text modality (source code). Content consists of internal proprietary CUDA and HPC library code (e.g., cuDNN, cuda-hpc) parsed from internal GitLab repositories. Code is CUDA-specific with domain-specific APIs and patterns. No sensor data involved.
 
 
104
 
 
 
 
 
 
105
 
106
+ ### Evaluation Dataset
107
+ * **Source:** Subset of [bigcode/the-stack-v2](https://huggingface.co/datasets/bigcode/the-stack-v2)
108
+ * **Data Collection Method by dataset:** Automated
109
+ * **Labeling Method by dataset:** Not Applicable
110
+ * **Properties (Quantity, Dataset Descriptions, Sensor(s)):** ~33,000 samples. Each sample corresponds to a single source code file. Text modality (source code). Content includes open-source code collected from permissive-licensed repositories. CUDA and general programming code in English. No sensor data involved.
111
 
112
+
113
+ ## Inference
114
  **Acceleration Engine:** vLLM
115
+
116
+ **Test Hardware:**
117
+ * H100
118
+ * DGX Spark
119
+
120
+ ## Ethical Considerations
121
+ NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
122
+ For more detailed information on ethical considerations for this model, please see the Model Card++ Explainability, Bias, Safety & Security, and Privacy Subcards.
123
+ Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns [here](https://app.intigriti.com/programs/nvidia/nvidiavdp/detail).