🤗 Hugging Face Dataset • 🤖 Hugging Face Model 70B• 🤖 Hugging Face Model 8B
👩🚀 Ask questions or discuss ideas on GitHub
📝 Check out SEMIKONG Tech Report
📕 Table of Contents
-
🤖 SEMIKONG is an open-source, industry-specific large language model (LLM) tailored to the semiconductor domain. It aims to address the unique challenges faced by the semiconductor industry, such as the physics and chemistry of semiconductor devices and processes, by incorporating domain-specific knowledge into the model.
-
🙌 Targeted as a bilingual language model and trained on 3T multilingual corpus, the SEMIKONG series models become one of the strongest LLM worldwide, showing promise in language understanding, commonsense reasoning, reading comprehension, and more. For example,
-
SEMIKONG-8B / 70B-Instruct model .
-
SEMIKONG-8B / 70B model .
-
🙏 (Credits to Llama) Thanks to the Transformer and Llama open-source communities, as they reduce the efforts required to build from scratch and enable the utilization of the same tools within the AI ecosystem.
-
[ Back to top ⬆️ ]
[ Back to top ⬆️ ]
- First industry-specific LLM for the semiconductor domain
- Trained on a comprehensive semiconductor-related text corpus
- Novel pre-training approach leveraging domain-specific knowledge
- Superior performance compared to general-purpose LLMs on industry-relevant benchmarks
- Serves as a valuable foundation for companies to build proprietary models tailored to their needs
SEMIKONG models come in multiple sizes and cater to different use cases. You can also fine-tune SEMIKONG models to meet your specific requirements.
If you want to deploy SEMIKONG models, make sure you meet the software and hardware requirements.
Model | Download |
---|---|
SEMIKONG-70B-Instruct | • 🤗 Hugging Face |
SEMIKONG-8B-Instruct | • 🤗 Hugging Face |
Model | Download |
---|---|
SEMIKONG-70B | • 🤗 Hugging Face |
SEMIKONG-8B | • 🤗 Hugging Face |
- For chat and base models
Model | Intro | Default context window | Pretrained tokens |
---|---|---|---|
70B series models | A powerful version of SEMIKONG that suitable more complex task | 48k | 25T |
8B series models | An economical version of SEMIKONG that able to perform general instruction and chat in semiconductor manufacturing process | 48k | 25T |
-
For chat models
For chat model limitations, see the explanations below. ⬇️
- Hallucination: This refers to the model generating factually incorrect or nonsensical information. With the model's responses being more varied, there's a higher chance of hallucination that are not based on accurate data or logical reasoning.
- Non-determinism in re-generation: When attempting to regenerate or sample responses, inconsistencies in the outcomes may occur. The increased diversity can lead to varying results even under similar input conditions.
- Cumulative Error: This occurs when errors in the model's responses compound over time. As the model generates more diverse responses, the likelihood of small inaccuracies building up into larger errors increases, especially in complex tasks like extended reasoning, mathematical problem-solving, etc.
- To achieve more coherent and consistent responses, it is advisable to adjust generation configuration parameters such as temperature, top_p, or top_k. These adjustments can help in the balance between creativity and coherence in the model's outputs.
The released chat model has undergone exclusive training using Supervised Fine-Tuning (SFT). Compared to other standard chat models, our model produces more diverse responses, making it suitable for various downstream tasks, such as creative scenarios. Furthermore, this diversity is expected to enhance the likelihood of generating higher quality responses, which will be advantageous for subsequent Reinforcement Learning (RL) training.
However, this higher diversity might amplify certain existing issues, including:
[ Back to top ⬆️ ]
Getting up and running with SEMIKONG models is simple with multiple choices available.
Select one of the following paths to begin your journey with SEMIKONG!
If you prefer to deploy SEMIKONG models locally,
- 🙋♀️ and you have sufficient resources (for example, NVIDIA A100 40GB), you can choose one of the following methods:
If you prefer not to deploy SEMIKONG models locally, you can explore SEMIKONG's capabilities using any of the following options.
If you want to chat with SEMIKONG, you can use one of these online services, which offer a similar user experience:
- SEMIKONG-70B-Instruct (SEMIKONG official on Hugging Face)
[ Back to top ⬆️ ]
This tutorial guides you through every step of running SEMIKONG-8B-Instruct locally on an A100 (40G) and then performing inference.
-
Make sure Python 3.10 or a later version is installed.
-
If you want to run other SEMIKONG models, see software and hardware requirements.
To set up the environment and install the required packages, execute the following command.
git clone https://github.com/aitomatic/semikong.git
cd semikong
pip install -r requirements.txt
You can download the weights and tokenizer of SEMIKONG models from the following sources:
You can perform inference with SEMIKONG chat or base models as below.
-
Create a file named
quick_start.py
and copy the following content to it.from transformers import AutoModelForCausalLM, AutoTokenizer model_path = '<your-model-path>' tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=False) # Since transformers 4.35.0, the GPT-Q/AWQ model can be loaded using AutoModelForCausalLM. model = AutoModelForCausalLM.from_pretrained( model_path, device_map="auto", torch_dtype='auto' ).eval() # Prompt content: "hi" messages = [ {"role": "user", "content": "hi"} ] input_ids = tokenizer.apply_chat_template(conversation=messages, tokenize=True, add_generation_prompt=True, return_tensors='pt') output_ids = model.generate(input_ids.to('cuda')) response = tokenizer.decode(output_ids[0][input_ids.shape[1]:], skip_special_tokens=True) # Model response: "Hello! How can I assist you today?" print(response)
-
Run
quick_start.py
.python quick_start.py
Then you can see an output similar to the one below. 🥳
Hello! How can I assist you today?
-
SEMIKONG-8B
Input
from transformers import AutoModelForCausalLM, AutoTokenizer MODEL_DIR = "pentagoniac/SEMIKONG-8B" model = AutoModelForCausalLM.from_pretrained(MODEL_DIR, torch_dtype="auto") tokenizer = AutoTokenizer.from_pretrained(MODEL_DIR, use_fast=False) input_text = "what is semiconductor ?" inputs = tokenizer(input_text, return_tensors="pt").to(model.device) outputs = model.generate(**inputs, max_length=256) print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Output
Semiconductor is a ....
[ Back to top ⬆️ ]
TBA
[ Back to top ⬆️ ]
You can build a web UI demo for SEMIKONG chat models (note that SEMIKONG base models are not supported in this scenario).
Step 1: Prepare your environment.
Step 2: Download the SEMIKONG model.
Step 3. To start a web service locally, run the following command.
python demo/web_demo.py -c <your-model-path>
You can access the web UI by entering the address provided in the console into your browser.
[ Back to top ⬆️ ]
For the SEMIKONG-8B model, a node with 1 GPUs, each with GPU memory larger than 16GB, is recommended.
For the SEMIKONG-70B model, because the usage of the zero-offload technique consumes a lot of CPU memory, please be careful to limit the number of GPUs in the 34B finetune training. Please use CUDA_VISIBLE_DEVICES to limit the number of GPUs (as shown in scripts/run_sft_Yi_34b.sh).
A typical hardware setup for finetuning the 70B model is a node with 8 GPUs (limited to 4 in running by CUDA_VISIBLE_DEVICES=0,1,2,3), each with GPU memory larger than 80GB, and total CPU memory larger than 900GB.
If you want to deploy SEMIKONG models, make sure you meet the software and hardware requirements.
Before using SEMIKONG quantized models, make sure you've installed the correct software listed below.
Model | Software |
---|---|
SEMIKONG 4-bit quantized models | AWQ and CUDA |
SEMIKONG 8-bit quantized models | GPTQ and CUDA |
Before deploying SEMIKONG in your environment, make sure your hardware meets the following requirements.
Model | Minimum VRAM | Recommended GPU Example |
---|---|---|
SEMIKONG-70B-Instruct | 170 GB | 3 x A100 80GB 5 x A100 40GB |
SEMIKONG-8B-Instruct | 16 GB | 1 x RTX 3060 (12 GB) 1 x RTX 4060 (8 GB) |
Model | Minimum VRAM | Recommended GPU Example |
---|---|---|
SEMIKONG-8B | 15 GB | 1 x RTX 3090 (24 GB) 1 x RTX 4090 (24 GB) 1 x A10 (24 GB) 1 x A30 (24 GB) |
SEMIKONG-70B | 200 GB | 4 x A800 (80 GB) |
[ Back to top ⬆️ ]
SEMIKONG has a comprehensive ecosystem, offering a range of tools, services, and models to enrich your experiences and maximize productivity.
The SEMIKONG series models follow the same model architecture as Llama. By choosing SEMIKONG, you can leverage existing tools, libraries, and resources within the Llama ecosystem, eliminating the need to create new tools and enhancing development efficiency.
For example, the SEMIKONG series models are saved in the format of the Llama model. You can directly use LlamaForCausalLM
and LlamaTokenizer
to load the model. For more information, see Use the chat model.
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("pentagoniac/SEMIKONG-8B-Instruct", use_fast=False)
model = AutoModelForCausalLM.from_pretrained("pentagoniac/SEMIKONG-8B-Instruct", device_map="auto")
[ Back to top ⬆️ ]
💡 Tip
Feel free to create a PR and share the fantastic work you've built using the SEMIKONG series models.
To help others quickly understand your work, it is recommended to use the format of
<model-name>: <model-intro> + <model-highlights>
.
If you want to get up with SEMIKONG in a few minutes, you can use the following services built upon SEMIKONG.
- SEMIKONG-70B-Instruct: you can chat with SEMIKONG using one of the following platforms:
[ Back to top ⬆️ ]
For detailed capabilities of the SEMIKONG series model, see SEMIKONG: Technical Report.
@article{semikong2024,
title={SemiKong: Curating, Training, and Evaluating A Semiconductor Industry-Specific Large Language Model},
author={Christopher Nguyen, William Nguyen, Atsushi Suzuki, Daisuke Oku, Hong An Phan, Sang Dinh, Zooey Nguyen, Anh Ha, Shruti Raghavan, Huy Vo, Thang Nguyen, Lan Nguyen, Yoshikuni Hirayama},
journal={arXiv preprint arXiv:2411.13802},
year={2024}
}
SEMIKONG-70B-Chat model demonstrates exceptional performance, ranking first among all existing open-source models in the benchmarks including MMLU, CMMLU, BBH, GSM8k, and more.
Evaluation methods and challenges. ⬇️
[ Back to top ⬆️ ]
Everyone! 🙌 ✅
The code and weights of the SEMIKONG series models are distributed under the Apache 2.0 license, which means the SEMIKONG series models are free for personal usage, academic purposes, and commercial use.
[ Back to top ⬆️ ]
This project is the result of a collaborative effort involving multiple companies and individuals:
- Tokyo Electron: Atsushi Suzuki, Daisuke Oku
- FPT Software AIC: Huy Vo, Thang Nguyen, Lan Nguyen
- Aitomatic: Daniel Guttierez, Vinh Luong, Christopher Nguyen.
- AI Alliance members and researchers
We would like to express our gratitude to the AI Alliance (https://thealliance.ai) for providing the impetus, resources, and platform for this work, and for collaboration in open science. We also extend our thanks to the member organizations of the AI Alliance, their researchers and engineers for their valuable contributions to this study, including:
- Noritaka Yokomori (Tokyo Electron)
- Anthony Annunziata (IBM Research)
- Sean Hughes (ServiceNow)
- Phong Nguyen (FPT Software, AI Center)
Their expertise, insights, and collaborative spirit have been instrumental in advancing our research.
[ Back to top ⬆️ ]
We use data compliance checking algorithms during the training process, to ensure the compliance of the trained model to the best of our ability. Due to complex data and the diversity of language model usage scenarios, we cannot guarantee that the model will generate correct, and reasonable output in all scenarios. Please be aware that there is still a risk of the model producing problematic outputs. We will not be responsible for any risks and issues resulting from misuse, misguidance, illegal usage, and related misinformation, as well as any associated data security concerns.
[ Back to top ⬆️ ]
The code and weights of the SEMIKONG series models are distributed under the Apache 2.0 license.
If you create derivative works based on this model, please include the following attribution in your derivative works:
This work is a derivative of [The SEMIKONG Series Model You Base On] by AI Alliance, used under the Apache 2.0 License.
[ Back to top ⬆️ ]