The Nemotron-4 340B series includes base, guided, and reward model weights. The Base model is pre-trained on a corpus of 9 trillion tokens, featuring over 50 natural languages and more than 40 programming languages.
It has undergone three alignment methods: SFT, DPO, and RPO.
Throughout the alignment process, it relies on approximately 20K manually annotated data, with the data generation pipeline synthesizing over 98% of the data used for supervised fine-tuning and preference fine-tuning (DPO and RPO).
The model can be used commercially, and it is free to create and distribute derivative models.
Nemotron-4-340B-Instruct is a standard decoder-only Transformer, trained with a sequence length of 4096 tokens, utilizing Grouped Query Attention (GQA) and Rotated Positional Embeddings (RoPE).
Training was conducted using a 768 DGX H100 cluster, with each cluster comprising 8 H100 80GB SXM5 GPUs.