DeepSeek V3

A massive, open-source Mixture-of-Experts (MoE) large language model engineered for top-tier efficiency and reasoning performance. It features a total of 671 billion parameters, activating roughly 37 billion parameters per token to balance intense processing depth with fast inference speeds. Trained on an extensive dataset of over trillions of multilingual tokens, V3 introduces specialized Multi-head Latent Attention (MLA) to reduce memory overhead during complex tasks. It excels at long-context comprehension, advanced mathematical logic, software coding, and creative generation. By drastically lowering computational costs compared to dense architectures, DeepSeek V3 provides a highly competitive, accessible foundation for institutional research and large-scale commercial deployments.

[←]

Science & Technology