Weekly Newsletter: Sep 20th

Highlight: Qwen2.5 Family - A Comprehensive Release of AI Models

The Qwen team has unveiled their largest release ever, featuring a wide range of models for various applications.

Key Features:

Qwen2.5: Models ranging from 0.5B to 72B parameters
Qwen2.5-Coder: Specialized models for coding tasks (1.5B, 7B, 32B)
Qwen2.5-Math: Models optimized for mathematical reasoning (1.5B, 7B, 72B)
Qwen2-VL-72B: Open-sourced multimodal model
Over 100 model variants, including quantized versions (GPTQ, AWQ, GGUF)
Competitive performance against proprietary models
Apache 2.0 license for most open-source models

The Qwen2.5-72B-Instruct model demonstrates competitive performance against proprietary models and outperforms most open-source models in various benchmark evaluations.

Models

1. Qwen2.5 Family

14B and 32B models outperform predecessor Qwen2-72B-Instruct
Compact 3B model achieves 68 on MMLU, surpassing Qwen1.5-14B
Qwen2.5-Coder shows competitive performance against larger code LLMs
Qwen2.5-Math supports both English and Chinese, with improved reasoning capabilities

2. Mistral AI’s Pixtral 12B

Natively multimodal model with 400M parameter vision encoder
Supports multiple images in 128k token context window
Achieves 52.5% on MMMU reasoning benchmark
Excels in instruction following, chart understanding, and image-to-code generation

3. NVIDIA’s NVLM 1.0

Frontier-class multimodal LLMs rivaling proprietary models
Novel architecture enhancing training efficiency and reasoning
1-D tile-tagging design for high-resolution image processing
Improved text-only performance after multimodal training

Research

1. GRIN: GRadient-INformed MoE

New approach to Mixture-of-Experts (MoE) training, incorporating sparse gradient estimation for expert routing. Developed a top-2 16×3.8B MoE model that outperforms a 7B dense model and matches a 14B dense model.

2. Preference Tuning Survey

Comprehensive overview of recent advancements in preference tuning and human feedback integration across language, speech, and vision tasks.

3. Promptriever

First retrieval model able to be prompted like a language model, achieving strong performance on standard retrieval tasks and following instructions. Curated a new 500k instance-level instruction training set from MS MARCO.

Libraries

New high-performance AI inference stack built for production, utilizing Zig, OpenXLA, MLIR, and Bazel.

Good Reads

Talk by Hamel Husain and Emil Sedgh on improving LLM apps beyond MVP:

Systematic approach to consistently improve AI
Avoiding common traps
Resources for further learning

AI Newsletter - Latest Developments in Models, Research, and More#

Highlight: Qwen2.5 Family - A Comprehensive Release of AI Models#

Key Features:#

Models#

1. Qwen2.5 Family#

2. Mistral AI’s Pixtral 12B#

3. NVIDIA’s NVLM 1.0#

Research#

1. GRIN: GRadient-INformed MoE#

2. Preference Tuning Survey#

3. Promptriever#

Libraries#

Good Reads#

AI Newsletter - Latest Developments in Models, Research, and More