AI Weekly Newsletter (New open models)

Highlights

  • OpenAI experiences significant leadership changes
  • Meta releases Llama 3 models with impressive benchmarks
  • Google unveils updated Gemini models with performance improvements and price reductions
  • AlphaChip transforms computer chip design using AI

News

OpenAI Leadership Changes

  • Top executives leaving OpenAI:
    • CTO Mira Murati
    • Chief Research Officer Bob McGrew
    • Research Leader Barret Zoph

Llama 3.2 Release

  • Meta releases new Llama 3.2 models:
    • 11B model comparable/slightly better than Claude Haiku
    • 90B model comparable/slightly better than GPT-4o-mini
    • New 128k-context 1B and 3B models competing with Gemma 2 and Phi 3.5
    • Tight on-device collaborations with Qualcomm, Mediatek, and Arm
    • MMMU Benchmark Leaderboard

Llama 3.2 Benchmark 1

Llama 3.2 Benchmark 2

Molmo Models by AI2

  • Open-source multimodal models outperforming Llama 3.2 in vision tasks
  • 7B and 72B model sizes (plus 7B MoE with 1B active params)
  • Benchmarks above GPT-4V, Flash, etc.
  • Human preference of 72B on par with top API models
  • Molmo Blog
  • Molmo Models on Hugging Face
  • Molmo Paper

Molmo comparison

Molmo Comparison

PixMo Dataset

  • High-quality dataset for captioning and supervised fine-tuning
  • Created without using VLMs to generate data
  • Includes dense captioning and supervised fine-tuning data
  • Novel data collection methodology using spoken descriptions

Google’s Gemini 1.5 Updates

  • Release of Gemini-1.5-Pro-002 and Gemini-1.5-Flash-002
  • Significant improvements:
    • 50% reduced price on 1.5 Pro for prompts <128K

    • 2x higher rate limits on 1.5 Flash and ~3x higher on 1.5 Pro
    • 2x faster output and 3x lower latency
    • Updated default filter settings
  • Performance improvements across various benchmarks
  • Google AI Studio
  • Gemini API Documentation This is how cost vs intelligent stands: Gemini Cost vs Intelligence

AlphaChip

Research

Elo Uncovered: Robustness and Best Practices in Language Model Evaluation

  • Study on the suitability of the Elo rating system for evaluating Large Language Models
  • Explores reliability and transitivity axioms in LLM evaluation
  • Findings offer guidelines for enhancing the reliability of LLM evaluation methods
  • Paper Link

Enhancing Structured-Data Retrieval with GraphRAG

  • Introduces Structured-GraphRAG framework for improving information retrieval across structured datasets
  • Utilizes multiple knowledge graphs to capture complex relationships between entities
  • Case study on soccer data demonstrates improved query processing efficiency and reduced response times
  • Paper Link

HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models

  • Comprehensive benchmark for evaluating LLMs’ performance in generating long text
  • Categorizes tasks based on Bloom’s Taxonomy
  • Proposes HelloEval, a human-aligned evaluation method
  • Experiments across 30 mainstream LLMs reveal limitations in long text generation capabilities
  • GitHub Repository

Libraries

crawl4ai

  • GitHub repository for easy web scraping with LLM-friendly output formats
  • Features:
    • Supports crawling multiple URLs simultaneously
    • Extracts media tags, links, and metadata
    • Custom hooks for authentication, headers, and page modifications
  • GitHub Repository

Good Reads