Performance Guide

Design Philosophy

SiliconGhetto is built for performance-sensitive browser games. Every layer — from the ECS to the rendering pipeline — is designed to minimize overhead and maximize throughput in a WASM environment.

WASM Optimization

Build Configuration

Release builds use aggressive optimization:

# In Cargo.toml [profile.release]
opt-level = "z"      # Optimize for size (smaller download, faster load)
lto = true           # Link-time optimization across all crates
codegen-units = 1    # Single codegen unit for maximum optimization
strip = true         # Strip debug symbols

For development, use the default debug profile for fast compilation and better error messages.

wasm-opt

After wasm-pack builds, run wasm-opt for additional size reduction:

wasm-opt -Oz -o output.wasm input.wasm

Typical size reduction: 10-30% beyond Rust’s built-in optimizations.

WASM Size Budget

ComponentTargetNotes
Engine core< 500KBsg_core + sg_scene + sg_render
wgpu runtime~800KBFixed cost, well-optimized
Demo game< 200KBGame-specific code
Total< 1.5MBBefore gzip (typically 400-600KB gzipped)

Rendering Performance

Sprite Batching

The SpriteBatch system is the key to 2D rendering performance:

  • All sprites are collected into a single instance buffer each frame
  • One draw call renders all sprites regardless of count
  • GPU instancing means the vertex shader runs per-instance, not per-sprite

Performance target: 10,000+ sprites at 60fps on mid-range hardware.

Draw Call Minimization

Each draw call has CPU overhead (command encoding, driver validation). Minimize them:

  • Batch sprites by texture/material
  • Use instanced rendering for repeated geometry
  • Avoid per-object state changes

GPU Memory

  • Buffers are pre-allocated to expected maximum size
  • Instance buffers grow but never shrink during a session
  • Texture atlas packing reduces texture switches (future)

ECS Performance

Archetype Storage

bevy_ecs uses archetype-based storage where entities with identical component sets are stored contiguously in memory. This means:

  • Iteration over components is cache-friendly
  • Adding/removing components moves entities between archetypes (avoid doing this frequently)
  • Queries with fewer components are faster

Query Optimization

// Fast: iterate only what you need
fn movement(mut query: Query<(&mut Transform2D, &Velocity2D)>) {
    for (mut pos, vel) in &mut query {
        pos.position.x += vel.x;
    }
}

// Slower: querying many components you don't fully use
fn movement(mut query: Query<(&mut Transform2D, &Velocity2D, &Sprite, &Health, &Name)>) {
    for (mut pos, vel, _, _, _) in &mut query {
        pos.position.x += vel.x;
    }
}

Fixed Timestep

The fixed-timestep loop prevents physics from running faster on high-refresh displays:

while game_time.consume_fixed_step() {
    fixed_update();  // Runs at exactly 60Hz regardless of display rate
}

This ensures deterministic behavior and prevents wasted computation on 120Hz+ displays.

Asset Loading

Lazy Loading

Load assets only when needed, not all at startup:

  • Game manifests list required assets
  • Assets are fetched in parallel
  • Progressive loading shows content as it arrives

Texture Compression

KTX2/Basis Universal textures are 5-10x smaller than PNG and decompress directly to GPU-native formats:

FormatSize (1024x1024 RGBA)GPU Upload
PNG~4MBDecode + upload
Basis/ETC1S~200KBTranscode + upload
Basis/UASTC~800KBFast transcode

Caching

  • WASM binaries: Cache-Control: public, max-age=31536000, immutable
  • JS glue code: Cache-Control: public, max-age=31536000, immutable
  • HTML: Cache-Control: no-cache (always revalidate)
  • Assets: Cache-Control: public, max-age=2592000

Profiling

Browser DevTools

  • Chrome Performance tab: CPU flame chart, GPU timing
  • Firefox Profiler: WASM call stacks, memory allocations
  • Safari Web Inspector: Canvas and GPU profiling

In-Engine Metrics

The PerfStats struct in sg_demo_shared tracks:

  • Frame time (ms)
  • FPS (rolling average)
  • Frame time variance
  • Entity count

The performance overlay displays these in real-time during demo execution.

stress-lab Demo

The stress-lab demo is purpose-built for benchmarking:

  • Auto-scales entity count until frame rate drops below target
  • Reports sustained FPS at various entity counts
  • Detects GPU thermal throttling
  • Tests quality tier transitions

Common Pitfalls

  1. Allocating in the hot loop: Pre-allocate buffers and reuse them each frame
  2. Excessive component changes: Adding/removing components triggers archetype migration
  3. Unbatched draw calls: Always batch sprites; never draw one at a time
  4. Synchronous asset loading: Use async loading to avoid blocking the game loop
  5. Debug builds in the browser: Always test performance with --release builds