Performance Guide

Design Philosophy

SiliconGhetto is built for performance-sensitive browser games. Every layer — from the ECS to the rendering pipeline — is designed to minimize overhead and maximize throughput in a WASM environment.

WASM Optimization

Build Configuration

Release builds use aggressive optimization:

# In Cargo.toml [profile.release]
opt-level = "z"      # Optimize for size (smaller download, faster load)
lto = true           # Link-time optimization across all crates
codegen-units = 1    # Single codegen unit for maximum optimization
strip = true         # Strip debug symbols

For development, use the default debug profile for fast compilation and better error messages.

wasm-opt

After wasm-pack builds, run wasm-opt for additional size reduction:

wasm-opt -Oz -o output.wasm input.wasm

Typical size reduction: 10-30% beyond Rust’s built-in optimizations.

WASM Size Budget

Component	Target	Notes
Engine core	< 500KB	sg_core + sg_scene + sg_render
wgpu runtime	~800KB	Fixed cost, well-optimized
Demo game	< 200KB	Game-specific code
Total	< 1.5MB	Before gzip (typically 400-600KB gzipped)

Rendering Performance

Sprite Batching

The SpriteBatch system is the key to 2D rendering performance:

All sprites are collected into a single instance buffer each frame
One draw call renders all sprites regardless of count
GPU instancing means the vertex shader runs per-instance, not per-sprite

Performance target: 10,000+ sprites at 60fps on mid-range hardware.

Draw Call Minimization

Each draw call has CPU overhead (command encoding, driver validation). Minimize them:

Batch sprites by texture/material
Use instanced rendering for repeated geometry
Avoid per-object state changes

GPU Memory

Buffers are pre-allocated to expected maximum size
Instance buffers grow but never shrink during a session
Texture atlas packing reduces texture switches (future)

ECS Performance

Archetype Storage

bevy_ecs uses archetype-based storage where entities with identical component sets are stored contiguously in memory. This means:

Iteration over components is cache-friendly
Adding/removing components moves entities between archetypes (avoid doing this frequently)
Queries with fewer components are faster

Query Optimization

// Fast: iterate only what you need
fn movement(mut query: Query<(&mut Transform2D, &Velocity2D)>) {
    for (mut pos, vel) in &mut query {
        pos.position.x += vel.x;
    }
}

// Slower: querying many components you don't fully use
fn movement(mut query: Query<(&mut Transform2D, &Velocity2D, &Sprite, &Health, &Name)>) {
    for (mut pos, vel, _, _, _) in &mut query {
        pos.position.x += vel.x;
    }
}

Fixed Timestep

The fixed-timestep loop prevents physics from running faster on high-refresh displays:

while game_time.consume_fixed_step() {
    fixed_update();  // Runs at exactly 60Hz regardless of display rate
}

This ensures deterministic behavior and prevents wasted computation on 120Hz+ displays.

Asset Loading

Lazy Loading

Load assets only when needed, not all at startup:

Game manifests list required assets
Assets are fetched in parallel
Progressive loading shows content as it arrives

Texture Compression

KTX2/Basis Universal textures are 5-10x smaller than PNG and decompress directly to GPU-native formats:

Format	Size (1024x1024 RGBA)	GPU Upload
PNG	~4MB	Decode + upload
Basis/ETC1S	~200KB	Transcode + upload
Basis/UASTC	~800KB	Fast transcode

Caching

WASM binaries: Cache-Control: public, max-age=31536000, immutable
JS glue code: Cache-Control: public, max-age=31536000, immutable
HTML: Cache-Control: no-cache (always revalidate)
Assets: Cache-Control: public, max-age=2592000

Profiling

Browser Profilers

CPU flame charts for game loop and WASM work
GPU timing and canvas profiling where the browser exposes it
Memory allocation views for engine and asset pressure

In-Engine Metrics

The PerfStats struct in sg_demo_shared tracks:

Frame time (ms)
FPS (rolling average)
Frame time variance
Entity count

The performance overlay displays these in real-time during demo execution.

stress-lab Demo

The stress-lab demo is purpose-built for benchmarking:

Auto-scales entity count until frame rate drops below target
Reports sustained FPS at various entity counts
Detects GPU thermal throttling
Tests quality tier transitions

Common Pitfalls

Allocating in the hot loop: Pre-allocate buffers and reuse them each frame
Excessive component changes: Adding/removing components triggers archetype migration
Unbatched draw calls: Always batch sprites; never draw one at a time
Synchronous asset loading: Use async loading to avoid blocking the game loop
Debug builds in the browser: Always test performance with --release builds