Performance Guide
Design Philosophy
SiliconGhetto is built for performance-sensitive browser games. Every layer — from the ECS to the rendering pipeline — is designed to minimize overhead and maximize throughput in a WASM environment.
WASM Optimization
Build Configuration
Release builds use aggressive optimization:
# In Cargo.toml [profile.release]
opt-level = "z" # Optimize for size (smaller download, faster load)
lto = true # Link-time optimization across all crates
codegen-units = 1 # Single codegen unit for maximum optimization
strip = true # Strip debug symbols
For development, use the default debug profile for fast compilation and better error messages.
wasm-opt
After wasm-pack builds, run wasm-opt for additional size reduction:
wasm-opt -Oz -o output.wasm input.wasm
Typical size reduction: 10-30% beyond Rust’s built-in optimizations.
WASM Size Budget
| Component | Target | Notes |
|---|---|---|
| Engine core | < 500KB | sg_core + sg_scene + sg_render |
| wgpu runtime | ~800KB | Fixed cost, well-optimized |
| Demo game | < 200KB | Game-specific code |
| Total | < 1.5MB | Before gzip (typically 400-600KB gzipped) |
Rendering Performance
Sprite Batching
The SpriteBatch system is the key to 2D rendering performance:
- All sprites are collected into a single instance buffer each frame
- One draw call renders all sprites regardless of count
- GPU instancing means the vertex shader runs per-instance, not per-sprite
Performance target: 10,000+ sprites at 60fps on mid-range hardware.
Draw Call Minimization
Each draw call has CPU overhead (command encoding, driver validation). Minimize them:
- Batch sprites by texture/material
- Use instanced rendering for repeated geometry
- Avoid per-object state changes
GPU Memory
- Buffers are pre-allocated to expected maximum size
- Instance buffers grow but never shrink during a session
- Texture atlas packing reduces texture switches (future)
ECS Performance
Archetype Storage
bevy_ecs uses archetype-based storage where entities with identical component sets are stored contiguously in memory. This means:
- Iteration over components is cache-friendly
- Adding/removing components moves entities between archetypes (avoid doing this frequently)
- Queries with fewer components are faster
Query Optimization
// Fast: iterate only what you need
fn movement(mut query: Query<(&mut Transform2D, &Velocity2D)>) {
for (mut pos, vel) in &mut query {
pos.position.x += vel.x;
}
}
// Slower: querying many components you don't fully use
fn movement(mut query: Query<(&mut Transform2D, &Velocity2D, &Sprite, &Health, &Name)>) {
for (mut pos, vel, _, _, _) in &mut query {
pos.position.x += vel.x;
}
}
Fixed Timestep
The fixed-timestep loop prevents physics from running faster on high-refresh displays:
while game_time.consume_fixed_step() {
fixed_update(); // Runs at exactly 60Hz regardless of display rate
}
This ensures deterministic behavior and prevents wasted computation on 120Hz+ displays.
Asset Loading
Lazy Loading
Load assets only when needed, not all at startup:
- Game manifests list required assets
- Assets are fetched in parallel
- Progressive loading shows content as it arrives
Texture Compression
KTX2/Basis Universal textures are 5-10x smaller than PNG and decompress directly to GPU-native formats:
| Format | Size (1024x1024 RGBA) | GPU Upload |
|---|---|---|
| PNG | ~4MB | Decode + upload |
| Basis/ETC1S | ~200KB | Transcode + upload |
| Basis/UASTC | ~800KB | Fast transcode |
Caching
- WASM binaries:
Cache-Control: public, max-age=31536000, immutable - JS glue code:
Cache-Control: public, max-age=31536000, immutable - HTML:
Cache-Control: no-cache(always revalidate) - Assets:
Cache-Control: public, max-age=2592000
Profiling
Browser DevTools
- Chrome Performance tab: CPU flame chart, GPU timing
- Firefox Profiler: WASM call stacks, memory allocations
- Safari Web Inspector: Canvas and GPU profiling
In-Engine Metrics
The PerfStats struct in sg_demo_shared tracks:
- Frame time (ms)
- FPS (rolling average)
- Frame time variance
- Entity count
The performance overlay displays these in real-time during demo execution.
stress-lab Demo
The stress-lab demo is purpose-built for benchmarking:
- Auto-scales entity count until frame rate drops below target
- Reports sustained FPS at various entity counts
- Detects GPU thermal throttling
- Tests quality tier transitions
Common Pitfalls
- Allocating in the hot loop: Pre-allocate buffers and reuse them each frame
- Excessive component changes: Adding/removing components triggers archetype migration
- Unbatched draw calls: Always batch sprites; never draw one at a time
- Synchronous asset loading: Use async loading to avoid blocking the game loop
- Debug builds in the browser: Always test performance with
--releasebuilds