Technobabble warning. In this article I assume you’re fairly familiar with deferred rendering/shading. If not, google it. There are numerous articles and presentations about what it is how it’s done in various games. That’s a lot more technobabble though.
Deferred rendering in Arcane Worlds
I use three ARGB8 textures for the G-buffer, so it’s 96 bits per pixel. There’s also a shadow buffer and light buffer textures. All of them are reused later in the anti-aliasing post-effect, so deferred rendering has the same video memory requirements as before.
I use stencil when rendering geometry, so I can draw sky after the scene to save fillrate.
In the G-buffer I store:
- 24-bit depth, packed into 3 channels.
- World-space normal, simply stored in 3 channels.
- Water factor (8 bit). Currently it’s either 0 or 1, but it’ll be used later to blend water based on its depth.
- Gloss factor (8 bit). Currently only used for water foam, but it’ll be used later for glossy parts of objects and buildings.
- Surface color (24-bit RGB). That’s surface albedo, not used for water.
- Glow factor (8 bit). Not used for now, it’s reserved for lava and other glowing things (like monster eyes).
So, world space pixel position can be reconstructed from depth, and we got normal and some other stuff.
The huge sun in Arcane Worlds should cast very soft shadows that fade out quite fast with distance from the caster. This means the shadow density along the light ray is not monotonic, which is a problem with shadow maps. Also, I wanted sky shadows (ambient occlusion actually) because it becomes very important at night without direct sun/moon light.
So, after experimenting a bit with shadow maps, I decided to do screen-space shadows using deferred rendering.
With deferred rendering, you can apply any number of lights in screen space, as a post-processing effect. With world position and normal in each pixel, and knowing light properties (position, color etc) it’s easy to compute lighting and accumulate it in the light buffer. But the same can be done to compute shadows.
Knowing caster and light properties, one can compute the amount of shadowing for that particular light in the particular pixel, and accumulate it in the shadow buffer. The simplest object is sphere, it’s defined by four numbers: 3 for position and 1 for radius. Fits nicely in a single float4 shader register.
This means I need to add special “shadow-casting spheres” to each object I model, but I’m fine with that since I was ready to make special shadow-casting meshes for thin objects anyway. And on the plus side, far LODs can use less number of bigger spheres or no spheres at all. Filling object shape with spheres is an approximation of course, but with very blurry shadows it works fine even with a few spheres per object. The stones in the 0.04 have 3 spheres per object.
Computing sky shadow aka ambient occlusion from sphere caster is simple and intuitive, while being mathematically accurate.
Sun shadow is a bit more complex, but can be done with a variant of aperture lighting (see previous technobabble article). Looking from the surface point (the pixel being shadowed) you see two circles: the sun and the sphere caster. The amount of intersection of these circles defines the amount of shadowing. It can be computed approximately of course, since shadows are very blurry anyway.
The accumulated shadow buffer texture is used then in the full-screen lighting pass to modulate sky and sun light respectively.
I store shadows as light scaling factors, so 1 is no shadow at all and 0 is full shadow (no light). Sun and sky (AO) shadows are stored in separate 8-bit channels, and I have 2 more channels free for 2 more lights. That’s sky, two suns/moons and one lightning strike (or other effect light) total, all with shadows. Should be enough for this game.
The shadow accumulation is done just by multiplying those factors, both in the shader and as render target blending.
I started the implementation with a simple full-screen pass for each shadow sphere. When I got it working, I started optimizing the application of the shader.
The screen is divided into cells (16×9 currently, but that’s easy to change) and each cell has a list of data to pass to the shader (float4 for each shadow sphere). When a shadow sphere is “rendered” its affected volume is projected to the screen and its data is added to all cells that the projected area covers. Then the data accumulated in cells is passed to the shader.
You can pass up to 10 float4 registers via interpolants in shader model 3, but I chose to pass 8 at most to simplify the implementation. In fact, I pass either 8, 4, 2 or 1 registers at once. First, I scan all cells and get data from those which have 8 or more entries in the list, storing the data and the cell coordinates in the dynamic vertex buffer. So after this pass, every cell has 7 or less entries in the list and I scan them again to pass data in packs of 4. Then it’s packs of 2 and finally 1 for odd number of entries in the list.
The technique is known and is usually used in deferred rendering to apply many lights which affect small screen area each. I’ll reuse it later to apply lights too, for spells, burning trees and the like.
To limit affected screen area, I have to limit the shadow range. For sky shadow, 4 times the radius is fine without any extra tricks, but the sun shadows are longer. I found that 8 times the radius range is acceptable if the end of the shadow is faded smoothly. I use a sphere to approximate affected volume, with center shifted away from the light by 3 times the radius of the shadow sphere, and its radius set to 5 times the shadow sphere radius. That way, I capture 8x radius shadow range away from the light and 2x radius around the caster for noticeable (in daylight) sky shadow.
Here comes the shader code:
#define MAX_RK 8
#define FADE_RANGE 4
#define MAX_RK2 (MAX_RK*MAX_RK)
#define FADE_RANGE2 (FADE_RANGE*FADE_RANGE)
float2 dp : POSITION0;
float2 org : POSITION1;
float4 data[NUMVEC] : TEXCOORD0;
float4 pos : POSITION;
float4 data[NUMVEC] : TEXCOORD0;
float2 tc : TEXCOORD9;
float3 wvec : TEXCOORD10;
float4 postm : register(c15);
float4 texsz : register(c16);
float4 viewx : register(c17);
float4 viewy : register(c18);
float4 viewz : register(c19);
VsOutput vsh(VsInput I)
O.pos=float4(pos, 1, 1);
float3 viewPos : register(c16);
float4 psh(VsOutput I) : COLOR
for (int i=0; i<NUMVEC; i++)
float d2=dot(dp, dp);
float dn=saturate(dot(dpn, norm));
// sky / ambient
float c=dot(dpn, sun_dir.xyz);
float ca=sqrt(saturate(1-sin2)), cb=sun_dir.w;
float cc=ca*cb, ss=sqrt((1-ca*ca)*(1-cb*cb));
float sun=smoothstep(cc-ss, maxc, c)*(1-ca)/(1-cb);
// fade with distance
return float4(shadow, 0, 0);