20091201

PixelJunk Shooter Youtubes



20091124

Battlefield: Bad Company 2

Been playing the PS3 beta because it is an awfully addictively fun game. Don't think I will ever want to go back to a game with a trivially static play area again, no matter how pretty the developer makes it. Baked global illumination, forget it, just not good enough any more. Nothing compares to the fun factor of being able to leave your mark, to be able to interact with your surroundings, and to do that in a tank!



The point of excellence in game design is when a game provides more than just the sum of canned content generated in production. Otherwise a game is simply an interactive movie: you play to see the next scene, and when the end is found, there is not anything more to it. Sure there is the market for the interactive novel, but games since the dawn of humans have been about enjoying the interaction between people in a system engineered to produce fun in the process. By this definition, BBC2 multi-player beta definitely captures the essence of what it is to be a game.



Other devs out there should be taking notes.

You've got greater than 1 Tflop to play with on high end GPUs in the PC space, over 100,000 times more processing power than many of us started programming with, and more to come in future years.

What are you going to do with that?

Many of you are simply going to increase your resolution, provide a more pretty interactive novel. Others are going to push forward with something new, provide gamers with an interactive experience which brings renewed energy into the industry, and learn to wield the scaling of the parallel machine for something beyond just graphics!

20091120

Real Time Global Illumination Using Temporal Coherence

Martin Knecht has posted a video and his thesis on Real Time Global Illumination Using Temporal Coherence.

20091119

Time Flys and the Blog is Lonely

Too busy to write but links of interesting stuff,

Johan Andersson's Parallel Futures of a Game Engine
Direct To Video on Deferred Rendering in Frameranger
Sander van Rossen's Progress on Virtual Texturing
Aras's Post on Deferred Shadow Maps
Stochastic Progressive Photon Mapping by T. Hachisuka and H. W. Jensen
Fermi spotted on Twitter
SC09 Papers
State-Based Scripting in Uncharted 2: Among Thieves
Quilez's Mandelblub

20091107

Demo Tube









20091101

Link Soup

meshula.net : Stone Soup - Thanks Nick for this post reminiscing of the exotic time before the common C compiler!

Real-time Parallel Hashing on the GPU - Neat paper, and poster at GTC. Builds upon Cuckoo hashing (using N>=2 hash functions instead of one, lookup requires at most a check of N places, insertion requires recursive eviction and insertion of the filled bin using the other hash functions), cuckoo hashing for N=3 hash functions can almost achieve 90% hash table occupancy. Paper presents parallel cuckoo hashing. First step is to use a high level hash function to divide the input into bins sized to fit into the local store, followed by parallel version of cuckoo hashing for the bin. Read the paper for more details.

Stochastic Progressive Photon Mapping - PPM extended to compute the average radiance over a region instead of a point.

Amortized Supersampling - Interesting, still problems with rapidly changing shading such as specular highlights, but great progress towards a complete solution.


Rudebox by Alcatrax




Jellyfish

Visited the Monterey Bay Aquarium the week before Halloween,



20091028

Random End of October

Looks like Insomniac updated their website: Link to R&D Page. Going to have to finish Uncharted 2 before I get onto A Crack in Time however. Been meaning to watch Nürburgring 24 Hour Race in 3D.

20091023

GPU Technology Conference

GPU Technology Conference Main Page | GTC Blog
Screen-casts plus audio recordings for the sessions are now starting to get posted. Look in the session catalog/calendar for the links.

20091020

1489% on 8-core machine

No not 14.89x faster, but rather 14.89x slower!

False sharing is no fun from Joe Duffy's Weblog. Really like this post because of the simplicity as to which it shows the problem of sharing cache lines on the typical multi-core CPU.

If one were to assume that you'd always be exactly 14.89 times slower when sharing cache lines (clearly this would vary) across 8 other cores, and you wanted this slow down to only add 6.25% (or 1/16) time overhead to your code, you would want to only share cache lines about 0.45% of the time.

Actual number here is not the point (and wouldn't be correct anyway), but rather that on the CPU, parallel performance is found by doing a majority of computation in effectively isolated memory regions.

20091013

Naumachia

Found this thanks to a twitter post from Jerome Liard (@blackjero) at Q-Games.



naumachia.aureasection.com

The "Kill Zone 2" effect (particle and transparent effects really make the engine) applied to 3D space combat. Impressive work from a team of just three developers!

20090930

Fermi

NVIDIA Fermi Architecture Page

Download the white paper for some architectural details.


Personal View

Yeah the architecture is awesome! For HPC, double precision fused multiply add at 1/2 the single precision rate, ECC memory support, and 40-bit address space support! For compute, configurable shared memory/L1 split, function and data pointer support via a unified address space, faster global atomics, unified L2 and more...

20090925

Last Night in Wisconsin

Yeah, last night in Wisconsin. Could talk about how I've actually physically moved out of Wisconsin and back to my Illinois Condo the weekend before September 11th when my lease was up, and how that has resulted in not having the ability to do any interesting CUDA and GPU computing programming for the past few weeks, but likely that would be too boring!


Prior Dormant Hobby

Had a 1969 911E which I started to build into a track car years ago and never finished. Sold off the car, trailer, and related stuff this Wednesday. Since I had this huge stalled project (full tear down and rebuild of the entire car), the racing bug was effectively squashed, the wife would have killed me if I bought another car without selling the old project. Actually I ended up doing just that! Made matters worse that the 911E body and trailer was in her parents barn. Also bad that I took her WRX to the track a few times and destroyed the rotors which I didn't fix for a year which ultimately resulted in her taking the car in to get the brakes done locally. Kathryn, you really are the best wife ever! Yes, she reads this blog once and a while ... I'll get an email saying, "I actually understood that post" ;)

With the project car gone, and a very track friendly daily driver to replace it, the racing bug is biting back hard.



Found that video via this rx7club forum post while looking for someone who is using a G-Force Transmission dog box in a road racing track car.

Need to do the brakes, brake lines, harness, and more first, but a G-Force dog box conversion of the Tremec T-56 in my car, is on the list long before adding more HP. Yes it is still going to be my daily driver, no the sound doesn't bother me, and yes one of my old daily drivers didn't have heat, air conditioning, or an interior (so safe to classify me as crazy with respect to what I drive).

Looks like Thunder Hill Raceway is less than a 3 hour drive from where I will be living...

20090924

Function 09: Behind Elevated

Function 09 - Behind Elevated

Pixel Junk Shooter: TGS 09 Gameplay!


More videos and images can be found on Gigazine

20090923

R800

R800 info is out, architecture looks similar to R700. Hard to get a clear answer on if texture filtering has indeed moved completely to the SIMD units (EDIT: perhaps not, see new comments/edits). Also have not seen any definitive answers with regards to triangle setup performance and info on append/consume performance. HD 5870 specs appear to be as follows,

EDIT: hardware.fr tests show same triangle setup rate as R700. No parallel triangle setup, but perhaps double samples rasterized per triangle per clock. Very interesting indeed!

EDIT: ATI's Dave Baumann on B3D post: "Texture interpolators have been removed from the design and is done on the shader core. In general we are seeing this as a performance improvement - its also the reason why one of the Vantage feature test gets a disproportionate increase over the previous gen."

OVERALL
- 2.7 Tflops single precision.
- 544 Gflops double precision.
- 153.6 GB/sec bandwidth.
- 20 SIMD cores.

TEXTURE
- 272 Gtex/sec 32-bit pixel unfiltered samples.
- 68 Gtex/sec bilinear filtering.
- 1 TB/sec L1 texture cache.
- 16KB TEX L1 per SIMD core (320KB total).
- Texture units able to read compressed AA color buffers.
- Texture filtering done in SIMD units (according to AnandTech)???

COMPUTE
- 8KB extra compute L1 per SIMD core (160KB total).
- 32KB local data store (640KB total).
- 64KB global data store.

MC/RBE/ROP
- 435 GB/sec L2 cache bandwidth.
- 128KB L2 cache per memory controller (512KB total).
- 4 64-bit memory controllers (MCs).
- Render Back-Ends (RBEs) can process 32 pixel/clk.
- RBEs look to be divided across the 4 MCs.
- Fast color clears.

SIMD
- EDIT: vertex attribute interpolation done in SIMD units.
- EDIT: Support for full speed 24-bit integer math.
- Dedicated Sum of Absolute Differences instruction.
- Faster Dot Product instruction.
- Required support for DX11 instructions.
- Full speed denormals?

20090921

Bad Industry Humor: To 12 MPix and Beyond!

Second post in the awfully-bad-industry-humor or would-have-been-better-off-sleeping catagory,

"At first we thought it would be a great idea, the new ATI GPU was to have something like 10x the bandwidth and 20x the compute power of the Xbox 360. So what was working at 720p on the 360 should work about the same on a six 1080p display array off the new GPU. Napkin math said peak capacity amortized to around 8 Kflops/pixel and 800 bytes/pixel bandwidth at 720p/30fps on the 360 and about the same targets on the 12 MPix (multi-monitor) output of the new GPU.

The project would be simple ... (just use nearly the same engine, add tessellation and up the texture quality) ... or so we thought.

Early on, decided to go with virtual texturing. With deferred shading, needed about 380 MB alone for the G-buffer (2xMSAA 16B/pixel). Another minimum of 1 GB would be needed for four layers of a compressed 16Kx16K virtual texture. This virtual texture would provide 21 four layer texels per screen pixel (a great ratio).

That is when the problems started.

Artists used to work with 2Kx2K source stamp textures at maximum size. Now they were required to work at 8Kx8K just to be safe. This required an upgrade of all the artists machines to 64-bit Photoshop CS4 to support enough address space so they could have more than 16 layers active in Photoshop without swapping to disk (we use 16-bit per channel source images). Had to get rid of all the Mac machines because 64-bit wasn't supported on the Mac with Photoshop.

Old source material just would not cut it with the new 64 Mpix texture resolution requirement. We had to purchase a few 40 Mpix Phase One digital camera backs and special medium format camera gear just to gather good source photos (at a cost of about $30-40K US per each group of camera/back/lenses/etc). Also had to hire a special photographer to help train the artists on the new gear (teach details like how to choose the proper aperture for sharp photos, etc).

Furthermore our publisher would only allow us to use four dual-layer Blue-Ray discs, which is a measly 200 GB of data. The art department was outraged, what is the point of having 1 to 2TB desktop drives when the game can only use 200 GB of space? Had something to do with the publisher saying that 12x speed BR drives would require a little over a one hour install to the HDD, and that was the upper maximum time that the average user would stand when every 16 to 20 minutes they had to put in another BR disk to complete the install.

And don't get me started on the required upgrade to our in-house GI farm, storage solution, and network ... lets just say re-baking the lighting and virtual texture was almost measured in weeks not hours.

In the end the project got canned and we ran out of money, going for 10x the resolution and quality on 2x the budget of a console title just was not possible yet. We never should have bet on technology that was going to only sustain or reduce the bandwidth (or compute) to pixel ratio ... or worse yet, require hanging an array of monitors from the ceiling.

Later we heard of another post DX11 launch title going the opposite direction, doing something VGA retro, real-time photon mapping at 320x200 at 30Hz with motion blur and a real camera lens model. The graphics engine was called something like The Turing Engine because most viewers could not tell the difference between the real-time 320x200 rendering and a VHS tape of similar real life source video. Something like 2 Mflops/pixel and 100 KB/pixel of bandwidth at 30 fps, damn, why didn't I think of that!"


If for a second you took any of that seriously, you really need to fill up on coffee before hitting the early morning blog run!