microsecond-accurate timing on Windows

mar · Post by **mar** » Sun Jun 03, 2012 9:45 am

diep wrote: Oh you could ask Johan de Koning, didn't he work a few years for ubisoft?

You realize they first have you sign 20 secrecy contracts before you can even deliver for $100 something? In some cases they just talk talk talk and still don't give YOU the contract job but throw it to the cheapest 3d world nation.

Yes I know that The King powered Chessmaster. I never worked for a big game company so can't say. Regarding 3rd world: It too happend in the company I work for, some people were fired because significant amount of work was offloaded to a subsidiary in India because it's much cheaper so I guess this is quite common.

diep wrote: However if you analyze yourself the benchmark results of the big games you will see that MAJORITY of games doesn't scale with the number of cores times Ghz.

So that's the simplest form of proof you can find. And very conclusive evidence.

They are total dependant upon the bandwidth to the RAM of the GPU and the bandwidth of the RAM of the CPU.

You can easily verify this yourself.

Also if you open a book on graphics algorithms you'll figure out that it doesn't need to be like that. In the end it's not steering a trillion pixels. It's steering just a few displays with objects inside.

If you draw things in a simple manner, then you obviously need huge bandwidth. If you do it more clever, it'll work great at a GPU from 10 years ago.

Realize however that they focus upon supporting all features in a manner that it works for you. Optimization is simply not the focus, unlike game tree search.

The luxury the game industry has when producing games is that those gpu's deliver effectively several teraflops each.

However if you use a few displays of 2560 * 1600 or something like that you'll realize that it effectively can do with just a bunch of gflop.

So the rest is inefficiency.

Ok Vincent, if we neglect the fact that GPU can't do everything, you still have to put the objects somewhere. So assuming the vertex buffers are stored/cached in GPU RAM and reused. So drawing an object reduces to one or more OGL/DX calls (oversimplified). The assets created by artists simply have to be stored somewhere. The best optimization is of course to not draw objects which you can't see. And that's of course not easy at all. Sure the GPU can lend a hand because it can do occlusion querries, which if i understand properly means representing an object with a few triangles and letting the GPU probe the z-buffer. Of course you can't wait for the query to complete so while it probes you do other stuff. It's tricky but it can be done, exploiting spatial and temporal coherence between consecutive frames.
You can also help determining the visibility on the CPU side, doing frustum culling is the simplest way. For indoor games, it's beneficial to use very low poly approximation of the solid structure of the world and portalize it. You can either precompute visibility (which has insane complexity and works well only if the number of portals is small) or you can simply render and clip through the visible portals.
However for outdoor games, the best thing you can do to help visibility on the CPU side is to use occluders. However this doesn't help that much because filtering through an occluder is costly. And if you have two occluders which overlap from the camera point of view it's costly to properly merge them. So the trick is to combine/balance approaches.
Another problem when rendering is fillrate (overdraw). Just render a bunch of translucent fullscreen quads in high resolution and tell me how many you need to get 10 fps, no matter how fast the GPU is.
Shadow volumes also waste fillrate but AFAIK they use shadow maps today anyway. Still the latter requires rendering (parts) of the scene to a render target texture and then doing projective texturing on the world geometry anyway.
So even if you use deferred shading so that shaders are applied to each nontranslucent pixel only once (and you have to do antialiasing manually in that case), you don't render each pixel only once.
And there are lots of particle effects used in today's games. Which means rendering lots of translucent objects. Sparks are no problem but consider a smoke. If you get close then almost each pixels has to be rendered multiple times. There's not much to do about it, except for particle trimming, which means approximating the texture with 2d convex hull. Of course this will save you fillrate a lot but it depends on the shape of the smoke texture. You don't want to render fullscreen quads in that case.
So perhaps it might be worth doing a z-fill pass, because modern GPUs probably use hierarchical z-buffer, which means they can reject some triangles without having to rasterize them and probing each z-buffer value. But I'm not sure about this one.
Another thing to keep in mind is to use as few state switches as possible (changing textures and so on).
Textures are another thing. I believe todays games can have textures larger than the size of the framebuffer. So you definitely want to use texture atlases and pack smaller textures into bigger one if possible. Also in order to reduce bandwidth, texture compression is applied. A very simple S3TC (DXT) can have good quality with only 4 bits per texel or more if alpha channel is needed (compare to 24-bit uncompressed RGB, that itself is a huge save and allows for larger textures). Of course this is lossy compression.
Or you can do the "megatexture" approach where you have a huge texture (32k x 32k or so) to cover the whole geometry. But you have to be able to do realtime streaming and caching in that case.
AFAIK Rage used that and ET:QW used that for terrain. Of course this only works well if the camera doesn't change fast because the fetches are delayed and you can't wait for it to load so it appears blurry until the desired detail is loaded.
So it's not as simple as it seems. For a forward renderer, it's beneficial to render nontranslucent objects sorted front-to-back because it nicely fills z-buffer. But translucent/transparent objects have to be drawn back-to-front.
I can't imagine that big engines would not render in batches or that they would waste bandwidth/fillrate. Simply because the consoles have much worse GPUs than todays highend in PCs. I don't claim that everyone has everything optimized to the max. You can't afford to be inefficient on consoles. And doing PC-only titles would be a suicide. There were cases where assets/texture quality had to be changed for console ports. But I believe that modern engines already support consoles directly.

diep · Post by **diep** » Mon Jun 04, 2012 1:25 pm

You make the mistake to assume these guys actually use algorithms when there is no need to use one.

They do not use an algorithm simply until they get forced to use one.

So on the CPU you can simply do things without any algorithm until there is a huge bottleneck somewhere that causes your game to get slower.

And not 1 second sooner they're gonna solve it.

That's how everything works on this planet.

Most software just barely works you know, usually just enough to power the hardware

If you look around in graphics that happens everywhere. Even the huge data gathering using the Adobe software, this also stores things randomly. Which ordering?

The formats themselves don't keep things ordered you know...

This isn't a problem. Just a matter of good hardware

In this case costing society several dozens of billions more a year, but heh - someone makes cash on it and if it is doing the job for you - why complain?

diep · Post by **diep** » Mon Jun 04, 2012 1:39 pm

I cut'n pasted a line:

"I can't imagine that big engines would not render in batches or that they would waste bandwidth/fillrate. Simply because the consoles have much worse GPUs than todays highend in PCs. I don't claim that everyone has everything optimized to the max. You can't afford to be inefficient on consoles."

So in the same sentence you start with 'i can't imagine', then you jump to console hardware and some of that console hardware is up to a factor 1000 slower...

Yet some of them run some of the most popular games that work both on windows as well as on consoles.

Please retry that sentence.

How would you like it as a gaming company that a windows product which is easy to illegal copy, is 100x better than a 100x slower console? (where copying also is possible yet a lot tougher and practical it happens less)

mar · Post by **mar** » Mon Jun 04, 2012 2:12 pm

Maybe you're too pessimistic in general, but i still believe that there are guys in the GI who care about how fast their engine runs. Simply because if it's faster, you can add "more stuff". And it's always good to have a reserve.
On the other hand i know real-life examples of how some developers work:
Like a guy saying huh, 256Meg block allocated here and there? Who cares? And then they ran into memory problems. Solution: let's do a 64-bit build.
Another example: Argued with a guy that bit shift right by 8 is way faster than a division by 255 (not the same thing but good approximation). With bit shift a factor of ~4 faster on old ARM.
Of course division by a constant was probably optimized to multiply by something then shift but still not bad. Neglecting the fact that ARMs up to v7 i believe don't have integer division instruction at all.
So yes i can imagine that many companies don't exactly develop the most efficient way so i basically agree with you. But they can't develop realtime apps that way. That was the whole point.
Of course there are also deadlines where a guy above who has no clue pushes to deliver something in time. Of course if you have a deadline there's not much time left to write efficiently, but that's a different story.

mar · Post by **mar** » Mon Jun 04, 2012 2:31 pm

diep wrote:I cut'n pasted a line:

"I can't imagine that big engines would not render in batches or that they would waste bandwidth/fillrate. Simply because the consoles have much worse GPUs than todays highend in PCs. I don't claim that everyone has everything optimized to the max. You can't afford to be inefficient on consoles."

So in the same sentence you start with 'i can't imagine', then you jump to console hardware and some of that console hardware is up to a factor 1000 slower...

Yet some of them run some of the most popular games that work both on windows as well as on consoles.

Please retry that sentence.

How would you like it as a gaming company that a windows product which is easy to illegal copy, is 100x better than a 100x slower console? (where copying also is possible yet a lot tougher and practical it happens less)

There's nothing wrong with that.
What I meant is that for many it's a must to support consoles, if they want to survive. Simply because of the reasons you just wrote.
And since consoles are much slower, how fast you think would run an "inefficient engine" on a console? A bunch of "O(N^2)" here and there and you'd have a slideshow instead of a game.

microsecond-accurate timing on Windows

Re: microsecond-accurate timing on Windows

Re: microsecond-accurate timing on Windows

Re: microsecond-accurate timing on Windows

Re: microsecond-accurate timing on Windows

Re: microsecond-accurate timing on Windows