Thursday, June 1, 2017

Porting GHEngine to VR Part 1: VR GUI Rendering

We're porting our in-house engine to VR!  Namely to Oculus because they were generous enough to help us out with a development headset.  This version of the engine is in use for one released game on iOS, Android, Windows Store, and OSX.  It also has 2 in development games including one for Steam.

2D on-screen GUIs have worked for all other platforms we've ever worked on, but they don't work in VR.  The images are too close to your face and cause a cross eyed view while looking around.  The answer is to move the GUI out into the world.  However, we don't want to have to make a whole new GUI system.  Our existing system for loading and interacting with menus is fully implemented, working, and took an immense amount of time.

We introduced a new concept to our gui system we're calling a canvas.  It's basically an output location for our 2d gui system.  Each gui widget gets assigned a canvas via data inheritance.
class GHGUICanvas
GHGUICanvas(bool is2d, const GHPoint3& pos, const GHPoint3& rot,
const GHPoint3& scale, GHMDesc::BillboardType bt);

bool is2d(void) const { return mIs2d; }
void createGuiToWorld(const GHViewInfo& viewInfo,
 GHTransform& outTrans) const;

bool mIs2d; // projected to screen or not.
GHPoint3 mPos;
GHPoint3 mRot;
GHPoint3 mScale;
GHMDesc::BillboardType mBillboardType;
We then moved the creation of transform for gui space (0-1) to screen space (-1 to 1 in d3d) from the shader to the CPU by using a per-object shader callback.  If there's no canvas or it specifies 2d we just pass a simple guispace to screenspace converter transform.  Otherwise we multiply offset, scale, rotation, position, and WorldViewProj together in that order and pass that to our gui vertex shader.

And the result:

Friday, August 7, 2015

All good programmers are lazy

I'm going to let you in on a secret: good programmers are lazy.  You want them to be lazy.  Don't hire a programmer that isn't lazy!  Everything I've learned about making programs better is about being a more lazy developer.

Variable and function names

float bob(float bob1) {
const float bob2 = 1209381902312;
float bob3 = bob1 * bob2;
return bob3;
float bobble = bob(98123);
If you see this function call, it's immediately obvious what it's wait, it really isn't.  You see it and have to go find the definition, read through it, and try to make sense out of it.  This is why you use descriptive names for everything: to be lazy.
float createPrivateKey(float seed) {
const float secretMultiplier = 1209381902312;
float ret = seed*secretMultiplier;
return ret;
float privateKey = createPrivateKey(98123);


Comments in code exist to let you be lazy.  You don't want to have to read through code every time you call a function.  If the name isn't descriptive enough add a comment.  If there's a block of tricky code add a comment.  Read a simple complete sentence instead of dozens of lines of code.

Code Reuse

You find yourself typing the same code more than once?  Why are you doing that?  You are supposed to be a lazy programmer!  Wrap that code in one place and call it wherever you need it.  Then for bonus laziness you can fix any bugs in one place and not 400.


The project spec changed and now you are using OpenGL instead of DirectX?  Time to change every file in the project.  It changed again and now you are using DirectX?  Go back and change every file in the project.

If you were a good lazy programmer you'd have an interface in place already.  Switching between the two should involve changing one line of code.

Spaghetti Code

You have a 10,000 line function and have to make a change in it.  Enjoy spending the next 3 hours reading through it all.  The lazy programmer next to you has their code broken into functions that do one thing each and can easily find the place to change.

Source Control

SVN and Git were invented to let you be lazy.  Make a bunch of changes and suddenly nothing works?  If you have no backups you could spend months trying to get back into a working state, or you could revert.  A bug showed up last Thursday?  You could look at the whole program, or you could look at the SVN logs to see what changed that day.


A function can't be called with a value less than 0?  Make the function argument an unsigned int.  It can't be called with a value less than 10?  Add an assert!  If possible add a compile-time assert.  Now you don't have to spend hours debugging a crash because you passed an invalid argument.

Unit Tests

Unit tests exist to allow you to be lazy.  Does your change break the math library?  You could spend your time trying to make sure everything works, or you could hit compile and let the unit tests tell you what is broken.

Continuous Builds and Integration

You just made a bunch of changes and checked in.  Did you break anyone else?  Did you drive home and get a phone call that you broke it and have to drive back?  Also, your whole team wants to be lazy with you.  They do a get and find that they can't compile and have to go around and find who can fix it?  This is why continuous builds were invented.  Check in and wait for the green light.


Always ask how lazy you can be when programming.  Are you adding work down the line when a bug shows up or the project specification changes?  Are you adding work now that won't pay off later?  More importantly are you adding work for other lazy programmers to have to deal with?

Calling someone a hard working programmer is almost an insult.  No one with a clue is impressed by a programmer who works 14 hours a day accomplishing what a lazy programmer could do in 2 hours.

Thursday, August 6, 2015

Android JNI and pthreads

Here is a quick gotcha for using pthreads that make JNI calls on Android: the JNIEnv is not shared between threads.

This simple implementation of threads will crash on any JNI call.
struct ThreadArgs
GHRunnable* mRunnable;

void* threadLaunch(void* arg)
ThreadArgs* threadArgs = (ThreadArgs*)arg;
delete threadArgs;

void GHAndroidThread::runThread(GHRunnable& runnable)
ThreadArgs* threadArgs = new ThreadArgs;
threadArgs->mRunnable = &runnable;

pthread_t myPThread;
if (pthread_create(&myPThread, NULL, threadLaunch, (void*)(threadArgs)))
GHDebugMessage::outputString("Error creating pthread");
 You need to call AttachCurrentThread and DetachCurrentThread to allow JNI access.
struct ThreadArgs
GHRunnable* mRunnable;
JavaVM* mJVM;

void* threadLaunch(void* arg)
ThreadArgs* threadArgs = (ThreadArgs*)arg;
JNIEnv* jniEnv;
bool attachNeeded = true;

int status = threadArgs->mJVM->GetEnv((void**)&jniEnv, JNI_VERSION_1_6);
if (status == JNI_OK)
attachNeeded = false;

if (attachNeeded)
threadArgs->mJVM->AttachCurrentThread(&jniEnv, NULL);


if (attachNeeded)

delete threadArgs;

void GHAndroidThread::runThread(GHRunnable& runnable)
ThreadArgs* threadArgs = new ThreadArgs;
threadArgs->mRunnable = &runnable;
threadArgs->mJVM = mJNIMgr.getJVM();

pthread_t myPThread;
if (pthread_create(&myPThread, NULL, threadLaunch, (void*)(threadArgs)))
GHDebugMessage::outputString("Error creating pthread");
 Lastly any JNI calls should probably include the following wrapper to get the correct env for the current thread.  I tried it without this code and didn't have problems but the docs do say it's a different JNIEnv per thread.  Maybe it would crash without this on a different Android version.

JNIEnv& GHJNIMgr::getJNIEnv(void)
JNIEnv* env;
int status = mJVM->GetEnv((void**)&env, JNI_VERSION_1_6);
if (status < 0) {
GHDebugMessage::outputString("Failed to get JNIEnv");
return *env;

Sunday, August 2, 2015

Updating Win32 and 8.1 apps to Windows 10

Win32 apps:
  1. Open your project in visual studio.
  2. Make new configurations ReleaseWin10 and DebugWin10 by using the configuration manager and copying values from the existing configs.  This will let you continue to compile on older versions of windows.
  3. Right click each project and go to properties.  Find the "Platform Toolset" and change to Visual Studio 2015.
  4. snprintf behavior is different from _snprintf.  If you had a project define snprintf=_snprintf, you will need to remove that define and make sure your code expects the different behavior.  If you have an 8 size buffer and call _snprintf(buf, 2, "aa") you will get "aa)(#(!@".  If you call snprintf(buf, 2, "aa") you will get "a\0)(#(!@".
  5. Compile and cross your fingers.

Windows 8 Store apps:

  1. Watch this awesome video:
  2. Check to see if you have the UAP/UWP tools installed by looking in "\Program Files (x86)\Windows Kits\10\Platforms\UAP".  If that folder doesn't exist then download the Visual Studio installer, run it, and click modify.  Under features make sure "Universal Windows App Development Tools" is checked.  Go on vacation and come back in 2 weeks once it is installed.
  3. Make copies of your windows 8 projects in a new folder called something like Win10Projects.  This way the update is non destructive and you have a reference for how things should look.
  4. Update MSAdvertisingXaml Version 8.1 if necessary.  Open the project and look for a References folder.  Remove that reference.
  5. Go here and follow the instructions
  6. If you had any snprintf=_snprintf defines, remove them.
  7. Remove any ApplicationSettings:SettingsPane stuff.  
    1. [SettingsPane may be altered or unavailable for releases after Windows 10. Instead of using a SettingsPane, integrate settings options into the app experience. For more info, see Guidelines for app settings.]
  8. Remove any deprecated items from the AppxManifest.xml.  They will have a blue underline.

Saturday, March 22, 2014

Android Texture Optimization

tldr: We increased the terrain texture resolution for Android Big Mountain Snowboarding from 256 to 1024 by making some changes.

Colt McAnlis gave a talk at GDC on shrinking your png use in Android games that inspired me to go over some of our assets and code.  Our current setup is to use high res hardware compressed textures like pvr4 or dxt on platforms that have consistency, and use lower res uncompressed textures on platforms that do not.  For Big Mountain Snowboarding this means 2048x2048 textures on iOS and Windows are 256x256 on Android.  Sorry Android people!

Part of the size difference is because BMS is still supporting phones that no one uses.  When we originally released the game, phones didn't have enough texture memory to support the larger uncompressed textures.  This appears to not be the case anymore since we have released 4 newer maps with higher res textures without hearing many complaints.

However, file size of the apk prevents us from increasing the size of all maps.  We can't make all 16 maps be 1024x1024 without going over the 50 meg Google Play limit unless we make some changes to how we handle textures.

So at the start of this experiment, our assets folder is 32 megs, and the compressed apk is 33 megs.  We have some room to make the apk bigger, but not enough to increase the terrain resolution.  The first thing I tried was export for web from photoshop.

Export for Web:
8-bit pngs are not going to work.  These are very large pixels in world space due to the way we're handling terrain, so the banding is going to be very obvious.  It's hard to tell in that small image but it's there!  Using save for web with 24 bit pngs was not a big enough savings to worry about.

PNGCrush (lossless):
Using the option -reduce for lossless compression, the largest texture was reduced from 1469KB to 1344KB.  It's the right direction, but not really enough for me to increase the texture resolution.  Using -reduce -brute brought it to 1231KB.  I thought this might be enough to do a (time consuming) full run on all the larger terrain png files to see what it does to the apk, and 2 textures later decided to look for better options. (lossy): is a site that provides free lossy compression of png files.  Here we are throwing away color information to favor a smaller disk size.  The site was able to reduce a 1.4 meg png down to 700k.
The difference is noticeable, which is unfortunate.  The unacceptable banding from the 8 bit version does not exist though!  Considering this amount of savings would let us increase the texture size of all maps I consider it worth it.  After a short meeting with our art director (me) and our tech director (me) where there was much arguing and throwing stuff, both departments agreed that it was a worthwhile change to make.

Doing this for all 4 of the large resolution maps brought our assets folder down to 21.9 megs (from 32), and our apk to 23 megs.

Parkinson's Law:
We now have approximately 27 megs to fill up!  Our older maps are using 256x256 textures on Android, and we have 12 of those.  A 1024x1024 mountain takes up about 2 megs of space, so it looks like we have room to bring all the old maps up to 2013.  Our pipeline lets us generate as much detail as we like but I can save time by starting with the OSX versions which are 2048x2048 and shrink them down.  With increased texture sizes our assets folder is 75 megs.  After compression the assets folder is 49 megs, which pushes our apk over the 50 meg limit.  Dropping the resolution of the two biggest maps brings us to 45 megs, but they are still higher resolution than the original game.

But what about the gpu?
None of this stuff matters on the gpu, where the textures will be uncompressed.  We've had some 1024x1024 maps on the market for a while without hearing about too many issues, so the phone quality of the average user has gone up a lot in 4 years.

I stuck in a breakpoint and all our png textures are loading as RGBA_8888.  Our big terrain textures would benefit a lot in memory size from being RGB_565.  The textures are 3 megs as uncompressed 32 bit and we have 4 of them per map.
     //bitmap = BitmapFactory.decodeStream(filestream);
     BitmapFactory.Options bmpOptions = new BitmapFactory.Options();
     bmpOptions.inPreferredConfig = Bitmap.Config.RGB_565;
     bitmap = BitmapFactory.decodeStream(filestream, null, bmpOptions);
 According to my breakpoints, this code loads in all non-alpha textures as 16 bit, which seems to look ok.  If it doesn't look ok for some specific textures we'll have to add a way to disable the code.

Sunday, December 15, 2013

HLSL instruction count optimization

I found a simple way to print out the number of instructions in a compiled shader, which is an easy way to optimize compared to looking at total framerate.  Instruction count isn't the whole story about how fast a shader will run, but it's one thing to look at.

#define INITGUID
#include "D3D11Shader.h"
#include "D3Dcompiler.h"
#include "GHDebugMessage.h"
After loading the shader buffer:
ID3D11ShaderReflection* pReflector = NULL;
D3DReflect(fileBuf, fileLen, IID_ID3D11ShaderReflection, (void**)&pReflector);
D3D11_SHADER_DESC shaderDesc;
GHDebugMessage::outputString("Shader %s instruction count %d", shaderName, shaderDesc.InstructionCount);
We have a very unoptimized experimental shader.  I ran some instruction count measurements on some simple changes.

Starting instruction count: 272
Ending instruction count: 235
Instructions saved: 37

Adding lerp:

Shader sbterrainpixel.cso instruction count 270
(saves 2 instructions)

float4 color;
color.w = 1.0; = (*blendColor.y) + (*blendColor.x);
float4 color = lerp(cliffColor, groundColor, blendColor.x);


No difference with or without the swizzle. *= blendColor.z;
color *= blendColor.z

Vector Ops:

Shader sbterrainpixel.cso instruction count 262
(saves 1 instruction per function call)

float2 offsetUV = float2(offsetShadProj.x / offsetShadProj.w, 1.0f - offsetShadProj.y / offsetShadProj.w);
return tex.Sample(samp, offsetUV).x;

float2 offsetUV = offsetShadProj.xy / offsetShadProj.w;
offsetUV.y = 1.0 - offsetUV.y;
return tex.Sample(samp, offsetUV).x;

More Vector Ops:

Shader sbterrainpixel.cso instruction count 246
(saves 4 instructions per function call)

const float xPixelOffset = 0.0015;
const float yPixelOffset = 0.0015;
float4 offsetShadProj = shadowPos + float4(offSet.x * xPixelOffset,
offSet.y * yPixelOffset, 0.0, 0.0);

const float2 pixelOffset = float2(0.002, 0.002);
float2 multOffset = offSet.xy * pixelOffset;
float4 offsetShadProj = shadowPos + float4(multOffset.xy, 0.0, 0.0);

Again Vector Ops:

Shader sbterrainpixel.cso instruction count 245
(saves 1 instruction)

float4 trailColor = TrailTexture.Sample(TrailTextureSampler, float2(input.trailPos.x / input.trailPos.w, 1.0 - input.trailPos.y / input.trailPos.w));

float2 trailUV = input.trailPos.xy / input.trailPos.w;
trailUV.y = 1.0 - trailUV.y;
float4 trailColor = TrailTexture.Sample(TrailTextureSampler, trailUV);

Bad Code:

Shader sbterrainpixel.cso instruction count 239
(saves 6 instructions)

float shadInBorder = saturate(step(0.95, shadCenter.x) + step(0.95, shadCenter.y) +
step(shadCenter.x, 0.05) + step(shadCenter.y, 0.05));
// todo: more efficient. -= (*(1.0-shadTot) * (1.0-shadInBorder));

// if we're on the border, come up with a value bigger than 1.
float shadInBorder = step(0.95, shadCenter.x) + step(0.95, shadCenter.y) +
step(shadCenter.x, 0.05) + step(shadCenter.y, 0.05);
// multiply color by the shadow value, unless we are on the border. *= saturate(shadTot + shadInBorder);

More Bad Code:

Shader sbterrainpixel.cso instruction count 235
(saves 4 instructions)

float shadInWideBorder = saturate(step(0.95, wideShadowUV.x) + step(0.95, wideShadowUV.y) + step(wideShadowUV.x, 0.05) + step(wideShadowUV.y, 0.05)); -= (*(1.0 - wideshadTot)) * shadInBorder * (1.0-shadInWideBorder);

// apply wide shadow if we are not in the wide border and are in the short border.
float shadInWideBorder = step(0.95, wideShadowUV.x) + step(0.95, wideShadowUV.y) + step(wideShadowUV.x, 0.05) + step(wideShadowUV.y, 0.05); *= saturate(wideshadTot + shadInWideBorder + step(shadInBorder, 0.9));

Sunday, June 23, 2013

C++ Subsets

C++ is my language of choice.  There's really one big unavoidable reason for this: Outside of C and maybe HTML5/javascript it's the most portable language there is, and I like having the choice of using the extra features C++ provides over C.  Part of our business plan is to jump on new platforms early.  Since we don't have to support most of the community we can beat Unity to market by a month or two except when they get early access.  This leads to lots of sales we wouldn't otherwise get.

Outside of that simple business reason, I really like C++.  It's built on the foundation of being able to ignore any features you don't like.  It's also huge and not a lot of people understand everything inside it.  Many companies define their own individual subset of features they allow and don't allow.  Some of these have good reasons and others are just traditions.

This is not a post about what all companies or even you should use when dealing with C++.  It's just the current state of our own guidelines.  We currently work with iOS, OSX, Win8, Win7, WP8, Android NDK, and Blackberry.  We have also worked with Wii in the past, and have been looking at some more exclusive platforms.

I've been holding off on the C++11 features because not all platforms we work with had full support for it yet.  This has been changing rapidly.  It might be time to lift the veil on it and start using at least some limited features but I need to research the support first.

Operator Overloading:
Operator overloading is not banned and is used in our code, but carefully.  I'm not a big fan of operator overloading for two reasons.  Firstly it can be non-obvious what is happening in the code with a cursory glance, especially if implicit casting comes into play.  Secondly some operators can lead to extra allocations that don't need to happen.  I really don't like the + operator for this reason and prefer to use += just for the reason that if + exists it will be used in some places without caution.  If I could make the IDE pop up a warning every time + is used I'd be more likely to allow it.  Some of the C++11 features might make this a non-issue.

I love templates.  Since I'm not working on embedded systems, code text size is more important to me than compiled code size.  Even so, Trick Shot Bowling's lib comes in under 1 meg.  We don't have vec2, vec3, vec4, mat16.  Instead we have GHPoint templated with type and count.  The amount of code we can share because of this is awesome.  I tend to think that companies avoiding templates due to compiled code size are often not aggressively shrinking their uncompiled code size and dependencies.

Also, templates can be a great optimization for removing virtuals where needed.

Virtual Functions:
Most of your code is not called at a frequency where virtual functions matter.  Most of your objects are not instantiated enough for the virtual function table pointer size to matter.  The trick is to figure out which parts of your code are too high frequency, and avoid virtuals there.  We don't use virtual functions inside TransformNode or Point due to volume of objects.  If we were using a software renderer we wouldn't use virtuals on a per-pixel basis.  It's a fuzzy line of where in the engine to stop using virtuals.

We fully allow STL everywhere.  Part of the reason we can do this is we only have two programmers who both have a pretty good understanding of what STL is doing.  We are not likely to grow a vector of concrete objects a bunch of times by repeatedly pushing back without first ensuring the vector is big enough to hold everything.

This has been a somewhat controversial subject among programmers in my career.  I have interviewed people who said we should ban STL and couldn't tell me why.  I always ask why when I encounter resistance to STL.  Some of the reasons are pretty sound.

1) STL is slow
I don't believe this has been true for many years.  Poor use of STL without understanding how/when it allocates is slow, or using the wrong container for the job is slow.  I don't expect people to achieve speed improvements by writing their own containers that conform to the same model as the STL containers.  There are always exceptions.

2) STL is not supported cross platform
This was true 3-4 years ago when the Android NDK first came out.  I don't currently know of any platforms that don't have good STL support.

3) You don't know what the implementation will do on different platforms.
This is partially true.  The STL spec provides some things you know to be true everywhere, and leaves others up to the individual compiler.  There's always the chance of a rogue implementation out there that isn't quite STL but conforms to the interface.

4) Dynamic allocations all the way down.
Map and set are really bad for causing memory fragmentation without spending a lot of effort on a custom allocator.  A misused vector can easily cause a ton of memory problems, such as often removing an item from the middle of a vector.  I have seen map replacements that stored entries of pairs in a vector instead of a tree which actually searched faster than map for under 1000 elements.  If I were to ban any part of STL it would be map/set, but they are currently flagged as use with caution, and use due to laziness but remove if it becomes an issue.

5) We use a fixed memory layout and STL causes problems with that.
I can't argue with this reason.  Having a completely fixed memory layout has a lot of advantages and is pretty difficult to do overall.  I'm not sure this is really required for any modern platforms outside of the Wii with its tiny Mem1.  We have chosen not to go down this path for development speed reasons.

6) STL has a complicated/hard-to-control memory pattern.
This is another reason I can't argue with, and I've been told I should look at EASTL which is something I intend to do.

I freely admit that I don't use RTTI simply because it used to be slow.  I have no idea if it's still slow or not.

Exceptions are banned partially for the "used to be slow" reason, and partially because I think the flow of control can become hard to understand.

Multiple Inheritance:
Multiple inheritance is currently banned partially for experimental reasons.  We looked at porting to WP7 using an automated C++ to C# converter with the old engine and were prevented from continuing due to our use of multiple inheritance.  The new engine instead uses inner classes of the type we would otherwise multiple inherit, such as MessageListener.  This does lead to extra boilerplate but overall feels cleaner and safer.

Deep Inheritance Trees:
There's no explicit ban on having a long inheritance structure but it is not used.  Probably the deepest we go would be Interface->Concrete->Specialization.  We prefer the has-a model to the is-a model due to placing a huge emphasis on re-usable widgets.

Raw Pointers:
We use them willy nilly.  If we had to deal with less experienced programmers this might be revised to only allowed in certain parts of the codebase.  I can see how this could make our codebase dangerous because of potential confusion about who owns the pointer.  We use a templated ref count wrapper for objects that have shared ownership.

Yup.  We use void* as return values in our loading code combined with a lack of RTTI and this has caused us problems.  The caller of loading something from xml or the resource cache needs to know what to expect from the data, and the data has to match up with those expectations.  I'm not sure I could begin to justify this in a larger company environment.  It does give us an extremely powerful loading structure with a tiny interface though.