Sunday, December 15, 2013

HLSL instruction count optimization

I found a simple way to print out the number of instructions in a compiled shader, which is an easy way to optimize compared to looking at total framerate.  Instruction count isn't the whole story about how fast a shader will run, but it's one thing to look at.

#define INITGUID
#include "D3D11Shader.h"
#include "D3Dcompiler.h"
#include "GHDebugMessage.h"
After loading the shader buffer:
ID3D11ShaderReflection* pReflector = NULL;
D3DReflect(fileBuf, fileLen, IID_ID3D11ShaderReflection, (void**)&pReflector);
D3D11_SHADER_DESC shaderDesc;
GHDebugMessage::outputString("Shader %s instruction count %d", shaderName, shaderDesc.InstructionCount);
We have a very unoptimized experimental shader.  I ran some instruction count measurements on some simple changes.

Starting instruction count: 272
Ending instruction count: 235
Instructions saved: 37

Adding lerp:

Shader sbterrainpixel.cso instruction count 270
(saves 2 instructions)

float4 color;
color.w = 1.0; = (*blendColor.y) + (*blendColor.x);
float4 color = lerp(cliffColor, groundColor, blendColor.x);


No difference with or without the swizzle. *= blendColor.z;
color *= blendColor.z

Vector Ops:

Shader sbterrainpixel.cso instruction count 262
(saves 1 instruction per function call)

float2 offsetUV = float2(offsetShadProj.x / offsetShadProj.w, 1.0f - offsetShadProj.y / offsetShadProj.w);
return tex.Sample(samp, offsetUV).x;

float2 offsetUV = offsetShadProj.xy / offsetShadProj.w;
offsetUV.y = 1.0 - offsetUV.y;
return tex.Sample(samp, offsetUV).x;

More Vector Ops:

Shader sbterrainpixel.cso instruction count 246
(saves 4 instructions per function call)

const float xPixelOffset = 0.0015;
const float yPixelOffset = 0.0015;
float4 offsetShadProj = shadowPos + float4(offSet.x * xPixelOffset,
offSet.y * yPixelOffset, 0.0, 0.0);

const float2 pixelOffset = float2(0.002, 0.002);
float2 multOffset = offSet.xy * pixelOffset;
float4 offsetShadProj = shadowPos + float4(multOffset.xy, 0.0, 0.0);

Again Vector Ops:

Shader sbterrainpixel.cso instruction count 245
(saves 1 instruction)

float4 trailColor = TrailTexture.Sample(TrailTextureSampler, float2(input.trailPos.x / input.trailPos.w, 1.0 - input.trailPos.y / input.trailPos.w));

float2 trailUV = input.trailPos.xy / input.trailPos.w;
trailUV.y = 1.0 - trailUV.y;
float4 trailColor = TrailTexture.Sample(TrailTextureSampler, trailUV);

Bad Code:

Shader sbterrainpixel.cso instruction count 239
(saves 6 instructions)

float shadInBorder = saturate(step(0.95, shadCenter.x) + step(0.95, shadCenter.y) +
step(shadCenter.x, 0.05) + step(shadCenter.y, 0.05));
// todo: more efficient. -= (*(1.0-shadTot) * (1.0-shadInBorder));

// if we're on the border, come up with a value bigger than 1.
float shadInBorder = step(0.95, shadCenter.x) + step(0.95, shadCenter.y) +
step(shadCenter.x, 0.05) + step(shadCenter.y, 0.05);
// multiply color by the shadow value, unless we are on the border. *= saturate(shadTot + shadInBorder);

More Bad Code:

Shader sbterrainpixel.cso instruction count 235
(saves 4 instructions)

float shadInWideBorder = saturate(step(0.95, wideShadowUV.x) + step(0.95, wideShadowUV.y) + step(wideShadowUV.x, 0.05) + step(wideShadowUV.y, 0.05)); -= (*(1.0 - wideshadTot)) * shadInBorder * (1.0-shadInWideBorder);

// apply wide shadow if we are not in the wide border and are in the short border.
float shadInWideBorder = step(0.95, wideShadowUV.x) + step(0.95, wideShadowUV.y) + step(wideShadowUV.x, 0.05) + step(wideShadowUV.y, 0.05); *= saturate(wideshadTot + shadInWideBorder + step(shadInBorder, 0.9));

Sunday, June 23, 2013

C++ Subsets

C++ is my language of choice.  There's really one big unavoidable reason for this: Outside of C and maybe HTML5/javascript it's the most portable language there is, and I like having the choice of using the extra features C++ provides over C.  Part of our business plan is to jump on new platforms early.  Since we don't have to support most of the community we can beat Unity to market by a month or two except when they get early access.  This leads to lots of sales we wouldn't otherwise get.

Outside of that simple business reason, I really like C++.  It's built on the foundation of being able to ignore any features you don't like.  It's also huge and not a lot of people understand everything inside it.  Many companies define their own individual subset of features they allow and don't allow.  Some of these have good reasons and others are just traditions.

This is not a post about what all companies or even you should use when dealing with C++.  It's just the current state of our own guidelines.  We currently work with iOS, OSX, Win8, Win7, WP8, Android NDK, and Blackberry.  We have also worked with Wii in the past, and have been looking at some more exclusive platforms.

I've been holding off on the C++11 features because not all platforms we work with had full support for it yet.  This has been changing rapidly.  It might be time to lift the veil on it and start using at least some limited features but I need to research the support first.

Operator Overloading:
Operator overloading is not banned and is used in our code, but carefully.  I'm not a big fan of operator overloading for two reasons.  Firstly it can be non-obvious what is happening in the code with a cursory glance, especially if implicit casting comes into play.  Secondly some operators can lead to extra allocations that don't need to happen.  I really don't like the + operator for this reason and prefer to use += just for the reason that if + exists it will be used in some places without caution.  If I could make the IDE pop up a warning every time + is used I'd be more likely to allow it.  Some of the C++11 features might make this a non-issue.

I love templates.  Since I'm not working on embedded systems, code text size is more important to me than compiled code size.  Even so, Trick Shot Bowling's lib comes in under 1 meg.  We don't have vec2, vec3, vec4, mat16.  Instead we have GHPoint templated with type and count.  The amount of code we can share because of this is awesome.  I tend to think that companies avoiding templates due to compiled code size are often not aggressively shrinking their uncompiled code size and dependencies.

Also, templates can be a great optimization for removing virtuals where needed.

Virtual Functions:
Most of your code is not called at a frequency where virtual functions matter.  Most of your objects are not instantiated enough for the virtual function table pointer size to matter.  The trick is to figure out which parts of your code are too high frequency, and avoid virtuals there.  We don't use virtual functions inside TransformNode or Point due to volume of objects.  If we were using a software renderer we wouldn't use virtuals on a per-pixel basis.  It's a fuzzy line of where in the engine to stop using virtuals.

We fully allow STL everywhere.  Part of the reason we can do this is we only have two programmers who both have a pretty good understanding of what STL is doing.  We are not likely to grow a vector of concrete objects a bunch of times by repeatedly pushing back without first ensuring the vector is big enough to hold everything.

This has been a somewhat controversial subject among programmers in my career.  I have interviewed people who said we should ban STL and couldn't tell me why.  I always ask why when I encounter resistance to STL.  Some of the reasons are pretty sound.

1) STL is slow
I don't believe this has been true for many years.  Poor use of STL without understanding how/when it allocates is slow, or using the wrong container for the job is slow.  I don't expect people to achieve speed improvements by writing their own containers that conform to the same model as the STL containers.  There are always exceptions.

2) STL is not supported cross platform
This was true 3-4 years ago when the Android NDK first came out.  I don't currently know of any platforms that don't have good STL support.

3) You don't know what the implementation will do on different platforms.
This is partially true.  The STL spec provides some things you know to be true everywhere, and leaves others up to the individual compiler.  There's always the chance of a rogue implementation out there that isn't quite STL but conforms to the interface.

4) Dynamic allocations all the way down.
Map and set are really bad for causing memory fragmentation without spending a lot of effort on a custom allocator.  A misused vector can easily cause a ton of memory problems, such as often removing an item from the middle of a vector.  I have seen map replacements that stored entries of pairs in a vector instead of a tree which actually searched faster than map for under 1000 elements.  If I were to ban any part of STL it would be map/set, but they are currently flagged as use with caution, and use due to laziness but remove if it becomes an issue.

5) We use a fixed memory layout and STL causes problems with that.
I can't argue with this reason.  Having a completely fixed memory layout has a lot of advantages and is pretty difficult to do overall.  I'm not sure this is really required for any modern platforms outside of the Wii with its tiny Mem1.  We have chosen not to go down this path for development speed reasons.

6) STL has a complicated/hard-to-control memory pattern.
This is another reason I can't argue with, and I've been told I should look at EASTL which is something I intend to do.

I freely admit that I don't use RTTI simply because it used to be slow.  I have no idea if it's still slow or not.

Exceptions are banned partially for the "used to be slow" reason, and partially because I think the flow of control can become hard to understand.

Multiple Inheritance:
Multiple inheritance is currently banned partially for experimental reasons.  We looked at porting to WP7 using an automated C++ to C# converter with the old engine and were prevented from continuing due to our use of multiple inheritance.  The new engine instead uses inner classes of the type we would otherwise multiple inherit, such as MessageListener.  This does lead to extra boilerplate but overall feels cleaner and safer.

Deep Inheritance Trees:
There's no explicit ban on having a long inheritance structure but it is not used.  Probably the deepest we go would be Interface->Concrete->Specialization.  We prefer the has-a model to the is-a model due to placing a huge emphasis on re-usable widgets.

Raw Pointers:
We use them willy nilly.  If we had to deal with less experienced programmers this might be revised to only allowed in certain parts of the codebase.  I can see how this could make our codebase dangerous because of potential confusion about who owns the pointer.  We use a templated ref count wrapper for objects that have shared ownership.

Yup.  We use void* as return values in our loading code combined with a lack of RTTI and this has caused us problems.  The caller of loading something from xml or the resource cache needs to know what to expect from the data, and the data has to match up with those expectations.  I'm not sure I could begin to justify this in a larger company environment.  It does give us an extremely powerful loading structure with a tiny interface though.

Friday, May 17, 2013

Google Play Game Services for Android setup tutorial

  1. Your app must use Google APIs instead of Android SDK.
  2. You must use a signed release build for testing, with the same SHA entered into the google play console.
  3. You must import the google-play-services_lib project into eclipse.  Just grabbing the lib is not good enough.


Step 1: Get the google-play-services_lib
  • Use the Android SDK Manager to download "Google Play services".  
  • Right click in the Package Explorer window and select Import.
  • Under "Android" select "Existing Android Code Into Workspace". Click next.
  • Under "Root Directory" click browse and select [android-sdk-dir]/extras/google/google_play_services/libproject/google-play-services_lib.
  • Make sure "Copy projects into workspace" is selected.  You want a copy of the version downloaded from google.

Step 2: Link your project to the google-play-services_lib
  • Right click your project and select properties
  • Select "Android"
  • Make sure "Google APIs" is selected.
  • Under library click "Add" and select google-play-services_lib
  • Also set the lib project as a reference.  Right click your project, go to project references, and make sure google-play-services_lib is selected.  Otherwise it won't export with your retail build.

Step 3: Set up the google play console.
  • Go to and sign in
  • Click the control pad icon to go to the new Google Play Game Services page.
  • Click "Add a new game" and fill out the details.
  • Select your new game, and click on "Linked apps".
  • Add a new android app and link to your store entry.
  • Click the "Authorize your app now" button after saving.
  • Enter your SHA1 signing certificate.  You won't be able to change this later.  To find your key, export a signed retail apk from eclipse and sign with the key that you use for publishing.  At the end of the process it will tell you the SHA1.
Step 4: Add yourself as a tester

  • Go to the google play games console for your app and click testing.
  • Add your google account as a tester.
If you skip this step you will see this in the logcat console:
Unable to retrieve 1P application 1234567890 from network
Unable to load metadata for game

Step 5: Set your app id in the manifest.
  • Grab your app's id from the google play console.  If you select the controller icon and then select your game you will see something at the top that looks like YourAppName - 1234567890
  • In your project in eclipse open res/values/strings.xml.
  • Add: <string name="app_id">1234567890</string>
  • Open your manifest file, and add this inside your application node: <meta-data android:name="" android:value="@string/app_id" />

Step 6: At program launch, make sure the player has Google Play Services installed.
Call this function on launch, and then again in onResume.  The first time it will send the player to the store for the package.  When the player comes back in to your app we will grab ahold of it.

// determine if the user has google play services installed.
//  if not, try to install it.
// should be called in onResume so we know that it got installed.
   public void validateGooglePlayServices()
        int checkGPServices = GooglePlayServicesUtil.isGooglePlayServicesAvailable(mActivity.getApplicationContext());
        if (checkGPServices !=
        mServicesAvailable = false;        Dialog gpsDialog = GooglePlayServicesUtil.getErrorDialog(checkGPServices, mActivity, 1);
        if (gpsDialog != null) {;
        mServicesAvailable = true;        }

Step 7: Create your GamesClient
private GamesClient mGamesClient = null;
private String mScopes[];
public void initClient()
if (!mServicesAvailable) return;Vector scopesVector = new Vector();
        mScopes = new String[scopesVector.size()];
        GamesClient.Builder gcBuilder = new GamesClient.Builder(mActivity, this, this);
        gcBuilder.setGravityForPopups(Gravity.TOP | Gravity.CENTER_HORIZONTAL);
        mGamesClient = gcBuilder.create();

Step 8: Following the official guide
The rest of the implementation is fairly straight forward. 

Step 9: Testing
  • Export a signed release build
  • Uninstall your app from your device
  • Install your new apk using [android-sdk-dir]/platform-tools/adb install yourgame.apk
  • Launch the game, and watch the log cat window in eclipse.

Friday, January 18, 2013

Cross platform SDK cheat sheet


Build C++
Navigate to directory where the jni/ file is (there may be multiple projects -- do them in order of dependency).
Type into Terminal:

Get Crash Log Info:
Type into Terminal:

~/src/android-sdk-macosx/platform-tools/adb pull /data/anr/traces.txt .
This will put crash info into traces.txt

Determine where are a crash is in C++:
android-ndk-r7b/toolchains/arm-linux-androideabi-4.4.3/prebuilt/darwin-x86/bin/arm-linux-androideabi-addr2line -C -f -e 000708e6

Export a signed android apk:
Right click on the project in eclipse
Go to “Android tools”
Click export signed apk
Uninstall builds from device, and then use adb to install the signed apks for testing

Install a signed android apk:
~/src/android-sdk-macosx/platform-tools/adb install SWNookTab.apk


Create a pvr texture
Download PVRTexTool from
From the command line use:
/Applications/PVRTexTool/PVRTexToolCL/MacOS_x86/PVRTexTool -f OGLPVRTC4 -i alley1icon.png -m -yflip0 -o ../GHBowlingiOS/alley1icon.pvr


Record a video
Go to applications and launch quicktime player
Run the mac build and resize the window to whatever
Quicktime menu bar: file, new screen recording
Click the little down arrow next to the record button to make it use microphone
Click the record button and choose record part of the screen.
Drag over the app window
Click record.

Touch all files in a directory
find . -exec touch {} \;

Get into Sandbox mode in GameCenter

Submit a new binary
In dev studio, Product->archive.
When that build finishes, organizer shows up.
Click the new binary in organizer, and click distribute.
Select Mac App Store
Log in to itunes connect when prompted
Select the application.  You must have already set the app to ready for upload on the web site.

To fix “does not contain a single-bundle application” error
From Tim Swast on stackexchange: “Turns out it is an issue with dependent projects in XCode 4. If this happens to you, go through the Build Settings for all your dependent projects (e.g. static libraries) and make sure that the "Skip Install" option under "Deployment" is set to YES.”

To get a receipt during development:
1) Sign the app with a development provisioning profile (not retail)
2) Make the app exit(173); from main.
3) Run the app once from finder (not from xcode)


Make a DDS file
contrib/texconv.exe -f [format] -o [outputdirectory] [file]
for format, use BC3_UNORM for textures that have alpha, BC1_UNORM for opaque textures

Windows Phone

Add data file(s) to the project (the C# project)
-Make an empty folder in the project to put the file(s) into initially
-Select Add -> Existing Item on the empty folder
-Multiselect all files you wish to add.
-Next to the “Add” button, click the arrow, select “Add as Link”
-Multiselect all the files in the previously empty folder. Right click and select “Properties”
-Set “Copy to Output Directory” to “Copy if Newer” Close the properties window
-Drag the files to the actual folder that the game will be looking for them in (ex: SB in the SBPhone project)

Data file Protip for the WinPhone C# project editor: Let the editor help you pick the right version of the files.
If you are including files that are versioned across platforms, add the highest priority folder first. EG: Add files from the SBWinPhone directory, then the SBWin8 directory, then the SBMac directory, then the SBIphone directory.
When you drag from the dummy folder into the real folder, it will only copy items that do not already exist in the real folder. You can then remove the remaining items from the project.

Query memory use:
(use %llu with printf)


To install a pre-built binary:
You can deploy your signed bar file using the blackberry-deploy utility included in the bin folder.   Follow the instructions here:  Make sure the device is in developer mode.

./blackberry-deploy -installApp -password DEVICE_PASSWORD_HERE -device IP_ADDRESS_OF_DEVICE -package /Users/YOUR_USER_NAME/Downloads/Apps_for_the_Dev_Alpha/