Watch the webinar and learn:
- The low-hanging fruit: basic optimizations
- How to read compiled MSIL code
- The struct versus class debate
- Optimize for the garbage collector
- Writing directly into memory with unsafe pointers
- Use dynamic delegates to dramatically speed up reflection
How to Write Very Fast C# Code on Vimeo.
For source code of the examples, please email Mark at mark@mdfarragher.com
Video Content
- Throwing exceptions (5:09)
- Fast String Handling (8:30)
- Fast arrays (14:26)
- Fast Loops (19:25)
- Fast Structs (23:00)
- Fast Memory Copy (28:09)
- Instantiation (31:51)
- Property Access (38:06)
- Q&A (45:32)
Webinar Transcript
Hello, everyone. I'm happy to be here. Let's get started. So I would like to talk about how to write very fast C# codes. I'm going to show you a couple of tips and tricks to speed up your C# codes. We're going to do lots of benchmarks to find out what kind of code is slow in C#, what kind of code is fast, and where the pitfalls are. Before we get started, there's a couple of offers that I'd like to bring to your attention. First of all, I have 10 courses on Udemy about C# programming. If you're interested, I've got a coupon called PostSharp15 that'll get you a 90% discount on any of my Udemy courses.
I've also copied all my courses to Teachable, to my own Teachable environment, and there I have subscription model where you can get access to any course for $9 per month, and future courses are included. So if you're in the subscription, any new course that I produce, you'll automatically get access, and I produce a course roughly once a month, once every two months if it's a slow month. So you can expect new content every month. Last but not least, the source code that I'm using in this webinar, I will send it to you if you send me an email. So just email me at the email address at the bottom here, and I will reply with a zip file of the solution, and you can play around with this actual code, do some benchmark testing of your own.
Okay. Let's get started. So these are the topics I want to cover in this webinar. I'm going to show you the overheads of throwing an exception. I'm going to show you how to manipulate strings, so how to handle strings in C#. I'm going to take a look at arrays, at the different types of arrays in C# and how they match up in terms of performance. I'll show you the difference between a For and Foreach loop and what that does to your performance. I'll show you structs versus classes. So I mean, you probably know that structs are slightly faster than classes, but how big is the difference? And is it worth the trouble of refactoring your codes? I'll show you how to copy a block of memory using different techniques, and I'm saving the best for last. At the end, I'm going to show you how to instantiate classes and how to do basic reflection in an extremely fast manner. I'm going to show you a piece of code that emits custom CIL instructions. So we're basically compiling C# on the fly, creating a custom assembly to do super fast reflection.
So let's get started. I'm using Visual Studio community edition to run this code on OSX. I've written a console program to do the performance measurements, and I'm using a base class called PerformanceTest right here to run the different performance tests. I have three methods here, MeasureTestA, MeasureTestB, and MeasureTestC. So these are virtual methods, and in any derived class, I will put the test code in here. The format is always that the first one, Test A, is the baseline test. So this would be unmodified slow code, and then we have two methods, B and C, available to try out different kinds of optimizations. And then for the actual performance test down here, the performance test is fairly simple. Basically what I do is I go through these Test A, Test B, and Test C methods, and I repeat them a number of times. So you can see there's a constant here, default repetitions with a value of 10. I basically repeat the tests 10 times to average out the effects of the garbage collector because an ill-timed collection of the garbage collector can really slow down one of the tests. So if we just run it 10 times, then we average that effect.
To measure performance, you're supposed to use the stopwatch class in C#. So when you're doing benchmarking, please don't use datetime, always use stopwatch because stopwatch is a specialized class for measuring times extremely accurately. So you can see I always start by restarting a stopwatch, doing the test, stopping the stopwatch, and then I have the elapsed milliseconds right here, and I'm adding that to a variable, and then in the end, you can see it right here. I'm returning the total value divided by the number of repetitions. So we're doing 10 repetitions. So we have the total elapsed time for every method and then divided by the number of repetitions gives you the average execution time.
Throwing exceptions
So let's get started. I have the program running right here. So it's a console program with eight tests pre-configured, and I can simply select a test and run it. So you can see the first one is exceptions. Let me go back to the code and show you the code. So the exception test is right here. So you can see it's just a class that's derived from PerformanceTest. So I get to implement these Measure A and Measure B methods. What I'm doing, check this out here in the constructor, what I'm doing is I'm filling a list of 1000 strings. So I've got a list of strings, 1000 strings, and every element in the list is a number, and the number has five digits. So I pick them from this list at random, and you can see I have an X here at the end. So what this code does is it mixes digits together to create random numbers, and 1 in 11 digits is going to be an X, which will make any number invalid.
So 9% of my list element population is going to be invalid and won't be able to be parsed. So what does my test do? See right here, Test A does a simple int.Parse and catches any format exception, and Test B does int.TryParse, and that's basically the only difference. So let me run the code, and we can see what happens. So I'm running Test 0, 100 iterations, and I include the baseline. So here we go. So now, it's doing the test and then doing it 10 times to average out any effects, and it'll show us the results in a little graph when it's done. There you go. So you can see that the slow down of an exception in int.Parse is massive. You can see we're looking at an execution time of 1.094 seconds for the Parse function and to the TryParse is 9 milliseconds. So this is a massive difference in performance, and keep in mind, only 9% of the numbers were invalid. So if you have a much higher failure rate in your data, much higher number of invalids that fills in your data, exceptions are really going to slow down your codes.
So the takeaway here is don't swallow exceptions in your codes. Don't catch an exception, and don't do anything with it. If you're parsing a lot of data, make sure that you validate the data before you try to parse it, and don't do it the other way around where you first parse the data, and then catch the exception, and in the catch block, you recover your code. I mean, you can do that, but it will really slow down the execution of your codes. Exceptions are super slow. They take roughly one microsecond to execute, and I mean, it's incredibly slow, and that's because they're intended for debugging purposes. They capture the stack trace. They capture the context of the executing threads, and they prepare all this debug information. So they're not supposed to be thrown in mission critical loops in your code. So first takeaway, don't catch exceptions. Avoid exception throwing as much as you can in mission critical codes.
Fast String Handling
Okay, so the next thing we're going to look at is string handling. So this is the classic example of C# performance optimization, and yet, I'm surprised by the number of people who don't realize this crucial difference. This is the code that does the test. The first, the baseline test simply builds a string. So we start with an empty string right here, and then, in a simple loop, I add one character to the string, and I do that 50,000 times. So in the end, I have a string of 50,000 characters, so that's 100 kilobytes on the heap. And that's it. The B test does the exact same thing, but it uses a StringBuilder. So instead of adding things together, it uses a StringBuilder, and it uses the Append method to append the character. The third test does build up a string incrementally, and it uses pointers.
So you can see here that I start with a character array of 50,000 characters. Then, I fix that character array on the heap, and I ask for a pointer to the block of memory, and then I declare my own pointer and initialize it with this initial value. So the pointer will initially point to the first element in the character array, and then in a simple loop, I use the pointer to directly write this character into heap memory. So this is a very interesting test because it will show us how much faster StringBuilder is compared to a regular string, and if it's worth your while to use pointers instead of using the StringBuilder to speed up the code even more.
So let's do the test and see what happens. So I'm running Test 1, the string test, 50,000 iterations with the baseline. Here we go. There you have it. So 553 milliseconds for the string, and the other two tests, they actually did run, but they are so fast that we can't see them on this resolution. So you're getting a hint already that the difference in performance is massive here. So let me run the test again. So now, I'm going to go to one million iterations, and now, I have to disable the baseline test because then, otherwise, we have to wait forever, and check this out. The StringBuilder takes two milliseconds, and the pointer operation is now so fast that we can't see it. So I'll do this again, and now, I'm going to do 100 million iterations, again, without the baseline test. Just wait for it, and here are the results. 452 milliseconds for this StringBuilder and 169 milliseconds for the pointer. So there is this massive performance difference between using strings and using the StringBuilder. If you want ultimate performance, you can use direct pointer operations on the heap, and that'll get you a three-fold, roughly, three-fold performance boost over using string builders.
So if you're wondering, why is the string so incredibly slow? It's because of this. Let me show it to you in a picture. When you append characters to a string, strings are immutable in .NET. So that means that any operation that modifies the string will create an entirely new string on the heap. So in the first loop iteration, we have an empty string, and then I add one character, so I get a string of a single character on the heap. Then, I add another character, and now, I have two strings on the heap, the original and the modified version. Then, I add a third character, and now, I have three strings on the heap, the original, the modified version, and again, the modified version, and so on and so on. So if I add 50,000 characters to a string, I end up with 50,000 disposed strings on the heap where each string is one character longer than the string before it. So it's a huge amount of data, and I'm flooding the heap with data, and I'm constantly doing this memory copy operation where the string is being copied to a new version and to a new version and to a new version.
So once we hit the higher end of the loop, like the high loop iterations, we're copying this block of 100 kilobytes on the heap, as like 100 K, and then 100 K plus 1 byte, and then 100 K plus 2 bytes, and so on and so on. So it's super inefficient. If you use a StringBuilder, it works the way you would expect. You have this buffer in memory that can hold any number of characters. You can declare a StringBuilder and specify the size, and then you simply write characters into specific locations in that buffer of memory. So naturally, the StringBuilder is much, much faster. Now, the fun part is that the StringBuilder, behind the scenes, the StringBuilder is actually using this code.
So internally, the StringBuilder fixes a block of memory and then writes directly to the memory using character pointers, and the only reason why we see a difference between these two blocks of code is because in Test B, we have the overhead of the append method, and that will slow down the code a bit. So going back to the results ... the StringBuilder, when you're modifying strings, always use a string builder because it's way faster. I mean, 50,000 iterations for the string and 100 million iterations for the string builder, I mean, that's pretty obvious. But if you want ultimate performance, use pointers directly. It'll give you another three-fold improvement over the StringBuilder.
Fast Arrays
Okay, moving on to arrays. Let me show you the array test. I have a very simple piece of code. I declare a three-dimensional array. So I have an array with three dimensions. Then, I have three nested For loops to fill the array, and in the innermost loop, all I do is increment every array element by one, so super simple. So that's Test A. Test B uses a one-dimensional array, and it‘s flattened. So I have a one-dimensional array that has the same size as the three-dimensional array, and I use this simple formula to calculate the index into this one-dimensional array using the I, J, and K variables, and then I do the same operation. Finally, I have a one-dimensional array. I use I, J, and K, but now, you can see, I simply have an index variable that starts at zero, then I increment it by one. So instead of using this formula with the multiplication and the addition, I just have a simple variable that incrementally goes to the entire array and initializes it.
So let's take a look at the performance of these arrays. So I'm running Test 2, 300 iterations with a baseline. Here we go. And here's the results. The three-dimensional array is slowest. The flattened, one-dimensional array is faster, and the array with incremental access is the fastest. Now, you might be wondering right now what's going on. Why is the three-dimensional array slower than a one-dimensional array with the exact same logic? I mean, if you think about it, this expression, the .NET framework needs to do this exact same calculation to find the memory location of this three-fold array index. So either I'm doing the formula, or the framework is doing the formula, but it's the same mathematical expression. So why do we see this difference? So to explain that I'm going to have to show you the intermediate language code of this compiled program, which I have right here.
So let's go to the array test, sorry, arrays test, plural. So here it is. Here's the array test class. This is the constructor. Let me just scroll down, and this is MeasureTestA. So the code to access the array element is right here. As you can see, this instruction loads a location, loads a local variable of the stack. So zero, one, and two are the I, J, and K variables, and then, this call does the array indexing, and you can see, it's actually a method call. It's an instance call to the array class, and within that class, it calls a method called Address, which expects three parameters. So behind the scenes, the .NET framework implements a three-dimensional array as a class, and any interaction with that class goes through methods, but now, let's look at the one-dimensional array. So that's right here, and the operations are here. So this is the code to index into a one-dimensional array, and the thing I want you to notice is this instruction here called load element address.
Load element address indexes into a one-dimensional array and returns the address of that element. So to work with one-dimensional arrays, the .NET framework, the .NET runtime has a specialized CIL instruction. So load element is optimized. It's specialized to work with one-dimensional arrays. So there's no method call. I don't have to go into a method and run some .NET framework code to get at the array element. It can all be done with CIL instructions. The only method calls in this block of code are these two, and they are only needed because my array dimensions are stored in this property. If I had used an array with a constant dimension, a constant size, this call wouldn't be there, and it would simply be a load instruction to load a constant value, and then, this entire block of code wouldn't have any method calls whatsoever. So the takeaway I want you to remember is that one-dimensional arrays in .NET, the intermediate language has optimized instructions for dealing with them. So a code that uses a one-dimensional array will always be faster than code that uses two, three, four, five, or six dimensional arrays because of the difference in implementation in the .NET runtime.
Fast Loops
Okay, back to the program. So the next test is a comparison of For and Foreach. So let me show you the code. The test is here. Here we go. So I create a list with one million elements. It's a list of integers, and I fill the list with random numbers, so super simple. One million integers, and every list element is just a random number, and then, this test uses a Foreach loop to loop through the list, and this test uses a normal For loop with an integer index variable, and that's it. So let's run those tests and see what happens. And here's the results. 273 milliseconds for the Foreach loop, and 112 for the For loop. So the For loop is roughly twice as fast as the Foreach loop. So to show you why that is happening, let me go back to the intermediate code. We can take a look at the compiled code and see how the loops have been implemented. So here's the test class. Let me scroll down. So that's the constructor, and here is MeasureTestA, and you can see that the Foreach loop uses an enumerator, a generic enumerator to loop through the list.
So the first thing you have to do is you have to call the list and call the GetEnumerator method, and that gets you the enumerator, and then, the enumerator itself has a Current property to access the current value of the current element you're looking at, and there's a MoveNext method that will move the enumerator to the next element in the list. A MoveNext is a ... You can see it's a Boolean method. It returns bool, and this branch instruction basically jumps back to 12. So it loops this bit of code as long as the enumerator returns a value, and as soon as MoveNext returns false, we reach this point, and this leave instruction will exit the methods. So you see that the Foreach loop is implemented by using an enumerator class and then repeatedly calling MoveNext and accessing the Current property to get the data. So that's a lot more overhead than a simple For loop, which is implemented here. It's this bit of code.
You can see that the For loop, it doesn't really use any classes at all. The only place where a class is being used, well, a method call is being used, is here when I access the elements in the list, but the loop itself is just this piece of code. So a For loop is going to be implemented with only a few CIL instructions, and it doesn't require any specialized classes. So that's why a For loop is much faster than a Foreach loop. Now, keep in mind, when you are looping through an array, there's no difference between the two because the C# compiler is very smart. If you use Foreach on an array, the compiler will generate a normal For loop behind the scenes. So you won't see any difference in performance, but for the more complex collection classes, you can see that there is a difference. It's about a factor of two.
Fast Structs
Okay, so moving on, structs versus classes. Let me show you the code, right here. So what I've done is I have declared a simple class that contains an X and a Y field, two integer fields, on a constructor to initialize those two fields. I've done the same thing as a struct. So this is basically the exact same thing, but now, it's a ... Whoops. Here, it's actually a bug in my code. It's not a struct or a class. Sorry about that. Let me quickly fix that then. Let me see if this works. This is the demo effect. There's always something that goes wrong. Exit and restart. So I've defined a class with an X and a Y field, and I've defined a struct with an X and a Y field, and then the only thing my code does is it fills a list with either classes or structs. So the C test fills a list with structs. The B test fills a list with classes, and the A test fills a list with classes, but look at this. The class has a finalizer. So when this class gets disposed, the finalizer will be called by the garbage collector.
So let's run that code and see what happens. It's default iterations. So there is the difference. The class with the finalizer takes 246 milliseconds. The normal class takes 111 milliseconds, and the struct takes 6 milliseconds. So that's quite a big difference. The reason for that difference is because of the way that structs and classes are implemented on the heap. When I create a list of classes, this is what the memory will look like. The reference to the list will be on the stack, so it's right here. The list itself is right here on the heap. The list has a number of elements. Each element is an object reference. So that'll be eight bytes in size, and the reference points to an entirely different location on the heap where the class is stored.
So if you calculate the amount of memory for eight megabytes list, you would actually look at 32 megabytes of heap memory because you have to store the list on the heap, and you have to store all the different point classes for all the data. Now, when this gets garbage collected, there's going to be a load of objects on the heap, not just the list, but also all these individual point classes, and they all need to be garbage collected to be disposed. If you use a struct, the memory layout looks like this. So we still have the list reference on the stack pointing to the heap. The list is on the heap, but now, the data, the struct is in line in the list itself. This is the difference between a class and a struct. Structs are stored in line within their containing type. Whereas, classes are stored separately, and the containing type contains a reference.
So now, the entire struct, it's two integers. So the entire struct fits inside eight bytes, inside an eight-byte element. So now, the entire data structure is only eight megabytes, and it's only a single object on the heap. So when the garbage collector has to clean up the memory, it goes to the list, disposes the list, and it's done. That's all it needs to do. So in these kinds of scenarios, using structs is extremely lucrative if you have lists with a large amount of data. The data itself contains only a few fields, like X and Y or X, Y, Z coordinates, think points, vectors, things like that, and you use the data for a short amount of time, and then you don't need it anymore. So you only briefly need access to the data. If those three conditions are met, then structs are extremely lucrative to use.
And finally, the big slow down in the finalizer is because when your classes have a finalizer, the garbage collector needs to call the finalizer one after another to dispose your class, and it does so on a single thread. So if you have one million classes right here on the heap, the garbage collector has to call one million finalizers to get rid of all the data, and that's really going to slow down your code. So that's why you get this difference in performance.
Fast Memory Copy
Okay, moving on to copying blocks of data. So what we're going to do is we're going to take a byte array right here, a byte array of one million bytes, so one megabyte in size, and we're going to copy this entire array into another byte array. So we're just copying a block of memory. So the most straighter way of doing that is simply using a loop and iterating ... right here, iterating through the loop and going through every byte in the array and manually copying it into the other array, and again, to slow down this test a bit, I repeat this whole thing 500 times. So I'm copying one megabyte of data 500 times. Now, in Test B, I do the same thing with pointers. So you can see I used the fixed keywords again to get two pointers to the first element in the buffer, and then, I used these source and destination pointer variables to do the copy like this.
So here, I'm using array indexing, and here, I'm using byte pointers. Finally, the last method, I use this CopyTo method, and array has a CopyTo method, which will quickly let you copy the array to another array. So now, we can see the difference, how slow is this manual process, how much faster is it with pointers, and is there any benefit in using the CopyTo method instead. So let's test that out, byte array copy, number 5, 500 iterations with a baseline. Wait for it, and there we go. So the direct copy operation takes 400 milliseconds. When we use pointers, it's only 388 milliseconds. So it's very interesting. Using pointers doesn't have that much benefit actually when we're working with bytes. The CIL implementation, so the intermediate language that the compiler produces is already so efficient that using pointers doesn't really have any added benefits, and this is perfectly in line with what I told you earlier, that the intermediate language runtime is optimized for one-dimensional arrays.
So when you're already working with one-dimensional arrays, you don't really need to optimize it further with pointers, but look at the CopyTo method, 32 milliseconds. That's massive. That's 10 times faster. So the reason for this is that the CopyTo method is incredibly optimized. It actually calls into the operating system, and it calls a low level function for copying a block of memory. So basically, the CopyTo method simply fixes those two blocks of memory on the heap, just like the fixed keywords, but then, it calls an OS function, and it says, "I've got this block of one megabyte. Could you please copy it to this other memory address," and then the operating system does it. Now, that is extremely fast. There's no way we can go even faster with C# codes. I mean, you can't beat the operating system. So the takeaway here is one-dimensional arrays are already super fast, so you don't really need pointer operations there, but if you are simply copying a block of memory, you're not doing anything special with the array values, then the CopyTo method is the way to go because then you allow the operating system to basically just copy this entire block of memory, and that gives you maximum speed because you are ten time the speed's improvements. So that's pretty cool.
Instantiation
Okay, the next performance test, now this one is very nice. I'm going to show you a very fast way to instantiate an object. So let me start with a simple code first. So here's my baseline. Instantiating an object means we construct an object. We construct an instance of a certain type, and often, when you have to use a reflection, the type isn't known at compile time. The type that you want to create is only known at runtime. You often get this if you have configuration information in an XML file, and your code needs to dynamically adapt to whatever is in the configuration file, or if you use something like XAML, not actually XAML, but say, your own implementation, and you have a complicated data-binding expression, something that you write out as text. You bind one property to another property, and then you somehow need to turn that into executable code.
So these are all scenarios where your code has a string that contains a type, and you need to instantiate an object of that type. So the most straightforward way of doing that is simply using reflection. So I have my string right here. You see I'm going to create a string builder. So I take the string. I create a type, object. So I get the type object of this type, and then I use this line of code. You've probably seen this a couple times in other programs, Activator.CreateInstance, and that will instantiate an object of that type, and then this code here is just a sanity check. I look at the object, and I check if the type is actually a string builder. So if we see an exception, we know that the code is acting weird.
So that's one way of doing it. Now, the fastest way of doing it is like this. This is cheating because here, I'm actually constructing a string builder. So here, the type information is known at compile time. Obviously, this wouldn't be possible in a normal program, but I'm just adding it here for reference purposes so we can see the difference in performance. So this is compile time instantiation. This is runtime instantiation using reflection, and now, I'm going to show you a really cool trick, a way to quickly instantiate types using runtime instantiation. That's this bit of code here. Now, all the magic happens here in this GetConstructor method. So let's look at that. So here is GetConstructor, and what Constructor does is it creates dynamic methods.
Dynamic methods are super cool. They were introduced into .NET when LINQ expressions were introduced, and now, with LINQ, you can create a LINQ expression and turn it into an expression tree, and then at a later time, turn that expression tree into executable code and then run that code. Behind the scenes, that library uses dynamic methods to create new methods on the fly at runtime. So when we instantiate an object in intermediate code, it's super simple because you only need two CIL instructions. Actually, you only need one CIL instruction to do it. The instruction is called new object, and the only thing it needs is a reference to a constructor. So it's a single CIL instruction, and it will instantiate an object. So what this code does is it creates a new dynamic method, and then it uses intermediate language generator to fill this method with CIL instructions one after another.
So the first instruction that we inject into this method is simply new object. So this will call the constructor of the object that we're trying to create, and then the second instruction is Return, because we want to return out of the method, and that's it. So this DynamicMethod is returned right here. You can see I return it as a constructor delegate, which I've declared up here. So my constructor delegate is simply a delegate that describes a method without any parameters that returns an object. So let's run this code and see what happens. Instantiation, one million times, and with a baseline, so here's the difference. So using reflection takes 85 milliseconds, which is not bad, but using my DynamicMethod takes only 22 milliseconds. So it's really cool. It's four times faster, but now, look at this. Compiled code is 19 milliseconds. So there's almost no difference between constructing a dynamic method and letting the compiler construct the method for you, basically.
So this trick lets you use reflection-like techniques that let you use a form of dynamic programming to create objects at runtime of any type you want at the same performance level of compiled code. Now, this is super important because my career is 20 years long, and I have used reflection many, many times in my code projects to instantiate objects, and with this trick, I get almost native performance. So there's no need to use Activator.CreateInstance anymore. You can simply use a DynamicMethod. So the takeaway here is please be aware that you can do it this way. You don't need to use classic reflection to create new objects. You can use this neat trick to create your own methods, inject CIL instruction into that method, and let it do anything you want, and creating an object is super simple. You only need two CIL instructions. So this whole magic just happens in this block of code. So it's fairly compact. It's a drop-in replacement for Activate.CreateInstance, and it gives you a massive performance increase. So please be aware that this is possible.
Property Access
Okay, now, the final performance benchmark I'm going to show you is property access because if you think about it, this DynamicMethod trick is super cool. We can inject any kind of CIL instructions into a new method, make it do anything we want. So could we access a property using a DynamicMethod? So let's find out. So my code is here. Let me go down to the Test methods first. So the first thing I do is I use classic reflection. So I'm creating a string builder right here. Then, I use classic reflection to get a PropertyInfo instance, and you can see I access the type of the string builder, and then I access the property called Length. So now, I have a PropertyInfo variable, and then to get the value of Length, all I need to do is this, pi.GetValue. That's it, and then here's a simple sanity check, my name, my full name. It's exactly 21 characters, so I'm checking that the value really is 21. If not, we'll see an exception.
So this is classic reflection. Compile time called would look like this. So to access the length of the string builder, I simply write sb.Length, and that's it. So compiled time called, this gives us maximum performance, but now, DynamicMethod. I have a method here called GetPropertyGetter, which will get me an access to the LengthGetter method of this type. So now, Getter will point to the property, and it will point to the internal Get method of the property, and then to call it, I can simply do this. So let's see how that works. So here is roughly the same code again. You can see I have my DynamicMethod, which I ... Whoops. Wrong one. You can see I have my DynamicMethod here, which I instantiate. So I'm creating a GetValue method, and I'm injecting CIL instructions into it, and then here are the instructions I'm creating.
So the first thing I'm emitting is a CIL instruction called load argument 0. So what this will do is it will load the first argument onto the internal CIL execution stack. So the first argument would be this one. Here's my property, GetDelegate, and you can see it's a delegate for a method that accepts a single object parameter, and it returns an object return value. So load argument 0 will load this value. Then, what I'm emitting is a call, an instance call, to the Getter function, and the Getter is up here. It's the Get method of the property that I specified. So there's a tiny bit of classic reflection here to get to the method info of the Getter of the property, but from then on, I simply use that variable to emit the call instruction directly, and now, you might be aware that the .NET framework, it can transparently work with value types and reference types, but if you have a value type, and you return it as an object, you have to box it.
So that's what I'm doing here. I'm looking at this Getter method, and if the return type of the Getter is a value type, then I'm emitting an XML box instruction to take this integer. I mean, we know that the length of the string builder is an integer, so it's going to be a value type. So I'm boxing it into an object, and then, here's the return, and that's it. So now, this is a super compact DynamicMethod with, well, four CIL instructions, load argument 0, call the Getter, box the value type, and return. So four CIL instructions to perform the access to the property, and then, to use this, all I need to do is this. I call this Getter delegate. I provide the string builder, and it returns me the length. So let's run the code and see what happens. So I'm doing the property access. We're doing ... What's this? Five million iterations and with a baseline. Wait for it, and that's the result.
So now, this is extreme, huh? The classic reflection takes 910 milliseconds. So it's fairly slow. The DynamicMethod takes 55 milliseconds, 55 milliseconds. So that's pretty awesome. That's 10 ... What is it? 20 times faster, more or less? 910 divided by 55, 16 times faster. So that's quite a speed improvement. Doing it in compile code is only one millisecond. So that's extremely fast. You can see that the delegates that I'm using to run this dynamic code, calling the delegate, there's little performance overhead associated with that. When we created the object, there wasn't that much difference between doing it in a compile time or doing it with a dynamic delegate, but here, you can see that the compiler is able to very quickly access this Length property, whereas my DynamicMethod is 55 times slower, so it's a big difference.
But we're assuming this is a scenario where you can't do compiled time code. You don't know type that you want to work with at compile time, so this option is basically out of the window. So your only choice is classic reflection or using a DynamicMethod, and this gives you a 16 times performance improvement. So the takeaway here, the thing that I want you to remember is using DynamicMethod is not that complicated. You can see my code is fairly compact. I've added the Setter as well. I'm not using this in my example, but when you download my code and use it, you can play around with the Setter as well, but you can see that's creating a property Getter using dynamic CIL instructions. It's not that much code. It's just this bit, and the CIL instructions to do the work are just this section here.
So creating the DynamicMethod, it sounds intimidating, but it's not that complicated actually, and it will give your code a massive speed improvement if you use it to replace your classic reflection code. So please be aware that this option is on the table. And that brings me to the end of this webinar.
So the coupon to get any of my Udemy courses at a 90% discount is POSTSHARP15. So use that code in any Udemy course to get the discount. If you want to spend even less, then go to my training.mdfarragher.com website. So this is a Teachable environment. This is my own Teachable environment, and there, you can take a subscription for $9 a month, and it'll give you access to everything, and any courses that I create in the future will automatically get added to the subscription. So that means that roughly once a month, you can expect a new course from me, and you will be enrolled in that course automatically. Finally, if you want the source code that I've just shown you to play around with the code and create some dynamic methods of your own, then just send me an email at mark@mdfarragher.com, and I'll reply, and I'll put the source code in an attachment in the reply, and then, you can play around with it. So I've used Visual Studio community edition on OSX, but of course, the code will work in any Visual Studio edition, and I'm using .NET Core 1.1, but I'm not doing any weird stuff. So you can easily take the code and run it against the classic .NET framework. It will still work. So send me an email. I'll give you the source codes.
Q & A
Q: Which version of .NET are you using?
A: I'm using .NET Core 1.1.. I'm using the default C# version, which is version 7.
Q: Is there a concern with using string.Format as opposed to string builder?
A: It depends on how we use it. String.Format, internally, of course, it uses a string builder. So the call to format itself will be pretty efficient, but of course, it returns a string. So it all depends on what you're doing with string.Format. If you take the outputs and you simply add it to another string, and you do that in a tight loop, and it's part of the mission critical part of your program, then you are going to see a performance hit, but honestly, I use string.Format all over the place in my own code, in logging code, in tracing code, output code, and it's all good. So don't worry about calling string.Format, but if you are looking at a mission critical loop in your code, then do consider removing it. One final thing: if you use these kind of strings, I forgot the name, but the ones that start with the dollar and have these embedded variables like right here, this is simply a syntactic sugar. It calls string.Format behind the scenes. So this is actually a string.Format call, and again, don't worry about it. Just use it wherever you like, but in tight loops, mission critical code, consider removing it.
Q: Is there any difference using DynamicMethod versus expression or lambda expression for reflection?
A: It's pretty much the same, but the DynamicMethod is slightly faster. I read a benchmark and they compared it to all these different ways of creating dynamic expressions, and the DynamicMethod was the fastest. It's only a slight difference, so if you prefer to use expressions instead, just go for it. If you want the maximum performance, then use DynamicMethod.
Q: Do you have advice for which of these methods you recommend we look at all the time versus looking for them in the hot path? For example, if something is called once at startup, do you strive for these optimizations in your code?
A: Yes. Absolutely. When you identify the hot path in your code, please do take out repetitive calls, repetitive instantiations, initializations, and move them outside of loops or outside of the hot path. That's basically step on in optimization.
Q: Looking at the GetConstructor method regarding string appending, how many strings does it need that the string builder is more efficient than a normal string edition?
A: I measured it. The answer is four. So I actually did these measurements. So if you do less than four string concatenations, the string is faster because of the overhead of actually creating the string builder, but if you do four or more, than the string builder is faster, and these two start to deviate really quickly. So again, don't religiously remove all normal string concatenations from your code because it makes your code a lot less readable. String builder's a nice class, but the append syntax is not very nice, basically. It's a lot less clear than simply using a plus sign to add two strings together. So three strings or less, absolutely no problem, use normal strings. More than three, use the string builder.
Q: When working with loops, is the performance loss the same when using LINQ queries? For example, does the Foreach expression have the same performance loss as a normal Foreach loop?
A: The fun thing LINQ is that it always uses an enumerator. So if you use Foreach in LINQ, you get the enumerator code to incrementally step through the expression, and if you try to do it with a classic loop, you still get the enumerator code because LINQ is built on top of enumerators. The whole thing is one giant enumerator with nested methods on top of that. So with LINQ you will see the slow performance no matter what you do.
Q: Since DynamicMethod has been in .NET since the introduction of LINQ, why doesn't the optimized reflection code you demonstrated exist in the reflection library outright as an existing class if the implementation is the same regardless of the code you're pulling?
A: Honestly, I'm not sure. My hunch is that it has to do with backwards compatibility, but it's a good point. You could definitely rewrite the classic reflection code and make it much faster using this methodology. My hunch would be that the Activator.CreateInstance behaves slightly different from a DynamicMethod, and if they had tried to do this, it would break backwards compatibility.
Q: Can you please suggest any good tools to check performance issues in code?
A: I have Visual Studio Enterprise and a virtual machine. It comes with a performance testing tool. That one's pretty amazing. So I really like the tools that are bundled with Visual Studio, and in fact, those are the only tools I use. I mean, I've demoed all this code using Visual Studio on OSX because it's easier, because I don't want to run a performance benchmark inside a virtual machine, but when I'm doing my day-to-day programming, I'm using Visual Studio Enterprise in a Windows VM, and the tools in there are just great. So I would say start with those. I really don't have any other recommendations than the standard Visual Studio stuff.
Q: Which one is faster, TypeOf or GetType method?
A: I think TypeOf is faster, but it's a hunch. I'm going to have to check that.
Q: Can you tell us a bit about projects where you have used these optimizations?
A: In the past, I wrote this huge web library, ASP.NET library, where you could create web pages using a XAML-like syntax. So you could basically just map out your entire web page in HTML, but you could use special binding expressions inside the HTML. So I could put a text box in there and then bind the contents of the text box to a variable in my ASP.NET code. So this wasn't XAML. It was my own project, and I was kind of inspired by XAML, and the code to parse those data-binding expressions used these dynamic methods to speed it up. I started out with classic reflection to parse an expression and then access objects, access properties, and get values, and it was incredibly slow. So I rewrote the whole thing and used DynamicMethod. So any place where you are creating expressions based on text data, so not actually code, but something that's stored in a text file. It could be XML or a config file or anything. Any situation where that occurs, using DynamicMethod is really going to help you.
Q: What's the overhead of string interpolation in C# 7 versus string.Format?
A: So string interpolation is the dollar syntax, I think. It's exactly the same. So behind the scenes, string interpolation is string.Format. So you're not going to see any performance difference between the two.
Q: In the strings test, would string builder perform better if you passed capacity into the constructor?
A: Yes. That is an excellent question. Since I didn't initialize my string builder, it gets initialized on the heap with a default size, which I think it's 16 bytes, I think, 16 characters, 32, or something like that. Every time when you hit the limit, so when you add characters, and the whole thing's full, it doubles in size, and of course, to double it in size, the framework has to instantiate a new string builder with twice the size, and then copy all the data over. So it's doing exactly the same thing that the string is doing, but the difference is it happens in doubling. It doesn't happen on every character addition. It only happens when the buffer is full, and it has to double. So string builder is logarithmically faster than string, but it's still doing this instantiate and using and copying data over process. So if you instantiate the string builder at maximum size right from the start and then fill it with data, then you never have to expand the buffer. There is enough room in memory, and you're simply writing characters one-by-one directly into that area of memory, and that will give you the maximum performance. Great question. Good observation.
Q: Why 9% are exceptions?
A: Several viewers have pointed out that the 9% number I mention in the webinar is incorrect. Here is the correct calculation:
I’m building numbers from individual digits. There are 11 digits, 0-9 and the letter ‘X’. So, the chance of a single digit being invalid is 1/11. A number consists of 5 digits, so the chance of a single number being invalid is (1/11) * 5 = 45%. The loop in my code will fail 45% of the time and throw an Exception.
Q: How to get mastery in reflection and dynamic code?
A: By practicing a lot. Write lots of code that uses reflection and dynamic emitting. Experiment, measure performance, see how far you can go optimizing your code. Play around and discover what works and what doesn’t. Plus: read lots of blog posts and articles.
Q: Why would it not be beneficial to use structs for all simple business objects? Is there a point of degradation or some limitation over a class? Is a struct usable with Entity Framework to represent database objects?
A: The .NET Runtime makes certain assumptions about structs and classes, specifically that structs will be very small (in terms of memory space) and have a short lifetime, and classes will either be small or large and have a long lifetime. Simply replacing all classes with structs in your code is dangerous because you will go against these assumptions. For example - if you change a long-living object to a struct, it will get boxed on the heap and your code will be even slower than when using classes. A struct also get copied during each method call, so passing a very large struct to many different methods will slow down your code a lot.
The rule of thumb here is to always start with classes, and only use structs when it makes sense to do so.
The Entity Framework does not support structs.
Q: Can we use DynamicMethod trick on AOT platforms (via Mono)?
A: Nope. The ILGenerator class is missing, so you can’t emit your own CIL code into the dynamic method. Makes sense, right? It couldn’t possibly work with AOT.
Q: CIL stuff is really interesting. Perhaps worth mentioning that string interpolation and string.Format uses StringBuilder so you don't always need to explicitly use StringBuilder. Also, StringBuilder has a little overhead so for <4 strings something like str1 + str2 + str3 is faster - I think
A: Correct! String interpolation ($”yadday {yadda}”) compiles to a String.Format call, so it’s exactly the same thing. I always use interpolation because it’s so much easier to type.
You’re also spot-on with the string versus StringBuilder comment. A StringBuilder has some overhead initializing, so it is actually slower for a small number of additions. The cutoff point is at 3 additions. For zero to three the string is faster, for four and more the StringBuilder is faster. For larger number of additions, they start to diverge very quickly.
In my logging and diagnostic code, I always use strings (string interpolation) because I usually stay below the 3-addition limit, and it makes my code so much easier to read.
Q: Hi, For Exceptions, what if TryParse is not there. For user-defined types instead of
Primitive what needs to be done.
A: You need to do the same that TryParse is doing internally –
scan the input data first, and only start parsing if the scan says it’s okay. Also make sure you return a
parsing failure as a return value (i.e. a bool) instead of throwing a FormatException.
An easy way to scan is by using a precompiled regular expression to make sure the input data doesn’t contain any invalid characters. Regular expressions are super-fast.
Q: Any comment about differences between copping arrays, lists, c # hash table, etc at the heap?
A: In terms of memory layout, there’s not that much difference between an array, a list, or a hashtable. All three use arrays internally to hold the data. A hashtable is optimized for key/value lookup, whereas list and array are intended for indexed access.
They all have a CopyTo method that attempts to block-copy all data in one go. If you’re storing value types, you will see great performance for all three.
Q: Are you going to review LINQ / Parallel performance someday?
A: That’s a great idea! Thanks for the suggestion. I have an existing course already that scratches the surface of LINQ versus PLINQ performance, but I’d love to go deeper.
Q: Nice talk. BTW Stringbuilder may not be the fastest. it depends on the size etc. you have to calculate the GC allocations also. best tool for that is BenchmarkDotNet with memory diagnoser on windows! it is a fantastic tool. General rule of thumb: whatever you do you have to measure in order to see perfomance benefits.
A: Thanks for the suggestion. I’ll check out BenchmarkDotNet. And you’re right about the rule of thumb – you always have to do actual measurements, you can’t rely on just theoretical knowledge to optimize your code.
Q: Just a question on array.CopyTo(...) where Mark said that the memory copy was done out of process by the OS (in C libs guessing "memcpy"). In the profiling application during the webcast, array.CopyTo(..) executed in 32ms, whereas the copy via index and loops was >300ms, in other words, using array.copyTo is an order of magnitude faster with OSX as the OS. It the 10-fold difference "about" the same with .NET on Windows? Different OS different ratio?
A: Yes, the ratio is roughly the same. The speed of a memory copy is more or less the same for all operating systems, whereas you might see small differences in 1-dimensional array performance. I’ve noticed that .NET Core tends to be slightly faster than Mono in handling arrays, because it’s much better optimized.
Q: I measured. GetType() is 171 ms vs. typeof() at 6 ms in a test of a million iterations.
A: That’s because typeof() is processed at compile-time, whereas GetType() is processed at runtime.
Q: How do you keep yourself upto date on the latest and greatest technology?
A: I read lots of technical blogs, and when I’m preparing for a new course or webinar, I do a lot of research and write small test programs to experiment. And I probably have a talent for learning new stuff very quickly.
Q: Would you use some form of multi-dimensional converter to convert a single dimensional array back to a multi-dimension array or would you take another approach?
A: It depends on the use case. I usually just wrap a 1-dimensional array so from the outside it looks like the original multi-dimensional array. The disadvantage of converting the other way is that you’re slowing the code down again, so I am a bit hesitant to use any kind of converter.
Q: Do you have any advice for Parallel.ForEach?
A: Yeah, use it! Parallel.ForEach is great for parallelizing regular for or foreach loops. It is my first step in parallelizing code, and quite often it’s all I need to do.
Two years ago, I wrote an app that processes Sharepoint documents. I had a for-loop in my code that would process each document individually. I parallelized the code simply by replacing my for-loop with a Parallel.ForEach. This drop-in replacement to make code multi-threaded is really nice.
Q: Have you tried these performance tests on .NET Core?
A: Yeah. Everything I show you in the Webinar is running on .NET Core 1.1
Q: Foreach loops do have a performance optimization over for loops in cases where the collection is already an enumeration or a function that yield returns?
A: No. Enumerations or methods with yield return cannot be indexed and they don’t have a well-defined upper limit, so there’s no benefit using a for-loop with them. If you do try to use a for loop, you’d have to manually access MoveNext() and Current, and this would be the exact same code the compiler produces when you use foreach.
Q: Is there any significant difference between pre- and post-increment operations. In C++ I am accustomed to always doing ++i in preference to i++ but I rarely see this being done by C# developers.
A: It works exactly the same as in C++, the difference between the two is the return value: i before increment or i after increment.
Q: Does the performance benefits you described for structs vs classes get lost when comparing the performance of passing classes vs structures to other functions (excluding cases where structs are being passed by reference)?
A: Passing structs to functions will slow down your code, because structs are copied by value. For every method call the entire struct will be cloned in memory. When you’re using classes, only the reference to the object instance is copied into the method.
So yes, for large structs with lots of fields you’ll see a measurable slowdown when doing lots of method calls with struct parameters.
Q: What is the difference between the heap and the stack?
A: The stack is a highly-optimized block of memory intended for data with a very short lifetime, just for the duration of a single method call. Stack memory is created when you enter a method and gets cleaned up when you exit out of a method. The stack is also fairly small, usually around 100MB. It’s optimized for a manageable number of small objects (thousands, not millions) with a very short lifetime.
The heap is a very large block of memory (multiple GBs) optimized for long-term storage. You can easily put millions of objects on the heap, and they can be either small or large. The heap has a special internal process for archiving long-lived data, and there’s a separate process called the Garbage Collector that cleans up objects that are no longer in use.
As a rule of thumb, the stack is slightly faster than the heap. It can also very quickly initialize new data by writing zeroes directly to memory (the heap calls the constructor of each object individually). The disadvantage of the stack is that it’s relatively small, and it assumes your data will be short-lived. The stack can also slow down if you have a very deep chain of nested method calls.
Q: Why and when we use reflection?
A: We use reflection when we want to dynamically access object fields or call object methods. With ‘dynamically’ I mean based on data that is not known during compile-time. For example, when we store database configuration data in a configuration file. The configuration file might say we need an OracleConnection or a SQLLiteConnection. With reflection, we can read this configuration field and then dynamically instantiate the correct object.
Basically, any time an object type, property, field or method appears somewhere in text format, we’re going to need reflection to perform instantiation, access fields and properties, or execute a method call.
Q: What does emit mean?
A: Emit means injecting a single CIL instruction into a dynamic method.
Q: What do you mean by baseline test?
A: A baseline test is a performance test of un-optimized code, to get a baseline performance value.
About the speaker, Mark Farragher
Mark Farragher is a blogger,
investor, serial entrepreneur, and the author of 10 successful IT courses in the Udemy marketplace. His IT career
spans 2 decades and he has worn many different hats over the years.
Mark started using C# and the .NET
framework 15 years ago, and creates online courses that make complex C# programming topics easy to understand and
accessible to anyone.