But it doesn’t have to be this way.
Yan Cui will show you a simple technique his team used at GameSys which allowed them to localize an entire story-driven, episodic MMORPG (with over 5000 items and 1500 quests) in under an hour of work and 50 lines of code, with the help of PostSharp.
Watch the webinar and learn:
- The common practices of localization
- The challenges and problems with these common practices
- How to rethink the localization problem as an automatable implementation pattern
- Pattern automation using PostSharp
You can find the slide deck here: http://www.slideshare.net/sharpcrafters/solving-localization-challenges-with-design-pattern-automation
- Six Sins of Traditional Approach to Localization (6:20)
- Automating Patterns with PostSharp (16:08)
- Q&A (20:50)
Hi, everyone. Good evening to those of you who are in the UK. My name is Yan Cui, and I often go by the online alias of The Burning Monk because I'm a massive fan of this '90s rock band called Rage Against the Machine.
I'm actually joined here by Alex from PostSharp as well.
Alex: Hi, everyone.
And before we start, a quick bit of housekeeping. If you have any questions, feel free to enter them in the questions box in the GoToWebinar control panel you've got. We'll try to answer as many of them as we can at the end of the session, and anything we can't cover, we will try to get back to you guys via email later on.
We're going to be talking about some of the work that I did while I was working for a gaming company called Gamesys in London. This was up until October last year, and one of the games I worked on was this MMORPG, or massive multi-player online RPG game, called Here Be Monsters. One interesting thing about Here Be Monsters was that it has lots of content, and when it comes to time to localize the whole game, we have some interesting challenges that we want to find a novel way of solving. Just from this simple screen, you can see there's a couple pieces of text. The name of the character, the dialogue that you say, as well as some UI control here that says, “out of bait.” One of these need to be localized for the entire game. And as I mentioned earlier, this game is full of content. In fact, in terms of text, we have more text than the first three Harry Potter books combined. And there are many different screens in the game, one of which is what we call the almanac. Or, you can think of this as an in-game Wikipedia of some sort, where you can find information about different items or monsters in the game. Here is an example of the almanac page for Santa's Gnome, which is only available during Christmas.
So anyway, there's a couple information about the monster itself. A name, description, some type, et cetera, et cetera. Those all need to be localized, as well as all the UI elements of labels or for bait. The text on the bottom et cetera, et cetera. So even for a very simple screen like this, there's actually a lot of different places where you need to apply localization.
A few years back, Atlus, who makes very popular RPG games, very niche games as well, they did a post explaining why localization is such a painful process that can sometimes take four to six months, and he touched on many different aspects that's involved in the localization process. And you can see from his list that I hit, by the estimation, programming alone takes between one and one and a half months with the traditional approach of localization. Therefore, each of your platforms, the client, they will ingest a gettext file, which contains a bunch of localization in a very plain text format like this. You've got, basically, a key value of pairs of what the original text is, as well as what the localized text should be.
Yan, so what is this pure file format? Is this the standard for localization, or do you have some tools for that?
Yep. So the gettext file is industry standard for localization. Otherwise, I don't know of any standard tooling for translators. The translator that we were working with, they had internal tools to help their translators, or work more effectively with the gettext file format. And for different languages, there are also libraries available for you to be able to consume those gettext files. We'll look at one for .NET later on.
Alex: Okay. Yeah, thanks.
Once you know you've consumed those translation files, you'll need to then substitute all the text that you have with the localized versions of those texts. You've got buttons that display some text. This is just do the code, it's not taking from our real code base, but to just give you an idea of where you need to apply localization to labels and buttons and so on, as well as your domain objects. So where you've got the domain object that represents a monster, or the names and descriptions, et cetera, et cetera, will need to be localized.
Once you've done that, you do your data binding. Then, assuming you haven't lost anything while you've localized, this screen shows you all information about the centers known. I think this is in Brazilian. Portuguese? I can't read any of it, so I don't know how accurate translations are, but at least you can see the localization has been applied to all the different places. So you pat yourself on the back for a job well done, probably go get a beer with your colleagues, and then you realize, oh wait; what happens if we make some changes or we add new, additional types to our domain? You're gonna have to keep doing this work over and over, each with a platform that we support.
And then to add salt to that injury, look at how much time Atlus reckons we tend to spend on QA. The reason why it takes so much time is because there's a massive scope they need to test. And not only to spot check and make sure the translations is not drastically wrong, but there's also loads of bugs that can creep in and do in client integration work, have people miss out on a particular screen, whether some buttons was left in English instead. And you've gotta do these for every single platform, and because you're doing releases so frequently, or at least I hope you should be, that means you have to put on repeated effort to test localization whenever you make any kind of change on the client, really; which broadens your scope for your regular testing. And it's putting a lot more pressure on your QA teams.
6 Sins of Traditional Approach to Localization
Here's pretty much a laundry list of the problem that I tend to find with the traditional approach to localization. A lot of up-front effort, which is development, and we have the team doing more work as you introduce more domain times and extend your game. It's also hard to test, and it's prone to repressions. You see, it's normal to feel doom and gloom whenever localization's been mentioned in the company, and it's because it's a pain. Once it's there, it just doesn't go away. Which is why, when it comes to time for us to implement localization in our game, we decided to think outside of the box and see whether or not we can do it better in a way that would be easier and more attainable for our team.
To give you a bit of background on what sort of hours of pipeline at a time, we built a custom CMS; content management system. We internally call it TNT. It's really just a very thin layer on top of Git where all the information about game design data about the monsters, about different quests, locations; they're all stored as JSON files, and we built in some integration with the Git flow, finding strategies so that we can apply the same Git flow. Finding strategy that we use for developers already, and get our testers to do the same thing.
By the way, why did you choose to build a custom CMS instead of using something else, or benefit to Git and Git 12 in this case?
Right. We decided to build a custom CMS because we also wanted to bake into the CMS some basic validations that applies to our particular domain. And the reason we have a pro-lay on top of Git, is because we want to have a source control for all our game data. Well, we do that for all our source code, and the game data is really part of your source code for your game, which can't exist without the validator, the things that makes up the content of the game.
And Git flow is just a way so that we allow our game designers to work in tandem with each other, and have a well understood process of how to merge things, and how to release things when those things get back to masters. So that when you look at a master branch, you know you're looking at exactly what has to be deployed for production and so on.
Well, we had a team of game designers who work on different branches of stories. One person may be working on a storyline, their storyline next week, whilst another person may be working on a storyline they're screwing up in a month's time, and you want them to be able to work in parallel without stepping on each other's toes. Git flow comes into the play as part of the mechanism for allowing them to do that.
Does that answer your question?
Yeah. That seems to work well, yes. Good idea. Thanks.
So inside TNT, you have some very simple UI controls that the game designers can do cherry-pick, as well as they do merges of different branches. Once they're happy with the game design for the world they've done, they can then publish to a particular environment so that they can test it out in that environment and see whether the quest is more interesting. Or, put in all the mechanics so that their hopes are in place.
Right, so at this point the custom CMS would package all the JSON files and send it to a publisher survey's, which you would perform deeper validation against the game moves. For example, if you've got an item, there's a high level. Then the pair should be at a particular quest, and it shouldn't be able to give out the item as a reward. And also, we do quite a few pre-computation and auto transform the format from the original JSON into a more suitable format to be consumed by different client platforms.
Now, all of that then gets pushed to S3 and from the TNT as a publisher, I press a button. Now I can see everything's happened. And as you can see from the logs, we do this an awful lot, which is also one of the reasons why we decided to invest the effort into building two links, such as the CMS, so that we can constantly integrate upon our game design, not just as much as our code.
Once the publisher has done his job, we will solidify the game specs. We call them into S3, into version folders, and as a game designer, we'll just press a button to publish my work. I'll get an email with a link at the top so that I can click that and load up the web version of the game with just the changes from my branch so that if someone else is using the same test environment to test their changes, I won't be stepping on their toes.
You may notice that there's a link here to run economic report. This refers to some other piece of work that we've done to use a graph database to help us understand how different aspects of the game connects with each other. So an item could be used in a recipe to make another item, which can then be used to catch a monster, who then drops a loop, which can then be used in another quest, or so on and so forth. So our domain is very highly connected, and to understand all the knock on effects or making even small changes like upping the price of water, you'll have a huge amount of knock on effect that goes through the entire economy of the game. So, we use a database to automate a lot of those validation and auto-balancing.
You can also see, down the email there, you can see results of the game rule validations, and we report back to the game designer. And then at this point, all the specs are ready. The server, the flash client, as well as the iPad client will be able to consume those data in different formats, and you'll be able to load up a game and test out the changes.
Another question here; so why do you need to produce different formats for all those different platforms? Is that a requirement?
Right. So for example, on the server application, we don't really care about the name of a monster, or the description of a monster. It helps to reduce the size of the file and how long it takes to load it, as well as the memory footprint of several application. So we ship those client only information from the server spec, and we also precalculate a bunch of secret values - coefficients and things like that - into the client and into the server spec, but not make them available in the client spec.
Of course, the client specs are public, so anyone who's a bit more tech savvy will be able to download the spec, and then work out its format and understand some secret values that we have embedded into our domain. They'll be able to cheat in the game, essentially.
Alex: Uh-huh, okay.
And also the flash, because it's all web based. They prefer to load the whole file as one big zip, whereas for a iPad, they prefer to have smaller file sizes. But many of them -
Alex: Okay, that makes sense.
So at this point, we thought, “well, if we do localization, what about if we bundle it into our publishing process so that by the time all the files has been generated, they'll only be localized on the client?” We wouldn’t have to do some of the things we saw earlier, where you have to apply localization to your domain objects all the time. You can then publish your localized versions of game specs to language specific folders. Notice that, as you mentioned earlier, the server doesn't care about most of our text being localized, so we're actually gonna need to apply that same sort of path to the server spec.
So with that, you remove the duplicated effort you need to do on each of the client platforms. At the same time, you reduce the number of the things that can change. They're automatically changed with each release, because they have an automated process of doing this, so there's fewer things that you test. We don't need to touch all these things, but we still have this problem of having to spend a large amount of effort up front. All the things that you were doing on the client before, now has to be done by something else. In this case, the server team, which have to, like I said before, ingest a gettext file to know all the translations and then need to check the domain objects for string field improperties that need to be localized, apply localizations when you transform those domain objects into DTOs. And then again, do the same thing for multiple languages if you're localizing for different targets.
Automating Patterns with PostSharp
But notice that the step two and three is actually just a Patient pattern that can actually be automated to help fire proof yourself against future changes or as you add more domain objects, we should get localization for free.
And in .NET, you can consume a gettext file and gain admission from that using the second language package. And obviously, because we're here, I'm going to be talking about the implementation patterns and how to automate them with Posharp.
So for those of you who are not familiar with Posharp, you can buy different aspects, which will then apply the post compilation modification to your codes so that you can bake additional logic and behavior into your code. So here, what I've got is a very simple aspect which is applied only to fields or properties of type strength, so that when you code a setter on those properties of fields, this bit of code will run. And as part of that, we check against localization; so, local stress … context object to see whether or not we are in localization. If not, we move on.
So I can see here that actually localization context is doing all the translation work apparently. How do you set it up, or how do you initialize it? Because here, you see just that you call translate.
Yep. So as I mentioned, we use just the gettext translation file. Imagine when the custom CMS, the TNT, calls the service with a big zip file, the publisher will then unpackage that. As part of that package, you will find those PO files. For each of those files, the publisher would load it with a second language, and then create a local context. And within that context, it then transforms the domain objects created from those JSONS, and into DTOs.
So when the DTO transformation is happening, and you're creating new DTO objects and setting the string values for its fields and properties, this code will kick in. And because it's called inside a localization context, you will contain the information that we have loaded from the gettext file. So that next line, this guy, all he's doing is checking for the gettext file. “Do we have a match for the string that you're trying to localize?” If there is, then we will use that localized string instead. So what we're doing is here, is that we're proceeding with calling the setter as if you've called the setter with the localized string instead of the original string.
Does that make sense?
Alex: Yeah, that's perfect.
So with this, we can then just multicast all of our DTO objects, which have the convention of having the suffix of VO for legacy reasons. And through this one line of code, plus the 30 we just saw, it pretty much covers over 90% of the localization work we had to do. And as we create new domain objects and new types, those types will be localized automatically without us having to do additional work.
So with that, we can eliminate the whole up front environment cost, because the whole thing took me less than an hour to implement. And because we're multicasting attributes to all DTO types, it means any time we add a new DTO type in the future, you will be localized automatically by default.
Again, you have more automations, so there's fewer chance for regression to kick in, because people are not changing things and having to constantly implement new things by hand. Still, you can have things that are regressed, but it's just, in my experience anyway, is far less likely. And since we implemented the localization this way, we actually didn't have any localization related regressions and bugs at all, which is pretty cool for us for not a lot of work. The combined effect of all of these changes is just far less pressure on your QA team to test the changes that you are making to the game; new quest line, new storylines, as well as UI changes and the server changes, and localization as well. So, they can better focus their time and effort on testing things that have actually changed and are likely to cause problems.
“Okay, well,” you may ask, “well, how do I exclude the DTO types from the localization process?” Fortunately, we have a built in mechanism for doing that, where you can just use the attribute property on particular types. In this case, I know the leaderboard player DTO only have the IDs which are called division profile ID and the name of the user, none of which should be localized. And therefore, we can simply exclude this guy from the whole localization process.
Then, you may ask, “well, but then where do you get these gettext files from,” which is a great question. Well, as I mentioned earlier, we actually store those gettext files as part of TNT so that when we call a publisher from there, you include the localization files as well. And to get those files into TNT, there's actually a page in the tool so that the game designer can go in there and make the changes once they're happy with all the contents. At that point, we say, “okay, now let's localize all the new quest lines that we've just created.”
And there's a button that we click, which will then take the existing localization file, because we don't want to localize the same text if you haven't changed. We actually use comments to basically put a unique identifier for each of your text, so that we can identify that when a particular by-log or name, or description or whatever, has changed, so that we will reset the entry in the gettext file. And then for the new gettext file, we then send you over to the translators, who, with their tools, will be able to pick out the new strings that they need to translate. They only charge us for the new strings that they have to translate and not everything else that we send them.
Once they send it back, the translated PO file, we upload it into TNT. When we do the next publish, you will have all the organization for the new content. When we release the new content, there is a bit of time where the English version is the head of the Brazilian Portuguese version. So if the player is up to the latest quest, then chances are they will end up playing a single game in English instead of translated the version of the game.
With that, that's everything I've got, and thank you very much for listening.
About the speaker, Yan Cui
Yan Cui is a Server Architect Developer at Yubl and a regular speaker at code camps and conferences around the world, including NDC, QCon, Code Mesh or Build Stuff. Day-to-day, Yan has worked primarily in a mixture of C# and F#, but has also built some components in Erlang. He's a passionate coder and takes great pride in writing clean, well structured code.