IIRC: User Scripting in Second Life

I’ve had some positive feedback from the last post I wrote, so I thought I’d write up some information about how User Scripting functioned in Second Life from the time when I worked on it. This post covers the background of scripting, the LSL language and a bit about how the legacy LSL2 VM works.

One of the unique features of Second Life is the way users can create content. Using the Editor in the Viewer (the name for the client), they can place and combine basic shapes (known as Prims) to make more complex objects. Prims can be nested inside another prim or joined together with static or physical joints. They can also be attached to the user’s Avatar (guess what the most popular “attachment” was?).

To give an object behaviour, Prims can have any number of scripts placed inside them. A script is effectively an independent program that can execute concurrently with all the other scripts. There is a large library of hundreds of functions they can call to interact with the world, other objects and avatars. A script can do a wide variety of things, for example; it can react to being clicked, give users items or even send an email.

Users can write scripts in LSL; a custom language, which has first class concepts for states and event handlers. A script can consist of variables, function definitions, and one or more named states. A state is a collection of event handlers, only one state can be active at any one time.

The Second Life viewer contained a compiler which converted scripts into program bytecode for its bespoke instruction set and layout.

default
{
  state_entry()
  {
    llSay(0, "Hello, Avatar!");
  }

  touch_start(integer total_number)
  {
    llSay(0, "Touched.");
  }
}

Above shows the default script, created when you add a new script to an object. When created or reset it will print, “Hello, Avatar!” to the chat in the region. When clicked it will print “Touched.”.

LSL has a limited number of data types. It is limited to integer, float, string, list (a heterogeneous list), vector (a 3d floating point vector) and key (a UUID).

The approach of treating each script as an independent program lead to many regions containing thousands of individual scripts within them. To give the effect of concurrency (in the single threaded Simulator), each script would get a timeslice of the main loop to execute a number of instructions before control was passed to the next.

The individual scripts within the original LSL VM, are quite limited. They are only allocated 16KB for program code, stack and heap. This fixed program size made managing memory limits simple; the heap and stack would grow towards each other in address space, and when they collided, the script threw an out of memory exception.

With LSL2, when a Prim moves between regions, migrating the executing script to the destination server is required. Migration of scripts between servers is relatively simple, all the program state was stored in a contiguous block of memory. The running form is identical to the serialized form. This can simply be sent directly over the wire and execution would continue the next time it was scheduled. No explicit serialization required. Unfortunately, this simple design VM lead to poor performance. The code was interpreted; there was no JIT compiler, it did not generate any native instructions.

In an attempt to rate limit particular actions that scripts could do, some library calls would cause the script to sleep for a given time. For example, calling llSay (the method to print some chat text) would cause the script to yield control for 0.1 seconds. Users worked around this limitation pretty easily; users would create a Prim containing multiple scripts. They could farm out the rate-limited work to multiple scripts using message passing. There was no limit to the number of scripts a user could have so this meant they’d effectively have unlimited calls to the restricted function.

For later features, we replace these sleeps were with per Sim rate limits. The rate limit was fixed for scripts that were attached to an avatar and proportional to the quantity of land owned within a region. For example, own 10% of the land, get 10% of the budget. This same style of limit was applied to the number of Prims within a region. This means that the rate limit was now, at least somewhat, tied to actual resource usage.

The users applied similar techniques to give more storage to their objects. Again by placing multiple scripts in a single object, users could distribute the values to be stored to multiple scripts. To retrieve it, they could broadcast a message to all their scripts and the relevant one would respond.

On the Sim, scheduling of scripts is effectively round-robin, however, there are some differences in the way events are scheduled. These differences were discovered and exploited by the users; for example, the users would craft scripts that could get more CPU, they would creating a loop using the colour system. A user could add a handler to the colour changed event and then within the handler change the colour. This short circuited the scheduling and allowed the script to jump the queue.

Like any thing in SL, users created scripts within the Viewer. There is an editor window with basic autocomplete which users can write or paste in any code. The code is then compiled on the client and uploaded like any other asset. This service was a simple, Apache hosted, CGI perl script. The upload process did no validation of the bytecode and users would occasionally try to exploit this by uploading garbage (or deliberately malicious) bytecode.

Users want to be able to use a mainstream programming language; the professionals wanted to be able to hire normal programmers with language experience and the amateurs wanted a transferable skill. The language and runtime often got in the way of adding new functionality, the lack of any data types made some methods very inefficient.

Second Life is fundamentally about user created content and User Scripting was one of the main tools used for this. It suffered many serious flaws and limitations including the problems I’ve described above. We wanted better performance, fewer limitations and new programming languages. To that end we replaced the legacy VM with Mono and the compiler backend with one that could generate CIL instructions. However, this wasn’t a simple swap; it we had to solve many complex problems. The implementation of this is going to be the topic of my next post.

IIRC: Persistence in Second Life

In a colleague’s recent presentation, he mentioned the Second Life in the context of persistence in virtual worlds. As I used to work a Linden Lab, I thought I’d follow up with some more information/notes about how it actually worked. This stuff isn’t secret, they published it on their wiki, mentioned it in office hours and I’ve seen other presentations too. I worked on the small team that worked on the Simulator.

The Simulator or Sim was responsible for all simulation of a 256m x 256m region and all the connection of players within it. The state of a Sim was periodically saved to disk and uploaded by another process to a SAN (often called the asset database). The state was also saved to disk in the event of a crash. Upon restarting, the Sim would attempt to use the saved state, if it could not, it loaded the last normal save from the SAN. This meant that there was up to a 15 minute window from a change in the state of a sim to it being persistent.

This gap could be exploited by players, who would take an item into their inventory then deliberately cause crashes with various exploits. This would duplicate the item into their inventory and leave in place in the Sim.

To mitigate this, we managed to fix all of the reported crash causes. Discovering and fixing all those bugs took years and it was a constant battle to keep on top of them. We had great reporting tools and stats of call stacks. There was also an army of support people who could manually replace lost items. New features inevitably introduced new opportunities for exploit and old exploits were infrequently discovered. Although this mostly mitigated the problem, it did not solve it. Sadly, even if a Sim never crashes you cannot be sure that your transaction will be durable; Sims died for other reasons too. For example, they were killed if their performance degraded too much and occasionally there where accidental cable pulls, network problems and power outages.

The Sim state file could get pretty large (at least 100MB); it contained the full serialised representation of all the hierarchy of entities (know as Prims in SL jargon) within it. This was unlike the inventory database which just had URLs to the items within it. This was a legacy from a time when everything was a file.

The Second Life Sim was effectively single threaded; it had a game loop that had a time slice for message handling, physics and scripts. IIRC, it wrote the state by forking the process to do the write. If we had attempted to write the entire sim state each time we made a modification, it could have been a problem for performance, with the potential for users to introduce DoS attacks.

Users were not the only things that could modify Sim state; scripted items could spawn stuff or even modify themselves. That’s why it was done at a limited rate. Serialised Sim state was not the only form of persistence. Second Life had a rich data model, including user representations, land ownership, classified adverts and many other areas.

One of the largest databases was the residents (the name for players) inventory. The inventory was stored in a sharded set of MySQL databases. The items contained in the inventory were serialised and stored in a file in the SAN or S3, much like the Sim region data. The database contained a URL to the resource that represented the item. Some residents had huge inventories with hundreds of thousands of items. The inventory DB was so large (along with operational decisions to use commodity hardware) that it needed to be sharded. The sharding strategy is to bucketed user based on a hash of their UUID.

Having a centralised store for inventory was essential, most users had inventories way too big to be migrated around the world. It also has operational advantages, the Dev/Ops team were well versed with maintaining and optimising MySQL. Unfortunately, by sharding like this, you lost the ability to do transactions across users as they’re no longer part of one database.

Back when Second Life was still growing, the main architectural aim was focused on scalability, reliability was secondary. The database had historically been a point of failure. Significant effort was put in to partition it, so that it would not be again. The architectural strategy was to migrate all clients of the DB to use loosely-coupled REST web services.

REST Web services are a proven scalable technology; they are what the Internet is built of. Provided the services are stateless, they are able to scale well and will often exploit caching well. Web technologies (specifically LAMPy) used were well know by Dev/Ops; they made scaling a deployment issue.

A secondary goal, of this initiative, was to allow a federated virtual world; something to allow other companies and individuals to host regions and still continue to use their SL identity. We got part way through this before I left, but I don’t think it ever got completed since the growth of SL stopped.

Second life went the long/hard way round to achieve durable transactions. This in part was due to the general issue having a hard-to-change monolith in the simulator. This caused many other problems; it made architectural changes hard. Importantly, it wasn’t difficult to change because the individual classes/files and files were badly coded. The Second Life Sim had too highly coupled parts; a change in one part could affect something seemingly unrelated. The Sim suffered from the ball-of-mud anti-pattern; it wasn’t originally badly designed, but it grew too organically and lacked structure.

I spent significant time introducing seams to produce sensible, workable sub-systems. Persistence is a hard thing to get right; Second Life needed to change several times to support their scale. That said, the state of the art in distributed databases (and even in normal database including MySQL) has progressed significantly since that time. Get it right, engineer in the qualities of transactions and persistence that we need from the start, and it will save you significant effort.

Gitflow-style-releases with Teamcity versioning

On the current project I work on we’re using the Gitflow branching model. We also use Teamcity for CI. Using Gitflow with SemVer means that you have to specify the version number each time you release giving it a specific meaning based on the changes within that release.

Previously, when using SemVer, I’ve just used the pre-release tag to identify builds from the build server, preferably, with something that ties a version to a specific revision. This is fine, but there is some duplication. Gitflow calls for you to tag the revision of release with a label. Teamcity has a build number that you can specify. The two of these overlap and I’d rather not have to type this number twice. I want to make the process of releasing as simple as possible.

There is a meta-runner for Teamcity the uses GitVersion to set the version number. This might provide you with the functionality you need, but unfortunately, for me there are two problems for me. The first is that we run build agent accounts that cannot use chocolatey and the meta-runner attempts to use it to install GitVersion. The second was to do with a limited checkout branches that Teamcity does; it doesn’t have all the tags. GitVersion attempts to use a full checkout to get the branches, I’d rather not do this as Teamcity has its own style of checkout that I don’t want to go against.

The first thing we need to do is get the version from the tag. As I’ve already mentioned, as with GitVersion, you can’t always get the tag with Teamcity’s limited checkout. Instead, the approach I’ve taken uses Artifact Dependencies.

Gitflow assumes one releasable thing per repository, or at least, only one version number. At the moment I’m using a major version of zero so I’m not tracking API breaking changes.

Using Teamcity, create a build configuration with a VCS trigger for your master branch. This is the branch from which your releases are built. To do this you need to add the trigger with the filter of +:refs/heads/master

Then add a Powershell script build step that executes the following.

$TagVersion = git describe --tags --match v*
Write "##teamcity[buildNumber '$TagVersion']"
$Version = "$TagVersion".TrimStart("v")
Write "$Version" > library.version
$parts = $Version.split(".")
$parts[1] = [int]$parts[1] + 1
$parts[2] = "0"
$SnapshotVersion = $parts -join "."
Write "$SnapshotVersion" > library-next-minor.version

This will generate two files, one which contains the current version from the tag, and another which contains the next minor version. Add these files (library.version & library-next-minor.version, renamed as appropriate) as artifacts. These artifacts can be used by other Build Configurations that produce outputs that need to be versioned.

In your build configuration use an Artifact Dependency on the latest successful build of the master branch. To make teamcity update the build number add the following build step.

$branch=git rev-parse --abbrev-ref HEAD
$hash=git rev-parse --short HEAD
$thisVersion = Get-Content corelibrary.version
$nextVersion = Get-Content corelibrary-next-minor.version
if ($branch -eq "master") {
$version = "$thisVersion"
} elseif ($branch -eq "develop") {
$version = "$nextVersion-$hash-SNAPSHOT"
} else {
$upper = $branch.replace("/", "-").toupper()
$version = "$nextVersion-$hash-$upper-SNAPSHOT"
}

Write "##teamcity[buildNumber '$version']"

If you need to access this later from your build scripts, you can get it from the BUILD_NUMBER environment variable.

My Build System Manifesto

In my current role I’ve inherited responsibility for the build system for the products my company is working on. This roles has a pretty broad remit; I’m involved in lots of areas including how our modules our layed out, how code is generated, packaging, deployment and even how the build supports the software architecture

For context, the system I have inherited is quite complex; it’s a mixed Scala and C# (for Unity3d) project, with heaps of code generation tangled into the build. There is custom dependency management build on top of Maven/Ivy. Unity3d has its own build system. The build is used for both the components and the games themselves.

The previous approach was to create a monolithic build system with all of the components required to build games and libraries packaged up with the bottom layers of the software. Instead of a monolithic build, we should adopt a approach more like the Unix philosophy, that is, we should provide the tools that people can use to extend their own build instead of attempting the replace their entire process.

I’d like to rewrite this, but I don’t expect this to be easy; it’s a moving goal with many developers still extending this system. While thinking about this problem I wrote a list to describe what I think makes a good build system. By sharing this approach I wanted to get the other developers on board with the direction I’m going so that they don’t generate any more technical debt.

Hopefully this list is not controversial so I haven’t attempted to provide justification for each item. I’m hoping that the value of each should be obvious. These should serve as preferences or guidelines rather than rules. In some cases, there may be technical reasons that they’re not achievable.

  • I want to be able to check out the source that I need from a single repository. I want this to build and test with a single command. I want to go from nothing to a working development environment in very few steps.
  • I don’t want to have to open several IDEs; it should be optional. If I do have to open another, IDE the build should be self contained. I shouldn’t have to open the same IDE twice either. Using an IDE should be optional – command line should be an alternative.
  • I only want to check out the code that is relevant to me; if it’s not in the products I work on, I don’t need it. I want the area I work on to be “small”.
  • I want to continuously integrate with my team. I want to be able to branch and develop with the same abilities as master/develop.
  • Working across components should be rare, but if I need to work across components, I want to be able to do this with minimum fuss, all on my local machine. That is, with very few extra commands, without manual copying.
  • I want to be able to work on sub-projects like they are a top-level project. They should be self contained/independent. That is, I can build, test and publish with single commands. Their dependencies should be built. It’s just convenience that a sub project is grouped together with others.
  • I want to be able to leverage my skills for each platforms I develop on. For example, if I’m using C#, I want to use Visual Studio and MSBuild. I do not want to have to learn skills that are not transferable. I don’t want to learn a new platform if it isn’t essential to what I’m doing.
  • I want to be able to continuously deploy/integrate. I don’t want to have to deliver to other teams. I want to be able to add value myself. That is, teams should work on products, not components.

I wrote the list in a format a bit like user stories, where the users (or the “I”) is a developer.
Later, it struck me that this list reads like a manifesto. It is after all a declaration of my intentions.

I believe that the build is something critical to the success of a project. A build process done wrong can dramatically harm developer productivity.

This list should capture some of the most important points of an effective build system. I haven’t attempted to capture everything; they are from the context of where I am, there are some “obvious” points, such as reliable and reproducible that I haven’t documented here as we already have them.

This list should capture some guidelines/qualities that I often see overlooked. I might go into detail of each of the points in future posts.

Hard to understand systems

Admitting something is hard to understand is often very difficult for software engineers, as no matter how complex a system is, if you put enough effort in you will eventually gain insight.

The best systems (not just software) are often easy to understand. If systems are easy to understand, this means you can more easily reason about them; you can track down bugs more easily, add features and optimise more effectively. Easy to understand systems are often a collection of well defined components that we can treat as black boxes; you do not have to understand the details to understand the system as a whole.

There are notable exceptions; for example, the Linux kernel and its driver space is notoriously difficult to understand. That complexity has been introduced over time. Despite the fact that the kernel is monolithic, it started out with manageable levels of complexity and reached a critical mass. Now there are many enterprises and individuals invested in its success. That said, not attempting to simplify is very risky; how do we know how complicated is too complicated?

Whilst, “hard to understand” can be subjective, its rare that easy to understand systems just emerge; you need to put effort into making them easy to understand. There are many tools we can use to better reason about a system. Abstraction underlies most techniques for understanding. By hiding extraneous detail we make systems easier to understand.

Layers constrain the dependencies between components. Using layers makes an architecture easier to understand because it aims to limit connectivity between components. Each connection between components has a complexity cost. That said, not all architectures need to be layered. Some systems are simple enough that they don’t need layers. Some systems require a certain structure that is inherently complex, and cannot be fitted into layers; luckily they are the exception. Layers themselves have a complexity cost and should be used sparingly.

To help our understanding and communication, we need to develop a common language to talk about the system (a.k.a. ubiquitous language); well defined terms will help us. Naming is important. For example, what is an Entity? Its one of the building blocks of the system I work on, and last time I asked this question, I did not get very consistent replies from the team. It doesn’t matter if you’re using the term Entity to be something more specific than the real world, but it should still be consistent with the rest of the world. For instance, I’ve seen many examples classes named Cache in different codebases that don’t fit the real-world definition.

The principles behind the Agile manifesto say that the best architectures come from self-organising teams. This means that the best architectures are not dictated to the team; the team have to do the architecture themselves. In some self organising teams a senior member will naturally fill the position of architect and in other teams all team members will take collective responsibility for the architecture. I believe architecture is evolutionary and that you discover it on the way; in the process of building your system.