Cocoa with Love

I haven’t written a Cocoa with Love article in 4 and a half years. What’s been happening? Is there anything new in Cocoa?

“Everything”, you say? Excellent! I should have plenty of new code to share.

But not in this article.

Previously on Cocoa with Love…

Cocoa with Love started as a place for me to write down some fun hacks I used for doing mischievous things in Cocoa – mostly abusing the Objective-C runtime. It’s fun to do things generally considered undocumented or impossible and Objective-C’s highly introspectible runtime let the devious programmer do much by hijacking other people’s code and subverting it. Of course, these sorts of actions are little more than cute tricks with few real use cases (they’re too risky, too high maintenance and there’s usually a non-devious way to achieve the same effect). I branched out in a number of different directions and even tried writing some articles about designing better apps, I primarily demonstrated a “business as usual” approach to app development and I’m not really happy with the overall result.

I’ve looked back through most of my old articles and many contain code that no longer compiles or code that’s completely superseded by newer APIs. Still more contain out-of-date information or worse: offer advice and opinions I no longer endorse.

I’m not planning to take the old articles down – people do still read them – but I’m not going to update or maintain them either. I’ve put a prominent “Reader beware” disclaimer on all of them to highlight that they should be viewed with appropriate caution. They’re doomed to become less relevant as Cocoa development continues to move towards Swift and away from Objective-C and that’s fine by me. I’d rather take the opportunity afforded by Swift and start fresh.

A new theme

Switching from Objective-C to Swift will be an obvious change as I return to Cocoa with Love but it’s not the only change. I’ve wanted to return to Cocoa with Love since before Swift’s announcement (it’s a long delayed 2014 New Year’s Resolution of mine) to try writing articles that loosely fit into a broad, overarching theme: writing maintainable Cocoa apps.

I realize that as a topic, “writing maintainable Cocoa apps” sounds like a good choice for boring people into a coma. Despite the banal connotations, maintenance is truly the hardest problem in app development and minimizing maintenance costs is a good way to make everything else in programming feel better.

The hardest problem in app development is not "how do I implement this" or "how can I make this run faster". These are problems that programmers will necessarily have to address but they're not persistent; we solve them and move on. Maintainability is a pervasive and persistent problem; there's no single solution and it doesn't simply go away.

The importance of maintenance is due to the fact that an app is never “done”. In most modern desktop and mobile spaces, update cycles for operating systems, devices – even device categories – are annual or faster. There is always a change that requires a corresponding update. At the same time, your app has its own update cycle because you want to advance its feature set. Sadly, the fundamental architecture of modern user-applications does not gracefully endure multiple refreshes. A typical codebase will grow progressively harder to maintain until after 2 to 3 years, you’re barely maintaining existing features, let alone adding new ones. If you can afford to double your developer resources, you’ll need to completely rewrite or thoroughly refactor otherwise the only option is app abandonment.

This situation is horrible: we invest months or years into projects and throw that work away on a regular basis.

A significant portion of this problem is due to the same code maintenance issues that affect all programming. Any programming project – not just user-applications – can suffer loss of momentum over successive releases due to haphazard implementation, poor original designs, difficulties refactoring code and other forms of “code rot”.

But the manifestation of maintenance difficulties affecting user-apps is particularly acute. Even fair implementations of adequately designed apps can rapidly grow unreasonably difficult to add features or update for new technologies and OS versions. User-applications, as a programming domain, have a high concentration of traits that most programming tries to avoid for the exact reason that they are difficult to maintain. Most prominent of these traits are:

apps are inherently stateful
controllers naturally subsume other components increasing coupling and fragility
efforts to combat coupling lead to excessive layers of abstraction rather than cleaner code
apps include many timing dependent behaviors including callbacks, asynchrony, animation and user events
apps are heavily reliant on OS provided APIs, taking many things out of your direct control
apps are difficult to test automatically

The cumulative result is that app programming, while not a technically difficult field, tends to create fragile webs of code that are highly interdependent along a number of different axes leading to programs that are easy enough to design initially but surprisingly time-consuming and expensive to maintain and update.

We need to write code that is resilient against regular code maintenance issues and at the same time, address the problems that undermine app maintenance in particular. Unless we do, every program we write is just another soon-to-be-abandoned app.

How do we improve “maintainability”?

If you’re savvy enough to reading this blog, then you’re doubtless aware that there are plenty of ideas around – technologies, frameworks, design patterns and programming techniques – that aim to limit the various negative traits that I listed as contributing to maintenance problems for user-applications.

Some are applications of basic programming theory
Some are lessons to avoid misusing development tools
Some are as simple as better coding practices
Some try to limit problems through evolved design patterns
Some try to detect problems sooner through better testing approaches
Some involve changing conventional thinking about errors, data, time or state
Some involve pure functional, dataflow, declarative or other programming paradigms
Some are frameworks that encapsulate a difficult trait throughout your program
Some are frameworks that entirely replace the native application frameworks with alternatives

However, it’s interesting to consider though, that despite the number of techniques claiming to limit maintenance problems, none are universally considered essential for app development. Swift language notwithstanding, mainstream Cocoa app development is not dramatically different to NeXTSTEP development 20 years ago. Which leaves open the question: how many of these techniques – if any – actually succeed at their goals?

Conclusion

My aim with the “Swift era” of Cocoa with Love will be to look at different approaches to make app programming suck less. If we program with fewer runtime errors and our new features are easier to add, we’ll have more fun – even if “maintainability” sounds like a dull word.

For my first proper article since returning to Cocoa with Love, I want to talk about “partial functions” (functions with preconditions).

It’s an unusual topic for an app programming blog since, outside of API design or Design by Contract, preconditions are not widely discussed. This isn’t because our functions are precondition-free. Instead, it’s because we tend to test applications in such a narrow way that we never consider the entire range of values that can be passed to our functions. Our functions may have lots of implicit preconditions (including dependencies on broader program state) that we’ve never considered or that we never document (so are easy to violate during subsequent changes).

Ultimately, consideration of preconditions and how to avoid partial functions is the consideration of whether our programs work reliably across the whole range of possible scenarios.

Background: type requirements versus runtime expectations

Every function has two categories of requirements:

Type requirements: a function must accept arguments and return a result as specified by its type signature. The compiler enforces the type requirements, ensuring both caller and function meet the requirements.
Runtime expectations: a description of what the function will achieve by its conclusion. Ensuring runtime expectations are met is the role of the function’s programmer (and testing).

What happens when these two categories of requirements conflict?

Let’s consider a function that converts an Int to a Bool (those are the type requirements). Rather than use the C rule of 0 is false and anything else is true, this function will be strict: 0 is false and only 1 may be converted to true (that’s the runtime expectation).

functoBool(x:Int)->Bool{ifx==0{returnfalse}elseifx==1{returntrue}}

This function satisfies the runtime expectations: 0 becomes false and 1 becomes true but the compiler will highlight the closing } character with the error message:

Missing return in a function expected to return ‘Bool’

The compiler knows that we haven’t handled every possible value of x and the function can skip over the two if conditions and reach the end without returning a value (a violation of type requirements).

We could change the function to handle every possible value of x:

functoBool(x:Int)->Bool{ifx==0{returnfalse}returntrue}

but now we’re converting values like -1 to true in violation of the runtime expectations.

This is a conflict between type requirements and runtime expectations.

Background: preconditions

The conflict occurs because the runtime expectations imply an additional requirement that is not part of the type requirements. We call this additional requirement a precondition. In our simple toBool example, the precondition is that the value of x must be 0 or 1.

This article mostly discusses preconditions on parameters (since it is easier in simple examples). However many preconditions depend on broader program state. For example, if you need to initialize a module before invoking its methods, that's a precondition. If you need to start a server before making requests, that's a precondition. If you're only allowed to set a value once on an object, that's a precondition.

That’s simple enough to say but there’s a problem: the precondition isn’t known to the compiler so it is possible to accidentally violate the precondition at runtime. What should a function do if the precondition is not met?

The only safe option is to trigger a fatal error (abort the program).

This might not sound “safe” but it’s the only approach that prevents something potentially worse. If a function fails to meet runtime expectations and actually returns, this means anything dependent on the function is now in an indeterminate state. Once the program is in an indeterminate state, any branch could go the wrong way, any action could be wrong. Maybe toBool was trying to answer the question “Do you want to delete everything on your hard disk?” Maybe toBool was trying to determine if the program should exit a loop but now it’s stuck in the loop, allocating more memory until the whole computer grinds to a halt.

We also want to abort the program because it draws attention to the exact location where the programming error occurred – rather than forcing us to look at subsequent symptoms and try to trace the symptoms back to where the program went awry. A fatal error simplifies debugging and ensures that when an error occurs, it’s likely to be caught and reported.

Background: preconditions in Swift

So we need to enforce the precondition by triggering a fatal error if it is not satisfied. Swift has a function named precondition which, unsurprisingly, does exactly that: tests a condition and triggers a fatal error if the condition is false. Our function then becomes:

functoBool(x:Int)->Bool{precondition(x==0||x==1,"This function can only convert 0 or 1 to Bool")ifx==0{returnfalse}/* x == 1 */returntrue}

Terminology note: I’ll refer to the precondition function for most of this article (since its role is unambiguous) but numerous other functions may be similarly used to directly (or indirectly, though a child funcion) trigger a fatal error including assert, assertionFailure, precondition, preconditionFailure, fatalError or other standard library functions that use “trap” intrinsics like Builtin.int_trap or Builtin.condfail.

Now, all of this might seem a little obstinate. I’ve deliberately chosen type requirements and runtime expectations that conflict and I’ve refused to change either, forcing the use of precondition. You might think no one would ever design a function this way of that you’d never use a function like this.

The reality is that nearly every Swift program uses this type of function indirectly through Swift standard library functions that contain similar precondition checks. Most common in Swift are the Array subcript operator (which has a precondition that the index be in-bounds), the force-unwrap operator ! on the Swift Optional type (which has a precondition that self be non-nil), any use of the ImplicitlyUnwrappedOptional type (which similarly has a precondition that self be non-nil) and the default integer operators and conversions (which trigger fatal errors on overflow).

There’s another kind of precondition you may have in your code: functions that may misbehave, non-fatally, when certain implicit, unchecked requirements are not met. This is an extremely difficult point to keep under control but you need to consider whether any of your functions have implicit, unchecked requirements and add precondition checks to document the requirements and ensure that you don’t accidentally violate them in the future.

Partial functions

A precondition is used to enforce the requirement that a function may be invoked only for a partial subset of the total set of values that would be valid according to the function’s type signature. That leads us to the mathematical term “partial function”.

The next paragraph is going to be math jargon. It’s important to use the correct terminology. It’s not so bad; hold your breath if you’d like.

In mathematics, a partial function is a function constructed to map values from a domain (set of possible input values) to a codomain (set of possible output values) where the function is undefined (no appropriate mapping exists) for one or more values in the input domain. The subset of inputs values where the partial function is actually defined is called the domain of definition. Functions that are defined for all possible inputs are called total functions.

A simple example of a partial function in mathematics is division. A mathematical function that divides 5 by any real number:

f : &reals; \to &reals; where f : x &map; \frac{5}{x}

is undefined for x = 0 because there is no sensible way to divide by zero in typical mathematics.

If implemented in Swift, due to the “undefined” case, we use precondition to enforce the requirement that the function may be invoked only within the “domain of definition”:

funcdivideFiveBy(x:Real)->Real{precondition(x!=0)return5/x}

Hidden partial functions

Now there isn’t a Real type in the Swift standard library. We do have Double but Swift’s Double doesn’t actually work this way (see “Change the behavior” below). However, Swift does work this way if we swap Int for Real:

funcdivideFiveBy(x:Int)->Int{return5/x}

Where did the precondition go? It’s still there. We don’t need to write precondition because it’s part of the / operator. The infix / operator for Int uses “checked” division in Swift (implemented in the standard library as _overflowChecked) so it will trigger a fatal error if it is invoked with 0 as its second argument. This occurs because the Int type in Swift, as with ℝ (reals) in the math example above, has no sensible way to handle division by zero.

The following is another example of a partial function because it may trigger a fatal error based on the value of parameter someArrayIndex:

funcsomeArrayFunction(someArrayIndex:Int)->Element{returnmyArray[someArrayIndex]}

And so is this, since it may trigger a fatal error based on the state of self:

structsomeStructWithAnOptionalMember{varoptionalSomeType:SomeType?funcaccessor()->SomeType{returnoptionalSomeType!}}

The problem with partial functions

How did I know that the infix / operator for Int uses “checked” division in Swift and will cause a fatal error if it is invoked with 0 as its second argument?

The only way to know is to check the documentation. The Swift Programming Language describes the requirements of division as:

arithmetic operators in Swift do not overflow by default. Overflow behavior is trapped and reported as an error.

It’s up to the reader to either know (or experiment and find out) that this means that the division operator will write a failure message to standard out and abort the program if you ever pass 0 as the second argument to integer division.

This makes partial functions terrifyingly dependent on documentation and testing (two areas worryingly prone to lapses):

A partial function’s requirements must be clearly documented
Users of the function must read and understand the documentation
Tests must exercise a wide range to confirm usage remains correctly inside required bounds in all cases

The biggest risk with partial functions is misbehavior in a deployed build. A corollary is that they don't cause as many problems for testing code. In tests we want to fail early and often. Use of Array subscript operators, Optional force-unwrap ! and other convenient-but-severe partial functions in testing code is okay.

Let’s assume points 1 and 2 are satisfied (or at least noticed during debugging). We still need to satisfy point 3.

Unfortunately: debugging and testing are unlikely to check all scenarios in a non-trivial program. Debugging and testing excel at validating specific scenarios but unless your tests are extremely thorough, it’s likely your users will be able to get one of your functions into a state you never tested. If your program uses partial functions, this leaves you expsed to potential runtime failure.

Avoid partial functions and use total functions instead

Partial functions should be avoided because:

they have requirements that the compiler cannot verify
they can pass your testing but still cause fatal errors after deployment if different data is encountered

Let me be clear: it is not the checking of preconditions that should be avoided. Absolutely, if your function has preconditions, you should check them immediately or face rendering your program “indeterminate”.

The problem is the existence of preconditions.

A function is partial if it has preconditions. We want to design our functions as “total functions” that have no preconditions. This means that we need a sensible result for every possible input value.

Let’s revisting the mathematical function that divides 5 by any real number. Previously we defined it as a partial function that was undefined for x = 0. Let’s write it as a total function:

f : X \to &reals; where X = \{x \in &reals;: x \neq 0\}, f : x &map; \frac{5}{x}

To explain this in a more programmer-friendly way: we’ve changed the type signature of the function. Instead of accepting any “Real” as an input, I’ve defined a new type X that can be any value in the Reals except zero. Now, the function is defined for every possible value in X and the function is a total function.

A rough equivalent in Swift would be:

structNonZeroInt{letvalue:Intinit?(fromInt:Int){guardfromInt!=0else{returnnil}value=fromInt}}funcdivideFiveBy(x:NonZeroInt)->Int{return5/x.value}

The runtime requirement in divideFiveBy is gone and instead we have a new type, NonZeroInt that satisfies the requirement at compile-time.

You might be able to see why I mentioned, above, that it’s important to think about precondition as subtracting values from the total set of valid values according to the type signature. We can avoid a precondition by defining a new type where those precondition-excluded-values are avoided by design in the new type.

Failable construction, non-failable usage

It's uncommon to see the term "partial function" used in imperative languages like Swift but it's a common term in functional languages like Haskell, Unsurprisingly, Haskell has pages on avoiding partial functions, too.

In the previous example, we created a new type, NonZeroInt, but the constructor for this new type can fail (return a nil instead of a value). In some sense, we’ve simply taken the burden of ensuring correctness from the call location of divideFiveBy and put it somewhere else. However, this change has helped for two reasons:

the compiler will ensure that we check the NonZeroInt?(fromInt:) return result
we’re validating the value at its construction, not when it is used

The first point stops the function being a partial function but the second point is just as interesting.

Ideally, we shouldn’t construct NonZeroInt from an Int immediately before passing into divideFiveBy, instead we should never have the Int at all; the NonZeroInt should be constructed at the source. Maybe the source is a settings file, maybe the source is user-input, maybe the source is a network connection; in any case, as soon as the value comes into existence, we immediately know if it’s valid or invalid. In the invalid case, we can report the input as the source of the problem. This is a huge improvement over carrying an invalid 0 value Int for an unknown time until it is finally passed to the function divideFiveBy which has no idea about the origin of its parameters.

Think about the path of data through your program as a pipeline: if your data won’t fit through the whole pipeline, reject it at the start rather than letting it cause problems in the middle. Ideally, construction should be the only scenario that can fail and every use case should be a “total function”.

Other approaches for avoiding partial functions

Avoiding a partial function involves making the type requirements and the runtime expectations agree.

Defining a new, more specific type that encapsulates the runtime expectations’ complete requirements for the data is the conceptually best approach to addressing the problem. As I explained, it pushes any checks on data back to the construction point which is the best place to handle error conditions.

However, there are plenty of cases where it’s not the most practical option:

it may be algorithmically difficult to determine the constraints for the data ahead-of-time
you might not have access at construction-time to state information required to check validity
maybe you don’t have control over the design of earlier stages in the data pipeline
maybe you construct data in a lot of places but only use it in one place so it’s simpler to change the usage location instead of the construction location

Fortunately, there are plenty of other options.

Change the return type

The simplest solution to making any partial function into a total function is to change the type signature to include room in the return type to communicate a failure condition. Instead of needing to trigger a fatal error, we can communicate the condition back to the caller and the caller can choose how to handle the result.

Swift’s Optional is ideal for this as we can show with our toBool function:

functoBool(x:Int)->Bool?{switchx{case0:returnfalsecase1:returntruedefault:nil}}

An example of this in the Swift standard library is the subscript operator on Dictionary. Unlike the subscript on Array, the Dictionary version returns an Element?. This means that you are allowed to look up a key that doesn’t exist.

I personally like to use the following extension on CollectionType to allow this type of Optional returning access for Array or other CollectionTypes:

extensionCollectionType{/// Returns the element at the specified index iff it is within bounds, otherwise nil.publicfuncat(index:Index)->Generator.Element?{returnindices.contains(index)?self[index]:nil}}

The name at in this case is borrowed from a function in C++ that accesses the value if it exists or throws an exception if it does not. A Swift throws function would be closer to the C++ implementation but returning an Optional is more in line with the pattern established by Dictionary and is syntactically tighter for this use-case.

However, using Swift’s error handling mechanism is also a valid way of making a function total, if you’d prefer it:

enumArtithmeticError{caseDivideByZero}funcdivideFiveBy(x:Int)throws->Int{switchx{case0:throwArtithmeticError.DivideByZerodefault:return5/x}}

In Objective-C, exceptions were (usually) used to indicate unrecoverable situations (i.e. partial functions). However, Swift errors are meant to be caught; in fact they must be caught. Therefore throwing an error in Swift is really just offering a different return type – semantically similar to returning an Optional despite the syntactic differences.

Change the behavior

Depending on context, it might make sense to change the runtime expectations so that every input is mapped to a valid output. This is

As I mentioned in the beginning, if we had used the C language’s definition of a Bool (anything that isn’t zero is true) then our toBool function would never have needed a precondition.

We can also change the behavior of the divideFiveBy function to do something not-entirely-accurate but which may be valid, depending on expected usage:

funcdivideFiveBy(x:Int)->Int{switchx{case0:returnInt.maxdefault:return5/x}}

This mirrors the Swift standard library’s division operator for Double:

funcdivideFiveBy(x:Double)->Double{return5/x}

Unlike the version from Int to Int, this function is a total function, not a partial function.

The / operator for Double will return Double.infinity (IEEE 754 “positive infinity”) if invoked with x == 0 which isn’t really true in a mathematical sense but is sufficient that you can work out what happened. Of course, the problem with this type of behavior change is that it might obscure the “basically an error” status of the result (for example: you should be handling the zero denominator rather than attempting to scale a drawing by “+infinity”).

Keep dependent components together

A common reason for partial functions is that you’re using two pieces of data that need to agree with each other (like an Array and a subscript index) but you create and store them separately so they are not naturally kept in agreement – they might even be created separately and could be out of agreement from their construction.

We can avoid preconditions on separate data being in sync by holding the required data in a single data structure that ensures the requirement.

The following alternative approach to indexes into an Array ensures the index remains valid at all times by keeping a reference to the Array and preventing transformations to the index that would make it invalid.

enumAlwaysValidArrayIndexError:ErrorType{caseNoAcceptableIndex}structAlwaysValidArrayIndex<T>{// Store the indexvarindex:Int// Together with the arrayletarray:Array<T>// Construction gives the first index in the array (or throws if the array is empty)init(firstIndexInArraya:Array<T>)throws{guard!a.isEmptyelse{throwAlwaysValidArrayIndexError.NoAcceptableIndex}array=aindex=array.startIndex}// Only allow the index to be advanced if there's somewhere to advancemutatingfuncadvance()throws{guardarray.count>indexelse{throwAlwaysValidArrayIndexError.NoAcceptableIndex}index+=1}// We can deference using the index alone since the array is held internallyfuncelementAtIndex()->T{returnarray[index]}}

This might seem like a strange thing to do but it’s similar to how Swift String indexes advance safely. With a StringString.CharacterView.Index, you’re required to construct the index from a String.startIndex and the index stores the string’s internal _StringCore storage with it so that it can iterate correctly over the Unicode grapheme clusters, keeping the index valid.

Minor aside/complaint about how Swift String indexes work

Sadly, despite storing the _StringCore internally and therefore being able to ensure validity at all times, String indexes let you hit fatal errors by advancing past the end (rather than a more graceful nil) and even worse: don’t themselves access characters but instead need to be passed back into the subscript on a String. This second problem lets the String.CharacterView.Index and String be out-of-sync again (since you can use an index from one String on a different String), leading to potential fatal errors (for out-of-range accesses) or invalid Unicode as produced by this example where an index from “Unrelated string” is used to access an invalid offset in an Emoji string:

print("👿👿"["Unrelated string".startIndex.advancedBy(1)])// Output will not be the 'n' in "Unrelated" or the second "Imp" Emoji.// Instead we get the Unicode invalid character marker '�'.

I hope these problems are addressed in future changes to the Swift standard library (even a basic precondition failure when using indexes with the wrong string would be preferrable).

Change the design

A final way to avoid partial functions is to avoid design patterns where they are common. This means: use the library functions and features that are total functions. If we restrict our programming to total functions then our functions are more likely to be total functions too.

Easy examples include using for x in, map and filter to perform most of the needed work on an Array without using the subscript. Similarly as an alternative to Optional’s force unwrap, you can always use if let, switch and flatMap and avoid any potential fatal errors.

What are the reasons for writing partial functions?

I’ve used a lot of words to say “partial functions are bad”. I also shown multiple ways to avoid them.

Why do partial functions exist at all? There’s a few reasons. I don’t agree with them all.

Aesthetics

The biggest reason for partial functions is aesthetics: the interface designer didn’t really want to define a new type, return an Optional or declare a function as throws.

To illustrate this claim, there’s a number of partial functions in the Swift standard library that are designed to look like traditional operators from C while transparently adding safety checks where memory unsafe behavior could have occurred in C. This includes Array subscripts, ImplicitlyUnwrappedOptional and overflowable arithmetic; these are designed to look like their C equivalents while applying runtime checks internally. There’s a historical/social expectation: people expect an array index to return a non-Optional. People expect that they can forcibly unwrap Optional if they want. People don’t want the syntactic overhead of dealing with overflows for most arithmetic.

The choice to use precondition rather than return an Optional (or another alternative) is risky and crash prone but that’s how humans work sometimes.

Internal functions with simple conditions

Preconditions involving multiple values being in-sync or methods on an object being invoked in a given order take additional work to avoid. For our internal functions – where we are the only people who need to learn and obey any preconditions – the amount of work to avoid the precondition might not be worth the effort, particularly if the precondition is simple and obvious and we’re sure we won’t accidentally violate it.

Just make certain to use precondition to explicitly check, rather than run the risk of accidentally violating the precondition later.

Method overrides

If an overrideable method is required by the superclass to do something (e.g. invoke super), we often need to rely on precondition or other similar tests to ensure the requirement occurs.

This is really a limitation of how object-oriented programming composes interfaces: subclasses are fully in control and the superclass only receives control when the subclass yields it. If the superclass wants to place requirements on the subclass, it can only do that checking the requirement after-the-fact (a “postcondition” but technically still implemented using precondition).

Effectively unreachable code paths

The actual conditions required to reach some code paths are so convoluted that they’re basically unreachable. This sometimes occurs when checking the results of functions: we feel obliged to check all error results but we might not be able to design a test case to actually reach the failure path. Rather than write a recovery attempter that we can’t test, we may place a preconditionFailure or a fatalError in the path to confirm out belief that the branch is unreachable.

Examples where this is appropriate include handling memory allocation failure return paths from certain C functions. On a modern system, a memory allocation failure is usually impossible (the OS will kill the process before malloc fails) so writing code to test and handle this situation is a poor use of our time.

Forced correctness

In some cases interface designers want haphazard users of their functions to see failures. There’s an argument that programming defensively against careless users of your function encourages poor programming and prevents users understanding what they’re doing wrong; instead, we should force bad programmers to confront their mistakes.

I think this argument is more valid in languages like C where returning an int error condition is frequently ignored by the user so a fatal error is more attention grabbing. In Swift, I think this approach is inappropriate. Users can’t ignore Optional or throws in Swift and will learn their mistakes from a returned “invalid argument” ErrorType just as well as they’d learn from a precondition failure – in fact, better, since users might not be aware of the existence of possible precondition failures but the syntactic overhead for throws is unavoidable so it’s possible that a never-before-seen failure will still be handled correctly at runtime.

Truly haphazard programmers are likely to handle Optional and throws results by using force-unwrap or try! in Swift so they’re going to see fatal errors anyway.

Logic tests

The assert function is commonly used to test “soft” postconditions (where a false result is not a critical failure) and other program logic.

If you’re unaware, assert works like precondition in Debug builds (compiled with ‘-Onone’) but does nothing in Release builds (compiled with ‘-O’). This split behavior complicates the practical implications but ultimately assert is still used to fatally test conditions in Debug builds so its usage in a function is still equivalent to a partial function.

Ultimately, assert treads a weird line between a precondition that isn’t properly tested in Release builds (leaving you open to indeterminate behavior) and a logic test that should be in your testing code and not in your regular code.

I personally think assert is a good idea only when a precondition is too computationally onorous to run at Release. In all other cases, you should be using precondition (because you really do want the condition to be true) or you should move the testing into your test code (because it’s not truly a precondition and you’re just validating behavior in a specific case).

Conclusion

A function with one or more preconditions is a partial function (valid for only a subset of the values implied by the type signature). Each precondition represents a potential programmer error you can make when using the partial function. Unlike type requirements (where programmer errors get caught at compile time) precondition programmer errors manifest as a fatal errors at runtime.

Fatal errors are obviously bad and you can avoid them by avoiding partial functions.

Do not avoid a partial function by failing to check preconditions. Boldface for a reason: that’s worse than a crash. Failing to check preconditions results in indeterminate behavior that can let misbehavior propagate, potentially leading to “worst case” scenarios. It also impedes debugging and allows misbehaviors to persist rather than being quickly caught. If your function has requirements, check them!

Instead, we can avoid partial functions by fixing our design to eliminate preconditions.

Preconditions are required only because there are values permitted by the type requirements that cannot meet the runtime expectations. If you fix the type requirements (choose input types where every value can meet the runtime expectations) or change the runtime expectations (to handle every value in the type requirements) there’s no need for preconditions.

This can be as simple as returning an Optional instead of a simple value. Or defining an input type that validates requirements on construction (returning nil if the requirements can’t be met). Given Swift’s syntactically efficient conditional unwrapping operators and error handling capabilities, the cost of handling lots of these types of conditionals is quite low so partial functions should be a very rare thing.

Despite all this, partial functions do exist. There are some narrow cases where they are necessary and some other situations where they’re commonly used. And due to their use in the Swift standard library, almost all Swift programs use partial functions in some form so you need to be aware of them.

For these reasons, you might even decide to create partial functions of your own. In the next post, I’ll look at testing partial functions by catching precondition failures so you can ensure any partial functions you create will trigger fatal errors correctly as expected.

In the previous post, I discussed “partial functions” and advised against them. As stated in that article though, there are situations where partial functions are necessary or expected. If you’re going to write a partial function, you need to test it and that means testing a precondition failure occurs when the requirement is violated.

One problem: precondition failures crash the xctest harness making testing annoying. In this article, I’m going to show a Mach exception handler that catches these crashes and rewrites the application’s state as though an Objective-C exception was raised instead, making precondition failures testable.

Background

On ARM, a fatal error is implemented as a brk instruction which triggers an EXC_BREAKPOINT instead. We'll only test iOS code in the simulator, which is x86-64, so we won't worry about EXC_BREAKPOINT.

A precondition failure is implemented in the Swift standard library as a Builtin.int_trap() which is ultimately compiled as a ud2 instruction on i386/x86-64 platforms. This instruction exists for the sole purpose of triggering an “invalid opcode” which will be caught by the operating system, leading to a Mach EXC_BAD_INSTRUCTION exception at runtime.

Using only Swift language and standard library features, there’s no way to recover from a precondition failure.

Traditional approaches to this type of situation involve running the code in a child process and monitoring crashes from the parent or using build configurations to swap the fatal error with something more catchable.

Mach exception handlers provide a different approach. With a Mach exception handler the operating system gives us a chance to respond to the Mach exception and we can use that chance to rewrite our application’s history as though the ud2 instruction never happened.

Tests first

What we need is a function named catchBadInstruction that satisfies the following test:

functestCatchBadInstruction(){#if arch(x86_64)// Test catching an assertion failurevarreachedPoint1=falsevarreachedPoint2=falseletexception1:BadInstructionException?=catchBadInstruction{// Must invoke this blockreachedPoint1=true// Fatal error raisedprecondition(false,"EXC_BAD_INSTRUCTION raised here")// Exception must be thrown so that this point is never reachedreachedPoint2=true}// We must get a valid BadInstructionExceptionXCTAssert(exception1!=nil)XCTAssert(reachedPoint1)XCTAssert(!reachedPoint2)// Test without catching an assertion failurevarreachedPoint3=falseletexception2:BadInstructionException?=catchBadInstruction{// Must invoke this blockreachedPoint3=true}// We must not get a BadInstructionException without an assertionXCTAssert(reachedPoint3)XCTAssert(exception2==nil)#endif}

The catchBadInstruction function runs the closure passed to it. If any Mach EXC_BAD_INSTRUCTION exceptions occur, this function catches the Mach exception, creates an instance of BadInstructionException (a subclass of NSException), raises that exception at the point where the EXC_BAD_INSTRUCTION occurred and then catches the BadInstructionException outside the child closure. The BadInstructionException raised, if any, is returned.

This catchBadInstruction function will catch any of the Swift fatal error aborts, including assert, assertionFailure, precondition, preconditionFailure, fatalError. By catching these failures, we can test that functions required to raise assertions for particular combinations of inputs are being applied correctly.

This code will also catch other unrelated sources of Mach EXC_BAD_INSTRUCTION exceptions but they’re extremely rare unless your binary is corrupted (not a serious possibility in the testing scenarios to which this code should be limited).

A list of serious caveats

I have tagged this article with the tag ‘hacks’. I intend this tag to communicate that the code in this post does some clever things but the result is well outside the bounds of what safe, maintainable programs should do. Let me be clear: there’s no good reason to run this code in your deployment builds. This code is intended for testing, exclusively.

One of the simplest things this code does is also the least usable in a deployed program: throwing and catching an Objective-C exception over your Swift code. Even in Objective-C, exceptions are usually unsafe unless you’re extremely careful. The situation is worse in Swift since we can no longer ask the compiler to generate exception-safe automatic reference counting. A few memory leaks will almost certainly occur and other code may misbehave due to interrupted side effects or partial construction. It’s up to you to minimize these problems if they surround precondition failures that you need to test. Under test conditions, this is usually a manageable problem.

Installing multiple Mach exception handlers may create complications. I’ve not tested multiple, nested or otherwise conflicting Mach exception handlers and I’m not convinced the exception handler will play well with other handlers installed on the same thread. This is all outside the “testing, exclusively” use-case so just don’t do it.

There’s also something you might have noticed in the test code: it will only run on arch(x86_64). The code will run in the iOS/watchOS/tvOS simulators and natively on the Mac but it will not run on iOS/watchOS/tvOS devices. This is because the API for catching Mach exceptions is not public in these SDKs. Landon Fuller mentioned in 2013 that he had filed a radar with Apple requesting the required interfaces on iOS but nothing has come of it. I can only assume this isn’t going to change.

If you’re using open source Swift on Linux, Mach exceptions and the Objective-C runtime aren’t available. The code in this post will not work. You could probably do something similar with a POSIX SIGILL signal handler and setjmp/longjmp.

Trying to write a Mach exception handler

Writing this code was considerably harder for me than it ideally should have been. For whatever reason, Apple don’t document Mach exception handling. They don’t conceal its existence and the “mach_exc.defs” file is public API on OS X but there’s nothing in the Xcode documentation reference, man pages or on Apple’s website beyond the definitions file itself.

Making matters worse, the examples and documentation you can easily find on third-party web sites are usually for the 32-bit version of Mach exceptions which uses slightly different functions and requires slightly different parameters. When you do find examples of 64-bit Mach exception handling (like that in plcrashreporter, lldb or gdb) it’s normally catching exceptions for the whole program rather than a specific thread or for catching from another process.

I eventually got through it with a basic trial-and-error approach but it was really slow going due to the ease of writing code that appeared to succeed but did nothing useful because the mach_port_t was configured using flags only valid in 32-bit.

Processing Mach exception messages uses technology that seems completely out-of-place in the modern world. Maybe that shouldn't be surprising, given a few of the files in this particular time-capsule are listed as "Author: Avadis Tevanian, Jr., Date: 1985".

There’s no simple C interface for processing Mach messages. Instead, you get a “MiG” (Mach Interface Generator) file and you’re expected to generate a C interface from that. Interface generators are normally used when it’s possible to generate interfaces for multiple languages. Okay, so can I generate a Swift interface? No, you can only generate a C interface. So why do I need to generate the interface at all? Why isn’t the implementation in a library with a basic C interface provided? I don’t know.

Then the generated interface expects to call into C functions in your code with specific type signatures. Here we run into a Swift limitation: Swift (as of version 2.1) can’t expose a function matching a C type signature. You can pass around @convention(c) pointers to your Swift functions but you can’t publicly expose headers to those same functions. The autogenerated “[ProductName]-Swift.h” file for letting Objective-C call into Swift only exposes your public Objective-C classes (free functions are not exposed). The end result is that it’s just easiest to call into Swift via Objective-C.

I wanted to write as much of the Mach message handling in Swift as possible but I’ve had to implement the actual callback interface in an Objective-C file and call the Swift handler function from there. There’s also some Objective-C code use to catch exceptions.

The Mach exception handler: rewriting history

There’s a lot of different parts to the code but the core of it happens inside the Mach exception handler. The exception handler gives us the “state” (registers and stack) for the thread where the exception occurs. The thread is suspended at this point, so we’re allowed to play around with the state. This is what we do:

// Read the old thread statevarstate=UnsafePointer<x86_thread_state64_t>(old_state).memory// 1. Decrement the stack pointerstate.__rsp-=__uint64_t(sizeof(Int))// 2. Save the old Instruction Pointer to the stack.UnsafeMutablePointer<__uint64_t>(bitPattern:UInt(state.__rsp)).memory=state.__rip// 3. Set the Instruction Pointer to the new function's addressvarf:@convention(c)()->()=raiseBadInstructionExceptionwithUnsafePointer(&f){state.__rip=UnsafePointer<__uint64_t>($0).memory}// Write the new thread stateUnsafeMutablePointer<x86_thread_state64_t>(new_state).memory=statenew_stateCnt.memory=x86_THREAD_STATE64_COUNT

The three numbered steps are the equivalent of an assembly language call instruction. We’ve changed the state of the thread to look like the last code run was not the ud2 instruction that raised the EXC_BAD_INSTRUCTION but was instead a call to our raiseBadInstructionException function. Therefore, when the thread resumes it will run:

privatefuncraiseBadInstructionException(){BadInstructionException().raise()}

which is a straightforward throw of an NSException subclass.

Setting up a Mach exception handler

The other code I wanted to highlight was the setup of the Mach exception handler. There are two reasons for this:

Documentation and useful examples for the required functions were really difficult to find, so I’d like to publish it here for visibility.
This was some of the first Swift 2 code I ever wrote and I went crazy with Swift’s defer, try, guard, throw and catch; I’m not sure if the result is brilliant or ridiculous but at least there’s no possibility of a goto fail error here.

I’ve commented each step in the code so you should just be able to read the comments to see what the code does. Pay close attention to the order that the steps are numbered, remember: defer statements are executed in the reverse order to their setup.

Here goes:

funccatchBadInstruction(block:()->())->BadInstructionException?{varcontext=exceptionContext()varresult:BadInstructionException?=nildo{varhandlerThread:pthread_t=nildefer{// 8. Wait for the thread to terminate *if* we actually made it to the creation point// The mach port should be destroyed *before* calling pthread_join to avoid a deadlock.ifhandlerThread!=nil{pthread_join(handlerThread,nil)}}trykernCheck{// 1. Create the mach portmach_port_allocate(mach_task_self_,MACH_PORT_RIGHT_RECEIVE,&context.currentExceptionPort)}defer{// 7. Cleanup the mach portmach_port_destroy(mach_task_self_,context.currentExceptionPort)}trykernCheck{// 2. Configure the mach portmach_port_insert_right(mach_task_self_,context.currentExceptionPort,context.currentExceptionPort,MACH_MSG_TYPE_MAKE_SEND)}trykernCheck{// 3. Apply the mach port as the handler for this threadthread_swap_exception_ports(mach_thread_self(),EXC_MASK_BAD_INSTRUCTION,context.currentExceptionPort,Int32(bitPattern:UInt32(EXCEPTION_STATE)|MACH_EXCEPTION_CODES),x86_THREAD_STATE64,&context.masks.value.0,&context.count,&context.ports.value.0,&context.behaviors.value.0,&context.flavors.value.0)}defer{// 6. Unapply the mach portthread_swap_exception_ports(mach_thread_self(),EXC_MASK_BAD_INSTRUCTION,0,EXCEPTION_DEFAULT,THREAD_STATE_NONE,&context.masks.value.0,&context.count,&context.ports.value.0,&context.behaviors.value.0,&context.flavors.value.0)}trywithUnsafeMutablePointer(&context){cthrowsin// 4. Create the threadguardpthread_create(&handlerThread,nil,machMessageHandler,c)==0else{throwPthreadError.Any}// 5. Run the blockresult=BadInstructionException.catchException(block)}}catch{// Should never be reached but this is testing code, don't try to recover, just abortfatalError("Mach port error: \(error)")}returnresult}

The kernCheck function is just a little helper to grab the result code from a Mach function and if it’s an error, convert to a Swift ErrorType and throw. It’s equivalent to the sort of macro that might be used for this type of error code checking in C.

The catchBadInstruction function sets everything up but it’s the machMessageHandler function (spawned by the pthread_create call at step 4) that sits around and waits to see if a Mach message will be received. It looks like this:

privatefuncmachMessageHandler(arg:UnsafeMutablePointer<Void>)->UnsafeMutablePointer<Void>{letcontext=UnsafeMutablePointer<ExceptionContext>(arg).memoryvarrequest=request_mach_exception_raise_t()varreply=reply_mach_exception_raise_state_t()do{// Request the next mach message from the portrequest.Head.msgh_local_port=context.currentExceptionPortrequest.Head.msgh_size=UInt32(sizeofValue(request))trykernCheck{withUnsafeMutablePointer(&request){mach_msg(UnsafeMutablePointer<mach_msg_header_t>($0),MACH_RCV_MSG|MACH_RCV_INTERRUPT,0,request.Head.msgh_size,context.currentExceptionPort,0,UInt32(MACH_PORT_NULL))}}// Prepare the reply structurereply.Head.msgh_bits=MACH_MSGH_BITS(MACH_MSGH_BITS_REMOTE(request.Head.msgh_bits),0)reply.Head.msgh_local_port=UInt32(MACH_PORT_NULL)reply.Head.msgh_remote_port=request.Head.msgh_remote_portreply.Head.msgh_size=UInt32(sizeofValue(reply))reply.NDR=NDR_record// Use the MiG generated server to invoke our handler for the request and fill in// the rest of the reply structureguardwithUnsafeMutablePointers(&request,&reply,{mach_exc_server(UnsafeMutablePointer<mach_msg_header_t>($0),UnsafeMutablePointer<mach_msg_header_t>($1))})!=0else{throwMachExcServer.Any}// Send the replytrykernCheck{withUnsafeMutablePointer(&reply){mach_msg(UnsafeMutablePointer<mach_msg_header_t>($0),MACH_SEND_MSG,reply.Head.msgh_size,0,UInt32(MACH_PORT_NULL),0,UInt32(MACH_PORT_NULL))}}}catchleterrorasNSErrorwhere(error.domain==NSMachErrorDomain&&(error.code==Int(MACH_RCV_PORT_CHANGED)||error.code==Int(MACH_RCV_INVALID_NAME))){// Port was already closed before we started or closed while we were listening.// This means the block completed without raising an EXC_BAD_INSTRUCTION. Not a problem.}catch{// Should never be reached but this is testing code, don't try to recover, just abortfatalError("Mach message error: \(error)")}returnnil}

Usage

The project containing this code is available on github: mattgallagher/CwlPreconditionTesting.

The Readme.md file file contains some additional usage instructions but the short version is:

git clone https://github.com/mattgallagher/CwlPreconditionTesting.git
drag the “CwlPreconditionTesting.xcodeproj” file into your project’s file tree in Xcode
go to your testing target’s Build Phase settings and under “Target Dependencies” press the “+” button and select the relevant “CwlPreconditionTesting” target (“_iOS” or “_OSX”, depending on your testing target’s SDK)
write import CwlPreconditionTesting at the top of any test file where you want to use catchBadInstruction (Swift should handle the linkage automatically when you do this)
use the catchBadInstruction function as shown in the CwlCatchBadInstructionTests.swift tests file

Conclusion

CwlPreconditionTesting.catchBadInstruction can catch Swift precondition failures so we can accurately test partial functions. I think it’s likely that the Mach exception handler shown in this article contains the highest percentage of Swift used in any Mach exception handler ever written (although I doubt there’s much Swift competition in this area).

This post completes my “Return to Cocoa with Love and Be Completely Self-Contradictory” trilogy:

I’m being flippant, of course, since this omits the context: the respective problem domains. In these three articles, the respective problem domains are:

deployed apps
programming interface design
programming interface testing

I hope that it’s apparent that key statements in each of these articles has very different relative importance within different problem domains in programming. I also hope it’s apparent that economics and other considerations will always get in the way of good app implementations and good API designs. Sometimes we have to cover our asses with good testing.

Swift is a memory safe language so that means no undefined memory bugs, right? Not quite. Swift often needs to interact with C and C remains memory unsafe. To further complicate matters, C and Swift have different models of how memory works, leading to some subtle situations that would have been totally safe in C becoming unsafe in Swift.

In this article, I’ll look at a class of memory safety bug that occurred multiple times while I was writing the previous article. This particular bug occurs only in Release builds and can occur even when your code has no occurrence of the word “unsafe” anywhere in it.

An example

The following example shows an Objective-C test that creates a C string from an Objective-C string, copies the C string using strcpy and converts back to Objective-C, testing that the result is the same as the original.

-(void)testProblem{NSString*source=@"Hi, all!";constchar*cs=[sourcecStringUsingEncoding:NSUTF8StringEncoding];charstring1[10]={0};charstring2[10]={cs[0],cs[1],cs[2],cs[3],cs[4],cs[5],cs[6],cs[7],0,0};strcpy(&string1[0],&string2[0]);NSString*destination=[NSStringstringWithUTF8String:string1];XCTAssert([destinationisEqualToString:source]);}

There are some minor quirks in how I’ve written this but it is totally valid C/Objective-C and the XCTAssert check succeeds. Let’s recreate it as accurately as possible in Swift:

functestProblem(){typealiasTenCChars=(CChar,CChar,CChar,CChar,CChar,CChar,CChar,CChar,CChar,CChar)letsource="Hi, all!"letcs=source.cStringUsingEncoding(NSUTF8StringEncoding)!varstring1=TenCChars(0,0,0,0,0,0,0,0,0,0)varstring2=TenCChars(cs[0],cs[1],cs[2],cs[3],cs[4],cs[5],cs[6],cs[7],0,0)strcpy(&string1.0,&string2.0)letdestination=String.fromCString(&string1.0)!XCTAssert(destination==source)}

Swift is doing exactly what the Objective-C code does. The word “unsafe” does not appear anywhere in the code. The code compiles and works correctly in Debug. In Release, there is a memory safety problem and the XCTAssert fails. The problem is particularly subtle: change the order of string1 and string2 and you’ll still have memory unsafety but the test will pass.

Can you see the problem?

C’s memory model

The source of the problem is a difference in memory model between C and Swift.

In C’s memory model:

every variable has an address
every field in a structure can be reached by offsetting a pointer from another field in the structure

Despite being part of C’s memory model, these points are not necessarily true – particularly after optimization. Variables in registers don’t have addresses. Variables, fields and whole structures may be omitted if they are unused.

But the optimizer in C is required to maintain the illusion of the memory model. Maintaining this illusion requires something very specific:

If you take the address of any field of a struct then the compiler must ensure the entire struct – and any parent struct that contains it – is allocated in an addressable location and fields may not be reordered or omitted.

So when we write:

strcpy(&string1[0],&string2[0]);

in C, we are forcing the compiler to ensure that the entire string1 and string2 arrays are addressable and complete. They cannot be registers, they cannot be truncated, they cannot be eliminated.

Maintaining the C memory model has a cost – the optimizer becomes limited in what it can do. Potentially unnecessary stack allocations have a performance overhead relative to register allocations or elimination.

Swift’s memory model

In Swift’s memory model:

variables don’t necessarily have addresses except inside a scope that creates an address
a pointer to one field of a tuple or struct offers zero knowledge about siblings or parents

You can create an UnsafeMutablePointer to a variable in Swift but Swift may need to copy that variable to the stack specifically for the UnsafeMutablePointer creation and may immediately remove it from the stack afterwards.

Specifically for this article:

you cannot create a pointer to a field within a tuple or struct and then use that pointer to access sibling or parent fields in the same tuple or struct.

That is the simple answer to what’s gone wrong in the Swift version: we tried to use a pointer to the first element of a tuple to read and write to the whole tuple. Creating a pointer to the first element of a larger structure and using that as a proxy for the whole structure is common in C and C++ but it’s simply not allowed in Swift.

A destructuring optimization pass

The optimization pass that causes the problem in this article occurs in the SILSROA optimization pass in the Swift compiler. It is applied during the runSILOptimizationPasses function in Passes.cpp. This SROA (scalar replacement of aggregates) optimization destructures structs and tuples that are only used for their component fields (never used in their entirety). This lets the separate fields of the struct be moved around and optimized separately – as though they were separately declared variables.

For the example in this article, after destructuring the tuple the Swift compiler realizes that – according to the rules of the Swift memory model – only the zeroth field of the string1 and string2 tuples are ever read so the initialization of the remaining fields is marked as a “dead store” and the dead fields 1-9 are omitted from the function entirely (never allocated on the stack).

The result is that strcpy, when it advances the pointer past the first element, overwrites whatever is on the stack past string1. I’ve organized the variables so that the source of the copy (string2) is immediately overwritten – causing an obvious and testable problem – but depending on how your variables on the stack are organized, you might not overwrite anything important, making this a very difficult bug to spot.

The fix

The solution isn’t particularly difficult: if we need the entire structure then we must use the entire structure. To do this, replace the line:

strcpy(&string1.0,&string2.0)

with

withUnsafeMutablePointers(&string1,&string2){strcpy(UnsafeMutablePointer<CChar>($0),UnsafePointer<CChar>($1))}

Swift can infer the <CChar> generic parameter, I’ve just included it for clarity.

This code makes its pointers from the entire string1 and string2 tuples (not just the zeroth elements). This guarantees the entire tuple remains intact. Once inside the withUnsafeMutablePointers scope, we can recast the pointer to UnsafeMutablePointer<CChar> to satisfy the strcpy function signature.

How does this relate to the previous article?

Let’s look at the real-world examples from the previous article where this bug caused problems. The following code snippets from CwlCatchBadInstruction.swift all show examples of needing to create pointers carefully to avoid this problem.

guardwithUnsafeMutablePointers(&request,&reply,{mach_exc_server(UnsafeMutablePointer($0),UnsafeMutablePointer($1))})!=0else{throwMachExcServer.Any}

This was the example that first drew my attention to the problem. In this example, request and reply are large structures for receiving and storing Mach exception messages but the mach_exc_server function only takes a pointer to their first field. In C, you’d simply pass &request.Head, &reply.Head but in Swift, we need to correctly scope the pointer to the entire structure or every field past Head will be invalid; the reply won’t be created correctly and the Mach exception handler stops doing its job.

trykernCheck{withUnsafeMutablePointers(&context.masks,&context.ports,&context.behaviors){(m,p,b)inwithUnsafeMutablePointer(&context.flavors){// 3. Apply the mach port as the handler for this threadthread_swap_exception_ports(mach_thread_self(),EXC_MASK_BAD_INSTRUCTION,context.currentExceptionPort,Int32(bitPattern:UInt32(EXCEPTION_STATE)|MACH_EXCEPTION_CODES),x86_THREAD_STATE64,UnsafeMutablePointer<exception_mask_t>(m),&context.count,UnsafeMutablePointer<mach_port_t>(p),UnsafeMutablePointer<exception_behavior_t>(b),UnsafeMutablePointer<thread_state_flavor_t>($0))}}}

In this great, ugly behemoth, 4 generic structs wrapping 14-element tuples (context.masks, context.ports, context.behaviors and context.flavors) need to be passed to the function. In the original code that I posted for the previous article, I passed these tuples to the function by creating pointers to the zeroth elements (e.g. &context.masks.value.0). The code worked without problem (since the unspecialized nature of the generic structs prevented the SROA optimization) but the code was likely to fail if I ever turned on whole-module optimization (which could specialize the generic structs). Oops.

Since there’s no withUnsafeMutablePointers function that creates 4 pointers, we need to create the pointers in two nested layers. And Swift’s type inference refuses to infer the types of the UnsafeMutablePointer generic parameters so they’ve all been written in full. Yuck.

Conclusion

This article is really just an aside, commenting on an interesting quirk from the previous article. The lesson that we need to properly scope all pointer access in Swift is a straightforward one but it’s important to understand that it can be very subtle; practically invisible. Any unsafe pointer – including an implicitly created one – can result in unsafe memory usage if the user of the pointer tries to access memory outside the scope of the pointer’s creation.

Debugging tasks with complex or asynchronous control flow can be difficult since it renders useless one of the most important debugging tools we have: the debugger stack trace. To overcome this, we can have our tasks capture their own stack traces in debug builds to aid debugging by reconstructing the path that long-running task objects take through the program.

This article will also share my own custom code for creating and symbolicating stack traces that’s a little faster and more flexible than the existing code for doing this in Swift.

Introduction

One of the best ways to analyze a program for the source of an error is a debugger stack trace.

Imagine we create a task object in function (1), then we pass the task object into another function (2), which then passes the task object into a third function (3) which represents the end of the task. In this example, path the object takes through the program and the call stack are the same thing:

a linear, three node graph

If, by the end of function (3), we don’t have the expected result, we can set a breakpoint at the end of function (3) and the stack trace in the debugger will show functions (2) and (1) ahead of function (3) on the stack. We’ll know the path our task took through functions between creation and completion. If the task is processed purely as a series of nested calls, then the debugger stack trace at the deepest point will capture the entire history of the task and reconstructing the control flow is as simple as reading this stack trace.

Unfortunately, many tasks are not so simple. Objects are not just passed into functions but also returned from functions, eliminating their source frames from the call stack in the process. Objects are also passed between threads, eliminating all prior steps in their lifetime from the call stack.

Imagine our simple three step path for the object instead looked like this:

a non-linear, three node graph

In this diagram, function (2) calls function (1) where the task object is created. Function (1) then returns the task object to function (2) which then asynchronously passes the object to function (3).

Function (1) is lost from the stack trace as soon as it returns to function (2) then function (2) is also lost from the stack trace when function (3) is invoked asynchronously. When we find a problem at the end of function (3), we’ve lost the history of control flow for the object through our program.

How can we maintain enough information about our computation to perform useful reactive analysis at step (3) under these circumstances?

We could use a log file

Traditionally, we address this type of control flow tracking problem with a log file. We log the construction of the task object, we log when we hand the object from one thread to another and we log its conclusion. By reading through the log file, we can see what events occurred and when.

The biggest problem with a log file though is controlling the amount of information it contains. A log file is usually program-wide and records events from all tasks in a single location. If multiple tasks are occurring simultaneously, they will be interleaved in the log file. If you have many tasks in your program, the log can become unreasonably difficult to read.

Another problem with log files is that they’re only as helpful as the information you choose to write – they don’t implicitly gather information. Good logging requires a non-zero amount of time to write a good log message.

Finally, logging only captures information from the current scope. This makes logging information far less helpful in reusable code where you’re more likely to be interested in the call site than the reusable callee.

A journal of stack traces instead

Instead of using log files to track asynchronous tasks, I prefer to use a structure that I call a “task journal”. It’s not a complicated data structure. Here’s an example:

#if DEBUGvartaskJournal:[[UInt]]=[callStackReturnAddresses()]#endif

Its an array of stack traces, where a stack trace is simply an array of return addresses (UInt) that we can use to reconstruct functions and offsets within those functions at a later time.

Relative to a log file for tracking control flow, this task journal has the following advantages:

it stays out of the way until you need it (it’s not blasted into a common location)
it’s unique for the task (remains readable, even when multiple tasks are occurring at once)
it captures information about callers, making it useful in reusable classes

The initial stack trace (as captured by callStackReturnAddresses) is added on initialization of the taskJournal and when your task reaches an interesting sequence point (conditional branch, interface boundary or other handover point), you append a stack trace.

Log files have many other purposes outside of tracking control flow; I'm not proposing stack traces as a replacement for log files. Log files remain the best solution for tracking progress across an aggregation of multiple tasks. Log files can also be persistent and user-readable, allowing them to be useful outside of simple debug analysis.

Coming from a log files, it’s natural to think you might need to include a message with the stack trace but the message is usually “I am here”, something that is already implicit in the stack trace.

This doesn’t mean you shouldn’t store additional metadata with tasks – information like “this task was created with parameters X, Y and Z” or “task received error message E” is very helpful – but it should usually be stored in a custom format for your task context. The key purpose of the task journal is to have metadata that doesn’t require any custom formatting, structures or messages – you just add sequence points when you want.

Capturing the stack trace

Now, I’ve written callStackReturnAddresses in the examples above to capture the current stack trace. There isn’t a free function with that name in Swift or Cocoa.

There is a class function with that name on NSThread, so we could write:

vartaskJournal:[[UInt]]=[NSThread.callStackReturnAddresses()]

Unfortunately, neither NSThread, nor any other class in Foundation, exposes a way to easily convert addresses to symbols at a later time when needed.

There is an alternative function on NSThread:

lettrace=NSThread.callStackSymbols()

which immediately converts the return addresses to symbols but that’s really too time consuming to attach to our main task machinery (we only want to convert to symbols after an error occurs, not on our critical path where we may need to capture hundreds or thousands of stack traces per second).

For performance reasons then, we need to stick to basic return addresses and we’ll need to convert them to symbols ourselves with the C function dladdr. At a minimum: NSThread.callStackReturnAddresses isn’t going to be a complete solution.

But I have another issue with the NSThread implementation: you can’t skip over the top couple of stack frames and you can’t limit the number of captured frames. With NSThread.callStackReturnAddresses, you must always capture the whole stack and then crop the result down to the information you actually need – which forces reallocates and wastes time.

We’d have a bit more flexibility if we could use the C function backtrace but in Swift, we can’t. The backtrace function comes from the “execinfo.h” header and is implemented as part of libSystem on OS X and iOS (which all Swift programs link against) but for whatever reason, the “execinfo.h” contents are not exposed to Swift programs.

Rolling my own

While the functionality of NSThread.callStackReturnAddresses is probably enough for most people, I was dissatisfied enough that I bothered to re-implement it in Swift with the features I wanted:

/// Traverses the frames on current stack and gathers the return addresses for traversed stack frames as an array of UInt./// - parameter skip: number of stack frames to skip over before copying return addresses to the result array./// - parameter maximumAddresses: limit on the number of return addresses to return (default is `Int.max`)/// - returns: The array of return addresses on the current stack within the skip/maximumAddresses bounds.@inline(never)publicfunccallStackReturnAddresses(skipskip:UInt=0,maximumAddresses:Int=Int.max)->[UInt]

To support the function, I also wrote:

/// When applied to the output of callStackReturnAddresses, produces identical output to the execinfo function "backtrace_symbols" or NSThread.callStackSymbols/// - parameter addresses: an array of memory addresses, generally as produced by `callStackReturnAddresses`/// - returns: an array of formatted, symbolicated stack frame descriptions.publicfuncsymbolsForCallStackAddresses(addresses:[UInt])->[String]

which can take the [UInt] and produce call stack symbols in an identical format to NSThread.callStackSymbols() but in an on-demand fashion.

A simple example using these functions

I have a class that I sometimes use called DeferredWork. This object wraps a closure that is deferred to run at a later date. The class has a strict usage requirement: the work must be run at some point before the DeferredWork object is deleted but it is not automatically run; it must be done manually.

A simplified version of the class looks like this (NOTE: this code assumes DEBUG is defined in debug builds):

classDeferredWork{letwork:()->Void#if DEBUG// The task journal captures the stack trace on constructionvartaskJournal:[[UInt]]=[callStackReturnAddresses()]vardidRun=false#endifinit(work:()->Void){self.work=work}funcrun(){work()#if DEBUGdidRun=true#endif}funcrecordHandover(){#if DEBUG// We can append additional stack traces whenever we wish.// For DeferredWork, this is intended to be done on ownership changes.taskJournal.append(callStackReturnAddresses())#endif}#if DEBUGdeinit{if!didRun{// A "didn't run" condition at this point constitutes a failure.// We symbolicate and format the stack traces and trigger a fatal error.lettraces=taskJournal.map{symbolsForCallStackAddresses($0).joinWithSeparator("\n")}preconditionFailure("Failed to perform work deferred at location:\n"+traces.joinWithSeparator("\n\nWith handover at:\n"))}}#endif}

To explain how this works: when the DeferredWork class is created, it takes ownership of the work closure and agrees to run this closure at a later time. The location of the construction is saved as the first element in the taskJournal.

If the DeferredWork is handed over to another owner, the recordHandover function can be manually invoked, recording the change in ownership.

Ultimately, if there is a problem – for this class a “problem” is a failure to invoke run before deinit– we know where the work closure came from we know the chain of custody that brought it to the current location and we can rapidly track down the precise location where the failure to run occurred.

Let’s look at how this solves the problematic three-step control flow I showed above:

a non-linear, three node graph

DeferredWork.init is function (1). The recordHandover function should be called immediately before the asynchronous call at the end of function (2). The deinit function is invoked at the end of function (3) when the DeferredWork object falls out of scope. If the run function is not called at any time during function (2) or function (3) then we can immediately see the path the object took through our program and decide where we’ve made our mistake.

Implementation of callStackReturnAddresses

As with all code, I’d prefer to implement everything in Swift but Swift lacks an equivalent of to the gcc/clang builtin function __builtin_frame_address which is critical for locating the current stack frame. I’ve written a C function named frame_address that returns __builtin_frame_address(1) and exposed this function to Swift in the bridging header.

Using this function, the internals of callStackReturnAddresses are very simple. I store the result from frame_address in a struct named StackFrame (which is nothing more than few functions wrapped around a uintptr_t that stores the result from frame_address)

letframe=StackFrame(address:frame_address())

Traversing through a stack is as simple as dereferencing the frame pointer (since the value at the frame pointer is the address of the previous frame):

letnextFrameAddress=UnsafeMutablePointer<uintptr_t>(bitPattern:frame.address).memory

Getting return addresses from each frame is also easy since it is stored immediately after the frame pointer in each frame:

letreturnAddress=UnsafeMutablePointer<uintptr_t>(bitPattern:frame.address).advancedBy(1).memory

So we just do this repeatedly through the stack until we exceed the stack bounds.

The resulting callStackReturnAddresses function is about twice as fast as NSThread.callStackReturnAddresses on the basic tests I’ve tried (roughly 2 million invocations per second per core on my Mac versus 1 million invocations per second for NSThread.callStackReturnAddresses). It’s easily fast enough to gather lots of data in Debug builds – even for fairly intensive computational paths.

Usage

The project containing this code is available on github: mattgallagher/CwlUtils.

In a subdirectory of your project’s directory, run git clone https://github.com/mattgallagher/CwlUtils.git
Drag the “CwlUtils.xcodeproj” file into your own project’s file tree in Xcode
Click on your project in the file tree to access project settings and click on the target to which you want to add CwlUtils.
Click on the “Build Phases” tab and under “Target Dependencies” click “+” and add the CwlUtils_OSX or CwlUtils_iOS target as appropriate for your target’s platform.
Still on “Build Phases”, if you don’t already have a “Copy Files” build phase with a “Destination: Frameworks”, add one. In this build phase, add “CwlUtils.framework”. NOTE: there will be two frameworks in the list with the same name (one is OS X and the other is iOS). The “CwlUtils.framework” will appear above the corresponding CwlUtils OS X or iOS testing target.

Note about step (1): it is not required to create the checkout inside your project’s directory but if you check the code out in a shared location and then open it in multiple parent projects simultaneously, Xcode will complain – it’s usually easier to create a new copy inside each of your projects.

Packaging note

This and the next few code posts I write are going to be part of a single framework, CwlUtils. To be blunt, I’m not really happy with this monolithic framework approach. Ideally, much of the code I’m hoping to produce would be separate, isolated classes that would be pulled together, on-demand, by a dependency manager. However, the existing third-party solutions for dependency management in Swift don’t work elegantly in this scenario (not entirely their fault since static linking and transparent Xcode integration are not possible).

I’ll definitely revisit this when the official Swift Package Manager is shipped as part of Xcode but at the moment, a dependency-free, monolithic framework is it. If you’re only interested in a small piece of code that I’ve written, you’ll need to extract it for yourself.

Conclusion

I’ve talked about ease and benefits of recording stack traces for ongoing tasks to aid debugging. Tracking down why a result is incorrect after asynchronous invocations or other complex control flows can be completely impractical without this information so it’s a good technique to keep in mind.

In the previous article, I looked at gathering stack traces to record what your own process is doing. In debug analysis, though, information about what a process has done is only half the picture: we often need to know about the environment in which the process ran to understand why the process has behaved a certain way.

In this article, I’ll look at gathering a narrow set of basic information about the host system for the purpose of debug analysis. System information can be obtained through a number of different APIs, each with their own advantages, disadvantages and idiosyncracies but I’ll be focussing on a core function available across OS X and iOS: sysctl. The function itself is cumbersome and full of classic C quirks so I’ll also share a Swift wrapper for sysctl to make it slightly less irksome.

Introduction

As with the previous article, this article concerns debug analysis. Specifically, analyzing information about what has happened after a problem occurs to try and determine what led to the problem. This time, instead of capturing information about our own actions, I want to look at capturing information about the host system.

To illustrate what I mean, let’s look at what’s in a typical Mac OS X diagnostic report. We can look at any of the “.diag” files in the “/Library/Logs/DiagnosticReports” folder. A typical dianostic report on my computer contains:

The date
Name and version information for program that was running when the report was created
Specific details about what error or condition triggered the report
A stack trace for the program that was running
The following information about the host computer…

OS Version:      Mac OS X 10.11.3 (Build 15D21)
Architecture:    x86_64
Hardware model:  MacPro4,1
Active cpus:     8

This is the host information I want to gather.

NSProcessInfo and UIDevice

Let’s look and see if we can gather this information from any common Cocoa location.

The Foundation singleton NSProcessInfo.processInfo() has properties operatingSystemVersionString and activeProcessorCount which could give:

OS Version:      Version 10.11.3 (Build 15D21)
Active cpus:     8

the iOS-only singleton UIDevice.currentDevice() also has systemName and model which would let you amend that to:

OS Version:      iPhone OS Version 9.2.1 (Build 13D20)
Hardware model:  iPhone
Active cpus:     2

Unfortunately though, “iPhone” is not a very helpful model description (this ran on my iPhone 6s which has a true model name of “iPhone8,1”).

The “Hardware model”, as listed in common diagnostic reports, is not available through Objective-C/Swift APIs. For this information, we need to look elsewhere.

uname

Cocoa classes don’t really help us get the hardware model. Instead, let’s turn to a C function named uname. Calling uname function fills in a struct named utsname with the following values:

sysname = "Darwin"
nodename = "Matt-Gallaghers-iPhone"
release = "15.0.0"
version = "Darwin Kernel Version 15.0.0: Wed Dec  9 22:19:38 PST 2015; root:xnu-3248.31.3~2/RELEASE_ARM64_S8000"
machine = "iPhone8,1"

We have the full model name. We can combine this with information from NSProcessInfo and we have all the basic information we need, right?

Let’s try the same thing on a “MacPro4,1” running Mac OS X…

sysname = "Darwin"
nodename = "MacPro.local"
release = "15.3.0"
version = "Darwin Kernel Version 15.3.0: Thu Dec 10 18:40:58 PST 2015; root:xnu-3248.30.4~1/RELEASE_X86_64"
machine = "x86_64"

The model name is gone, replaced instead by the CPU family. So we can’t get the “Hardware model” on the Mac using uname.

Looking for the source

Why is uname inconsistent between platforms? What’s happening?

Let’s look at where uname gets its information and see what’s going on. We can view the source code for uname on opensource.apple.com.

The machine field is filled in by the following code:

mib[0]=CTL_HW;mib[1]=HW_MACHINE;len=sizeof(name->machine);if(sysctl(mib,2,&name->machine,&len,NULL,0)==-1)rval=-1;

So uname isn’t the source of the information. The value actually comes from another function named sysctl.

Of course, sysctl isn’t the source either. Following that rabbit hole all the way down gives:

sysctl gets its information from different OID handlers
sysctl_hw_generic handles the information for most of the CTL_HW OIDs, including HW_MACHINE.
PEGetMachineName handles the HW_MACHINE OID.
Depending on CPU, one of the IOPlatformExpert::getMachineName implementations (essentially a driver for the CPU) will return the machine name.

The value is hardcoded into the getMachineName function so this is the true source, although it’s largely irrelevant to us since the sysctl API remains the final layer that we can easily access.

Does this answer why uname is inconsistent between OS X and iOS? Let’s have a look at the output from sysctl on these platforms:

	OS X	iOS
`HW_MACHINE`	x86_64	iPhone8,1
`HW_MODEL`	MacPro4,1	N71mAP

"sysctl" results for selected "CTL_HW" subkeys on iOS and OS X

That “N71mAP” value is the iPhone’s CPU model – not completely the same as “x86_64” for an Intel W3520 but similar. So it looks like the inconsistency is due to HW_MACHINE and HW_MODEL results from sysctl getting swapped around – without access to the source code, I don’t if this is a mistake or a deliberate decision (it looks like an accidental mixup) but in any case, the iOS behavior has remained steady since the iOS platform was released.

With this knowledge, we can finally get a “Hardware model” value by using sysctl to get the HW_MODEL for OS X and HW_MACHINE for iOS systems.

What else can `sysctl` do?

For my own purposes, I rarely go much deeper; basic machine and model information is enough to satisfy the diagnostic information needs for which I employ sysctl. However, the HW_MODEL and HW_MACHINE values that I’ve focussed on in this article are only a small fraction of a huge range of values you can get from sysctl.

You can see almost all of these values on OS X by running sysctl -A on the command line. More than 1000 keys and values will be shown.

I say “almost” though because a few keys are not shown. Curiously, there’s a handful of values that are hidden from this list by default, including HW_MODEL and HW_MACHINE. To get the full list of values, you can download the source to the sysctl command line tool and on line 992, change the final 0 argument passed to show_var to 1. Running the result gives a few dozen extra values you can query.

On iOS, there are a couple hundred fewer sysctl values available (806 on my iPhone versus 1098 on my Mac) with many of the missing values omitted from the hardware – CTL_HW– section. While this is annoying, fortunately most of the relevant traits of an iOS system (CPU type, capabilities and clock rate) are locked to the model so it’s not a major catastrophe. In any case, be wary of the fact that sysctl on iOS may return errors (specifically, a POSIX error 2) for many values that are valid on OS X.

Improving sysctl’s interface with a nested set of wrappers

The sysctl function itself is not incredibly complicated but it is a little ugly in Swift:

publicfuncsysctl(name:UnsafeMutablePointer<Int32>,namelen:u_int,oldp:UnsafeMutablePointer<Void>,oldlenp:UnsafeMutablePointer<Int>,newp:UnsafeMutablePointer<Void>,newlenp:Int)->Int32

You pass a C array of Int32 which uniquely identifies the value you’re after and you pass a buffer via oldp that’s oldlenp long and the value will be written there (I’m going to completely ignore using sysctl to set values since it’s very rare to do that in an app).

The reason why sysctl feels so cumbersome in Swift is:

Creating an array of Int32 and passing that by pointer for the first parameter is a nuisance in Swift
You basically need to call sysctl twice: once with oldp equal to nil to get the size required for the result buffer and then a second time with a properly allocated buffer.
The result is returned as an untyped buffer of bytes which you then need to interpret correctly.
There are a few different ways in which failure can occur and we want to reduce these different ways to idiomatic Swift errors or preconditions.

For these reasons, I use a wrapper around sysctl which has the following interface:

publicfuncsysctl(levels:[Int32])throws->[Int8]

This lets you write let modelAsArrayOfChar = sysctl([CTL_HW, HW_MODEL]) to get the hardware model as an array of Int8.

Of course, a [Int8] isn’t particularly useful so I call this function from inside subsequent functions that further refine the process:

publicfuncsysctlString(levels:Int32...)throws->String

This function lets you pass in the levels as a comma separated list and converts the result to a regular Swift String so we can get the model as a string in a single line: let modelString = try sysctlAsString(CTL_HW, HW_MODEL).

An alternative overload lets you use the sysctl names instead of Int32 identifiers:

publicfuncsysctlString(name:String)throws->String

for which we’d write let modelString = try sysctlAsString("hw.model").

There’s still more that we can do: we can eliminate the try, entirely. Risking fatal errors should always be kept to a minimum but for core sysctl values, the error code path is effectively unreachable (see ‘Effectively unreachable code paths’ in Partial Functions in Swift, Part1) so forcing “no error” with try! is a valid approach for these code values.

This then leads to the final wrapper around these functions, a static struct exposing the core values, without error handling:

publicstructSysctl{publicstaticvarmodel:String}

For these convenience vars on this Sysctl struct, I’ve taken the opportunity to swap HW_MACHINE and HW_MODEL results on iOS so that its behavior is more in line with OS X. The result is that you can use Sysctl.model to get the “Hardware model” on either OS X or iOS.

Usage

The project containing this sysctl wrapper is available on github: mattgallagher/CwlUtils.

The CwlSysctl.swift file is fully self-contained so you can just copy the file, if that’s all you need. Otherwise, the ReadMe.md file for the project contains detailed information on cloning the whole repository and adding the framework it produes to your own projects.

Conclusion

The sysctl function is a fundamental kernel/machine information tool on OS X and iOS. There’s more information about network interfaces in SystemConfiguration and there’s more about attached hardware devices in the IOKit registry, but these locations don’t hold all of the same information that sysctl offers (and IOKit isn’t available on iOS anyway).

Despite the fundamental nature of this function, its clumsy interface often leaves it as an API of last resort behind Foundation classes and simpler C functions. I hope I’ve shown that if you’re interested in a description of the host’s “Hardware model”, it should be first choice.

The opening sentence of “The Swift Programming Language” chapter on “Error Handling” briefly refers to “error conditions” but beyond that, there’s no definition of what an error is, the conditions that would give rise to one or why it would require special handling.

It’s not a problem unique to “The Swift Programming Language”; search for “error” on the web and you’ll realize it is difficult to find a clear definition of this kind of error that isn’t a discussion about error representations. Error representations are just value types so they’re not ultimately very interesting. They may be returned along separate return paths to non-errors or they may be combined with non-error types into composite entities like “sum” types – but they are still just values.

To better manage errors and test the functions that involve them, we need to look at what makes errors (not just their representations) unique. I’ll give a universal definition of an “error” and look at an increasing series of complications common to errors (“unexpected”, “composite”, “non-pure”, “external”) that make them much worse than typical values and look at the implications this has on testing and subsequent handling.

Figure 1: a pure function

To begin, let’s look at a function that doesn’t produce an error or require any error handling.

Figure 1: a pure function

This is the structure of a basic “pure” function. If you’re unfamiliar with functional programming technology:

A pure function is one whose behavior depends only on its input arguments (the only data it reads must be read from its arguments) and doesn’t change program state (only temporary local variables and the “return” value may be written)

The function pre-processes arguments (the “prepare” step in the diagram), maps the input values onto output values in accordance with expected behavior (the “evaluate” step) and then packages and sends the output to the caller (the “return” step).

An example in Swift:

funcmultiplyU32toU64(a:UInt32,b:UInt32)->UInt64{let(x,y)=(UInt64(a),UInt64(b))letresult=x*yreturnresult}

In this function, the first line is the “prepare” step (processing the arguments into the desired format), the second line is the “evaluate” step (performs the logic of the function) and the third line is the “return” step (sending the results of the “evaluate” step to the caller).

By design, this function has the same semantics (logical behavior) for all possible inputs. Due to the increase from 32 to 64 bits, there is no possibility of overflow and there is no other way to trigger a special valued output in this function; any two Int32 values will produce an Int64 that is the multiplicative product of the two.

Figure 2: a piecewise function

This diagram represents another “pure” function. In some respects, it is simply a “zoom in” on the previous diagram – the “prepare” step has expanded into “check” and “partition” steps. The “evaluate” step has expanded into two possible “evaluate” steps (“A” and “B”) but otherwise, it’s the same overall structure.

The difference is that now, we have two different control-flow paths for the function, based on an analysis of the input.

An example in Swift:

funcmultiplyU32toU32(a:UInt32,b:UInt32)->UInt32?{letresult:UInt32?iflog2(Double(a))+log2(Double(b))<32{result=a*b}else{result=nil}returnresult}

This version of the multiply function guards against overflow by pre-checking the inputs and returning nil if it multiplication would overflow and only performing the multiplication if it would be safe.

The two different control-flow paths are reflected in the result which is a “sum” type – an Optional which may be either nil or a UInt32.

A sum type is a composite or algebraic type where instances must conform to one type from a set of possible types. In Swift, this is usually implemented as an enum, like Optional.

Errors are expectation failures

The multiplyU32toU32 function’s behavioral expectation is that it will apply a multiplication and return the result. However, the function has a code path and a return type that bypass that functionality entirely.

Depending on the value of input arguments, this function may fail to meet behavioral expectations. This leads us to the missing definition of an error:

An error represents a failure to meet expectations (of arguments, state or other input) where those expectations are a predicate to meeting behavioral expectations (of the statement, function or program).

There are different kinds of error. I’ve previously discussed fatal errors which are failures of the programmer to meet expectations where the program chooses to abort rather than allow the behavioral expectations to fail.

In this article, we’re looking at non-fatal errors. They still represent an expectation failure that prevents the typical behavioral expectations of the function being met however, instead of an abort, the function avoids returning its expected result and instead, a different value (which doesn’t depend on the failed expectation) is returned.

Implications of errors

It’s a seemingly minor note but we don’t generally expect we will fail to meet our expectations.

While the multiplyU32toU32 function has similar structure to the following function:

funcabsoluteValue(a:Int)->Int{letresult:Intifa<0{result=-a}else{result=a}returnresult}

The difference is that with absoluteValue, we expect we’ll need to test both paths equally.

When we call something an “error”, we’re expecting to avoid the error wherever possible. For this reason, there’s an instinct to leave the error path relatively untested – we don’t expect it to occur as often. Even if this asymmetry of usage is true, it creates a higher risk for the less tested path.

It should be clear that testing the internals of both multiplyU32toU32 and absoluteValue should follow the same pattern: both branches should be equally exercised by tests. The only reason you should apply less rigorous testing to a path is if its implementation is much simpler.

Usage frequency should not be used to dictate testing thoroughness (unless the expected frequency is “never”, in which case: maybe the path shouldn’t exist at all).

Impact of a composite type on the caller

Any composite type – like a sum type or an ad hoc equivalent that uses mutually exclusive separate values or even single scalars where different ranges reserved for communicating different behaviors – requires that the caller separate the different components and handle each appropriately. This requires either control flow constructs or functions that are aware of the nature of the composite.

Returning a sum type increases the complexity for the caller. Efficient language constructs and functions that are aware-by-design of the sum type will mitigate some of this complexity increase but a sum type will never be as simple as a continuous scalar type.

Swift has numerous language and library features for handling “sum” types (e.g. switch and if let) but this handling has a syntactic and mental overhead compared to non-sum type handling. There are other approaches that further reduce syntactic overhead by enclosing the actual handling inside the function (e.g. conditional unwrapping and flatMap) but even when the syntax is efficient, we must still consider the multiple behaviors that may be involves.

Figure 3: a non-pure function

We’ve looked at “pure errors” but the majority of errors in imperative programming results from “non-pure” inputs.

To see what that means, let’s start with an error-free, non-pure function:

Figure 1: a non-pure function

Structurally, it’s not much different to a pure function. The “prepare” is now a “fetch” stage – implying an action to bring data in from outside – but the purpose is the same: get the required arguments for the function. The difference is in the details:

funcsecondsSince(previous:NSDate)->Double{letcurrent=NSDate()letresult=current.timeIntervalSinceDate(previous)returnresult}

Like the multiply32to64 function under “Figure 1”, this function is an operation on two values. In this case however, while one of the operands (previous) is a function parameter, the other (current) pulls its value from the system clock inside the NSDate constructor. Since the clock is not a parameter to the function, this function is “non-pure”.

Non-pure statements

Technically a function is only non-pure by extension. Non-pure is really the property of a single statement. Most of the secondsSince function is pure and it is only the fetching of the current date:

letcurrent=NSDate()

that is non-pure. The diagram for “non-pure” could have been just the “fetch” step and nothing more. However, “non-pure” as a concept is not particularly interesting, it is the effect that non-pure dependencies have on our program that we want to consider – which is why it’s important to always look at what follows any “fetch” or “send”.

Impact of non-pure functions on testing

Non-pure statements have implications for thread-safety, repeatability and determinism but in many cases, the biggest difficulty with non-pure functions is that they’re difficult to test.

In typical testing, we invoke a function with known arguments and test the result to ensure the function worked. With non-pure functions, controlling the arguments doesn’t control the function.

In the “non-pure” function diagram, how do we test the “evaluate” step is working correctly if we don’t know the exact value constructed during the “fetch” step?

We either need access to the other dependencies (sometimes singletons or other global state within our program can be pre-configured) or we need to be loose enough with our testing that guesses about the result will be true. In both cases, it adds complexity and reduces the robustness of testing.

Alternately, we can move dependencies to the other side of the function interface (a process called “dependency injection”) which can stop the function being non-pure but adds interface complexity and (since it adds an additional layer of indirection) can impede performance.

Figure 4: a non-pure error

I’ve looked at “pure errors” and “non-pure functions”; no surprises that the next step is a non-pure error:

Figure 1: a non-pure error

In most respects, it is a combination of a “pure error” and a “non-pure function”. There’s a “sum” return type, two paths through the function and a dependency that isn’t part of the invocation arguments.

Here’s a simple example:

varsettings=Dictionary<String,String>()funcformattedName()->String?{letresult:String?ifletfirstName=settings["firstName"],secondName=settings["secondName"]{result="\(secondName), \(firstName)"}else{result=nil}returnresult}

Like the multiplyU32toU32 function, above, which checked an expectation and returned an error value of nil when the expectation was not met, this function checks that the “firstName” and “secondName” keys are present in the settings and returns nil when they’re not.

This example isn’t very intimidating. The dependency is local (just a global variable in the same file) so its access can be viewed and controlled easily. It’s unlikely to return nil in surprising ways.

External state

The most difficult to manage errors are those that depend on “external” state.

External state, in this case, is state that doesn’t reside within your own process. It may mutate independently of your program, it may have behavior that is not fully declared and even when the behavior is broadly understood, the full complexity may be far beyond what a simple app wants to manage.

Consider the following function:

funcsizeOfFile(path:String)throws->Int{letattributes=tryNSFileManager.defaultManager().attributesOfItemAtPath(path)letresult=(attributes[NSFileSize]as?Int)??0returnresult}

This function attempts to get the size of a file at the specified path. It’s not a complicated idea. Looking at this function, consider the following questions:

How many ways can an attempt to get the attributes of a file fail?
The attributesOfItemAtPath documentation simply states that attributes will be present – does it mean all attributes?
Is it possible that NSFileSize will be missing from the attributes?
Is returning 0 a sensible result (correct interpretation when NSFileSize is missing) or would a missing NSFileSize convey a different meaning?

Unless you have access to the Foundation source code and the source code of the filesystem, you might not be able to answer any of these questions. Errors due to external state are a nightmare because there aren’t clear answers. You can guess about likely errors and scenarios but it’s almost impossible to know.

Conclusion

It shouldn’t be as difficult to find a definition for “error” as it is. It’s not a complicated concept: an error is a failed expectation, of an input or another dependency, that leads to a failure to deliver on behavioral expectations.

It’s the common traits of an error that make them so difficult to manage:

composite (requiring handling of multiple possibilities)
unexpected (skewing testing and design considerations)
non-pure (dramatically increasing difficulty of testing)
external (preventing control or clear definition)

I’ve attempted to keep the contents of this article as simple as possible, focussing on definitions and the implicit difficulties but avoiding significant discussion about error handling or how to manage difficulties.

I hope to refer back to this article in the future as I go through the different approaches used to manage errors, test error-prone functions, handle non-pure functions, handle externalities, reduce and hide complexity in error handling and use higher level abstractions to work with composite types in the same way as normal types.

In Swift, a class type is allocated on the heap, uses reference counting to track its lifetime and can perform cleanup behaviors when it is deleted. By contrast, a struct is not separately allocated on the heap, does not use reference counting and cannot perform cleanup behaviors.

Right?

In reality all of these traits “heap”, “reference counting” and “cleanup behaviors” can be true of struct types too. Be careful though: out-of-character behaviors are a good way to cause problems. I’m going to show how a struct can end up with some of the traits you might associate with a class and show how this can be a source of memory leaks, errant behavior and compiler crashes.

WARNING: this post presents a number of anti-patterns (things you really shouldn’t do). The purpose of this article is to highlight some subtle dangers with structs and closure capture. The best way to avoid these dangers is to steer well clear of them unless you’re comfortable that you understand the risks.

Class fields in a struct

While a struct doesn’t usually have any deinit behavior, values of struct type are required (like all other values in Swift) to correctly maintain reference counts for their contents. Any reference counted field within the struct must have its reference counts correctly incremented and decremented as they are added to the struct and when they are removed, or when the struct is deleted.

We can exploit the fact that reference counted fields are decremented when a struct falls out of scope to attach behaviors to the struct as though it had a deinit method. To do this, we can use an OnDelete class:

publicfinalclassOnDelete{varclosure:()->Voidpublicinit(_c:()->Void){closure=c}deinit{closure()}}

and then use the OnDelete class as follows:

structDeletionLogger{letod=OnDelete{print("DeletionLogger deleted")}}letdl=DeletionLogger()print("Not deleted, yet")withExtendedLifetime(dl){}

which will output:

Not deleted, yet
DeletionLogger deleted

When the DeletionLogger is deleted (after the completion of the withExtendedLifetime function which keeps it alive past the preceeding print statement), then the OnDelete closure is run.

Trying to access a struct from a closure

So far, there’s nothing too strange. An OnDelete object can perform a function at cleanup time for a struct, a little like a deinit method. But while it might appear to mimic the deinit behavior of a class, an OnDelete closure is unable to do the most important thing a deinit method can do: operate on the fields of the struct.

Despite some obvious reasons why it’s a bad idea, let’s try to access the struct anyway and see what goes wrong. We’ll use a simple struct that contains an Int value and we’ll try to output the value of the Int when the OnDelete closure runs.

structCounter{letcount=0letod=OnDelete{print("Counter value is \(count)")}}

We can’t do this (error: Instance member 'count' cannot be used on type 'SomeStruct'). That’s not so strange though: we wouldn’t be allowed to do that, even on a class since you’re not allowed to access other fields from an initializer like that.

Let’s initialize the struct properly and then try to capture one of its fields.

structCounter{letcount=0varod:OnDelete?=nilinit(){od=OnDelete{print("Counter value is \(self.count)")}}}

The compiler throws a segmentation fault in Swift 2.2 and a fatal error in Swift Development Snapshot 2016-03-24.

Excellent! I’m having fun already.

Of course, I could avoid all compiler problems by doing this:

structCounter{varcount:Intletod:OnDeleteinit(){letc=0count=cod=OnDelete{print("Counter value is \(c)")}}}

or the seldom-seen capture list which, in this case, is equivalent:

structCounter{varcount=0letod:OnDelete?init(){od=OnDelete{[count]inprint("Counter value is \(count)")}}}

but neither of these options actually let us access the struct itself; both these options capture an immutable copy of the count field but we want access to the up-to-date mutable count.

structCounter{varcount=0varod:OnDelete?init(){od=OnDelete{print("Counter value is \(self.count)")}}}

Hooray! That’s better. Everything is mutable and shared. We’ve captured the count variable and there are no compiler crashes.

We should ship this code since it clearly works, doesn’t it?

Completely loopy

It clearly doesn’t work. If we run the code the same way as before:

letc=Counter()print("Not deleted, yet")withExtendedLifetime(c){}

the only output we get is:

Not deleted, yet

The OnDelete closure is not getting invoked. Why?

Looking at the SIL (Swift Intermediate Language, as returned by swiftc -emit-sil), it’s clear that capturing self in the OnDelete closure prevents self from being optimized to the stack. This means that instead of using alloc_stack, the self variable is allocated using alloc_box:

%1 = alloc_box $Counter, var, name "self", argno 1 // users: %2, %20, %22, %29

and the OnDelete closure retains this alloc_box.

Why is this a problem? It’s a reference counted loop:

closure retains the boxed version of Counter→ the boxed version of Counter retains OnDelete→ OnDelete retains closure

With this loop created, our OnDelete object is never deallocated and never invokes its closure.

Can we break the loop?

If Counter was a class, we would capture it using a [weak self] closure and avoid the reference counted loop that way. However, since Counter is a struct, attempting to do that is an error. No luck there.

Can we break the loop manually, after construction, by setting the od field to nil?

varc=Counter()c.od=nil

Nope. Still doesn’t work. Why not?

When the Counter.init function returns, the alloc_box it creates is copied to the stack. This means that the version OnDelete has retained is different from this version we can accces. The version OnDelete has is now inaccessible.

We’ve created an unbreakable loop.

As Joe Groff highlights in this thread on Twitter, Swift evolution change SE-0035 should prevent this problem by limiting inout capture (the kind of capture used in the Counter.init method) to @noescape closures (which would prevent capture by OnDelete’s escaping closure).

Copies bad, shared references good?

So the problem is that a different copy of self is returned by the Counter.init method than the version we capture during the method. What we need is to make the returned and retained versions the same.

Let’s avoid doing anything in an init method and instead set things up in static function instead.

structCounter{varcount=0varod:OnDelete?=nilstaticfuncconstruct()->Counter{varc=Counter()c.od=OnDelete{print("Value loop break is \(c.count)")}returnc}}varc=Counter.construct()c.count+=1c.od=nil

Nope: we still have the same problem. We’ve got a captured version of Counter, permanently embedded in OnDelete, that’s different to the returned version.

Let’s change that static method…

structCounter{varcount=0varod:OnDelete?=nilstaticfuncconstruct()->()->(){varc=Counter()c.od=OnDelete{print("Value loop break is \(c.count)")}return{c.count+=1c.od=nil}}}varloopBreaker=Counter.construct()loopBreaker()

The output is now:

Counter value is 1

This finally works, and we can see the state change from the loopBreaker closure is correctly affecting the result printed in the OnDelete closure.

Now that we’re no longer returning the Counter instance, we’ve stopped making a separate copy of it. There is only one copy of the Counter instance and that’s the alloc_box version shared by the two closures. We have a referenced counted struct on the heap and an OnDelete method that can access the fields of the struct at cleanup time.

Some perspective

The code technically “works” but the result is a mess. We have a reference counted loop that we need to manually break, we can only access the Counter type through closures set up in the construct function and for a single underlying instance we now have four heap allocations (the closure in OnDelete, the OnDelete object itself, the boxed allocation of the c variable and the loopBreaker closure).

If you haven’t realized by now… this has all been a big waste of time.

We could just have made the Counter a class in the first place, keeping the number of heap allocations to 1.

classCounter{varcount=0deinit{print("Counter value is \(count)")}}

Long story short: if you need access to the same mutable data from different scopes, a struct probably isn’t a great choice.

Conclusion

Closure capture is something we just write and assume the compiler will do what is required. However, capturing mutable values has a few, subtly different semantics, that may need to be understood to avoid problems. This is complicated by a couple minor design issues that we’re still waiting on Swift 3 to fix.

Remember to consider the possibility of reference counted loops when capturing struct values with class fields. You can’t weakly capture a struct so if a reference counted loop occurs, you’ll need to break the loop another way.

In any case, most of this article has looked at a completely stupid idea: trying to make a struct capture itself. Don’t do that. Capturing, like other reference counting structures, should be an acyclic graph. If you find yourself trying to make loops, it’s probably because you should be using class types with weak links from child to parent.

Finally, there are some good reasons to use an OnDelete class (I’ll be using one in the next article) but don’t start thinking it works like a deinit method – it’s predominantly for side effects (state outside the scope to which it’s attached).

The always entertaining Erica Sadun wrote an article on her blog yesterday about code indentation styles, titled “Swift Style: Are you a 4-denter?”. There was a flurry of replies on Twitter as others playfully chimed in.

While I love trolling and human misery as much as the next person, I’m always saddened when I see these discussions – not because I have a deeply vested interest in tabs over spaces or one indentation width over another – but because the existence of the argument reminds me of the fact that after more than 50 years of indented code, nothing has been done to fix the technology limitations that cause these debates.

Ignoring hypothetical technology changes to eliminate the problem, there does exist a best-practice approach on the topic of code formatting that maximizes maintainability. But I’ll give you a hint: if you’re trying to fix things on the left edge of the line, you’re looking in the wrong spot.

A quick rhetorical question…

Have a look at the following code:

funchelloWorld(){print("Hello world")}

Question: when I typed this text, did I type Unix linefeeds \n, classic Mac carriage returns \r or Windows carriage return linefeeds \r\n?

Obviously, I didn’t type anything of the sort. I hit the “return” key and my text editor did whatever was required to take me to the start of the next line.

Here is how I feel about the encoding of layout commands:

The encoding of layout commands is the responsibility of the text editor program, not the user. The fact that users are perpetually embroiled in debates about the underlying encoding is a failure between text editing programs and text formats to negotiate parameters without user intervention.

Document formats could eliminate the debate entirely

If you’re given a text document without any context, detecting the document type and the character encoding are unsolvable problems. You can use heuristics to guess but these fail regularly.

Consider an XML document declaration:

<?xml version="1.0" encoding="UTF-8"?>

XML documents clearly state their type and encoding, eliminating ambiguity and making the unsolvable problem of determining document type and text encoding into a trivially easy task.

Now, imagine if it was standard for Swift documents to start with a comment like this:

/*swift tabwidth="4" prefertabs="true"*/

This shouldn’t be such a strange thing to see: something equvalent is probably buried in your .vimrc or Xcode settings. The problem is that this information is attached to your editor, not the document. This information needs to be carried along with the document.

If common code editors supported this type of “document declaration” comment, there wouldn’t need to be a tabs versus spaces debate. The mere presence of this line would be enough for a text editor to correctly interpret any combination of tab and space characters at the start of a line and ensure any newly added indentations are correctly encoded.

Tabs versus spaces

But like I said before: in more than 50 years, no one in a position to fix the problem has done anything. I don’t anticipate change.

So given the existing constraints what do we do to keep editing and maintenance costs to a minimum? Tabs or spaces?

From a code maintenance point-of-view, it doesn’t really matter: you will need to vigilantly enforce your standard, no matter what standard you choose.

Both spaces and tabs require that all editors of a document coordinate on which standard they’re going to use. But tabs and spaces are both invisible by default which leads to mistakes. That’s the reason why hostility exists: it’s not because one is superior to another, it’s because anyone who uses a different standard to you is a potential looming threat to your coding standard.

Frankly, compilers should detect inconsistent indentation styles and emit warnings. That would help far more than any debate over choice of standard.

But detecting indentation is complicated by files where lines don’t always start on a whole number of indents…

Non-indent formatting

There’s a related issue to indentation that requires separate discussion: horizontal alignment of text in ways other than indentation.

Non-indent formatting is one of the biggest reasons why many people choose spaces over tabs. For a range of reasons, tabs are more prone to problems when used as part of non-indent formatting.

Here’s my take on the issue though: you should never use non-indent formatting in your code, regardless of your choice of tabs or spaces for indentation encoding. This advice has little to do with tabs versus spaces though and more to do with the belief that time spent formatting code is time wasted and it creates an unncessary code maintenance headache.

I use the following rule and I suggest it for all people free to choose their own coding standards:

Indent code but never horizontally align code in any other way. After indentation, never put 2 spaces together (except inside a comment block or string literal) and never use a tab character at all.

Your editor may be set up to “smart indent” or otherwise facilitate certain kinds of non-indent formatting but I assure you, there’s still a non-zero effort involved in maintaing the aesthetic. It’s much harder to coordinate this type of styling across a team than tabs/spaces settings. And it simply doesn’t help readability enough to be worth the effort.

Hard wrapping and parameter alignment

Let’s look at some code to see what I’m talking about. Here’s some code from Swift’s GenEnum.cpp:

voidstoreExtraInhabitant(IRGenFunction&IGF,llvm::Value*index,Addressdest,SILTypeT)constoverride{auto&C=IGF.IGM.getLLVMContext();autopayloadTy=llvm::IntegerType::get(C,cast<FixedTypeInfo>(TI)->getFixedSize().getValueInBits());dest=IGF.Builder.CreateBitCast(dest,payloadTy->getPointerTo());index=IGF.Builder.CreateZExtOrTrunc(index,payloadTy);index=IGF.Builder.CreateAdd(index,llvm::ConstantInt::get(payloadTy,ElementsWithNoPayload.size()));IGF.Builder.CreateStore(index,dest);}

Clearly, this file is authored with a standard that requires parameters are horizontally aligned when they’re put on new lines. Is it really time well spent? The alignment plays poorly with the 80 character line width and many of the aligned parameters aren’t actually aligned – they just hang in the middle of the line for no apparent reason until you realize they’ve bumped up against the right margin of the window and been pushed partially leftwards.

Let’s look at this same code with no formatting other than indentation:

voidstoreExtraInhabitant(IRGenFunction&IGF,llvm::Value*index,Addressdest,SILTypeT)constoverride{auto&C=IGF.IGM.getLLVMContext();autopayloadTy=llvm::IntegerType::get(C,cast<FixedTypeInfo>(TI)->getFixedSize().getValueInBits());dest=IGF.Builder.CreateBitCast(dest,payloadTy->getPointerTo());index=IGF.Builder.CreateZExtOrTrunc(index,payloadTy);index=IGF.Builder.CreateAdd(index,llvm::ConstantInt::get(payloadTy,ElementsWithNoPayload.size()));IGF.Builder.CreateStore(index,dest);}

I’ve cheated slightly: I moved the opening brace to its own line and increased the indentation to 3 spaces. Both changes are to offset the increased density from eliminating non-indent formatting.

But despite my scurrilous cheating ways, I think there’s an objective truth here: the extensive formatting effort in the first example doesn’t make it substantially easier to read. In fact, I personally I find the more consistent indentation in the second example (the lack of formatting) makes it easier to read.

Even if you find the second example harder to read, you need to ask yourself: how much easier? Because the second example comes with a very big advantage: in Xcode (or Vim and other editors), soft line wrapping with indentation can be automatic and completely eliminates any need to ever hard format code.

Relying on soft wrapping in your editor can take a little acclimatization (many developers have hard-wrapping techniques drilled into them) but this one approach alone can save dozens of hours of time per year as every edit, re-edit or refactor to a hard-wrapped line wastes a few seconds of your time.

Column formatting

Maybe it was cruel to use C++ as an example: it’s ugly from the beginning. Let’s start with something that looks pretty. The following code is from ffmpeg’s mpjpegdec.c:

staticconstAVClassmpjpeg_demuxer_class={.class_name="MPJPEG demuxer",.item_name=av_default_item_name,.option=mpjpeg_options,.version=LIBAVUTIL_VERSION_INT,};

Two columns! It looks pretty! It’s contextually appropriate for the declarative structure!

But are you really saving time? Separating the two columns helps you locate the right column more quickly but that’s not really very important.

staticconstAVClassmpjpeg_demuxer_class={.class_name="MPJPEG demuxer",.item_name=av_default_item_name,.option=mpjpeg_options,.version=LIBAVUTIL_VERSION_INT,};

Removing formatting has made the code uglier.

But you can still find the keys at a glance, with almost the same speed. The right column doesn’t pop out and it doesn’t look like a pretty formatted table but there was no time wasted in lining up columns and no need for formatting maintenance.

Is your time really well spent by aligning columns?

Conclusion

We shouldn’t need to care about tabs versus spaces – it should be coordinated between the document and the editor with no possibility for mistakes.

Sadly, that’s not how things work.

Compilers should emit warnings about inconsistent indentation used within a given file so we aren’t perpetually forced to police invisible characters.

Sadly, that’s not how things work, either.

Due to the failures of our tools to address the issue, we’re stuck with ongoing work to enforce whatever indentation standard we choose.

You can look at my code on Github and discover how I typically encode my indents. I do have a standard and I do keep to it but unless you choose to push changes to one of my projects, you shouldn’t care.

Far more interesting example to follow is how I handle whitespace elsewhere on the line: a \t character will never appear anywhere except in an indent and two or more space characters will only appear in indents, comment blocks or string literals. This avoids a whole host of problems and combined with the fact that I use soft-wrapping where possible, saves a noticeable amount of time in editing and refactoring code.

The best approach for presenting error conditions to the user is to integrate feedback into the user interface about the condition that the error indicates. This could be changing an icon, colors, adding status text or otherwise changing the state of a user interface element.

This “best approach” requires that we understand the error condition and plan for it in advance.

What if we’re still implementing the app and haven’t had time to exhaustively discover possible error conditions and plan for them? What about technically possible error conditions that we cannot trigger in our own testing making planning impractical? What about error conditions that should never occur if everything is well behaved but are still semantically possible?

We need a base level tier of error handling and reporting to the user. This base level must be very low overhead (so we can add it use it without much thought during development) and while it must report an error to the user, it is primarily intended to gather diagnostic information so the source of the error condition can be understood and facilitate fixes and other maintenance.

Introduction

First, some quick terminology:

an error condition is a failed conditional check that results in a function skipping its usual functionality and instead returning an nominated error value.
an error is a value used to report that an error condition occurred and normal functionality was skipped
error handling is code that looks for errors and performs different actions based on the presence of those errors
error reporting communicates an error result from a user task to the user

In my previous article, “Errors: unexpected, composite, non-pure, external” I focussed on the first two points and discussed how, from the perspective of the function that creates the error, the error always represents an “unexpected” condition.

In this article, I’m focussing on the latter two points. It’s important to realize that, from the perspective of handling and reporting code, errors might not be entirely unexpected.

Certainly in some cases, an “error” result from one function may represent the preferred result for the receiver (an expected error). In other cases, error handling may deal with errors by choosing a different path that satisfies requirements another way, so the error is never communicated to the user (error recovery). In most other scenarios, even if the error is not “preferred”, then at least we know how to handle the error (an anticipated error) and specialized feedback is presented to the user in a way that is aesthetically appropriate.

In most programs of reasonable complexity, there are likely to be paths through the program – even if they are rare or theoretical – where an error receives no custom handling. This means: no custom text, no custom code paths based on the error type, just bulk acknowledgement of an unanticipated error and an abort of the task in progress.

Reliable, maintainable programming requires that we always have an approach for errors, even these unanticipated errors, so that the error is routed through our error handling and reported to the user. Given that this type of error potentially represents programmer oversight, it’s important that this error include helpful diagnostic information so fixes or other maintenance can occur if required.

Mediocre error handling and reporting

Let’s look at a basic user task. In this case, an @IBAction method on a view controller (as triggered by a button press). The user action then starts a processing task which may trigger an error condition.

// An action on a view controller, triggered by a button press@IBActionfuncsomeUserAction1(sender:AnyObject){someProcessingTask1("/invalid/path")}// A processing task in the data modelfuncsomeProcessingTask1(path:String){do{letdata=tryNSData(contentsOfFile:path,options:.DataReadingMappedIfSafe)processData(data)}catchleterrorasNSError{showAlert(error)}}// A utility function (part of neither data model nor user interface)funcshowAlert(error:NSError){#if os(OSX)NSAlert(error:error).runModal()#elseletalert=UIAlertController(title:error.localizedDescription,message:error.localizedFailureReason,preferredStyle:UIAlertControllerStyle.Alert)alert.addAction(UIAlertAction(title:NSLocalizedString("OK",comment:""),style:UIAlertActionStyle.Default,handler:nil))UIApplication.sharedApplication().windows[0].rootViewController!.presentViewController(alert,animated:true,completion:nil)#endif}

I'm ignoring situations where you have a clear understanding of the error conditions and the risks and can argue that a try! or a deliberate "no-op" catch is valid. This article is about handling situations where there is a degree of uncertainty so both of these options are excluded.

If you look about, you’ll see roughly this pattern repeated in many projects. It’s the result of not wanting to think about error handling but feeling obliged to put something into the catch block.

At least this approach does something in the event of an error. That makes this approach a step up from the typical Objective-C error handling (pass nil for the NSError ** parameter and ignore the problem entirely) or an empty catch block (which is the equivalent in Swift).

But despite appearing to handle the error, this approach can have some potentially serious problems.

Non-unique error information

On its own, the error dialog produced by the previous code may be helpful or it may be useless.

site triggered error dialog

If a user reports that they’re seeing this error dialog in your program, do you have enough information to work out what’s happening and potentially fix the problem?

This error message tells us a given file couldn’t be found but it doesn’t tell us why we’re trying to open a non-existent file. Did we process the path incorrectly? Have we missed a step before getting here? How did the program get to this point? What has gone wrong to trigger this event?

If the only information we have is the error message it’s difficult to properly diagnose many situations. We have a problem that needs to be fixed but we either need to rely on intuition to find the problem or we need to reproduce the problem again in the debugger. The information we are given in this error message alone is insufficient.

For a “file not found” error like this, the default error message is more helpful than usual since the exact missing file is named (although the full path is omitted). Other errors typically have far more opaque default error messages that might be used generically across a range of different circumstances. Errors like “The file could not be played.”, “The operation couldn’t be completed.” are so broad as to be useless. And if you’re unlucky enough to get a POSIX error code, it could have been generated by lots of different functions for lots of different reasons – they are not unique.

Always propagate errors

But the amount of information reported isn’t the biggest problem. The biggest problem is presenting the error at the site where it occurs, rather than the site that triggered the overarching task. We need to propagate the error back to its origin in the someUserAction1 function, rather than trying to handling it in the middle of the task.

Without error propagation:

the user-interface may get stuck in a mid-task state
earlier stages in the task can’t attempt a retry or recover
we’re forcing the task’s presentation of the error, rather than giving the user interface controller that triggered the action a the chance to present errors in a more integrated way

In Objective-C, propagating errors was really annoying (layer, after layer passing NSError** parameters). I personally avoided this where possible – storing the error in a state structure somewhere and instead propagating BOOL through the stack. This approach had numerous problems (shared state is a maintenance headache) but Objective-C’s idiomatic error handling was just miserable.

Swift’s throws keyword is one of the best features of the language. You may need to paint a lot of functions with the throws attribute (especially if you need to throw from deep inside a hierarchy) but it makes your interfaces honest.

What do we want?

Let’s look at a better approach…

// An action on a view controller, triggered by a button press@IBActionfuncsomeUserAction2(sender:AnyObject){do{trysomeProcessingTask2("/invalid/path")}catchleteasNSError{self.presentError(e)}}// A processing task in the data modelfuncsomeProcessingTask2(path:String)throws{tryrethrowUnanticipated{letdata=tryNSData(contentsOfFile:path,options:.DataReadingMappedIfSafe)processData(data)}}

Why is this better?

Error propagation improvements

The most significant difference is that the function responsible for starting the user action – someUserAction2– is now the function responsible for presenting feedback. The highest priority in a user interface is to give feedback to the user’s actions; this control flow lets the user action that triggers the task be responsible for the display of the result.

Moving the presentation to the view controller in this way removes all view code from the model. In the first example, the data model was performing a view action (presenting user-interface feedback). This is a theoretical win for separation of concerns.

The function presentError part of Cocoa on Mac OS X but isn’t usually part of iOS. I’ve provided an implementation on UIViewController to make this work. Even if you don’t choose to use presentError, it remains a good idea to pass your errors through a relevant controller associated with your view hierarchy. This gives you the ability to use custom presentation for errors at a later time by overriding the presentation method.

Diagnostic improvements

Now, I haven’t just let the error thrown by NSData propagate directly. That would be possible and it would work but the error dialog would be the same “default” error I showed in the “mediocre” example, above. In a situation where you know the cause of the error and you know that the localizedDescription for this error fully describes it to the user, then this type of simple error reporting may be sufficient, however, this article focusses on errors that we haven’t anticipated and we don’t know if the localizedDescription will be helpful at all.

We want more information to ensure easy problem diagnosis. The rethrowUnanticipated wrapper function adds an UnanticipatedErrorRecoveryAttempter to the userInfo dictionary of the error and the dialog becomes:

presentError with unexpected error recovery handler

and clicking the “Copy details” button places the following text on the clipboard:

CwlUtils_OSXHarness/1, x86_64/MacPro4,1, Version 10.11.4 (Build 15E65), en, fr

The file “path” couldn’t be opened because there is no such file.
The error occurred at line 61 of the CwlUtils_OSXHarness/AppDelegate.swift file in the program's code.

NSCocoaErrorDomain: 260. [NSFilePath: /invalid/path, NSLocalizedDescription: The file “path” couldn’t be opened because there is no such file., NSUnderlyingError: Error Domain=NSPOSIXErrorDomain Code=2 "No such file or directory"]

1   CwlUtils_OSXHarness                 0x000000010000241c _TFC19CwlUtils_OSXHarness11AppDelegate19someProcessingTask2fzSST_ + 412
2   CwlUtils_OSXHarness                 0x00000001000025c2 _TFC19CwlUtils_OSXHarness11AppDelegate15someUserAction2fPs9AnyObject_T_ + 114
3   CwlUtils_OSXHarness                 0x0000000100002716 _TToFC19CwlUtils_OSXHarness11AppDelegate15someUserAction2fPs9AnyObject_T_ + 54
4   libsystem_trace.dylib               0x00007fff9caa107a _os_activity_initiate + 75
5   AppKit                              0x00007fff8c9ace89 -[NSApplication sendAction:to:from:] + 460
6   AppKit                              0x00007fff8c9befde -[NSControl sendAction:to:] + 86
7   AppKit                              0x00007fff8c9bef08 __26-[NSCell _sendActionFrom:]_block_invoke + 131
8   libsystem_trace.dylib               0x00007fff9caa107a _os_activity_initiate + 75
9   AppKit                              0x00007fff8c9bee65 -[NSCell _sendActionFrom:] + 144
10  libsystem_trace.dylib               0x00007fff9caa107a _os_activity_initiate + 75
11  AppKit                              0x00007fff8c9bd48a -[NSCell trackMouse:inRect:ofView:untilMouseUp:] + 2693
12  AppKit                              0x00007fff8ca05fd0 -[NSButtonCell trackMouse:inRect:ofView:untilMouseUp:] + 744
13  AppKit                              0x00007fff8c9bbbb4 -[NSControl mouseDown:] + 669
14  AppKit                              0x00007fff8cf10469 -[NSWindow _handleMouseDownEvent:isDelayedEvent:] + 6322
15  AppKit                              0x00007fff8cf1144d -[NSWindow _reallySendEvent:isDelayedEvent:] + 212
16  AppKit                              0x00007fff8c95063d -[NSWindow sendEvent:] + 517
17  AppKit                              0x00007fff8c8d0b3c -[NSApplication sendEvent:] + 2540
18  AppKit                              0x00007fff8c737ef6 -[NSApplication run] + 796
19  AppKit                              0x00007fff8c70146c NSApplicationMain + 1176
20  CwlUtils_OSXHarness                 0x0000000100002984 main + 84
21  libdyld.dylib                       0x00007fff982a45ad start + 1
22  ???                                 0x0000000000000003 0x0 + 3

How is it done?

No significant work happens in the rethrowUnanticipated function. It’s just a convenience wrapper that looks like this:

publicfuncrethrowUnanticipated<T>(@noescapef:()throws->T)throws->T{do{returntryf()}catch{throwerror.withUnanticipatedErrorRecoveryAttempter()}}

In situations where you want to selectively handle different error types or create your own errors, you would call withUnanticipatedErrorRecoveryAttempter on your error directly instead of using this convenience wrapper.

In any case, it’s withUnanticipatedErrorRecoveryAttempter that’s important. It converts the ErrorType to an NSError (if it wasn’t one already) and adds keys to the userInfo dictionary so that the error can participate in Cocoa’s error recovery system.

The mechanics of presentError and NSRecoveryAttempterErrorKey are fairly straightforward and you can read about them in Apple’s ‘Recover From Errors’ documentation. Obviously, in this case, we’re not strictly “recovering” from an error, we’re just attaching diagnostic information.

Let’s look then at how we attach all this information.

extensionErrorType{publicfuncwithUnanticipatedErrorRecoveryAttempter(file:String=#file,line:Int=#line)->NSError{// We want to preserve the "userInfo" dictionary, so we avoid "self as NSError"// if we can (since it creates a new NSError that doesn't preserve the userInfo).// Instead, we cast *via* NSObject.lete=((selfas?NSObject)as?NSError)??(selfasNSError)varuserInfo:[NSObject:AnyObject]=e.userInfo// Move any existing NSLocalizedRecoverySuggestionErrorKey to a new key (we want// to replace it but don't want to lose potentially useful information)ifletpreviousSuggestion=userInfo[NSLocalizedRecoverySuggestionErrorKey]{userInfo[UnanticipatedErrorRecoveryAttempter.PreviousRecoverySuggestionKey]=previousSuggestion}// Attach a new NSLocalizedRecoverySuggestionErrorKey and our recovery attempter// and optionsletdirectory=((fileasNSString).stringByDeletingLastPathComponentasNSString).lastPathComponentletfilename=(fileasNSString).lastPathComponentletsuggestion=NSString(format:NSLocalizedString("The error occurred at line %ld of the %@/%@ file in the program's code.",comment:""),line,directory,filename)userInfo[NSLocalizedRecoverySuggestionErrorKey]=suggestionuserInfo[NSLocalizedRecoveryOptionsErrorKey]=UnanticipatedErrorRecoveryAttempter.localizedRecoveryOptions()userInfo[NSRecoveryAttempterErrorKey]=UnanticipatedErrorRecoveryAttempter()// Attach the call stackuserInfo[UnanticipatedErrorRecoveryAttempter.ReturnAddressesKey]=callStackReturnAddresses()returnNSError(domain:e.domain,code:e.code,userInfo:userInfo)}}

Getting a little mileage out of my own code, this uses callStackReturnAddresses and symbolsForCallStackAddresses from Tracing tasks with stack traces in Swift.

When the error this function returns is passed to presentError it will use the NSRecoveryAttempterErrorKey added to the userInfo dictionary.

This UnanticipatedErrorRecoveryAttempter provides implementations for the NSErrorRecoveryAttempting methods and uses them to put an OK and Copy details button in the presentError dialog and when the Copy details button is pressed, generates a string to put on the generalPasteboard using the following function:

privatefuncextendedErrorInformation(error:NSError)->String{varuserInfo=error.userInfo// Fetch and format diagnostic information for displayletcallStackSymbols=(userInfo[ErrorTypeReturnAddressesKey]as?[UInt]).map{symbolsForCallStackAddresses($0).joinWithSeparator("\n")}??NSLocalizedString("(Call stack unavailable)",comment:"")letlocalizedDescription=error.localizedDescriptionletlocalizedRecoverySuggestion=error.localizedRecoverySuggestion??""letapplicationName=(NSBundle.mainBundle().infoDictionary?[kCFBundleNameKeyasString]as?String)??NSProcessInfo.processInfo().processNameletapplicationVersion=(NSBundle.mainBundle().infoDictionary?[kCFBundleVersionKeyasString]as?String)??NSLocalizedString("(App version unavailable)",comment:"")letlocales=NSLocale.preferredLanguages().joinWithSeparator(", ")letmachineInfo="\(Sysctl.machine)/\(Sysctl.model), \(NSProcessInfo.processInfo().operatingSystemVersionString)"// Remove already handled keys from the userInfo.userInfo.removeValueForKey(NSLocalizedDescriptionKey)userInfo.removeValueForKey(NSLocalizedRecoverySuggestionErrorKey)userInfo.removeValueForKey(NSLocalizedRecoveryOptionsErrorKey)userInfo.removeValueForKey(NSRecoveryAttempterErrorKey)userInfo.removeValueForKey(ErrorTypeRecoveryAttemptCheckKey)userInfo.removeValueForKey(ErrorTypeReturnAddressesKey)return"\(applicationName)/\(applicationVersion), \(machineInfo), \(locales)\n\n\(localizedDescription)\n\(localizedRecoverySuggestion)\n\n\(error.domain): \(error.code). \(userInfo)\n\n\(callStackSymbols)"}

This includes “machine” and “model” information, generated as described in my earlier article Gathering system information in Swift with sysctl.

Usage

The code presented in this article is part of the CwlUnanticipatedError.swift file in my CwlUtils project on Github.

The ReadMe.md file for the project contains detailed information on cloning the whole repository and adding the framework it produes to your own projects.

If you want to play with the example error handling code used in this article, it is part of the “CwlUtils_OSXHarness” and “CwlUtils_iOSHarness” targets which each produce a basic app with some buttons to trigger errors.

Conclusion

This article discussed two key points:

The importance of propagating errors all the way back to their origin
Using an error that embeds an NSErrorRecoveryAttempting implementation for diagnostic purposes

Combined with some wrapper functions, Swift’s error handling and Cocoa’s error reporting capabilities, these two techniques provide a solid base level of error management that you can start to use, even at a hastily implemented prototype stage, and iteratively replace with better error handling and reporting as needed.

In a fully-tested program, this type of error reporting is not what the user should see. The presence of line numbers and mention of the “program’s code” is a cue to any mildy savvy user that this dialog is diagnostic tool, not a deliberate feature. While the user may be able to use the information contained to resolve the problem on their own, they are hinted to provide the information under the “Copy details” button through appropriate support channels if the problem persists.

I rewrote the C++ implementation of Swift’s Demangle.cpp in Swift. My reimplementation is completely standalone and can be dropped directly into any Swift project.

On its own though, that isn’t necessarily very interesting. Demangling Swift names isn’t a very common task – most Swift/Cocoa reflection functions will return an already demangled name. You usually have to go through C functions to get a mangled name in the first place.

To make it interesting, I turned the exercise into an opportunity to compare C++ and Swift – to see if Swift could be used as a C++ replacement for relatively low level tasks like a recursive descent parser. Would C++ do things with preprocessor macros or templates and metaprogramming that can’t be recreated in Swift, leading to cumbersome workarounds? Would Swift turn out to have critical performance problems?

And if I could clear all the technical hurdles, could I improve on the C++ implementation by making it more “Swifty”? What would idiomatic Swift look like for this type of work?

As an aside to the whole language comparison, the ScalarScanner that I implemented as part of the parser is available separately and might be useful if you’re looking for an alternative to using NSScanner, NSRegularExpression or other tools poorly suited to character-by-character parsing.

A specific style of C++

I’m going to compare my opinion of what idiomatic Swift is to the style of C++ found in Demangle.cpp from the Swift project.

I’m going to refer to things that C++ does but I really mean: the style of C++ used in Demangle.cpp. Of course, there are lots of different ways of writing C++ and the C++ on display in Demangle.cpp is a very narrow subset. Demangle.cpp uses a particular, minimalist, conservative style. I will discuss some possible pragmatic reasons why Demangle.cpp is so minimalist but for now, please keep in mind: I’m not saying C++ can’t do things differently but that a mainstream C++ compiler project from experienced, professional developers chooses to do things this way.

A Swift `UnicodeScalar` scanner

Starting at the beginning, the most important tool for writing a recursive descent parser is a scanner (character, string and other token reader) for traversing the input data.

Demangle.cpp defines its own scanner class named NameSource for traversing data provided by an LLVM StringRef.

classNameSource{StringRefText;public:NameSource(StringReftext);boolhasAtLeast(size_tlen);boolisEmpty();explicitoperatorbool();charpeek();charnext();boolnextIf(charc);boolnextIf(StringRefstr);StringRefslice(size_tlen);StringRefstr();voidadvanceOffset(size_tlen);StringRefgetString();boolreadUntil(charc,std::string&result);};

We need something with similar functionality. The obvious choice for this in Cocoa is usually NSScanner (or a subclass) but for a number of reasons, NSScanner isn’t really the right tool for this type of job. In particular: it doesn’t read single characters, it doesn’t peek, you can’t inline Objective-C method invocations and it doesn’t use Swift error handling.

So I wrote ScalarScannerto behave like a Swift version of NameSource:

publicstructScalarScanner<C:CollectionType,TwhereC.Generator.Element==UnicodeScalar,C.Index:BidirectionalIndexType,C.Index:Comparable>{letscalars:Cvarindex:C.Index/// Entirely for user usepublicvarcontext:Tpublicinit(scalars:C,context:T)publicinit(string:String,context:T)publicmutatingfuncmatchString(value:String)throwspublicmutatingfuncmatchScalar(value:UnicodeScalar)throwspublicmutatingfuncreadUntil(scalar:UnicodeScalar)throws->Stringpublicmutatingfuncskip(count:Int=default)throwspublicmutatingfuncbacktrack(count:Int=default)throwspublicmutatingfuncremainder()->StringpublicmutatingfuncconditionalString(value:String)->BoolpublicmutatingfuncconditionalScalar(value:UnicodeScalar)->BoolpublicfuncrequirePeek()throws->UnicodeScalarpublicfuncpeek(ahead:Int=default)->UnicodeScalar?publicmutatingfuncreadScalar()throws->UnicodeScalarpublicmutatingfuncreadInt()throws->IntpublicmutatingfuncreadScalars(count:Int)throws->StringpublicfuncunexpectedError()->CwlDemangle.ScalarScannerError// Any `NSCharacterSet`-like functionality could be applied using thesepublicmutatingfuncskipWhileTrue(@noescapef:UnicodeScalar->Bool)publicmutatingfuncreadWhileTrue(@noescapef:UnicodeScalar->Bool)->String}

From the first line, we can see a difference between the Swift and C++ versions. Swift has a standard set of protocols and constraints for defining data providers, so it makes sense to use them. C++ could use a template parameter to define the data provider but since C++ lacks an equivalent to Swift’s protocol constraints and lacks a corresponding set of standard behaviors, the mulitple constraints for the data provider would be a confusing black box thrust upon any user of NameSource– likely manifesting in weird errors in internal headers if any requirements were not met.

Unless there’s a strong need for abstraction in C++, it’s not worth the effort. Meanwhile, in Swift, there’s no reason to make collections, sequences and other common providers concrete.

This has immediate reuse benefits for Swift. This ScalarScanner can read equally from Array<UnicodeScalar> or from a String.UnicodeScalarView and is completely reusable in any Swift project. Meanwhile NameSource os completely dependent on LLVM’s StringRef and would need to be rewritten to be used with a different data provider.

Translation is boring

I translated about 4200 lines of C++ into about 2800 lines of Swift in roughly 5 hours. All that typing, copying, regex replacing and glancing between code files is as mind numbing as you’d expect.

The answer to my question about whether there would be any difficulties is: there were no major hurdles at all. Demangle.cpp is a very clear, simple, easy to translate file.

Later, because I’m a glutton for punishment, I spent another few hours refactoring to make things a little more “Swifty” and get the line count down under 2100.

Biggest difference between Swift and C++: error handling

The initial implementation – even before any refactoring to make the code more “Swifty” – included Swift error handling instead of the C++ approach of returning a nullptr instead of a NodePointer on error. Swift error handling is such an obvious inclusion that it doesn’t require a second pass or a rethink, even during a relatively mindless translation task.

You can see how simple it is to switch from nullptr results to Swift error handling from the first line of the parser:

C++:

if(!Mangled.nextIf("_T"))returnnullptr;

Swift:

tryscanner.matchString("_T")

The difference between these two approaches is profound.

Most importantly, there are significant benefits for reliability. When Tony Hoare’s called null his “billion dollar mistake”, he was referring to the likelihood of bugs when a potentially null result is not checked for null. Looking at the Git history for Demangle.cpp reveals this happening on more than one occasion – and this doesn’t include bugs that may have been fixed prior to committing. Using nullptr to indicate failure has a real-world maintenance cost.

Also of potential benefit for maintenance: the Swift approach includes contextual information. All errors thrown by the parser include the index where the parsed failed and brief information about the type of token that couldn’t be read. If something goes wrong, there’s at least some information about why the failure occurred.

But the most visible difference in adopting Swift error handling is a significant reduction in code size. Switching to Swift error handling immediately eliminated 149 return nullptr early exit lines from the C++ version. Furthermore, Swift can happily exit from a function in the middle of an expression when a parse attempt fails instead of needing to break expressions into multiple pieces either side of early exits.

For example, the following C++:

#define DEMANGLE_CHILD_OR_RETURN(PARENT, CHILD_KIND) do { \
   auto _node = demangle##CHILD_KIND();                  \
   if (!_node) return nullptr;                           \
   (PARENT)->addChild(std::move(_node));                 \
} while (false)autometaclass=NodeFactory::create(Node::Kind::Metaclass);DEMANGLE_CHILD_OR_RETURN(metaclass,Type);returnmetaclass;

Is reduced down to:

returnSwiftName(kind:.Metaclass,children:[trydemangleType(&scanner)])

Swift is happy to run demangleType and exit before proceeding with construction of the Array or SwiftName.

C++ exceptions versus Swift error handling

C++ could have used exceptions to improve syntactic efficiency and achieve some of the same effects as Swift error handling. Why rely on error prone nullptr to communicate results in Demangle.cpp?

Many large C++ projects – Swift included – are compiled with C++ exceptions entirely disabled. Why deliberately remove a potentially useful feature from the language? The Swift developers answer this question when considering error handling options for Swift:

C++ aspires to making out-of-memory a recoverable condition, and so allocation can throw […] Since constructors are called pervasively and implicitly, it makes sense for the default rule to be that all functions can throw […] Different error sites occur with a different set of cleanups active, and there are a large number of such sites. In fact, prior to C++11, compilers were forced to assume by default that destructor calls could throw, so cleanups actually created more error sites. This all adds up to a significant code-size penalty for exceptions, even in projects which don’t directly use them and which have no interest in recovering from out-of-memory conditions.

C++ exceptions bloat the entire project – even if you only use them sparingly. The code size increase alone can start to slow your program. Then, if you actually throw an exception, the stack unwinding is between 4 and 100 times slower than simply returning a value from a the function (depending on compiler and host details). And you have to maintain rigorous RAII discipline or risk leaking memory. And you have to be careful to avoid exceptions in destructors.

Exceptions are not well-loved.

Conditionals and `switch` statements

The majority of any recursive descent parser is conditional logic based on the token encountered. Accordingly, the C++ parser is filled with functions that look like this:

if(Mangled.nextIf('M')){if(Mangled.nextIf('P')){autopattern=NodeFactory::create(Node::Kind::GenericTypeMetadataPattern);DEMANGLE_CHILD_OR_RETURN(pattern,Type);returnpattern;}if(Mangled.nextIf('a')){autoaccessor=NodeFactory::create(Node::Kind::TypeMetadataAccessFunction);DEMANGLE_CHILD_OR_RETURN(accessor,Type);returnaccessor;}// ... and so on}

The control flow here shows two tiers of conditionals, checking for 'M' at the top level and 'P' and 'a' at the second.

With Swift switch statements, we can check both tiers of conditionals simultaneously:

switch(tryscanner.readScalar(),tryscanner.readScalar()){case("M","P"):returnSwiftName(kind:.GenericTypeMetadataPattern,children:[trydemangleType(&scanner)])case("M","a"):returnSwiftName(kind:.TypeMetadataAccessFunction,children:[trydemangleType(&scanner)])// ... and so on}

Technically, C++ could pack two chars into a uint16_t and switch on that, just like I have done in Swift. With macros, you could probably make the case label vaguely readable. The point is really that this wouldn’t be idiomatic C++ so you’d need to weigh the syntactic benefit against the familiarity shock.

Let’s look at some non-idiomatic C++ and see how that usually goes. The following exerpt from Demangle.cpp is an implementation of a two tier conditional construct in C++:

enumclassImplConventionContext{Callee,Parameter,Result};StringRefdemangleImplConvention(ImplConventionContextctxt){#define CASE(CHAR, FOR_CALLEE, FOR_PARAMETER, FOR_RESULT)            \
  if (Mangled.nextIf(CHAR)) {                                      \
    switch (ctxt) {                                                \
    case ImplConventionContext::Callee: return (FOR_CALLEE);       \
    case ImplConventionContext::Parameter: return (FOR_PARAMETER); \
    case ImplConventionContext::Result: return (FOR_RESULT);       \
    }                                                              \
    unreachable("bad context");                               \
  }autoNothing=StringRef();CASE('a',Nothing,Nothing,"@autoreleased")CASE('d',"@callee_unowned","@unowned","@unowned")CASE('D',Nothing,Nothing,"@unowned_inner_pointer")CASE('g',"@callee_guaranteed","@guaranteed",Nothing)CASE('e',Nothing,"@deallocating",Nothing)CASE('i',Nothing,"@in","@out")CASE('l',Nothing,"@inout",Nothing)CASE('o',"@callee_owned","@owned","@owned")returnNothing;#undef CASE}

This was the trickiest construct from Demangle.cpp to translate into Swift because it’s so unconventional. It was an effort to work out, given possible values of ctxt on input, what were the possible results from the function.

That’s the result of doing something non-idiomatic in any language: it can take a little time to get your head around what it’s trying to do. Basically, it tries to consume the char in the left column of each CASE and if it succeeds, it will return the result from the second, third or fourth column, as determined by the ctxt parameter passed into the function.

Now, the demangleImplConvention function is used from two locations: demangleImplCalleeConvention and demangleImplParameterOrResult. The former always invokes demangleImplConvention with ctxt equal to ImplConventionContext::Callee and the latter goes through the following logic to determine the ctxt parameter:

autogetContext=[](Node::Kindkind)->ImplConventionContext{if(kind==Node::Kind::ImplParameter)returnImplConventionContext::Parameter;elseif(kind==Node::Kind::ImplResult||kind==Node::Kind::ImplErrorResult)returnImplConventionContext::Result;elseunreachable("unexpected node kind");};autoconvention=demangleImplConvention(getContext(kind));if(convention.empty())returnnullptr;

The relevant point is that the ImplConventionContext enum used as input to demangleImplConvention is largely pointless. The getContext closure maps Node::Kind::ImplParameter onto ImplConventionContext::Parameter and Node::Kind::ImplResult/Node::Kind::ImplErrorResult onto ImplConventionContext::Result. We might as well eliminating the existence of the ImplConventionContext type, using Node::Kind instead and then both of the previous two C++ code blocks reduce to the following Swift switch statment:

funcdemangleImplConvention(inoutscanner:ScalarScanner<[SwiftName]>,kind:SwiftName.Kind)throws->String{switch(tryscanner.readScalar(),(kind==.ImplErrorResult?.ImplResult:kind)){case("a",.ImplResult):return"@autoreleased"case("d",.ImplConvention):return"@callee_unowned"case("d",.ImplParameter):fallthroughcase("d",.ImplResult):return"@unowned"case("D",.ImplResult):return"@unowned_inner_pointer"case("g",.ImplConvention):return"@callee_guaranteed"case("g",.ImplParameter):return"@guaranteed"case("e",.ImplParameter):return"@deallocating"case("i",.ImplParameter):return"@in"case("i",.ImplResult):return"@out"case("l",.ImplParameter):return"@inout"case("o",.ImplConvention):return"@callee_owned"case("o",.ImplParameter):fallthroughcase("o",.ImplResult):return"@owned"case("t",_):return"@convention(thin)"default:throwscanner.unexpectedError()}}

This shows the advantage of a two value switch being idiomatic in Swift: compared to the strange series of #define, if and switch constructions in C++, this is clear, simple and readable. If you have a look, you’ll notice that the different columns from the C++ statement have become different rows in this example – it’s a two dimensional switch but the elements are still presented in a single column. (The appearance of a “t” case is not an accident; it’s an additional case rolled in from the demangleImplCalleeConvention call site.)

When we use a simple switch like this, it’s not just clearer to the programmer. It’s also clearer to the compiler. In the Swift case, you’d get warned by the compiler if you accidentally had two duplicate cases. You’re not going to get a warning about this in the C++ code… which is how the “D” case was accidentally labelled “d” for 2 years in the Swift compiler until I created a pull request to fix the problem while writing this article.

Aggregate processing

Thus far, I’ve been looking at code changes in the parser but Demangle.cpp actually includes two different parts:

the parser (which reads a mangled string into a tree of nodes) and
the printer (that serializes the node tree back to an unmangled string).

Swift error handling and switch statements were a big help in the parser but they don’t offer much to the printer. The print function is already a single massive switch statement and it doesn’t need to return any error results.

Let’s look at a typical case from the NodePrinter::print function in C++:

caseNode::Kind::TupleElement:if(pointer->getNumChildren()==1){NodePointertype=pointer->getChild(0);print(type);}elseif(pointer->getNumChildren()==2){NodePointerid=pointer->getChild(0);NodePointertype=pointer->getChild(1);print(id);print(type);}return;

On my first pass, I wasn’t able to think of anything helpful and the SwiftName.print I wrote ended up looking very similar to the C++.

case.TupleElement:ifchildren.count==1{children[0].print(&output)}elseifchildren.count==2{children[0].print(&output)children[1].print(&output)}

In my code, the SwiftName prints itself, rather than requiring a separate NodePrinter class, so the actual output stream is passed as the first parameter (named output here and it’s a Swift.OutputStreamType). Other than that difference, the code is structurally identical.

How do you make this more “Swifty”?

Finally, I concluded that idiomatic Swift implies processing collections – in this case, the children array – in a different way. Idiomatic Swift collection processing implies:

Avoid accessing elements by index.
Act on the collection unconditionally (conditionals should be inside your operators, not around them).
Don’t loop over or manually enumerate collection contents (act declaratively on the whole collection).

That lead me to this:

case.TupleElement:output.write(children.slice(0..<2)){output,childinchild.print(&output)}

Unlike the typical slice subscript in Swift, e.g. children[0..<2], this version that I’m using limits the slice range to the valid indices of the collection. This means there’s no need to check the size of the collection before creating the slice which enables unconditional behavior and reduces possibility of programming errors leading to fatal errors.

Obviously, this isn’t the typical write function on OutputStreamType but is instead a new overload which iterates over a sequence and runs a render closure to actually write to the OutputStreamType– in this case, that means recursing into the next level of the SwiftName.print

Now, it might not be clear why I’ve chosen to implement this as a function on OutputStreamType instead of simply running a forEach over the children. The answer is that this approach allows the OutputStreamType to insert prefixes, separators, suffixes or other labels into the stream between iterations of the loop making comma separated lists a simple task.

For example, the following C++:

caseNode::Kind::ProtocolConformance:{NodePointerchild0=pointer->getChild(0);NodePointerchild1=pointer->getChild(1);NodePointerchild2=pointer->getChild(2);print(child0);Printer<<" : ";print(child1);Printer<<" in ";print(child2);return;}

becomes:

case.ProtocolConformance:output.write(children.slice(0..<3),labels:[nil," : "," in "]){$1.print(&$0)}

Not all cases in the print function are this simple. In Demangle.cpp, many involve iteration over part of the children, terminated by different conditions, where rendering isn’t a simple recursive call to printing the child within.

An example of one of the tricker cases is DependentGenericSignature:

caseNode::Kind::DependentGenericSignature:{Printer<<'<';unsigneddepth=0;unsignednumChildren=pointer->getNumChildren();for(;depth<numChildren&&pointer->getChild(depth)->getKind()==Node::Kind::DependentGenericParamCount;++depth){if(depth!=0)Printer<<"><";unsignedcount=pointer->getChild(depth)->getIndex();for(unsignedindex=0;index<count;++index){if(index!=0)Printer<<", ";Printer<<archetypeName(index,depth);}}if(depth!=numChildren){Printer<<" where ";for(unsignedi=depth;i<numChildren;++i){if(i>depth)Printer<<", ";print(pointer->getChild(i));}}Printer<<'>';return;}

The logic here is difficult to dramatically simplify but if the operators you use to act on your collections aren’t composable (in particular, able to wrap recursively around each other), then you probably haven’t designed them correctly.

The nested loops over children from the C++ example are expressed as nested calls to OutputStreamType.write and the loop condition on the outer loop is expressed as a filter on the input sequence.

case.DependentGenericSignature:letfilteredChildren=children.lazy.filter{$0.kind==.DependentGenericParamCount}.enumerate()varlastDepth=0output.write(filteredChildren,prefix:"<",separator:"><"){o,tinlastDepth=t.indexo.write(0..<t.element.indexFromContents(),separator:", "){$0.write(archetypeName($1,UInt32(t.index)))}}letprefix=(lastDepth+1<children.endIndex)?" where ":""lets=children[(lastDepth+1)..<children.endIndex]output.write(s,prefix:prefix,separator:", ",suffix:">"){$1.print(&$0)}

No loops, no conditionals (except for a ternary operator), no direct accesses by index and the collection is processed by aggregrate statements (in two parts but that’s fine).

Why does pragmatic C++ avoid abstraction?

The abstractions I used to improve the printing could easily be implemented in C++. But despite the standard library including a <functional> header (its included in Demangle.cpp although I’m not exactly sure where it’s used), C++ is not typically used in a functional fashion.

Part of the problem, as I’ve already discussed, is that a lack of protocols and contraints in C++ makes actions on different kinds of collection difficult. Accidentally providing a type that doesn’t meet the requirements of a template parameter and getting an esoteric error buried deep inside a header you didn’t write is frustrating and slow to resolve.

But there’s more to it than that. If you look though large codebases like llvm and Swift, you’ll see that they are very spartan. While Demangle.cpp contains a few minor C++11 niceities, in general, most of the code could have been written in the late 1990s. Most types have no template parameters. Most iteration is done using manual indexes. There’s no “map”, “reduce”, “filter” or other aggregate processing. And there’s no significant library of reusable functionality either – Demangle.cpp is part of a large compiler project but it has to implement its own token scanner.

Why does “best practice” in C++ often involve writing C++ like it’s C?

I’m often reminded of one of C++’s founding principles: you should only pay for those features that you use. The unfortunate corollary is that you’re forced to pay for everything you use.

C++’s compilation model requires substantial header includes and the more features you use, the slower and more painful compilation becomes. Templates must appear in the include path, rather than a separate compilation unit, adding significant complexity for every feature.

The end result is that large C++ projects work better when each file keeps to itself and minimizes includes.

I hope these problems don’t end up affecting Swift. In general, it feels like there’s less resistance in Swift towards using language features and abstractions because less baggage is brought along. No need for headers or declaring before usage. Swift uses easier-to-reason about generics rather than templates, Swift protocols rather than templates again and Swift thankfully uses nothing rather than metaprogramming. Additionally, the fact that reference counting, algebraic types and optionals are all part of the language, rather than bolted on using library features makes abstractions cleaner, less leaky and simpler.

A quick comparison of performance

I don’t want to over emphasise the importance of performance here, since I’m fairly sure Demangle.cpp wasn’t written with performance as a primary consideration and neither was my Swift implementation. But I was curious anyway and I’m sure other people would be, also.

I extracted the full set of mangled strings from Apple’s manglings.txt test file and ran both a parse and print to string 10,000 times in a loop. For Apple’s Demangle.cpp version (C++), the loop was:

for(llvm::StringRefname:names){for(inti=0;i<10000;++i){swift::Demangle::NodePointerpointer=swift::demangle_wrappers::demangleSymbolAsNode(name);std::stringstring=swift::Demangle::nodeToString(pointer,options);}}

For my own CwlDemangle.cpp, the loop was:

// NOTE: `input` here is `Array<UnicodeScalar>`, not `String`, to avoid conversion costs inside the loop.forinputininputs{for_in0..<10_000{_=(try?demangleSwiftName(input).description)??input.reduce(""){return$0+String($1)}}}

I built the C++ code with the Swift project’s build script with the “ReleaseAssert” settings (which is -O2, I think) and Swift at -O.

The C++ version took 21.202s and the Swift version took 17.110s (Swift was about 20% faster).

Please don’t take this to mean that Swift is 20% faster than C++. That’s not what these numbers mean at all. In fact, at a glance it appears that all these numbers show is that the C++ version used a separately allocated pointer for each NodePointer but the corresponding SwiftName in the Swift version is just a plain struct. Yes, the C++ needs the separate allocation to avoid a copy-constructor on Node when used with std::vector but an optimized copy constructor (or an alternative to std::vector) could be easily used if performance was a serious issue.

Let’s take a quick look at the top 10 “Time Profiler” results with “Invert call tree” selected:

C++

    Running Time    Self (ms)  Symbol Name
3766.0ms   17.2%    3766.0     szone_free_definite_size
1830.0ms    8.4%    1830.0     szone_malloc_should_clear
1741.0ms    7.9%    1741.0     tiny_malloc_from_free_list
1607.0ms    7.3%    1607.0     std::__1::__shared_weak_count::__release_shared()
1422.0ms    6.5%    1422.0     _os_lock_spin_lock
1232.0ms    5.6%    1232.0     tiny_free_list_add_ptr
1147.0ms    5.2%    1147.0     szone_size
1079.0ms    4.9%    1079.0     tiny_free_list_remove_ptr
876.0ms     4.0%     876.0     std::__1::__shared_weak_count::__add_shared()
679.0ms     3.1%     679.0     get_tiny_free_size

Swift

    Running Time    Self (ms)  Symbol Name
2115.0ms   12.3%    2115.0     _swift_release_(swift::HeapObject*)
1557.0ms    9.0%    1557.0     _swift_retain_(swift::HeapObject*)
1159.0ms    6.7%    1159.0     szone_free_definite_size
771.0ms     4.5%     771.0     szone_malloc_should_clear
770.0ms     4.5%     770.0     tiny_free_list_add_ptr
769.0ms     4.4%     769.0     _StringCore._claimCapacity(Int, minElementWidth : Int) -> (Int, COpaquePointer)
716.0ms     4.1%     716.0     tiny_malloc_from_free_list
645.0ms     3.7%     645.0     szone_size
569.0ms     3.3%     569.0     _os_lock_spin_lock
568.0ms     3.3%     568.0     _os_lock_handoff_lock

Bluntly: neither case is particularly optimized. These numbers represent roughly 100,000 names parsed and printed per second but optimized versions of these parsers would be an order of magnitude faster and these memory allocation, deallocation and reference counting calls would barely feature.

Despite C++ coming in slower, I would expect it to be easier to eliminate reference counting and dynamic string and array storage from C++. The reason is that these features are largely opt-in in C++ but opt-out in Swift. A couple days optimizing both codebases would probably see the performance lead swing the other way as the time spent in _swift_release_ and _swift_retain_ would be much harder to eliminate (not impossible, just harder) than the other calls in these lists.

In any case, I think it’s obvious that for parsers of this nature, C++ and Swift are “similar enough” in performance.

Some minor points against Swift

The reason retains and releases are difficult to eliminate in Swift is that there’s no “unique pointer” (all reference types are shared pointers). But retains and releases are usually triggered implicitly depending on the @owned or @guaranteed nature of function parameters (attributes you might not know about unless you’ve read Swift’s Calling Convention carefully). Dancing around implicit rules like this can vary from difficult to impossible. It would be good to get some help in Instruments for locating lines of code that trigger retains and release and report why they are occurring (the variables and the rules involved) since reading the assembly and trying to connect it back to the Swift code is slow and error prone.

Changing topic from performance to safety, my implementation includes the methods: Array.at and Array.slice. These extensions to Array include non-fatal bounds checks on Array accesses. With Swift’s emphasis on safety in other cases, I’m surprised that this functionality is not part of the standard library. I feel like I need to include these in every project.

Another place where the Swift standard library could add a little more functionality is with UnicodeScalar. Converting between ASCII numerals stored in UnicodeScalar and UInt32 is a real pain. Being explicit is fine but since UnicodeScalaris a UInt32 under the hood, it would be nice if the two could be used more naturally together for integer arithmetic so I don’t feel like need to unwrap and rewrap the UnicodeScalar twice in every expression.

It would also be nice to have an ASCIILiteralType in Swift (or something else for constructing UInt8 from a string literal). The absence of such a type is the biggest reason I opted to use UnicodeScalar as my underlying element type, even though the code only ever processes UTF8 code units and is therefore wasting 25 bits of storage out of every 32. It doesn’t obviously impact performance here but if the source data was multiple kilobytes instead of a few dozen bytes, it might cause a bigger problem.

Finally, despite having an OptionSetType for managing bitfields, defining their values is still a nuisance in Swift. It would be nice to have an enum that could constrain values to powers of two by default (while allowing certain cases to have non-power-of-two values for multi-value masks) and could work together with OptionSetType to provide values/names for bits in the set.

Usage

The ScalarScanner presented in this article is contained in the CwlScalarScanner.swift file in my CwlUtils project on Github.

The implementation is fully self-contained, so you can just copy the file if you wish.

Otherwise, the ReadMe.md file for the project contains detailed information on cloning the whole repository and adding the framework it produces to your own projects but the file is fully self-contained so you can just download the “CwlScalarScanner.swift” file on its own and add it to an existing project, if you wish.

The CwlDemangle implementation presented in this article is contained in the CwlDemangle.swift file in my CwlDemangle project on Github.

LICENSE NOTE: I usually release my code under an ISC-style license (an Expat/MIT-style license equivalent which I prefer for its simple, readable phrasing) but since the CwlDemangle.swift code is derived from Apple’s Demangle.cpp, I have released it under the same Apache-License 2.0 with runtime exception.

The implementation is fully self-contained, so you can just copy the file if you wish. The whole repository contains full tests for the demangle implementation (tests for ScalarScanner are in the CwlUtils repository, above).

Conclusion

Swift worked pretty well for writing a recursive descent parser. I expected performance gotchas in Swift compared to C++ but it worked out better than expected. I expected some C++ constructs, abstractions or library features that wouldn’t translate easily to Swift but it turns out such things are generally avoided in C++.

By adding Swift error handling, some multi-dimensional switch statements and some more functional abstractions over collections, I was able to produce code in Swift that passes all the same tests as the C++ version, has less possibility of unchecked nullptr crashes and array index issues, produces significantly better errors on parse failures, is less than half the number of lines and is nearly 20% faster.

None of this should be seen an attack on the Swift developers or their code quality. Quite the opposite: Demangle.cpp’s conservative, simple style is very easy to read, was very easy to adapt and has a universality to it that would make it legible to developers familiar with any C-influenced language. The code sometimes looks more like C than C++ but there are pragmatic reasons why that might result in a better experience in large codebases. I also suspect my more-densely packed, hastily written Swift code would be a little harder for outsiders to follow.

And, of course, it’s the Swift developers who developed the tools in Swift that I’m using to iteratively improve upon the tools that are used to develop Swift.

Um, what?

Appendix: why reimplement Demangle.cpp?

In general, no Swift API will give you a mangled name but if you interrogate your process through C APIs (or anything that isn’t Swift or Cocoa), you’ll see names that look like this:

_TFC19CwlUtils_OSXHarness11AppDelegate15someUserAction2fPs9AnyObject_T_

This particular name is taken from the stack trace shown in the previous article. This name ultimately comes from the C dl_info function (called as part of the symbolsForCallStackAddresses function I introduced in Tracking tasks with stack traces in Swift) which is not Swift name aware.

The demangled form of the name is:

CwlUtils_OSXHarness.AppDelegate.someUserAction2 (Swift.AnyObject) -> ()

The demangled name is far more readable. Where possible, I’d prefer to read the demangled name.

It’s easily possible to demangle this on the command-line. Assuming Xcode is installed, then the following will do it:

MacPro:~ matt$ xcrun swift-demangle _TFC19CwlUtils_OSXHarness11AppDelegate15someUserAction2fPs9AnyObject_T_
_TFC19CwlUtils_OSXHarness11AppDelegate15someUserAction2fPs9AnyObject_T_ ---> CwlUtils_OSXHarness.AppDelegate.someUserAction2 (Swift.AnyObject) -> ()

but this requires that a recent version of Xcode is installed on the target machine and you want to launch a whole process just to demangle a single name – neither are necessarily true on an end-user device.

Now, the Swift standard library kinda sorta contains a function to perform name demangling. The non-public function stdlib_demangleName exists in the standard library. Search your compiled programs and this identifier likely appears in all of them. Unfortunately, this function is non-public. Why? My guess is that you’re not supposed to encounter a mangled name if you’re acting in an idiomatically Swift way and the Swift developers are worried about backwards/forwards compatibility issues as the Swift mangled name grammar changes from version to version.

In any case, if we want name demangling, we need to do it ourselves. How complicated could it be?

The full grammar for mangled Swift names appears at the bottom of The Swift ABI document. It’s roughly 10 screens in my browser just for the BNF grammar – it’s not exactly complex but it’s sizeable.

Faced with the prospect of implementing a 2000 line parser and serializer from scratch, I balked and instead decided to translate the Swift Demangle.cpp from C++ into Swift. This required a lot of typing but much less thinking than coding from scratch.

Why not simply use the Swift C++ version with a C wrapper, just like stdlib_demangleName does in the Swift standard library? For a few reasons, but mostly: I wanted to see what it was like to write a parser in Swift.

What’s the best general purpose random number generating algorithm available?

In this article I’ll present a RandomGenerator protocol use it to implement 8 different random number generating algorithms. Implementations will include wrapers around Mac/iOS built-in algorithms, my own implementations in Swift of some popular algorithms and some corresponding C implementations for comparison. The implementations of the RandomGenerator protocol all seed from /dev/random by default, can generate data of arbitrary size (although I’ll focus on 64-bit integer generation) and offer conversion to Double (preserving up to 52-bits of randomness in the significand).

As an aside, my Swift implementation of the Mersenne Twister ended up 20% faster than the official mt19937-64.c implementation. Curious to understand what I had done, I ended up “fixing” the C version to be just as fast as the Swift version. Yes, it’s true: with a little tuning, C can be just as fast as Swift.

Welcome to C with love.

Use case

My primary use for random numbers is in fuzz testing; deliberately sending mixed and garbled inputs to my functions (to look for data handling errors) or running large numbers of threads with different timing offsets and data sizes (to look for timing or thread-safety bugs).

It’s difficult to know exactly what performance or quality I require from random numbers in this scenario – most “good quality” options would probably suffice – but what I do require is:

no shared global data (since I run multiple tests in parallel)
the ability to set the initial seed (since I want to reproduce bugs when I find them)

Historically, I’ve used a C implementation of the Mersenne Twister. I don’t have any particular problems with it but the algorithm is nearing its 20th birthday so I was curious to see what else was around.

Built-in sources of randomness on Mac and iOS

The C standard library on the Mac and iOS contains a few different functions for random number generation:

rand()
random()
[d|e|j|l|m]rand48() et al
arc4random()
/dev/random

/dev/random

If you need cryptographically secure random numbers, the only option you should consider is /dev/random.

On some OSes, /dev/random can block waiting for random bits to accumulate from hardware entropy sources (so you may be encouraged to use /dev/urandom instead) but on Mac and iOS, both /dev/random and /dev/urandom are identical and use the Yarrow algorithm in conjunction with bits accumulated from hardware entropy sources to generate non-blocking data.

The problem is that reading from /dev/random is slow. In my testing, it is between 100 and 1000 times slower than other generators and uses additional system resources on top of the userspace resources of typical random number generators.

The end result is that /dev/random, while useful for setting initial values, is a poor choice for a general-use random number generator.

Linear congruential generators

The rand, random and the various rand48 functions are all variations of linear congruential generator. This means that invocation mutates the internal state according to the following equation:

X_{n + 1} = (a X_{n} + c) mod m

In the case of rand and random, a is 16807, c is 0 and m is $2^{31 - 1}$ . Looks pretty simple, doesn’t it? It is. These algorithms don’t give a good distribution, don’t have a very long period and can get worse with poor seed choices.

Even ignoring quality considerations, I consider all of these functions – except jrand48 - to be useless due to the following problems:

they provide just 31 bits of randomness (can’t simply run them twice to get 64-bits)
they use shared global state (can’t be used independently from multiple threads)

Even with jrand48, the algorithm needs to be run twice to generate a single 64-bit integer. This causes it to be roughly half the speed it should ideally be.

The “allegedly” RC4 generator

The standard “good” random number generator I see used in Swift is arc4random. It is generally high quality but it is slow due to the use of large amounts of state, locks on the global data and periodic mixing of additional entropy from /dev/random.

The “arc4” in the name is because the algorithm is “allegedly” compatible with the RC4 algorithm developed by RSA Labs. Like the official RC4, arc4random was originally intended for use as a cryptographic random number generator. While the arc4random implementation in Mac/iOS doesn’t suffer the same problems that made the RC4 implementation in WEP vulnerable to trivial attacks, the output of arc4random should be treated as no longer cryptographically secure.

What does that mean?

arc4random provides good quality but is about 5 times slower than algorithms of equivalent quality. It uses global state and can’t be directly seeded for debugging purposes or other situations requiring repeatability.

Other general-purpose random number generators

I’m going to look at some high quality, simple, fast random number generators, implement them in Swift and see how they compare.

Linear feedback shift register

A linear-feedback shift register (LFSR) is just a series of shift and XOR operations. Wikipedia has a very clear animated GIF of the operation generating a random sequence from a 4-bit number.

By carefully selecting the feedback points, the shift amounts and combining multiple registers, you can get very long cycles, good distribution and low predicatability. Researcher Pierre L’Ecuyer gives some tables of values for “Tausworthe” style linear-feedback shift register generators in his paper Tables of Maximally-Equidistributed Combined LFSR Generators.

In the code I present as part of this article, I give two variants, Lfsr176 and Lfsr258– with periods approximately equal to $2^{176}$ and $2^{258}$ respectively.

Mersenne Twister

The Mersenne Twister was a huge advancement when it was introduced by M Matsumoto and T Nishimura in 1997 and it remains the random generator that newer non-cryptographic generators are compared against (the algorithm isn’t cryptographic because you can observe just 624 values and from that point, predict the sequence).

So the Mersenne Twister fails the “unpredictable” test but it is well tested, has a good distribution, very long period and is fairly fast. But there are some caveats.

In a tight loop, the Mersenne Twister is within a factor of 2 of the fastest algorithms tested. However, the standard Mersenne Twister uses 2496 bytes of internal storage. That might not seem like a lot of space on a modern computer but it is big enough to put additional burdern on your L1 cache.

On the quality front: the Mersenne Twister has some known problems with entering “zero” states (situations where it’s internal state contains a large number of zeros and the generator get “stuck”).

Well equidistributed long-period linear

The “Well equidistributed long-period linear” algorithm (WELL) comes from F. Panneton, P. L’Ecuyer, and M. Matsumoto – two out of three of these names have already appeared in this article (it’s a small academic cabal, I guess). Like the Mersenne Twister, it is based on linear recurrences modulo 2 over a finite binary field. The WELL algorithm handles poor seeds and states better than the Mersenne Twister (“escaping states with a large number of zeros”). It is otherwise similar quality to the Mersenne Twister and uses significantly less memory.

Sadly, only 32-bit variants of the algorithm exist. This means it needs to run twice to produce the 64-bit values that I’m using for performance comparison. The end result is that the implementation I used is about twice as slow as equivalent quality algorithms. The much smaller 64 byte internal state of the WELL algorithm – versus the 2496 byte state of Mersenne Twister – might mean that the real difference is somewhat less, depending on your program, although it’s difficult to tell.

Xoroshiro

There has been a battle in the last three years between Melissa O’Neill, creator of PCG and Sebastiano Vigna, creator of xorshift+ and xorshift*. Both authors have relied on automated random numnber quality tests to experiment with different variations on themes to find heavily refined versions of their respective approaches – O’Neill applying block ciphers on top of linear congruential generators and Vigna performing variations on xorshift.

The latest release from Sebastiano Vigna (in collaboration with David Blackman) is Xoroshiro – a specialized linear-feedback shift register. Xoroshiro claims to be the fastest algorithm to fair well on the TestU01 set of quality tests for random number generators and claims to provide a better statistical distribution on the PracRand tests than previous xorshift algorithms.

While the period of the generator is fairly low ( $2^{128}$ ), it’s easily high enough for any common purpose.

Implementations

I defined the following protocol and provided implementations of the protocol for each of the algorithms:

publicprotocolRandomGenerator{publicinit()mutatingpublicfuncrandomize(buffer:UnsafeMutablePointer<Void>,size:Int)mutatingpublicfuncrandom64()->UInt64mutatingpublicfuncrandom32()->UInt32mutatingpublicfuncrandom64(max:UInt64)->UInt64mutatingpublicfuncrandom32(max:UInt32)->UInt32/// Generates a double with a random 52 bit significand on the half open range [0, 1)mutatingpublicfuncrandomHalfOpen()->Double/// Generates a double with a random 52 bit significand on the closed range [0, 1]mutatingpublicfuncrandomClosed()->Double/// Generates a double with a random 51 bit significand on the open range (0, 1)mutatingpublicfuncrandomOpen()->Double}

and a derived protocol:

publicprotocolRandomWordGenerator:RandomGenerator{associatedtypeWordTypemutatingfuncrandomWord()->WordType}

The advantage with these protocols: a generator need only implement randomize(_, size:) or randomWord() and all the remaining functions are automatically provided (although I’ve implemented optimized versions of random64() in all cases to ensure a fair comparison).

The implementations are:

Arc4Random– 64 bit integers generated with arc4random_buf
DevRandom– data read from /dev/random
JRand48– 64-bit integers generated by two invocations of jrand48
Lfsr176– a 3 register linear-feedback shift register with period $2^{176}$
Lfsr258– a 5 register linear-feedback shift register with period $2^{258}$
MersenneTwister– a Swift implementation of MT19937_64
MT19937_64– the Mersenne Twister as generated by mt19937-64.c by Takuji Nishimura and Makoto Matsumoto.
WellRng512– 64-bit integers generated by two invocations of the WELL algorithm
Xoroshiro– 64-bit integers generated by this custom xor/shift generator
xoroshiro128plus– the original C implementation of Xoroshiro by David Blackman and Sebastiano Vigna

and

ConstantNonRandom– baseline that returns a constant 64-bit number

Performance

All algorithms were used to generate 100 million 64-bit UInt64 values. CwlRandom.swift was statically linked with the testing bundle (rather than dynamically linked through the CwlUtils.framework) for performance reasons. The MT19937_64 and xoroshiro128plus implementations were implemented in the same file as the tests so the extra function call layer for these tests would be inlined away.

These are the timing results:

	Seconds
`DevRandom`	123.50
`Arc4Random`	4.651
`WellRng512`	1.350
`JRand48`	1.107
`Lfsr258`	0.872
`MT19937_64`	0.643
`MersenneTwister`	0.517
`Lfsr176`	0.511
`xoroshiro128plus`	0.352
`Xoroshiro`	0.345
`ConstantNonRandom`	0.221

Time taken to generate 100 million 64-bit values

On my 2.67Ghz Nehalem Mac Pro, performance ranges from 300 million per second for Xoroshiro to 813 thousand per second for DevRandom.

MersenneTwister in C versus Swift

So the Swift implementation of MersenneTwister ended up 20% faster than the mt19937-64.c C implementation. Hooray, Swift is the fastest!

Why?

The Mersenne Twister is made up of two parts:

The “xor-shift-mask” steps performed on every iteration
The “twist” steps performed every 312 iterations

The “xor-shift-mask” has very little wiggle room. Here’s the C implementation:

x=ctx->state[ctx->index++];x^=(x>>29)&0x5555555555555555ULL;x^=(x<<17)&0x71D67FFFEDA60000ULL;x^=(x<<37)&0xFFF7EEE000000000ULL;x^=(x>>43);returnx;

There is a minor performance issue with this code but there’s no significant room for refactoring here. The real difference between my Swift code and the C is in the “twist” steps.

A simple Swift implementation of “twist” would look like this:

foriin0..<n{letx=(state[i]&upperMask)|(state[(i+1)%n]&lowerMask)intxA=(x&1==0)?(x>>1):((x>>1)^0xB5026F5AA96619E9state[i]=state[(i+(n/2))%n]^xA}

This code would work but unfortunately, % is not a fast operation and at 100 million per second speeds, the ternary conditional operator ?: is also too slow (don’t worry about the n / 2 part, that’s a constant and is optimized away). Avoiding these requires restructuring the loop so that they’re not required.

The mt19937-64.c implementation breaks the loop apart into two halves and follows up with an epilogue to handle the final position:

for(i=0;i<(n/2);i++){x=(ctx->state[i]&upperMask)|(ctx->state[i+1]&lowerMask);ctx->state[i]=ctx->state[i+(n/2)]^(x>>1)^mag01[x&1];}for(;i<n-1;i++){x=(ctx->state[i]&upperMask)|(ctx->state[i+1]&lowerMask);ctx->state[i]=ctx->state[i-(n/2)]^(x>>1)^mag01[x&1];}x=(ctx->state[n-1]&upperMask)|(ctx->state[0]&lowerMask);ctx->state[n-1]=ctx->state[(n/2)-1]^(x>>1)^mag01[x&1];

(I’ve renamed the variables and reformatted this to look more like my Swift code, to aid comparison.)

This code contains no division or modulo (remember: the n / 2 is a constant and optimized away) and no conditionals or ternary operators (except the loops themselves). But there’s a weird mag01 array (which is 0 at index 0 and 0xB5026F5AA96619E9 at index 1) and worse: the ctx->state is fully iterated multiple times since the ctx->state[i + (n / 2)] and ctx->state[i - (n / 2)] accesses each walk the values from the other loop.

I took a different approach to optimize the “twist” code for my implementation. My code walks both halves of the loop simultaneously, using two different indexes, offset by n / 2:

let(a,mid,stateMid)=(0xB5026F5AA96619E9,n/2,state[n/2])var(i,j)=(0,mid)repeat{letx1=(state[i]&upperMask)|(state[i&+1]&lowerMask)state[i]=state[i&+mid]^(x1>>1)^((state[i&+1]&1)&*a)letx2=(state[j]&upperMask)|(state[j&+1]&lowerMask)state[j]=state[j&-mid]^(x2>>1)^((state[j&+1]&1)&*a)(i,j)=(i&+1,j&+1)}whilei!=mid&-1letx3=(state[mid&-1]&upperMask)|(stateMid&lowerMask)state[mid&-1]=state[n&-1]^(x3>>1)^((stateMid&1)&*a)letx4=(state[n&-1]&upperMask)|(state[0]&lowerMask)state[n&-1]=state[mid&-1]^(x4>>1)^((state[0]&1)&*a)

The epilogue needs to handle two indexes but the state array is only traversed once, we’re hitting half as many loop conditions and a simple multiply by 0 or 1 (optimized to bitwise arithmetic) is used to eliminate the ternary operator conditional.

If there were no other elements at play, this alone would improve performance 7% versus the mt19937-64.c implementation.

But there’s another advantage: with these two iterations (using the i index and the j index) side-by-side in the same loop, the compiler can automatically optimize to SIMD instructions to give us an extra 10% boost.

There’s another 3% performance difference but to understand that, we’ll need to make the C and Swift code more similar.

What happens if the C code does the same thing?

Swift is faster but it’s not an apples-to-apples comparison. Is it possible to make a true comparison? Where both C and Swift follow the same logic?

Let’s try changing the C code to:

for(i=0,j=mid;i!=mid-1;i++,j++){x=(ctx->state[i]&upperMask)|(ctx->state[i+1]&lowerMask);ctx->state[i]=ctx->state[i+mid]^(x>>1)^((ctx->state[i+1]&1)*a);y=(ctx->state[j]&upperMask)|(ctx->state[j+1]&lowerMask);ctx->state[j]=ctx->state[j-mid]^(y>>1)^((ctx->state[j+1]&1)*a);}x=(ctx->state[mid-1]&upperMask)|(stateMid&lowerMask);ctx->state[mid-1]=ctx->state[n-1]^(x>>1)^((stateMid&1)*a);y=(ctx->state[n-1]&upperMask)|(ctx->state[0]&lowerMask);ctx->state[n-1]=ctx->state[mid-1]^(y>>1)^((ctx->state[0]&1)*a);

As expected, this brings the C to within 3% of Swift.

So what’s the cause of the remaining difference?

At this point, the only differences are a few int types in C that are Int in Swift. We need to move the C code to a more consistent 64-bit everywhere with unsigned long long instead of int.

The remaining difference? The use of the postincrement operator I showed in the first line of the C “xor-shift-mask” code. We need to change:

x=ctx->state[ctx->index++];

x=ctx->state[ctx->index];ctx->index=ctx->index+1;

It might seem like this shouldn’t make any difference but it does affect the generated assembly and appears to result in a 1-2% difference.

Both the C and Swift are now the same. I don’t just mean “they take the same time”, I mean they are compiled to literally the same instructions.

Hooray, C is also the fastest!

Usage

The project containing these RandomGenerator implementations is available on github: mattgallagher/CwlUtils.

The CwlRandom.swift file is fully self-contained so you can just copy the file, if that’s all you need.

Otherwise, the ReadMe.md file for the project contains detailed information on cloning the whole repository and adding the framework it produces to your own projects.

License note: the mt19937-64.c file is “Copyright (C) 2004, Makoto Matsumoto and Takuji Nishimura” and contains its own BSD 3-clause license however, this file is not built into the CwlUtils.framework, it is used only by the testing bundle for validation and performance testing.

Conclusion

It looks like Xoroshiro is the best general purpose algorithm currently available. Low memory (just 128 bits of storage), extremely high performance (1.2 nanoseconds per 64-bit number, after subtracting baseline overheads) and very well distributed (beating other algorithms on a range of automated tests). Mersenne Twister might still be a better choice for highly conservative projects unwilling to switch to such a new algorithm, but the current generation of statistically tested algorithms brings a baseline of assurance from the outset that previous generations lacked.

Of course, if you only need a few random numbers in your program and you don’t really care about multithreading or repeatability then these alternative algorithms are unnecessary – there’s no problem with the built-in arc4random or even reading from /dev/random (you could generate hundreds of numbers from /dev/random per second without overheads reaching 1%). However, the RandomGenerator protocol I’ve presented makes it easier to seed and generate 64-bit values or Double from these sources too.

Getting C-level performance in Swift for numerical algorithms is quirky but not particularly difficult. If you limit yourself to value types (no classes or existentials), use unsafe pointers and tuples instead of arrays, use overflow discarding operators &+/&-/&* instead of normal +/-/*, use while or repeat/while for your loops, then Swift and clang C will generally compile to identical instructions.

It’s not as though C is maximally performant without a little contortion. Using int types for indexes on 64-bit systems should be avoided and so should common idioms like inline use of the ++ postincrement operator.

I want to briefly talk about the absence of threading and thread synchronization language features in Swift. I’ll discuss the “concurrency” proposal for Swift’s future and how, until this feature appears, Swift threading will involve traditional mutexes and shared mutable state.

Using a mutex in Swift isn’t particularly difficult but I’m going to use the topic to highlight a subtle performance nuisance in Swift: dynamic heap allocation during closure capture. We want our mutex to be fast but passing a closure to execute inside a mutex can reduce the performance by a factor of 10 due to memory allocation overhead. I’ll look at a few different ways to solve the problem.

An absence of threading in Swift

When Swift was first announced in June 2014, I felt there were two obvious omissions from the language:

error handling
threading and thread synchronization

Error handling was addressed in Swift 2 and was one of the key features of that release.

Threading remains largely ignored by Swift. Instead of language features for threading, Swift includes the Dispatch module (libdispatch, aka Grand Central Dispatch) on all platforms and implicitly suggests: use Dispatch instead of expecting the language to help.

Delegating responsibility to a bundled library seems particularly strange compared to other modern languages like Go and Rust that have made threading primitives and strict thread safety (respectively) core features of their languages. Even Objective-C’s @synchronized and atomic properties seem like a generous offering compared to Swift’s nothing.

What’s the reasoning behing this apparent omission in Swift?

Future “concurrency” in Swift

The answer is, somewhat tersely, discussed in the “Concurrency” proposal in the Swift repository.

This proposal appears to describe a situation where, like in Cyclone or Rust, references can’t be shared between threads. Whether or not the result is anything like those languages, it appears that the plan is for Swift to eliminate shared memory between threads except for types that implement Copyable and are passed through strictly governed channels (called Streams in the proposal). There’s also going to be a form of coroutine (called Tasks in the proposal) which appear to behave like pausable/resumable asynchronous dispatch blocks.

The proposal then goes on to claim that most common threading language features (Go-like channels, .NET-like async/await, Erlang-style actors) can then be implemented in libraries on top of the Stream/Task/Copyable primitives.

It all sounds great but when is Swift’s concurrency expected? Swift 4? Swift 5? Not soon.

So it doesn’t help us right now. In fact, it kind of gets in the way.

Impact of future features on the current library

The problem right now is that Swift is avoiding simple concurrency primitives in the language or thread-safe versions of language features on the grounds that they would be replaced or obviated by future features.

You can find explicit examples of this occurring by watching the Swift-Evolution mailing list, including:

Object references (both strong and weak) are undefined “in the presence of read/write, write/write, or anything/destroy data races on a variable”. There’s no intention of changing this behavior or offering a built-in “atomic” approach since it is “one of the few undefined behavior rules that we embrace”. The eventual “fix” for this undefined behavior will be the new concurrency model.
OSSpinLock is dangerously broken on iOS but – comically – can’t be fixed for fear of breaking apps dependent on the old, dangerously broken behavior. This problem affects all languages on iOS (not just Swift) but Objective-C has built-in language features for atomic properties that internally use a private “handoff lock” that is safe on iOS and relatively lightweight. Objective-C gets this lightweight synchronization but there’s no apparent desire to bring it to Swift because we’re expected to use heavyweight locks until the concurrency changes are introduce.
Result types (or another means of throwing over boundaries other than function interfaces) would be useful for numerous “continuation passing style” algorithms and have been thoroughly discussed, but ultimately will be ignored until Swift has “provided proper language support [for coroutines or async promises]” as part of the concurrency changes.

There are plenty of common features that don’t even get as far as the mailing list, including language syntax for mutexes, synchronized functions, spawning threads and everything else “threading” related that requires a library to achieve in Swift.

Trying to find a decent mutex

In short: if we want multi-threaded behavior, we need to build it ourselves using pre-existing threading and mutex features.

The common advice for mutexes in Swift appears to be: use dispatch_sync on a serial dispatch_queue_t.

I love libdispatch but, in most cases, using dispatch_sync as a mutex is in the slowest tier of solutions to the problem – more than an order of magnitude slower than other options. The reason it is so slow is similar to the problem I’ll bump into in the next section but with dispatch_sync, we have no way of working around the problem.

The next most commonly suggested option I see is objc_sync_enter/objc_sync_exit. While faster (2-3 times) than libdispatch, it’s still a little slower than ideal (because it is always a re-entrant mutex) and relies on the Objective-C runtime (so it’s limited to Apple platforms).

The fastest option for a mutex is OSSpinLock– more than 20 times faster than dispatch_sync. It has some limitations typical of a spin-lock (high CPU usage if multiple threads actually try to enter simultaneously). But it’s the serious problems on iOS that make it totally unusable outside of the Mac.

All these problems leave us with pthread_mutex_lock/pthread_mutex_unlock as the only reasonable performance, portable option.

Closure capture pitfalls and mutexes

Like most things in plain C, pthread_mutex_t has a pretty clunky interface, so it helps to use a Swift wrapper around it (particularly for construction and automatic cleanup). Additionally, it’s helpful to have a “scoped” mutex – one which accepts a function and runs that function inside the mutex, ensuring a balanced “lock” and “unlock” either side of the function.

I’ll call my wrapper PThreadMutex. Here’s an implementation of a simple scoped mutex function:

publicfuncslowsync(@noescapef:()->Void){pthread_mutex_lock(&unsafeMutex)f()pthread_mutex_unlock(&unsafeMutex)}

This was supposed to be high performance but it’s not. Can you see why?

It might not be obvious but in most cases, this function will be more than 10 times slower than it needs to be (3.043 seconds for 10 million invocations, versus an ideal 0.263 seconds). The reason is that in most cases, the closure passed into this slowsync function will need to capture something from the surrounding scope, specifically: whatever mutable state is protected by the mutex.

// Capture the `mutableState` variable in the closuremutex.slowsync{doSomething(mutableState)}

This might seem innocent enough (afterall, capturing is what closures do). But if the slowsync function can’t be inlined into its caller (because it’s in another module or it’s in another file and whole module optimization is turned off) then the capture will involve a heap allocation.

When you’re working with code that needs to run anywhere up to a few thousand times a second (you know, 99.9% of code), memory allocation probably isn’t a big deal. Allocate memory freely and everything will work out fine. But when you’re going into the hundreds of thousands or the millions of times per second – something that we often expect of our threading primitives – memory allocation is a bad thing.

We don’t want heap allocation so let’s see if we can avoid it by passing the relevant parameter into the closure, instead of capturing it.

WARNING: the next few code examples get increasingly goofy and I don’t suggest doing this in most cases. I’m doing this to demonstrate a problem. Please read through to the section titled “A different approach” to see what I actually use in practice.

publicfuncsync_2<T>(inoutparam:T,@noescapef:(inoutT)->Void){pthread_mutex_lock(&unsafeMutex)f(&param)pthread_mutex_unlock(&unsafeMutex)}

That’s better… now the function runs at full speed (0.282 seconds for the 10 million invocation test).

We’ve only solved the problem with values passed in to the function. There’s a similar problem with returning a result. The following function:

publicfuncsync_3<T,R>(inoutparam:T,@noescapef:(inoutT)->R)->R{pthread_mutex_lock(&unsafeMutex)letr=f(&param)pthread_mutex_unlock(&unsafeMutex)returnr}

is back to the same, sluggish speed of the original, even when the closure captures nothing (at 1.371 seconds, it’s actually worse). This closure is performing a heap allocation to handle its result.

We can fix this by making the result an inout parameter too.

publicfuncsync_4<T,U>(inoutparam1:T,inout_param2:U,@noescapef:(inoutT,inoutU)->Void)->Void{pthread_mutex_lock(&unsafeMutex)f(&param1,&param2)pthread_mutex_unlock(&unsafeMutex)}

and invoke like this:

// Assuming `mutableState` and `result` are valid, mutable values in the current scopemutex.sync_4(&mutableState,&result){$1=doSomething($0)}

We’re back to full speed, or close enough (0.307 seconds for 10 million invocations).

A different approach

The last few examples were hideous.

Instead of having a single, elegant sync function that can handle everything, we would need a dozen or more permutations of the function with all of:

Returning Void or a generic result
Zero or more inout parameters up to an arity of at least 2
A “constant” parameter (for parameters that can’t be inout captured, like self)

The whole setup starts getting silly.

Also, the experience is ugly to the caller. One of the advantages with closure capture is that items inside the closure have the same names inside and outside the closure. When we avoid closure capture and instead try to pass all values in as parameters, we’re forced to either rename all our variables or shadow names – neither of which helps comprehension – and we still run the risk of accidentally capturing a variable and losing all efficiency anyway. In addition, returning values by using an inout parameter purely as an “out” parameter is a serious indication that something is wrong.

All of this ugliness is because Swift can’t inline between compilation units and can’t optimize our closure to the stack.

We can solve this problem another way. We can take the original slowsync function and just copy and paste it into every file:

privatefuncsync<R>(mutex:PThreadMutex,@noescapef:()throws->R)rethrows->R{pthread_mutex_lock(&mutex.unsafeMutex)defer{pthread_mutex_unlock(&mutex.unsafeMutex)}returntryf()}

This almost works. The heap allocation overhead is gone, bringing the time taken from 3.043 seconds down to 0.41 seconds. But we still haven’t reached the baseline 0.263 seconds of calling pthread_mutex_lock/pthread_mutex_unlock manually. What’s going wrong now?

It turns out that despite being a private function – where Swift can fully inline the function – Swift is not eliminating redundant retains and releases on the the mutex.

We can force the point by making the function an extension on PThreadMutex, rather than a free function:

extensionPThreadMutex{privatefuncsync<R>(@noescapef:()throws->R)rethrows->R{pthread_mutex_lock(&unsafeMutex)defer{pthread_mutex_unlock(&unsafeMutex)}returntryf()}}

This forces Swift to treat the self parameter as @guaranteed, eliminating retain/release overhead and we’re finally down to the baseline 0.264 seconds.

Usage

The project containing this PThreadMutex implementation is available on github: mattgallagher/CwlUtils.

The CwlPThread.swift file is fully self-contained so you can just copy the file, if that’s all you need.

Otherwise, the ReadMe.md file for the project contains detailed information on cloning the whole repository and adding the framework it produces to your own projects.

Conclusion

Trying to get maximum performance from a scoped mutex isn’t straightforward in the current version of Swift. Without inlining between compilation units or optimization of capturing closures to the stack, the only option we have to get maximum performance from a scoped mutex function is to manually copy the helper function into every file that uses it – and that function must be a method, not just a free function.

It would be nice if this functionality was part of the standard library (since the standard library is always inlined) but with Swift avoiding action on the topic of concurrency for at least another year (possibly more), it probably won’t happen.

Appendix: performance numbers

I ran a simple loop, 10 million times, entering a mutex, incrementing a counter, and leaving the mutex. The PThreadMutex class is compiled as part of a separate dynamic framework to the test code.

These are the timing results:

Mutex variant	Seconds
`PThreadMutex.slowsync` (capturing closure)	3.043
`dispatch_sync`	2.330
`PThreadMutex.sync_3` (returning result)	1.371
`objc_sync_enter`	0.869
`sync(PThreadMutex)` (function in same file)	0.374
`PThreadMutex.sync_4` (dual `inout` params)	0.307
`PThreadMutex.sync_2` (single `inout` param)	0.282
`PThreadMutex.sync` (copied to the same file)	0.264
manual `pthread_mutex_lock`/`unlock`	0.263
`OSSpinLockLock`	0.092

The test code used is part of the linked CwlUtils project but is surrounded by #if TEST_ADDITIONAL_SYNC_FUNCTIONS and is disabled by default.

I took a break from converting projects to Swift 3 this weekend and instead played around with writing a new “Source Editor Command” extension for Xcode 8. The result is an extension that detects and corrects whitespace issues. It’s a topic I’ve touched on before because I’m frustrated by the continuous burden of correcting whitespace in my projects. I wanted to see if an Xcode source extension could reduce my annoyance with this issue.

The fun part of this exercise for me was writing a pushdown automaton parser where all of the parser logic was expressed through the case patterns of a single Swift switch statement. Accordingly, most of this article is not really about Xcode Source Editor Command extensions or whitespace issues but is actually about design choices related to large, complex switch statements in Swift: choosing data types to switch upon, designing case patterns to handle a deep decision tree in a single switch statement, when to use custom pattern matching and when to use where clauses.

Reformatting source files in Xcode

Code edited in Xcode is particularly prone to whitespace problems. Xcode file templates hard code indentation, ignoring your project settings. Code snippets also hard code indents and additionally ignore the indentation of the current scope, leading to code completion that falls into the left margin until you dig it out again:

funcapplicationDidBecomeActive(_application:UIApplication){for<#item#>in<#items#>{<#code#>}}

Beyond indentation problems, there are different Swift community opinions about whether some : characters should have spaces on both sides or just the right side, about whether case labels should be indented, about whether #if contents should be indented and more. Additionally, there’s always the possibility of typos inserting whitespace where it doesn’t belong or deleting whitespace you’d prefer to keep.

Xcode does offer “Re-Indent” for correcting the indentation of your file (menu “Editor → Structure → Re-Indent”, or Ctrl-I, by default). In general, it’s fairly good. I often wonder why it isn’t run automatically over code created from templates or snippets to apply the user’s indentation preferences.

It does have four limitations:

It only fixes indentation problems (not other whitespace issues).
It can’t be run in a “pre-flight” mode to see if or what it would change.
It always marks the file as “dirty” so it’s difficult to tell what, if anything, changed.
It’s not open source so I can’t customize it to my personal preferences.

With the new “Source Editor Command” extensions possible in Xcode 8, I wanted to see if I could produce something that enforces whitespace rules as I’d like them to be enforced in my own projects.

A pushdown automaton for whitespace parsing

I’ve written recursive descent parsers on Cocoa with Love before but they’re a lot of work (particularly if you need to refactor the grammar) and in this case, it’s probably not the best option.

The most important task in whitespace parsing is tracking the number of scopes entered and exited – we need this to determine the indent level. This involves pushing “scope” items on and off of a parse stack as they are encountered and counting the number of “scope” items on the stack to determine the indent level.

In a recursive descent parser, the parse stack is the call stack. This is efficient in most cases but since we can’t simply count items on the call stack (without preventing inlining and doing other unsafe things) we end up needing to replicate the parse stack in an array as well as on the call stack, which ends up feeling redundant.

Instead, I’m going to create a pushdown automaton, using an updated version of my ScalarScanner from Comparing Swift to C++ for parsing as part of the underlying tokenizer.

In most respects, a pushdown automaton is really just an extension of a finite automaton. You’ve probably seen diagrams for finite automatons, they look like this:

a non-linear, three node graph

In this diagram is a finite automaton with two states: it’s either parsing “body” text or it’s parsing a “comment”. If it’s in the “body” and it hits a “/*” then it will transition to “comment” and if it’s in the “comment” and it hits a “*/” it will transition back. Any other token will cause the parser to spin in-place.

The problem with this type of automaton is that it can’t handle nesting. If there’s a comment inside a comment (valid in Swift) this parser will see the inner “*/” and will immediately return to “body” state incorrectly treating the remainder of the outer comment as “body”. You could add additional states for a finite number of comment nesting level states but it quickly gets complicated dealing with large numbers of states and it will only ever handle a finite nesting level rather than true unbounded nesting levels.

Let’s look instead at a pushdown automaton:

a non-linear, three node graph

The key distinction versus a plain automaton is that each arrow may also push and pop values on a stack and any arrow may fork along multiple paths based on the contents of the stack. Unlike the finite automaton, the stack gives the pushdown automaton an unlimited number of states, so it can represent unlimited nested – in this case of multiline comments.

The nice thing about Swift for modelling automata is that the logic for the states, arrows and stack can all be handled by a single switch statement, resulting in code that very closely models the structure of the automaton. Here’s the main loop of the pushdown automaton in the whitespace parser, showing the cases in the previous diagram:

whilelettoken=nextToken(&scanner){switch(state,token,stack){case(.body,.slashStar,_):arrow(to:.multiComment,push:.comment)case(.body,_):breakcase(.multiComment,.starSlash,UniqueScope(.comment)):arrow(to:.body,pop:.comment)case(.multiComment,.starSlash,_):arrow(to:.multiComment,pop:.comment)case(.multiComment,.slashStar,_):arrow(to:.multiComment,push:.comment)case(.multiComment,_,_):break}}

This code tests the stack before popping, unlike the diagram, so it tests for a unique value in the stack, not an empty stack, but otherwise this code contains the same logic as the previous diagram.

The full parser is about 120 lines long but it is simply more states like this and more tokens.

Custom matching patterns in Swift

When a .starSlash is encountered in the .multiComment state, the code implements the following logic:

if exactly 1 comment is on the stack
- pop the comment
- return to body
otherwise
- pop the comment
- stay at multiline comment.

This logic involves evaluating the condition “if exactly 1 comment is on the stack” which is implemented with the UniqueScope(.comment) pattern in the code. It’s fairly unusual to see this type of construction in Swift but it really makes the code clearer and more concise in this situation so I wanted to talk about it a little more.

The stack is an Array<Scope>. By default, there’s no pattern matching for Array– not even basic equality matching. For example, the following code won’t compile:

switch["a","b"]{case["a","b"]:print("Can't match arrays")default:break}

Pattern matching is automatically defined for types that implement Equatable but (due to type system limitations), Array does not implement Equatable (despite having a == operator definition).

In any case, we’re not interested in “equality” style matching here. Instead we want to match against arrays that contain a single instance of a given Scope. For this purpose, I’ve used a type named UniqueScope. This type is basically a predicate pattern and is defined as follows:

structUniqueScope{letscope:Scopeinit(_scope:Scope){self.scope=scope}}func~=(test:UniqueScope,array:Array<Scope>)->Bool{returnarray.reduce(0){$1==test.scope?$0+1:$0}==1}

The UniqueScope type is clearly very simple. Its only purpose is to offer the associated ~= implementation which will be used when evaluating the case expression. Technically, the following two switch cases are equivalent:

switchstack{caseUniqueScope(.comment):print("Just one .comment in the stack")case_whereUniqueScope(.comment)~=stack:print("Just one .comment in the stack")

Creating better `case` patterns

The first case in the previous code sample isn’t just more syntactically efficient, it’s a vastly better conceptual design. The second case should be considered an anti-pattern: you should never use a where clause to do what should be done in the case pattern.

When to `where` clause

The only scenario where a where clause is likely to be the best option is in testing properties unwrapped by the case pattern:

switchanOptionalInt{case.some(letvalue)wherevalue>1:print("Value is greater than 1")

You could avoid the where clause here by writing another custom pattern matching predicate:

switchanOptionalInt{caseMatchValue(greaterThan:1):print("Value is greater than 1")

which is syntactically a little tighter but unless you needed to apply this pattern dozens of times, the effort of defining the type is probably more than you’re saving. Additionally, using the standard syntax of the language has familiarity advantages to new readers.

External conditions

What about situations where the where clause tests a value that is completely unrelated to the switch expression or unwrapped values therein?

For example, in a previous version of the parser, I had written:

switch(state,token,stack){case(.body,.space,_)wherelengthOfToken>1:// ... flag multiple spaces as "unexpected" in body text ...

There’s no technical problem with this code but there is a conceptual problem. The case chosen by the switch statement is now dependent on properties outside the switch expression. This is conceptually dishonest.

More honest might be adding lengthOfToken to the tuple in the switch expression:

switch(state,token,stack,lengthOfToken){case(.body,.space,_,1...Int.max):// ... flag multiple spaces as "unexpected" in body text ...

which is fine but this switch statement contains nearly 100 cases and this field will be a mere placeholder for all other cases. It’s arguably better conceptually but it’s worse sytactically.

Compose program state into fewer variables

Ultimately I made the decision that the design was wrong; that .space where length == 1 and .space where length > 1 were different tokens. This eliminated the need to test the length, since that test would already be implicit when matching on .multiSpace.

switch(state,token,stack){case(.body,.multiSpace,_):// ... flag multiple spaces as "unexpected" in body text ...

If you compose your values correctly on construction, then you’re effectively hoisting conditionals from later to earlier in your data pipeline – which can simplify your program if the later code is larger, more complex or runs more often.

Compromise

I’ve given a bunch of rules here but ultimately, sometimes you have to make judgement calls, trading syntactic efficiency for conceptual honesty. The final case I want to show is where the indentation level is validated for the line:

case(.indentEnded,ValidIndent(self,column),_):// ... proceed normally ...case(.indentEnded,_,_):// ... flag invalid indent ...

The ValidIndent predicate pattern type is constructed with self and the current column and it is matched against the last read token. What this means is that it is essentially given the whole function and parser state to evaluate its result.

This behavior of this case pattern isn’t conceptually pure. This case is actually dependent on the stack (i.e. the third column of the switch but which sits as an empty placeholder in the case pattern) via the self argument, even though it is nominally only matching against the token. However, without this pattern matching, the 8 tokens and 2 stack configuration this pattern conceals would need to be pulled back into the main switch statement and then doubled (once for a valid indent and once for an invalid indent).

I opted for a little conceptual muddiness over case replication.

Case lists?

One option I didn’t really look at closely was caselists. All of my code takes this form:

switchexpression{casea:fallthroughcaseb:fallthroughcasec:fallthroughcased:print("Matched a to d")}

but this type of switch can be written as:

switchexpression{casea,b,c,d:print("Matched a to d")}

With short case patterns, this is an improvement. I’m not sure it would really help with the large case patterns involved when matching against three-element tuples nor would it make a very long switch easier to read.

Usage

The code for the Xcode Source Editor Command extension and whitespace parser discussed in this article is available as part of the CwlWhitespace project on Github. This project is written for Swift 3 and requires Xcode 8 to build.

The application is the delivery mechanism. Run the application once and the extension will be installed. (If running Xcode 8 on El Capitan, you need to enable Xcode extensions. See the Xcode 8 release notes for more.) To uninstall, drag the application to the Trash.

The extension itself has two commands, available from the “Whitespace Policing” submenu at the bottom of the “Editor” menu in Xcode when editing a source file:

Detect problems
Correct problems

(There are some quirks in Xcode 8 beta 1 and sometimes the “Whitespace Policing” submenu will appear greyed out. To fix the problem, close all your projects and then quit Xcode. Upon reopening Xcode, wait on the “Welcome to Xcode” splash screen for about 5 seconds before opening your project.)

The first command uses multiple selections to select every text range in your file that it believes is violating a whitespace rule. If a line contains a zero-length problem (missing whitespace or missing indent) then the whole line will be selected.

The second command edits whitespace problems to the expected values and selects the changed regions in your editor.

The tests in “CwlWhitespaceTaggingTests.swift” are the only documentation about what whitespace is permitted and what is disallowed.

WARNING: this is a program that – if you use the “Correct Problems” command – may deliberately delete characters from your source file. The only tests I have performed are the tests in the “CwlWhitespaceTests” bundle. There could be bugs outside the tested behavior so be prepared. Pay attention to what the extension has done to your file and if unhappy, use Xcode’s “Undo” (which should roll back any changes made by the extension). Please keep your files in a version control system and also use a backup system to protect your data.

Conclusion

I mostly wrote this Xcode Source Extension for myself. Without an authoritative “swift-format” to enforce common rules, these types of code formatting choices end up being personal rather than universal. However, the code is open source so you’re welcome to hack at it and use it as the basis for something that enforces your own preferences if you wish.

I think the core parser involved is pretty good. To this point is has endured tweaks and additions quite easily (although if it gets much bigger it might make sense to split it across different functions for each state). I’ve only focussed on a few key features in the current version and I haven’t included any rules for vertical whitespace, operators or a hundred other little areas in Swift where whitespace can occur. It’s possible that it might be better to leave it a little minimalist – I’ll need to see how I feel after using it for a while.

It was interesting to iterate my design of the switch statement and try to decide what constitutes a good case statement and what might be considered bad. The power of Swift’s switch statement remains one of my favorite features of the language but you still have to use it in a way that makes conceptual sense – particularly when it’s over 100 lines long and you can’t necessarily read all parts of it at once.

There’s a tokenizer (also known as a lexer) in the code too (the nextToken and readNext functions in the “CwlWhitespaceTagging.swift” file). I didn’t really discuss it because I’m not totally happy with how it turned out. If you have any thoughts on writing a more concise, more extensible, high performance tokenizer, let me know.

This article will look issues surrounding a Swift compiler error that repeatedly forces me to rewrite my code:

error: expression was too complex to be solved in reasonable time; consider breaking up the expression into distinct sub-expressions

I’ll look at examples that trigger this error and talk about negative effects beyond compiler errors that are caused by the same underlying issue. I’ll look at why this occurs in the compiler and how you can work around the problem in the short term.

I’ll also present a theoretical change to the compiler that would eliminate all this problem permanently by altering the algorithm involved to be linear time complexity instead of exponential time complexity, without otherwise changing the external behavior.

Errors compiling otherwise valid code

The following line will give an error if you try to compile it in Swift 3:

leta:Double=-(1+2)+-(3+4)+5

This line is valid, unambiguous Swift syntax. This line should compile and ultimately optimize to a constant value.

But the line doesn’t get past the Swift type checker. Instead, it emits an error that the expression is too complex to solve. It doesn’t look complex, does it? It’s 5 integer literals, 4 addition operators, two negation operators and a binding to a Double type.

How can an expression containing just 12 entities be “too complex”?

There are many other expressions that will cause the same problem. Most include some literals, some of the basic arithmetic operators and possibly some heavily overloaded constructors. Each of the following expressions will fail with the same error:

letb=String(1)+String(2)+String(3)+String(4)letc=1*sqrt(2.0)*3*4*5*6*7letd=["1"+"2"].reduce("3"){"4"+String($0)+String($1)}lete:[(Double)->String]=[{vinString(v+v)+"1"},{vinString(-v)}+"2",{vinString(Int(v))+"3"}]

All of these are completely valid Swift syntax and the expected types for every term in every expression should be obvious to any Swift programmer but in each case, the Swift type checker fails.

Needlessly long compile times

Errors aren’t the only consequence of this problem. Try compiling the following line:

letx={String("\($0)"+"")+String("\($0)"+"")}(0)

This single line does compile without error but it takes around 4 seconds to compile in Swift 2.3 and a whopping 15 seconds to compile in Swift 3 on my computer. The compiler spends almost all of this time in the Swift type checker.

Now, you probably don’t have many lines that take this long to compile but it’s virtually guaranteed that any non-trivial Swift project is taking at least a little longer than necessarily to compile due to the same complexity problem that causes the “expression was too complex to be solved in reasonable time” error.

Unexpected behaviors

I want to highlight a quirk in the Swift type checker: the type checker will choose to resolve overloaded operators to non-generic overloads whenever possible. The code comments for the path that handles this specific behavior in the compiler note that this behavior exists as an optimization to avoid performance issues – the same performance issues responsible for the “expression was too complex” error.

To see what effects this has, let’s look at the following code:

letx=-(1)

This code doesn’t compile. Instead, we get an error “Ambiguous use of operator ‘-‘”.

This shouldn’t really be ambiguous; the compiler should realize we’re using a integer literal, treat the 1 as an Int and select the following overload from the standard library:

prefixpublicfunc-<T:SignedNumber>(x:T)->T

However, Swift considers only the non-generic overloads; in this case, the Float, Double and Float80 implementations, all of which are equally imperfect (non-preferred creation from an integer literal). The compiler can’t choose one so it bails out with the error.

This particular optimization is applied only to operators, leading to the following inconsistency:

funcf(_x:Float)->Float{returnx}funcf<I:Integer>(_x:I)->I{returnx}letx=f(1)prefixoperator%%{}prefixfunc%%(_x:Float)->Float{returnx}prefixfunc%%<I:Integer>(_x:I)->I{returnx}lety=%%1

This code defines two function names (f and a custom operator prefix %%). Each of these function names has two overloads, (Float) -> Float and <I: Integer>(I) -> I.

Calling f(1) selects the <I: Integer>(I) -> I implementation and x is an Int. This is exactly what you’d expect.

Calling %%1 selects the (Float) -> Float implementation and y is a Float, contrary to expectations. The code has chosen to convert 1 to a Float, against the preference for Int– despite the fact that Int would also work – because the compiler bails out before it considers the generic overload of the function. It’s not really a semantically consistent choice, it’s the result of a compromise in the compiler to avoid the “expression was too complex to be solved” error and its associated performance problems.

Working around the problem in our code

In general, Swift’s complexity problem won’t be an issue unless you’re using two or more of the following features in a single expression:

overloaded functions (including operators)
literals
closures without explicit types
expressions where Swift’s default “every integer literal is an Int and every float literal is a Double” choice is wrong

If you don’t typically combine these features in your code, then you’re unlikely to see the “expression was too complex” error. However, if you are using these features, it isn’t always straightforward to suddenly stop. Mathematics code, large “function”-style expressions and declarative code are easier to write with these features and often require a complete rethink to avoid them.

You may prefer to give the compiler a little nudge so that it will accept your code without major changes. There are a few different approaches that can help.

The compiler error suggests “breaking up the expression into distinct sub-expressions”:

letx_1:Double=-(1+2)letx_2:Double=-(3+4)letx:Double=x_1+x_2+5

Okay, technically that works but it’s really annoying – especially on small expressions where it only hurts legibility.

Another option is to reduce the number of function and operator overloads that the compiler must consider, by adding typecasts.

letx:Double=-(1+2)asDouble+-(3+4)asDouble+5

This will prevent (Float) -> Float or (Float80) -> Float80 being explored as one of the possible overloads for the negation operator, effectively reducing a system with 6 unknown functions to a system with just 4.

A note about this approach though: unlike other languages, Double(x) is not equivalent to x as Double in Swift. The constructor works more like another function and since it has multiple overloads on its parameter, it actually introduces another overloaded function into the search space (albeit at a different location in the expression). While the previous example will solve if you introduce Double around the parentheses (since the way the graph is rearranged favors the type checker), there are some cases where a similar approach can actually make things worse (see the String examples near the top of the article). Ultimately, the as operator is the only way to cast without inserting further complexity. Fortunately, as binds tighter than most binary operators so it can be used without parentheses in most cases.

Another approach is to use a uniquely named custom function:

letx:Double=myCustomDoubleNegation(1+2)+myCustomDoubleNegation(3+4)+5

This eliminates any need to resolve function overloads entirely since it has no overloads. However, it’s pretty ugly in this case where it presents a large amount of visual weight in an otherwise lightweight expression.

A final approach, is that in many cases you can replace free functions and operators with methods:

letx:Double=(1+2).negated()+(3+4).negated()+5

This works because methods normally have fewer overloads than the common arithmetic operators and type inference with the . operator is often narrower than via free functions.

Swift’s constraints system solver

The “expression was too complex” error is emitted by the “Sema” (semantic analysis) stage in the Swift compiler. Semantic analysis is responsible for resolving types in a system, ensuring that they all agree with each other and building a well-typed expression where all types are explicit.

More specifically, the error is triggered by the constraints system solver (CSSolver.cpp) within the semantic analysis stage. The constraints system is a graph of types and functions from a Swift expression and the edges in this graph are constraints between nodes. The constraints system solver processes the constraints from the constraints system graph until an explicit type or function is determined for each node.

All of that is really abstract so let’s look at a concrete example.

leta=1+2

The constraints system representation of this expression looks something like this:

a simple constraints graph

The labels starting with “T” (for “type”) in the diagram come from the constraints system debug logging and they are used to represent types or function overloads that need to be resolved. In this graph, the nodes have the following constraints:

T1 conforms to ExpressibleByIntegerLiteral
T2 conforms to ExpressibleByIntegerLiteral
T0 is a function that takes (T1, T2) and returns T3
T0 is one of 28 implementations in the Swift standard library named infix +
T3 is convertible to T4

NOTE: If you’re more familiar with Swift 2 terminology, ExpressibleByIntegerLiteral was previously named IntegerLiteralConvertible.

To resolve this system, the constraints system solver starts with the “smallest disjunction”. Disjunctions are constraints that constrain a value to be “one of” a set (essentially, a logical OR). In this case, there’s exactly one disjunction: the T0 function overload in constraint 4. The solver picks the first implementation of infix + in its list: the infix + implementation with type signature (Int, Int) -> Int.

Since this is the only disjunction in the graph, the solver then resolves type constraints. Types T1, T2 and T3 are Int due to constraint 3. T4 is also Int due to constraint 5.

Since T1 and T2 are Int (which is the “preferred” match for an ExpressibleByIntegerLiteral), no other overloads to the infix + function are considered; the constraints system solver can ignore all other possibilities and use this as the solution. We have explicit types for all nodes and we know the selected overload for each function.

Getting more complex, quickly

Thus far, there’s nothing strange and you might not expect that Swift would start failing with expressions just a few times bigger. But let’s make just two changes: wrap the 2 in parentheses and apply a negation operator (prefix -) and add a type of Double for the result.

leta:Double=1+-(2)

This gives a constraints system graph that looks like this:

a slightly more complex constraints graph

with the constraints:

T1 conforms to ExpressibleByIntegerLiteral
T3 conforms to ExpressibleByIntegerLiteral
T2 is a function that takes (T3) and returns T4
T2 is one of 6 implementations in the Swift standard library named prefix -
T0 is a function that takes (T1, T4) and returns T5
T0 is one of 28 implementations in the Swift standard library named infix +
T5 is convertible to Double

Just 2 more constraints in the system. Let’s look at how the constraints system solver handles this example.

The first step to resolve this system is: choose the smallest disjunction. This time that’s constraint 4: “T2 is one of 6 implementations in the Swift standard library named prefix -”. The solver sets T2 to the overload with signature (Float) -> Float.

The next step is to apply the first step again: choose the next smallest disjunction. That’s constraint 6: “T0 is one of 28 implementations in the Swift standard library named infix +”. The solver sets T0 to the overload with signature (Int, Int) -> Int.

The final step is to use the type constraints to resolve all types.

However, there’s a problem: the first choice of (Float) -> Float for T2 and the choice of (Int, Int) -> Int for T0 don’t agree with each other so constraint 5 (“T0 is a function that takes (T1, T4) and returns T5”) fails. This solution is invalid and the solver must backtrack and try the next choice for T0.

Ultimately, the solver will go through all of the implementations of infix + and none will satisfy both constraint 5 and constraint 7 (“T5 is convertible to Double”).

So the constraints system solver will need to backtrack further and try the next overload for T2, (Double) -> Double. Eventually, this will find a solution in the overloads for T0.

However, since Double is not the preferred match for an ExpressibleByIntegerLiteral, the constraints system solver will need to backtrack and choose the next overload for T2 and run another full search through the possible values for T0.

There are 6 total possibilities for T2 but the final 3 implementations are rejected as an optimization (since they are generic implementations and will therefore never be preferred over the explicit Double solution).

This specific “optimization” in the constraints solver is the cause of the quirky overload selection behaviors I showed in the Unexpected behaviors section, above.

Despite this optimization, we’ve already gone from solving the constraints system with our first guess to needing 76 different guesses before reaching a solution. If we had another overloaded function to resolve, the number of guesses would get much, much bigger. For example, if we had another infix + operator in the expression, e.g. let a: Double = 0 + 1 + -(2), the search space would require 1,190 guesses.

Searching for the solution in this way is a clear case of exponential time complexity. The search space between the disjunction sets here is called a “Cartesian product” and for $n$ disjunctions in the graph, this algorithm will search an n-dimensional Cartesian product (an exponential space).

In my testing, 6 disjunctions in a single expression is usually enough to exceed Swift’s “expression was too complex” limits.

Linearizing the constraints solver

A better solution, in the long term, is to fix the problem in the compiler.

The reason why the constraints system solver has an exponential time complexity for resolving function overloads is because (unless it can shortcut around the problem) Swift searches the entire $n$ -dimensional “Cartesian product” of all $n$ disjunction sets caused by function overloads in the graph to determine if one of the values is a solution.

To avoid the $n$ -dimensional Cartesian product, we need an approach where disjunctions are solved independently, rather than dependently.

Here is where I need to include a big warning:

UNPROVEN CLAIMS WARNING: The following is a theoretical discussion about how I would improve the resolution of function overloads in the Swift constraint system solver. I have not written a prototype to prove my claims. It remains possible that I’ve overlooked something that invalidates my reasoning.

Premise

We want to satisfy the following two goals:

Constraints on a node should not depend upon, or reference, the result from other nodes.
The disjunction from a preceeding function overload should be intersected with the disjunction from a succeeding function, flattening the two constraints to a single disjunction.

The first goal can be achieved by propagating constraints along the full path from their origins. Since Swift constraints are bidirectional, the constraint path for each node starts at all of the leaves of the expression, traverses via the trunk and then traverses back along a linear path to the node. By following this path, we can compose constraints that – instead referencing or depending upon types from other nodes – simply incorporate the same constraint logic as other nodes.

The second goal supports the first by reducing the propagated accumulation of constraints down to a single simple constraint. The most important intersection type for function overloads is an intersection of possible outputs from one overloaded function with possible input parameters of another overloaded function. This operation should be calculable by a type check of the intersecting parameter over a 2-dimensional Cartesian product. For other combinations of constraint intersection, a true mathematical intersection might be difficult but it’s not necessary to follow strict type rules, we need only reproduce the behavior of Swift’s type interference which often resorts to greedy type selection in complicated scenarios.

Re-solving the previous example

To see how these goals would be used in the constraints system solver, lets revisit the previous constraints graph:

leta:Double=1+-(2)

a slightly more complex constraints graph

and begin with the same set of constraints:

T1 conforms to ExpressibleByIntegerLiteral
T3 conforms to ExpressibleByIntegerLiteral
T2 is a function that takes (T3) and returns T4
T2 is one of 6 implementations in the Swift standard library named prefix -
T0 is a function that takes (T1, T4) and returns T5
T0 is one of 28 implementations in the Swift standard library named infix +
T5 is convertible to Double

Propagating constraints from right to left

We start by traversing from right to left (from the leaves of the expression tree down to the trunk).

Propagating the constraint on T3 to T2 adds the constraint “T2’s input must be convertible from a type that is ExpressibleByIntegerLiteral with Int preferred due to literal rules”. Intersecting this new constraint with the existing T2 constraints, discards the new constraint since all input parameters in T2’s existing possibility space already fulfill this new constraint and the preference of ExpressibleByIntegerLiteral for Int type on the input is overruled by the preference for specific operator overloads over generic operator overloads (which makes Double, Float or Float80 overloads of the prefix - function preferred). Propagating T2 to T4 adds the constraint “T4 must be one of the 6 types output from prefix -, with Double, Float or Float80 preferred”. Propagating T4 to T0 adds the constraint “T0’s second parameter must be convertible from one of the 6 types output from prefix -, with Double, Float or Float80 preferred”. Intersecting this with the constraints already on T0 leaves the constraint “T0 is one of 6 implementations in the Swift standard library named infix + where the right input is one of the types output from prefix -, with Double, Float or Float80 preferred”. Propagating T1 to T0 has no additional effect (since all choices in the existing constraint on T0 already fulfill this new constraint and the preference of ExpressibleByIntegerLiteral for Int type is cancelled out by the preference for Double, Float or Float80). Propagating T0 to T5 add the constraint “T5 is one of 6 values returned from the infix + operator where the second parameter is one of the types output from prefix -, with Double, Float or Float80 preferred”. Intersecting this with the constraint already on T5 leaves “T5 is Double”.

The half-propagated constraints are now:

T1 conforms to ExpressibleByIntegerLiteral with Int preferred due to literal rules
T3 conforms to ExpressibleByIntegerLiteral with Int preferred due to literal rules
T2 is a function that takes (T3) and returns T4
T2 is one of 6 implementations in the Swift standard library named prefix - with Double, Float or Float80 preferred due to the preference for specific operator overloads over generic operator overloads
T4 must be one of the 6 types output from prefix -, with Double, Float or Float80 preferred due to the preference for specific operator overloads over generic operator overloads
T0 is a function that takes (T1, T4) and returns T5
T0 is one of the 6 implementations in the Swift standard library named infix + where the second parameter is one of the 6 types output from prefix -, with Double, Float and Float80 preferred due to the preference for specific operator overloads over generic operator overloads
T5 is Double

Propagating constraints from left to right

Now we go left to right (from the trunk, up to the leaves).

Starting at the constraints on T5. Constraint 5 is “T5 is Double”. Propagating this to T0 creates a new constraint on T0 that its result must be convertible to Double. The intersection of this new constraint and the result of T0 immediately eliminates all possible overloads for the infix + operator except (Double, Double) -> Double. Propagating from T0 to T1, via the first parameter of this overload, creates a new constraint for T1 that it must convertible to Double. Intersecting this new constraint with the previous “T1 conforms to ExpressibleByIntegerLiteral” results in “T1 is Double”. Propagating from T0 to T4, via the second parameter of the selected infix + overload creates a new constraint for T4 that it must be convertible to Double. Intersecting this new constraint with the existing constraint on T4 results in “T4 is Double”. Propagating from T4 to T2 creates a new constraint on T2 that it must return a type that is convertible to Double. Intersecting this new constraint with the previous constraints on T2 immediately eliminates all overloads except (Double) -> Double. Propagating from T2 to T3 creates a new constraint for T3 that it must convertible to Double. Intersecting this new constraint with the previous “T3 conforms to ExpressibleByIntegerLiteral” results in “T3 is Double”.

The fully-propagated constraints are now:

T1 is Double
T3 is Double
T2 is the (Double) -> Double overload of prefix -
T0 is the (Double, Double) -> Double overload of infix +
T5 is Double

And the constraints system is solved.

How would performance compare?

The purpose of my suggested algorithm is to improve the resolution of function overloads, so I’ll call the number of function overloads $n$ . I’ll use $m$ for the average number of overloads per function (the “disjunction” size).

As I previously stated, Swift’s algorithm solves this by searching an $n$ -dimensional Cartesian product of rank $m$ . This gives is a time complexity of $O (m^{n})$ , an exponential.

My suggested algorithm resolves $n$ function overloads by searching $(n - 1)$ separate 2-dimensional Cartesian products of rank $m$ . This gives is a time complexity proportional to $m^{2} n$ . Assuming $m$ is independent of $n$ (true when we can calculate the intersection between constraints, false otherwise), this makes the upper bound $O (n)$ , linear complexity.

Linear complexity will always beat exponential complexity for large values of $n$ but it’s important to know what constitutes “large” and what effect other factors might have for smaller values. In this case, 3 is already a “large” value. As I previously stated, Swift’s constraint system solver would require 1,190 guesses to solve the expression let a: Double = 0 + 1 + -(2). My suggested algorithm would require a search of just 336 possibilities and would have significantly lower overheads per possibility than Swift’s current approach.

I’m making an interesting assertion there: I’m claiming that my suggested algorithm would have lower overheads per possible solution. Let’s look at that in more detail for $n = 2$ with our previous let a: Double = 1 + -(2) example. Theoretically, both Swift’s algorithm and my suggested algorithm will search the same 2-dimensional Cartesian product between prefix - and infix +– a space that contains 168 possible solutions.

Swift’s current algorithm searches just 76 possible entries out of a total possible 168 in the 2-dimensional Cartesian product space between the overloads of prefix - and infix +. But in doing so, Swift’s algorithm performs 567 calls to ConstraintSystem::matchTypes, of which 546 are related solely to function overload guesses. Swift performs a large number of type checks per guess.

My suggested algorithm would search the entire 168 possible entries in the 2-dimensional Cartesian product space (I haven’t included any shortcuts or optimizations at this point) but since it only checks the intersecting parameters at each search location (it doesn’t check unrelated type constraints), I estimate that it would require just 222 total calls to ConstraintSystem::matchTypes for the entire solution.

Determining performance for an unwritten algorithm involves a lot of guesswork, so it’s difficult to know with any degree of certainty, but it’s possible that my algorithm would perform equal or better for any value of $n$ .

Is this type of improvement coming soon to Swift?

I’d love to be able to say: “I’ve done all the work, surprise! Here it is and it works great!” But that’s simply not possible. Due to the scale of the problem (the constraints system is 10s of thousands of lines and the solver logic alone is a couple thousand lines) this is not a practical problem for an outsider to tackle.

Is the Swift development team working to linearize the constraint system solver for overloaded functions? I don’t think so.

Posts like this: “[swift-dev] A type-checking performance case study”, indicate that the Swift developers believe resolving function overloads in the type checker is inherently exponential. Rather than redesigning the type checker to eliminate exponential complexity, they are redesigning the standard library to try and skirt around the issue.

Either:

I’ve made an embarassing error and I should quietly delete the previous 2 sections from this article like they never happened.

I’m correct and the constraints system solver should be quickly improved so we can stop hiding in fear from the Exponential Boogeyman.

On the plus side: the constraints system solver (theoretically) isn’t an exposed part of the language so a major change to the constraints system solver – if it occurred – could be rolled out on a minor release of Swift, rather than waiting for a major release.

Conclusion

In my usage of Swift, the “expression was too complex to be solved in reasonable time” error is the most common error I see that isn’t a simple programmer error. If you write a lot of large functional-style expressions or mathematics code, you’ll likely see it on a regular basis.

The exponential time complexity in the Swift compiler’s constraints system solver can also lead to significant compilation time overhead. While I don’t have numbers on what percentage of typical Swift compilation times are spent in this part of the compiler, there are certainly degenerate cases where the majority of a long compilation time is spent in the constraints system solver.

The problem can be worked around in our own code but we shouldn’t need to do this. The problem should be fixed in the compiler by improving the constraints system solver to run with a linear time complexity with respect to the size of the system. I’m fairly sure it’s possible and I’ve laid out a way I think it could be done.

Until any change occurs in this area, the continued existence of this error will be a bothersome reminder that the Swift compiler remains a work in progress.

Timers can be a surprisingly tricky tool to use correctly.

Deferred invocations and single fire timers are simple enough to get working but they vary between an unmaintainable anti-pattern that should never be used and a construct highly prone to subtle ordering problems between control and handler contexts.

Join me for a look at bugs and potential maintenance issues involving timers.

NOTE: code in this article will demonstrate single fire timers with the Swift 3 version of the Dispatch API but the same principles are applicable to other periodic timers, other languages and other asynchronous timer APIs to varying extents.

Purpose of a timer

The problems with timers often start before any code is written.

Timers have a conceptual problem: their interface makes them look like their purpose is to delay a function to some time in the future. Technically, delaying a function is what they do but it is never their purpose.

The true purpose of a timer is to perform end-of-lifetime operations for a temporary resource. Session timers delete the session when they elapse. Timeouts close idle connections. User interface timers delete view elements or reset view state. Timers for calendar events move the event from pending to elapsed.

Occasionally, you might see timers that look like a delay without an underlying temporary resource. The worst of these are delays in the hope that the delayed function might be invoked after some precondition occurs. Hoping that independent code will complete within a specific time period is the worst kind of coupling (and is almost always ingoring a notification that could trigger it properly).

But even in this undesirable delay-only scenario, the delay state is itself is a temporary resource. All states should be clearly represented as values in your data – allowing composability, testing and debugging of the state – and this type of state is no exception.

I am stressing this purpose of timers since it leads to the following expectations:

a timer should always be closely tied to an associated temporary resource
changes to either the timer or its associated temporary resource must resolve synchronously with the other (even when they don’t always occur synchronously)

Most problems around timers involve failure to meet one of these expectations.

Deferred invocations

Using libdispatch, the simplest form of timer is DispatchQueue.after. This is a form of “deferred invocation” that simply delays a function but returns no reference and therefore offers no possibility for cancellation.

A basic after invocation might look something like this:

DispatchQueue.global().after(when:DispatchTime.now()+DispatchTimeInterval.seconds(10)){// Some deferred code}

Deferred invocations are sometimes useful for quickly probing and tesing scenarios during debug investigations but they are simply too prone to problems to be safely used in a deployed program.

Let’s look at the most obvious situation where a deferred invocation will cause problems:

classParent{letqueue=DispatchQueue(label:"")vartemporaryChild:Child?=nilfunccreateChild(){queue.sync{// Construct a new, temporary valuetemporaryChild=Child()// Schedule cleanup after a 10 secondslett=DispatchTime.now()+DispatchTimeInterval.seconds(10)DispatchQueue.global().after(when:t){[weakself]inguardlets=selfelse{return}// Delete the value when invokeds.queue.sync{s.temporaryChild=nil}}}}}

When the temporaryChild is created, a deferred invocation is scheduled to remove it after 10.0 seconds but this deferred invocation does not share the same lifetime as the temporaryChild.

It should be easy to see how this goes wrong: call createChild twice and the first deferred invocation will delete the second temporaryChild.

I consider after to be unusable in deployed code due to its potential for causing maintenance problems; you can make it work but the result is highly fragile. Small changes to code outside the immediate scope of the timer can break its behavior. Worse: when it breaks, it might continue to look like it works and might pass your automated testing unless you hit the exact timing pattern required to cause problems.

Don’t use deferred invocations outside of debug investigations.

Cancellable timer

A cancellable timer is not much more difficult than a deferred invocation.

publicextensionDispatchSource{publicclassfunctimer(interval:DispatchTimeInterval,handler:()->Void)->DispatchSourceTimer{letresult=DispatchSource.timer(queue:DispatchQueue.global())result.setEventHandler(handler:handler)result.scheduleOneshot(deadline:DispatchTime.now()+interval)result.resume()returnresult}}

The returned DispatchSourceTimer will automatically cancel the timer if it is released, so we immediately have a much safer design.

classParent{letqueue=DispatchQueue(label:"")vartemporaryChild:(child:Child,timer:DispatchSourceTimer)?=nilfunccreateChild(){queue.sync{// Construct a new childletc=Child()// Schedule deletionlett=DispatchSource.timer(interval:.seconds(10)){[weakself]inguardlets=selfelse{return}// Delete the child when invokeds.queue.sync{s.temporaryChild=nil}}// Tie the child and timer togethertemporaryChild=(c,t)}}}

The lifetime of the timer is tied to the lifetime of the resource that it manipulates and the previous problem is solved.

But we still have a critical flaw in this code.

Ignoring cancelled timers

In all the Parent examples, access to the temporaryChild was protected by using queue.sync as a mutex. However, there’s an important lesson here about mutexes: the mutex alone is not enough to make the code thread safe.

Consider the following order of events:

A child is created using createChild()
10 seconds later, the handler is invoked on the DispatchQueue.global() concurrent queue
The handler starts but does not yet enter s.queue.sync
While that is happening, the createChild() function is called again, entering the queue, creating a new child and new timer and exiting the queue.
The handler from step 3 – which was associated with the old, already deleted child – finally enters s.queue.sync and deletes the new child.

A previous timer has deleted the new child. Oops.

We’re back to the problem where the timer is not correctly tied to the appropriate child. Any scenario where handler control or execution occurs outside the mutex can create a mismatch between the mutex’s version of sequential and the timer’s version of sequential. Since we only care about the mutex’s version of sequential, we need to ignore timer handlers that are not the most recent timer handler applied under the mutex. This involves changing the timer’s construction so that the handler takes a parameter that we can use to distinguish out-of-date timers.

One way this is sometimes done is to pass a reference to the timer itself into the handler function. This requires re-writing the previous DispatchSource.timer function:

publicextensionDispatchSource{// Similar to before but we pass an instance of the timer to the handler functionpublicclassfunctimer(interval:DispatchTimeInterval,handler:(DispatchSource)->Void)->DispatchSourceTimer{letresult=DispatchSource.timer(queue:DispatchQueue.global())// Some minor juggling with the timer instance to avoid creating a retain cycleletres=resultas!DispatchSourceresult.setEventHandler{[weakres]inguardletr=reselse{return}handler(r)}result.scheduleOneshot(deadline:DispatchTime.now()+interval)result.resume()returnresult}}

and then you can use the new timer construction like this:

classParent{letqueue=DispatchQueue(label:"")vartemporaryChild:(child:Child,timer:DispatchSourceTimer)?=nilfunccreateChild(){queue.sync{// Construct a new childletc=Child()// Schedule deletionlett=DispatchSource.timer(interval:.seconds(10)){[weakself](t:DispatchSource)inguardlets=selfelse{return}s.queue.sync{// Verify the identity of the timerguardletchildTimer=s.temporaryChild?.timer,t===(childTimerasAnyObject)else{return}s.temporaryChild=nil}}// Tie the child and timer togethertemporaryChild=(c,t)}}}

Our handler function now verifies it is still the “current” timer and aborts if it isn’t.

A timer with generation count

The code now mostly works but there’s a situation it doesn’t handle: rescheduled timers.

A rescheduled timer is one where we needed to extend the deadline for the timer. An example is an idle timer (e.g. a sleep timer or a timeout timer). For an idle timer, each new activity should reset the timer to its full duration.

The problem with rescheduling is that it sets a new deadline for the timer but the underyling timer instance remains the same. If a handler is in the middle of invocation while we’re changing the deadline, the handler invocation for the old deadline will still succeed since it has the same timer identity.

To ignore cancelled timers and rescheduled timers, we can instead use a “generation” count. A generation count is just an arbitrary Int parameter, passed to the DispatchSource.timer on construction and when rescheduled. This value is then passed through to the handler when invoked. As before with the timer’s identity, we can verify the generation count but it has the added advantage that we can change the value on rescheduling, not just creation.

It’s very flexible and effective but it adds an additional layer of complexity at each point so the code size is almost double that of the original cancellable timer example:

publicextensionDispatchSource{// Similar to before but we pass a user-supplied Int to the handler functionpublicclassfunctimer(seconds:Double,parameter:Int,handler:(parameter:Int)->Void)->DispatchSourceTimer{letresult=DispatchSource.timer(queue:DispatchQueue.global())result.scheduleOneshot(seconds:seconds,parameter:parameter,handler:handler)result.resume()returnresult}}publicextensionDispatchSourceTimer{// An overload of scheduleOneshot that updates the handler function with a new// user-supplied Int when it changes the expiry deadlinepublicfuncscheduleOneshot(interval:DispatchTimeInterval,parameter:Int,handler:(parameter:Int)->Void){suspend()setEventHandler{handler(parameter:parameter)}scheduleOneshot(deadline:DispatchTime.now()+interval)resume()}}classParent{letqueue=DispatchQueue(label:"")vargeneration:Int=0vartemporaryChild:(child:Child,timer:DispatchSourceTimer)?=nilfunccreateChild(){queue.sync{// Construct a new childletc=Child()// Schedule deletionlett=DispatchSource.timer(interval:.seconds(10),parameter:generation){[weakself]pinguardlets=selfelse{return}s.timerHandler(parameter:p)}// Tie the child and timer togethertemporaryChild=(c,t)// Increment the generationgeneration+=1}}funcresetChildTimer(){queue.sync{guardtemporaryChild==nilelse{return}// Reschedule the timerself.temporaryChild?.timer.scheduleOneshot(interval:.seconds(10),parameter:generation){[weakself]pinguardlets=selfelse{return}s.timerHandler(parameter:p)}// Increment the generationgeneration+=1}}// Since we're changing the handler each time, it helps to have a shared// function to create the handlerfunctimerHandler(parameter:Int){queue.sync{guardparameter==generationelse{return}temporaryChild=nil}}}

A single queue, synchronized timer

Our simple handler now contains a lot of code and a significant amount of this exists purely so we can ignore invalid results. When available a better option is to prevent invalid results from occurring at all by ensuring that the timer is scheduled on the same context used as a mutex around the timer and associated temporary resource.

A DispatchSourceTimer offers a way to do this by ensuring that the timer is scheduled on the same queue that we use as a mutex around our data. For this, let’s redo the DispatchSource.timer function again:

publicextensionDispatchSource{// Similar to before but the scheduling queue is passed as a parameterpublicclassfunctimer(interval:DispatchTimeInterval,queue:DispatchQueue,handler:()->Void)->DispatchSourceTimer{// Use the specified queueletresult=DispatchSource.timer(queue:queue)result.setEventHandler(handler:handler)// Unlike previous example, no specialized scheduleOneshot requiredresult.scheduleOneshot(deadline:DispatchTime.now()+interval)result.resume()returnresult}}

and the Parent class can now be dramatically simplified:

classParent{letqueue=DispatchQueue(label:"")vartemporaryChild:(child:Child,timer:DispatchSourceTimer)?=nilfunccreateChild(){queue.sync{lett=DispatchSource.timer(interval:.seconds(10),queue:queue){[weakself]inself?.temporaryChild=nil}temporaryChild=(Child(),t)}}funcresetChildTimer(){queue.sync{temporaryChild?.timer.scheduleOneshot(deadline:DispatchTime.now()+DispatchTimeInterval.seconds(10))}}}

It’s dramatically cleaner and simpler than the previous example, while equally thread safe.

This timer usage pattern isn’t always possible – in these cases, the previous “generation count” approach should be used instead. This includes cases where you might choose to use a different type of mutex around your data (possibly a faster mutex as I discussed in Mutexes and closure capture in Swift). In other APIs, it might not be possible to use a scheduling queue as a sychronous mutex (an example is boost::asio in C++ where the io_service::strand class used to serialize jobs can’t be invoked in a guaranteed synchronous manner).

External requirements

The problem with both the “generation count” and the “single-queue synchronized” patterns for using a timer is that they both have external requirements.

What do I mean by an external requirement? I mean that these design patterns have requirements that are not part of any function parameter. Specifically, both require a mutex around the timer and mutations to its associated temporary resource or they risk falling out of synchronization.

Ideally, we would have an interface that avoids any external requirements or preconditions – if you fulfill the type requirements of the interface, then your usage of the interface is valid.

In narrow scenarios, this is possible. The most straightforward approach is to wrap the value, the timer and the mutex in a single interface that ensures the requirements are met. For example:

publicclassTimerLimited<T>{varpossibleValue:T?lettimer:DispatchSourceTimerletqueue:DispatchQueuepublicinit(value:T,interval:DispatchTimeInterval){self.possibleValue=nilself.queue=DispatchQueue(label:"")self.timer=DispatchSource.timer(queue:queue)self.timer.setEventHandler(handler:{[weakself]inself?.possibleValue=nil})self.timer.scheduleOneshot(deadline:DispatchTime.now()+interval)self.timer.resume()}publicvarvalue:T?{varresult:T?=nilqueue.sync{result=possibleValue}returnresult}publicfuncresetTimer(interval:DispatchTimeInterval){queue.sync{timer.scheduleOneshot(deadline:DispatchTime.now()+interval)}}}

The problem with this is that it limits the actual action that can be performed at the end of the timer: in this case, all it does is sets an Optional to nil. In most cases, that’s simply not useful enough. Changes over time usually require a notification to be broadcast and possibly some kind of refresh or reprocessing operation so that other objects in memory can adjust to the new value. This change propagation might need to occur under the same mutex or under separate mutexes in a way that avoids deadlocks.

While you could make the possibleValue member an OnDelete struct (like I described in Breaking Swift with reference counted structs) and then use the OnDelete handler to perform any kind of action when this occurs, this is just reverting back to behaving like a bare timer. You would have another arbitrary layer of abstraction around the underlying timer but the end result is a timer that triggers a simple handler when it fires.

To handle a series of cascading change propagations, moving in and out of locks while remaining thread safe would require sweeping changes throughout the whole program. In that scenario, there are ways to hide timers within the interface of the larger framework. How that’s done ends up being specific to the change propagation framework.

Without a thread safe change propagation framework, the best option is simply to endure the external requirement on timer usage since it allows you to perform change propagation from your Parent object as appropriate.

Usage

The “generation count” and “single queue synchronized” DispatchSource.timer implementations from this file are part of the swift3-prerelease branch of mattgallagher/CwlUtils. This branch will be merged into master when Swift 3 becomes final.

The CwlDispatch.swift file is fully self-contained so you can just copy the file, if that’s all you need.

Conclusion

There’s a popular design principle which states: “You ain’t gonna need it”, implying that you should focus solely on your current requirements and you shouldn’t worry about future problems if your code works in the present. There’s some value in the principle but when dealing with problems that are difficult to test, a different level of caution and future proofing is required.

Timers have a nasty tendency to look like they’re working but then break when barely related (or even unrelated) code changes slightly. Since automated testing tends to follow a narrow range of timing patterns, it may fail to uncover timing bugs and you can end up with serious issues in your program without any tests failing. It’s best to take a few simple steps to ensure your timers are safe under a range of usage modalities from the outset – even if you don’t think you need cancellation or rescheduling for your timers.

For every timer:

Clearly identify the associated “temporary resource” for every timer and ensure they remain in sync.
All timers should be cancellable and tied in lifetime to any associated temporary resources.
Timer handler invocations from cancelled or rescheduled timers are impossible or have no effect.

You should obey these requirements even when you don’t think you need cancellation or rescheduling.

I showed two different ways that these requirements can be satisfied: a “generation count” pattern and a “single queue synchronized” pattern for timer usage.

The latter is the more syntactically efficient and involves the following steps:

Store the timer and its associated temporary resource together in a compound value.
Use a DispatchQueue as a mutex around the timer and its associated temporary resource
Schedule the timer on the same DispatchQueue

The alternative “generation count” pattern avoided the requirement on DispatchQueue as a mutex and avoided any constraint on the scheduled queue for the timer. However, it still requires some kind of mutex and adds the additional requirement of tracking the generation count. It also tends to be significantly more verbose.

Sadly, both patterns represent an ongoing nuisance since both have an external requirement on a mutex in the surrounding scope – something that is difficult to confirm with a precondition or other check.

Designing thread safe code involving timers in an asynchronous environment without any external requirements would require a more opinionated approach to change management throughout your program. This is definitely a topic I’ll revisit in the future.

Swift’s error handling has a major limitation: it can only be used to pass errors up the stack.

If you want to handle errors across asynchronous boundaries or store value/error results for later processing, then Swift error handling won’t help. The best alternative is a common pattern called a Result type: it stores a value/error “sum” type (either one or the other) and can be used between any two arbitrary execution contexts.

It’s an incredibly simple type but since the handling of value/error results sits a critical position in many types of operation, it provides an interesting look into the capabilities and priorities of a programming language. The Result type is so useful that it was almost included in the Swift standard library and even its rejection reveals an interesting look at the philosophies underpinning Swift’s design.

In this article, I’ll discuss the Result type in Swift as well as common variations in implementation and approach used for this data type. I’ll also look at why the type was rejected for inclusion in the standard library and what effect that rejection is likely to have.

A tagged union of success and failure

I’ve previously talked about how errors are inherently “composite”: they represent a combination of multiple data paths or potential data values that are brought together to produce a single result that reflects the path taken and the state encountered. Within a larger operation, the results from multiple steps are composed to produce the final output.

If you’re familiar with Swift, then you know that Swift’s inbuilt error handling deals with the different paths associated with “success” and “failure” states by decorating functions with the throws keyword which allows them to have two separate exit paths: a return path for the normal value and a throws path for an Error type.

I’m sure you already know what Swift error handling looks like but here’s an example so I can refer back to it later:

// A simple function that returns the time since boot, if it is even// otherwise throws an errorfuncevenTimeValue()throws->UInt64{switchmach_absolute_time(){caselettwheret%2==0:returntdefault:throwTimeError.expectedEvenGotOdd}}enumTimeError:Error{caseexpectedEvenGotOdd}// Calling the function and handling the errordo{print(tryevenTimeValue())}catch{print(error)}

The evenTimeValue result may return a UInt64 or it may throw a TimeError.expectedEvenGotOdd error. This composite result is immediately decomposed in the do block by splitting into two paths, the print(try evenTimeValue()) path and the print(error) path.

The `Result` type

Swift’s error handling works to return value/error results to callers on the stack but it won’t pass value/error results in any other way. Examples of scenarios you might want to handle where Swift’s error handling won’t help include:

results passed between threads
results asynchronously delivered to the current thread
results retained for any duration
results passed into a function rather than out of a function.

Many of these alternate scenarios fall under the banner of “continuation passing style” (a design pattern where, instead of directly returning a result, functions invoke a provided “handler” function and pass the result into it). Work with Swift for long enough and you’re likely to use a continuation passing style eventually. Depending on the nature of your work, you might even need to store errors and other results.

The obvious candidate in these other scenarios is a Result type. Where Swift’s error handling encapsulates the composite nature of error handling by using “value” and “error” return paths from a function, a Result type embeds the “value” or “error” directly into a composite data type:

enumResult<Value>{casesuccess(Value)casefailure(Error)}

Most Result implementations also offer map and flatMap methods, conversion to an optional value/error and conversion to/from a Swift throws function.

That’s about it; it’s a type that is better served by a narrow implementation.

A `Result` example

Imagine our evenTime() function was computationally intensive and we wanted to invoke it outside the main queue. We might then need a callback function to report the result:

// A version of `evenTimeValue` that returns a `Result` instead of throwingfuncevenTimeValue()->Result<UInt64>{switchmach_absolute_time(){caselettwheret%2==0:return.success(t)default:return.failure(TimeError.expectedEvenGotOdd)}}// An async wrapper around `evenTime` that invokes a callback when completefuncasyncEvenTime(callback:(Result<UInt64>)->Void){DispatchQueue.default.async{callback(evenTimeValue())}}// This is equivalent to the do/catch block labelled "Calling the// function and handling the error" in the previous exampleasyncEvenTime{timeResultinswitchtimeResult{case.success(letvalue):print(value)case.failure(leterror):print(error)}}

As in the Swift error handling example, the asyncEvenTime function may generate a UInt64 or it may generate a TimeError.expectedEvenGotOdd error but in this case, the value or error is wrapped in a .success or .failure case of the Result<UInt64> and passed into the callback function. This enum is manually unwrapped and pattern matched by the switch statement, splitting into two paths, the print(value) path and the print(error) path.

Without language integration, the compiler doesn’t force us to handle the timeResult. Otherwise, the effect is very similar: Result handling and Swift error handling process the same data flow in very similar ways.

Using `Result` as a monad

Some people view a Result type as a functional programming construct that should be manipulated using flatMap calls. The flatMap function looks like this:

extensionResult{funcflatMap<U>(_transform:(Value)->Result<U>)->Result<U>{switchself{case.success(letval):returntransform(val)case.failure(lete):return.failure(e)}}}

The intent of flatMap is to avoid unwrapping the Result in your own code. Instead, you let the flatMap unwrap the Result and if it happens to contain a .success, the flatMap function will invoke your code to process the .success value appropriately and pass it to the next stage in the processing pipeline, otherwise the flatMap function will short-circuit passed your processing function and instead pass the existing .failure error along to the next stage in the processing pipeline.

Types manipulated exclusively with flatMap (or functions implemented on top of flatMap) are called “monads”. By never accessing the contents directly and instead interacting through the “black box” of the flatMap function, your program avoids being dependent on the state of the value inside the monad. Since avoiding dependency on state is a key aim of functional programming, monads end up being a key pattern in functional programming.

It’s important to note though that Swift is not a functional programming language and I didn’t useflatMap in the asyncEvenTime example. The Result type was merely used as data transport with any logic applied either before wrapping the Result or after unwrapping at the end.

There are certainly situations where you might choose to use Result as a monad (I show an example in the Comparing Result and Swift error handling section, below) but any such usage is not required. I personally think it’s appropriate to consider unwrapping with a switch statement as a first option and consider more abstract functional operators as a second option, only when they constitute a clear simplification.

Specifying an error parameter

Some implementations of Result use a generic parameter for the error:

enumResult<Value,E:Error>{casesuccess(Value)casefailure(E)}

There are some problems though with strongly typing errors like this in Swift. On the Swift Evolution mailing list, John McCall offers some comments on the subject.

Basically, this approach would be fine if we could define a type as:

letresult:Result<Value,FileError|NetworkError>

where the error is a “structural sum type” (a type that is either FileError or NetworkError) but we can’t do this in Swift at the moment.

Instead, we would need to manually define an enum each time:

enumErrorFromMyFunction:Error{casefile(FileError)casenetwork(NetworkError)}

That might not seem too bad but this then requires we manually wrap and unwrap error types as they occur inside our interface to get them into the correct container enum, since a manually constructed enum can’t be constructed from an unrelated error enum using flatMap or other composing functions.

Frankly, until Swift supports structural sum types (and there is no guarantee that it ever will), this can potentially involve a lot of manual work propagating errors to communicate a small amount of additional type information that the interface user will promptly ignore by treating all errors identically (bail out on any error).

Comparing `Result` and Swift error handling

I’ve shown how you can use a Result for asynchronous callbacks but it’s worth considering how a Result would compare to Swift’s error handling if they were both used in the same “function return” scenario.

Consider a function that invokes the previous evenTimeValue function and adds a previously obtained UInt64 value:

// Using Swift error handling:funcaddToEvenTime(_previous:UInt64)throws->UInt64{returntryprevious+evenTimeValue()}// Using a `Result` return type:funcaddToEvenTime(_previous:UInt64)->Result<UInt64>{returnevenTimeValue().map{previous+$0}}

I’m using map in the Result implementation to avoid unwrapping and rewrapping (map is a flatMap where the output from transform is always wrapped in a .success). Meanwhile, Swift’s error handling doesn’t require handling of wrapped values.

Now, let’s look at how Swift error handling and Result handling compare when chaining three calls to addToEvenTime together:

// Using Swift error handling:funcsumOfThreeEvenTimes()throws->UInt64{returntryaddToEvenTime(addToEvenTime(addToEvenTime(0)))}// Using a `Result` return type:funcsumOfThreeEvenTimes()->Result<UInt64>{returnaddToEvenTime(0).flatMap(addToEvenTime).flatMap(addToEvenTime)}

The comparison between these two approaches provides a good insight into Swift’s design philosophy. The effect of Swift’s error handling over successive throws statements is equivalent to the monadic flatMap over multiple Result generating functions but Swift avoids making abstract mathematical concepts like map and flatMap a required part of the core language and instead makes the code look as though it is a simple, linear sequence of actions.

As a counterpoint, Result is not really much more complicated, despite lacking any language integration. If you find a situation where Swift’s error handling is not practical, then switching to Result instead is relatively simple. If you use asynchronous workflows and other data-flow scenarios, then you might find Result is pratically required.

In the standard library

Multiple people have suggested, via the Swift Evolution mailing list, that the Swift standard library should incorporate Result. At one point in time, the Swift development team themselves suggested a Result type might be added to the standard library to handle cases that Swift’s built-in error handling couldn’t cover (see ErrorHandling.rst in the docs directory of the Swift repository).

John McCall explains the Swift standard library team’s verdict as follows:

We considered it, had some specifics worked out, and then decided to put it on hold. Part of our reasoning was that it seemed more like an implementation detail of the async / CPS-conversion features we’d like to provide than an independently valuable feature, given that we don’t want to encourage people to write library interfaces using functional-style error handling instead of throws.

Ultimately, while a Result type is useful in Swift, the Swift team would rather avoid directly endorsing alternatives to throws approach since it is not their first preference and they ultimately hope to extend the throws style handling to other scenarios.

The “async / CPS-conversion features” hinted at are the potential future Swift “Concurrency” features that I’ve mentioned previously. Sadly though, no features will be delivered in this area until after Swift 4.

Implications of no `Result` type in the standard library

What are the implications of omitting a type from a standard library?

If a commonly used type is neither part of the standard library nor sourced from a single common repository, this results in two common problems:

Bloated code size due to replication
Interoperability between multiple independent implementations

Since a Result type is mostly just an enum definition, it may add some runtime type information to the executable but it won’t add to the actual code size. Methods that operate on the Result type do have measurable size but the most complicated extension you’re likely to need, flatMap, is just five lines. Even with a broad range of helper functions, a Result implementation should be less than 100 lines. It certainly isn’t a big code overhead on your project.

Interoperability between multiple independent implementations is a bigger concern but again, unlikely to become a major headache. The biggest reason for this is that any two implementations will always have a path through which they can be converted: Swift’s throws error handling.

Provided any two implementations contain the following two functions:

extensionResult{// Construct a `Result` from a Swift `throws` error handling functionpublicinit(_capturing:()throws->Value){do{self=.success(trycapturing())}catch{self=.failure(error)}}// Convert the `Result` back to typical Swift `throws` error handlingpublicfuncunwrap()throws->Value{switchself{case.success(letv):returnvcase.failure(lete):throwe}}}

then any two different definitions of Result from different modules could be converted as follows:

letfirstResult:Module1.Result<Int>=someModule1Function()letsecondResult=Module2.Result<Int>{firstResult.unwrap()}

Conclusion and usage

A Result implementation can be found in the CwlResult.swift file of the CwlUtils repository.

The CwlResult.swift file has no dependencies and you can just use the file alone, if you wish. Of course, the implementation of a Result type is so mind numbingly simple that you might not even need to use someone else’s code – it’s just a two case enum, aferall.

Swift’s error handling doesn’t cover all error passing scenarios. Disappointing but not a disaster. If you need to handle value/error results in your code outside of passing results to the caller, there’s very little friction involved in switching to Result handling instead – they can both end up producing a very similar outcome.

I would absolutely prefer to see Swift’s error handling extended so it covers a wider range of common scenarios but I’ve been using a Result type to handle error passing in Swift since Swift’s first public betas and I’m not worried about the prospect of continuing to do so.

Aside 1: Why then does the standard library include `Optional`?

It’s interesting to consider that where Result is rejected from the standard library, in favor of special language features and syntax, Swift contains a very similar type, Optional which looks like this:

enumOptional<Wrapped>{casesome(Wrapped)casenone}

Both Optional and Result can be used to encapsulte the result of a function that may produce a result or fail. Both types can be processed via map and flatMap to handle the success case while short-circuiting the failure case.

In many ways, a Result is a more powerful Optional. In the “not a value” case, a Result allows metadata about why the state occurred.

Interestingly, despite being less powerful, an Optional is more useful because it is simpler. An Optional represents a basic toggle so it is well suited to representing basic boolean state (connected/disconnected, constructed/deleted, enabled/disabled, available/unavailable). Meanwhile Result is really constrained – by virtue of requiring Error metadata in failure cases – to being the output of an actual data flow.

Aside 2: `Either` types

Another type, similar to Result<Value> that was suggested for the Swift standard library and ultimately rejected was a biased Either<Left, Right> type that looks a little like this:

enumEither<Left,Right>{caseleft(Left)caseright(Right)}

If you consider a fully typed Result<Value, ErrorType> like this:

enumResult<Value,ErrorType>{casesuccess(Value)casefailure(ErrorType)}

then a left-biased Either type can be considered a more general form of the same type.

The discussion on this topic revealed that the proposers of an Either type were most interested in capturing the “shape” of different potential abstract operations. I understand the intent but attempting to capture the “shape” of operations is difficult enough in single parameter, strictly functional languages like Haskell but in multi-parameter imperative languages like Swift, the number of possible operations grows with each additional parameter and once side effects are involved, it becomes immediately unmanageable.

It’s usually just easier to unwrap the enum when you need to operate on its contents, rather than relying on a large library of abstract and inefficient transformations.

Inspired by the “Survey” section of the ErrorHandlingRationale document from the Swift repository, I wanted to take a concrete look at the languages studied by the Swift developers when designing Swift’s approach to error handling. By comparing these to Swift, I’ll try to better understand what balance the Swift developers sought on the topics of feature complexity, syntactic efficiency, abstraction, information signalling and safety.

Learning about languages via error handling

The “Survey” section of the ErrorHandlingRationale briefly discusses error propagation in C, C++, Objective-C, Java, C#, Haskell, Rust, Go and common scripting languages. Since I personally maintain professional projects in 5 of those languages (and I’ve written projects in most of the rest) the topic makes me reflect on the approaches I use for error handling in each language; whether I’m writing the best code I can, given the constraints and expected idioms of each language.

So I wanted to take a concrete look at error handling in the languages mentioned and see if I could learn more about those languages or Swift by directly comparing error handling approaches. Following the handling styles discussed in the previous article, I will consider the following two cases:

Passes a value or recoverable error result from a function to its caller
Invokes a callback function passing that value or error as an argument

Unlike the ErrorHandlingRationale document:

I will consider only common use recoverable errors
I won’t be considering scope cleanup issues
I’m not going to look at Go or any scripting languages since I don’t use them often enough to be insightful

Swift

I’m going to start with a reference implementation in Swift. Using a conversion from throws to a Result type, as described in the previous article, the example looks like this:

// A function with a synchronous error result (Swift error handling)funcevenTimeValue()throws->UInt64{switchmach_absolute_time(){caselettwheret%2==0:returntdefault:throwTimeError.expectedEvenGotOdd}}enumTimeError:Error{caseexpectedEvenGotOdd}// A continuation passing style wrapper (Swift error handling to Result handling)funccpsEvenTime(callback:(Result<UInt64>)->Void){callback(Result{tryevenTimeValue()})}// A function call and final error handlingcpsEvenTime{timeResultinswitchtimeResult{case.success(letvalue):print(value)case.failure(leterror):print(error)}}

Scorecard: 16 lines (excluding blank lines and comments), 2 switch statements, 2 closures, 7 function calls. Compiler enforces the try and while it is possible to ignore the timeResult, it is not possible to use the timeResult without pattern match unwrapping.

C

Typical implementations use an int return to indicate “success” (often zero) or “failure” (often a non-zero error code) and return their results through an out-pointer:

// Synchronous error result (int error code, result by out-pointer)intevenTimeValue(unsignedlonglong*result){unsignedlonglongt=mach_absolute_time();if(t%2==0){*result=t;return0;}else{return1;}}// Continuation passing style wrapper (code and value passed into function pointer)voidcpsEvenTime(void(*callback)(int,unsignedlonglong)){unsignedlonglongt;intcode=evenTimeValue(&t);callback(code,t);}// C has no anonymous functions (lambdas/closures) so the handler is a declared functionvoidhandleResult(intcode,unsignedlonglongt){if(code==0){printf("%lld\n",t);}else{printf("failed\n");}}// Invoking and handlingintmain(intargc,char**argv){cpsEvenTime(handleResult);return0;}

Scorecard: 25 lines (excluding blank lines and comments), 2 if/else statements, 1 function pointer, 1 out-pointer, 6 function calls. It is possible to omit any check of the code and get an uninitialized t value.

What does this tell us about C?

C aims to be simple by offering a minimal set of language features and relying on the programmer to understand problems like undefined behavior and avoid them. Good programming in C requires a simple and methodical approach. It’s possible to write abstractions in C but generally, programmers work in C because they want a raw pointer and the freedom to process things themselves, one byte at a time if necessary.

The reality is that C could get a little closer to the spirit of the Swift implementation through its union type – which could encapsulate a value or error disjunction in a single data type but a manually tagged union is clumsy and adds layers of complexity with no additional safety. Ultimately, layers of abstraction like this conflict with a typical C design philosophy and are typically avoided. There’s a whole world of preprocessor hacks that attempt to make different design patterns like this manageable in C but that’s really just one leaky abstraction on top of another.

What does this tell us about Swift?

Like C, Swift aspires to be simple, but Swift and C disagree on what simple is. Swift aims for simplicity by offering clear tools to solve common problems and minimizing the possibility of mistakes.

Objective-C

In principal, Objective-C inherits C’s error handling, which is to say that it doesn’t have any language supported error handling. By convention though, Objective-C uses a slightly different style, returning an object pointer on success and nil on failure, plus an optional NSError ** out parameter with details.

// It wouldn't be Objective-C without some boilerplate@interfaceTimeSource:NSObject@end// And some file-scoped definitionsNSString*constTimeSourceErrorDomain=@"TimeSourceError";NSIntegerTimeSourceExpectedEvenGotOdd=0;@implementationTimeSource// Synchronous error result (nil value means error, errorOut may be NULL)+(NSNumber*)evenTimeValueWithError:(NSError**)errorOut{uint64_tt=mach_absolute_time();if(t%2==0){return@(t);}else{if(errorOut){*errorOut=[NSErrorerrorWithDomain:TimeSourceErrorDomaincode:TimeSourceExpectedEvenGotOdduserInfo:nil];}returnnil;}}// Continuation passing style wrapper (code and value passed into block parameter)+(void)cpsEvenTime:(void(^)(NSNumber*,NSError*))callback{NSError*e;NSNumber*t=[selfevenTimeValueWithError:&e];callback(t,e);}@end// Invoking and handling the Resultintmain(intargc,char**argv){[TimeSourcecpsEvenTime:^(NSNumber*t,NSError*e){if(t){printf("%s\n",[[tdescription]UTF8String]);}else{printf("%s\n",[[elocalizedDescription]UTF8String]);}}];return0;}

Scorecard: 32 lines (excluding blank lines and comments and unwrapping ), 2 if/else statements and an if statement, 1 closure, 1 out-pointer, 4 function calls, 7 method invocations. It is possible to omit any check of the t value and get Objective-C’s no-op when messaging nil.

I’m fond of Objective-C but this example emphasises at lot of its worst aspects. It’s verbose, full of structural boilerplate, over-reliant on heap allocations and while the “nil messaging is no-op” pattern (inherent in the NSNumber * result) reduce crashes in Objective-C versus C, it doesn’t guarantee correct behavior and may make problems harder to track down.

What does this tell us about Objective-C?

Objective-C tries to handle most problems as object-oriented design problems but when that doesn’t work, it falls back to plain C. Composite value/error result handling falls outside basic object-oriented design so Objective-C offers little over C except some slightly more consistent standard patterns. However, some of those standard patterns in Objective-C involve some running around – this is a prime example.

What does this tell us about Swift?

Objective-C’s standard approach for error handling was objectively mediocre. It was never obviously a focus of the langauge. While Swift inherits many ideas from Objective-C, error handling is not among them.

When interoperating with Objective-C, all of this work may still occur but it’s such straightforward boilerplate that Swift can do it automatically.

C++

This is tricky because there’s no universally used approach for error handling in C++.

Many of C++’s features are controversial. I’ve previously discussed why many C++ programmers disable exceptions. Other programmers use standard libraries that don’t include shared pointers or compilers that don’t support lamdas. Other programmers avoid templates for compile-time performance reasons or historical bugs in the template implementations of some compilers. Even where templates are used, many programmers avoid large common libraries like “boost” due to poor compilation performance and lack of modularity. The result is that there are hundreds of little sub-domains within C++ built around a specific subset of features, written as though all the other features of the language don’t exist. Because they might not.

For handling disjunctions over value or error results, the following all exist:

Typical C-style error handling (result code plus out-pointer).
C++ exceptions
Tagged unions using boost::variant (or std::variant in C++17)

I’m going to use the latter two (since it’s what I do in most of my larger C++ projects):

#include <boost/variant.hpp>// Synchronous error result (throws exception on error)unsignedlonglongevenTimeValue(){uint64_tmach_absolute_time(void);unsignedlonglongt=mach_absolute_time();if(t%2==0){returnt;}else{throwstd::runtime_error("Expected even time, got odd");}}// Continuation passing style wrapper (error string or value in a tagged union)voidcpsEvenTime(conststd::function<void(boost::variant<std::string,unsignedlonglong>)>&callback){try{callback(evenTimeValue());}catch(std::exception&e){callback(e.what());}}// Invoking and handling the Resultintmain(intargc,char**argv){cpsEvenTime([](boost::variant<std::string,unsignedlonglong>result){if(boost::get<unsignedlonglong>(&result)){printf("%lld\n",boost::get<unsignedlonglong>(result));}else{printf("%s\n",boost::get<std::string>(result).c_str());}});return0;}

Scorecard: 27 lines (ignoring blank lines and comments), 2 if/else statements, a try/catch, 1 potential throw, 1 lambda, 4 function calls, 13 function calls. It is possible to omit the try/catch and have the exception propagate to a surrounding scope. You could also accidentally ignore the result or misapply the boost::get and trigger a runtime abort.

What does this tell us about C++?

C++ incorporates many distinct philosophies. C-style programming is a common denominator for most – but not all. There’s usually a few tools to solve any problem but there’s no guarantee any of them will play well with your own style of C++.

C++ is typesafe and usually avoids pointers of any kind but that doesn’t mean it’s completely robust, since you can forget to catch an exception or use the wrong boost::get and trigger another exception or unwanted abort– there are lots of potential ways to trigger runtime aborts (it’s difficult to avoid partial functions).

Additionally, despite a reputation for being a kitchen sink of every language feature, I needed to use boost::variant in the example, rather than a standard library feature or an actual inbuilt langauge feature to get a basic tagged union. C++17 will finally add this to the standard library but it’s still not as good as adding it to the language itself (it will still be prone to runtime, rather than compile-time type checking).

What does this tell us about Swift?

Given that all of the Swift compiler developers are themselves C++ developers, it is interesting that Swift has turned out almost, but not quite, entirely unlike C++. Swift has opted to avoid C++-style exceptions. Swift’s error handling offers similar control flow in error handling scenarios but you can’t “forget” to handle Swift errors in the same way and they don’t have the controversial aspects of stack unwinding that C++ exceptions have. Additional sugar like defer in Swift also help to avoid the need for obsessive RAII design in Swift that dogs C++ exception-safe code.

Haskell

Errors in Haskell are typically represented as an Either, Maybe or an IO monad type. In Haskell, the flatMap operation from Swift is called bind and it is typically invoked using the >>= operator (or implicitly handled using do notation).

Since Haskell is less likely to be well understood by readers of Cocoa with Love, I’ve thoroughly commented this code so if you can’t read the Haskell, you can at least read about what is happening here.

importData.Time.Clock.POSIXimportControl.Exception-- ** A function with an IO monad wrapped Integer result **-- Please excuse everything to the left of `getPOSIXTime` here; it's just to-- generate a time value as an integer.-- The actual test for even/odd occurs in the `evenOrFail` helper functionevenTimeValue=round.(*1000000)<$>getPOSIXTime>>=evenOrFail-- ** Pattern matching helper function with an IO monad wrapped Integer result **-- Tests t for odd or even, triggering an error if it is odd-- This function is intended to be used with a `>>=` operator since it-- takes a bare value and returns an `IO` monad wrapped valueevenOrFailt|modt2==0=returnt|otherwise=fail"Expected even"-- ** Continuation passing style wrapper around `evenTimeValue` (empty IO monad result) **-- Passing the result from `evenTimeValue` into the callback `c` is effortless-- since errors from functions and errors as parameters have the same monad-- representation in HaskellcpsEvenTimec=cevenTimeValue-- ** Callback function (empty IO monad result) **-- Processing the value occurs with another `>>=` operator. Processing-- the error requires a `catch` and redirect to the `onError` handlercallbackt=(t>>=print)`catch`onError-- ** Error handling helper function (empty IO monad result) **-- Separate error handling functiononError::IOError->IO()onErrore=print("Error: "++show(e))-- ** Invoke everything at the top level (has an implicit empty IO monad result) **main=cpsEvenTimecallback

Scorecard: 11 lines (ignoring blank lines and comments), 1 pattern match, 2 bind operators, 1 catch, 6 different symbolic operators. It is possible to accidentally omit the catch and have the error bubble, like a fatal exception, all the way to the top level.

What does this tell us about Haskell?

Ignoring the bulk added by the comments, this code is quite compact. It’s not exactly simpler or doing less than the Swift example, rather, Haskell avoids braces and makes heavy use of symbolic operators so it crams a lot of work into a small space.

Haskell’s error handling is fundamentally anchored in monads. Often these are “one-way” monads (where you can never pattern match and unwrap, you can only bind or catch to process the possible cases).

Haskell has many good aspects when it comes to code correctness and error handling:

memory safe programming language
fewer partial functions than common imperative languages so you’re less unlikely to see exceptions/aborts
no force unwrapping or other cheating

However, Haskell’s approach to error handling does have some limitations. Specifically, monadic handling makes it very easy to “bind” (>>=) to get the “success” result and totally ignore what happens in an error case. Monadic handling encourages the ignoring of errors. If this code had omitted the catch handling, the IO monad would have propagated all the way to the output of the main function.

There’s also an almost total lack of signalling. Unless you look for the bind operator, do notation or the catch, return or fail functions, it’s difficult to know where IO or other monads are involved. Haskell’s pervasive type inferencing is often a hindrance here: only one of these functions is required to actually specify a type signature.

What does this tell us about Swift?

Swift’s do and Result handling have a lot in common with Haskell’s do and monads. The difference is that Swift avoids wrapping your data types and therefore allows you to manage the data flow without any wrappers or abstractions.

While downplaying abstractions, Swift plays up control flow and signalling of potential actions with keywords like try for signalling the existince of error handling in a statement where equivalent handling in Haskell may be subtler to detect.

Java

Java includes the closest parallel to Swift’s inbuilt error handling with its “checked exceptions”. Like Swift’s throws, the possibility of throwing is part of a function’s signature and may not be ignored by the caller.

importjava.io.IOException;importjava.util.function.BiConsumer;publicclassTimeSource{// A function with declared "checked" exceptionpublicstaticlongevenTimeValue()throwsIOException{longt=System.currentTimeMillis();if(t%2==0){returnt;}else{thrownewIOException("Expected even, got odd");}}// A continuation passing style wrapper (passing value and error arguments)publicstaticvoidcpsEvenTime(BiConsumer<IOException,Long>callback){try{callback.accept(null,newLong(evenTimeValue()));}catch(IOExceptione){callback.accept(e,null);}}// Invoking and handling the resultpublicstaticvoidmain(String[]args){TimeSource.cpsEvenTime((error,time)->{if(error!=null){System.out.println(error.getMessage());}else{System.out.println(time);}});}}

Scorecard: 28 lines (ignoring blank lines and comments), 2 if/else statements, a try/catch, 1 potential throw, 1 lambda, 10 methods. It is possible to forget to check the error for null and use an invalid value.

What does this tell us about Java?

In this example, Java’s use of throws, throw and catch closely resemble Swift’s matching keywords. However, while methods that throw “checked exceptions” are always marked, sites that use these functions are not marked – if the IOException had been generated inside a function, there would not need to be any mention of that in this source code.

Negatives of the approach for common error handling include:

In Java, the exception types thrown must be fully enumerated – a difference that is sometimes an advantage and sometimes a burden. I’ll look at the problems this causes in the C# section, below.
Java’s Exception class captures a full stack trace and other information, which, along with allocation overhead and exception hander overhead, makes it 60 times slower than returning an error code
The name “exception” leads to reasonable questions about whether it should be regularly used or used only in exceptional circumstances.

Ignoring poor performance, the biggest problem affecting Java’s checked exceptions is the existence of “unchecked exceptions”. While checked exceptions are usable for common errors, unchecked exceptions are absolutely for exceptional circumstances (programmer errors which should lead to an abort). Despite the fact that they should be a rarer choice, unchecked exceptions are actually easier to use, since they don’t need to be declared.

Ultimately, declaration issues, performance issues and confusion with programmer errors lead to a signficantly compromised language feature.

What does this tell us about Swift?

While Swift has copied some of the syntactic elements of checked exceptions, Swift is very careful to avoid calling its errors “exceptions” since they are different in important ways. Most significantly, Swift’s errors are as performant as returning an error code and have no overlap with technology intended for fatal errors.

Good performance and clear separation of concerns are important goals of Swift.

C#

I doubt it would be new information to anyone here but C#’s syntax is very similar to Java to the point where you can’t always immediately tell one from the other.

usingSystem;namespaceTimeSourceApplication{classTimeSource{// A function that might throw an exceptionpublicstaticlongevenTimeValue(){longt=System.DateTime.Now.Millisecond;if(t%2==0){returnt;}else{thrownewSystem.Exception("Expected even, got odd.");}}// A continuation passing style wrapper (passing value and error arguments)publicstaticvoidcpsEvenTime(Action<System.Exception,long>callback){try{callback(null,evenTimeValue());}catch(System.Exceptione){callback(e,0);}}// Invoking and handling the resultpublicstaticvoidMain(string[]args){TimeSource.cpsEvenTime((error,time)=>{if(error!=null){System.Console.WriteLine(error.Message);}else{System.Console.WriteLine(time);}});}}}

Scorecard: 29 lines (ignoring blank lines and comments), 2 if/else statements, a try/catch, 1 potential throw, 1 lambda, 9 methods (including a getter). It is possible to forget to check the error for null and use an invalid value.

What does this tell us about C#?

On the topic of error handling, C# explicitly rejected Java’s checked exceptions in favor of unchecked exceptions.

Anders Hejlsberg, creator of C#, spoke about why checked exceptions were not included in C# in an interview in 2003:

Adding a new exception to a throws clause in a new version breaks client code. It’s like adding a method to an interface. After you publish an interface, it is for all practical purposes immutable, because any implementation of it might have the methods that you want to add in the next version. So you’ve got to create a new interface instead. Similarly with exceptions, you would either have to create a whole new method called foo2 that throws more exceptions, or you would have to catch exception D in the new foo, and transform the D into an A, B, or C.

and he further states:

The trouble begins when you start building big systems where you’re talking to four or five different subsystems. Each subsystem throws four to ten exceptions. Now, each time you walk up the ladder of aggregation, you have this exponential hierarchy below you of exceptions you have to deal with. You end up having to declare 40 exceptions that you might throw. And once you aggregate that with another subsystem you’ve got 80 exceptions in your throws clause. It just balloons out of control.

The problem, as he sees it, is Java requires you explicitly enumerate every kind of exception that can be thrown. This is in contrast to Swift which simply says: “an error may be thrown” and makes matching error types a runtime concern.

Swift addresses Anders Hejlsberg’s concerns by similarly removing the need to declare exception types but Swift keeps the explicit declaration of throws.

What does this tell us about Swift?

The interesting point here then is that Swift has taken an approach closer to checked exceptions than C#’s unchecked exceptions. Swift wants it to be easier to accurately reason about code and to keep propagations from accidentally escaping.

Rust

Rust and Swift are both products of modern programming language design and include similar aims (no null pointers, no undefined behavior and no memory unsafety outside of clearly identified “unsafe”). Each has borrowed ideas from the same sources and each has borrowed from the other. The two are very different when it comes to level of programmer involvement in managing ideas like reference ownership but even on this level, there’s evidence that Swift may move a little closer to Rust in this regard in the future.

// A function with a synchronous error (Result return type)fneven_time_value()->Result<u64,TimeSourceError>{letelapsed=std::time::UNIX_EPOCH.elapsed().map_err(TimeSourceError::SystemTimeError);returnelapsed.and_then(|value|matchvalue.as_secs(){tift%2==0=>Ok(t),_=>Err(TimeSourceError::ExpectedEvenGotOdd),});}// An error type that includes cases for our generated error and a possible underlying// system error#[derive(Debug)]enumTimeSourceError{ExpectedEvenGotOdd,SystemTimeError(std::time::SystemTimeError)}// A continuation passing style wrapper (Result passed as argument)fncps_even_time_value<F>(callback:F)whereF:Fn(Result<u64,TimeSourceError>){callback(even_time_value());}// Invoking and handling the Resultfnmain(){cps_even_time_value(|r|matchr{Ok(v)=>println!("{:?}",v),Err(e)=>println!("{:?}",e)});}

Scorecard: 25 lines (excluding blank lines and comments), 2 match statements, 2 closures, 2 macros, 5 function calls. Compiler enforces the rule that Resultmust be used in any scope and if a match statement is used to unwrap, forces handling of both Ok and Err cases.

What does this tell us about Rust?

Rust prides itself on being precise and opinionated. Precise reference ownership semantics, precise error typing, even debug printing is unavailable unless specifically requested (in this case, with the derive(Debug) attribute). This is also the only example where I’ve altered the naming scheme to match the language since evenTimeValue gave a compiler warning: “function evenTimeValue should have a snake case name such as even_time_value”.

In accordance with the focus on being precise, Rust includes no automatic error propagation. Instead, you are merely forbidden (via the compiler attribute #[must_use]) from completely ignoring the Result. Beyond that, the behavior or propagation is manual.

Rust’s error handling is complicated by the fact that its Result type doesn’t function as a simple monad and requires a map_err followed by and_then (instead of just a flatMap) to convert the error before transforming the value. This can be seen on the let elapsed line where UNIX_EPOCH, which returns a std::time::SystemTimeError, must be converted to a TimeSourceError before we can transform and return it from the function.

What does this tell us about Swift?

Swift is less focussed on precision and control than Rust. Swift is happy to have errors fall under a single broad protocol (rather than precisely enumerating all possible sub-error types) to allow a smoother experience. Swift is also happy to include automatic behaviors like propagation, provided there is clear signalling of possible behaviors (e.g. the try keyword and a surrounding do scope or throws function).

Conclusion

Swift’s error handling is probably my favorite feature in the language because of the improvements it offers over alternatives while remaining familiar, simple to use and easy to reason about.

However, I’m not exclusively a Swift programmer. I continue to program in other languages so I feel like there are also lessons from this investigation that I could apply back to some of the other languages I use.

In particular, I feel like I should experiment with Result types in C++, Java and C# to avoid problems with exceptions or the problems with using separate error codes and values. The generics/templates and lambdas in these languages are certainly capable of cleanly expressing this pattern. I do already use boost::variant in my C++ code but it’s prone to accidental boost::get misuse and lacks transformations for composing errors – a dedicated Result type would be better.

For the other languages in this post, while I prefer Swift’s error handling, you could argue that the error handling approaches match the philosophy of the respective languages. Rust’s Result type is safe, explicit and avoids compiler inserted sugar – in accordance with that language’s focus on safety, precision and control. Haskell’s error handling – and the possibilites afforded by monads and higher kinded types – is practically a religion for its adherents. Even C’s approach (language does nothing, language user coordinates everything) could be considered appropriate for the language.

Swift 3 was officially released this week. That’s exciting, right?

Maybe it will be, eventually. In the short term, it means everything old is broken. I’ve gone back through all Cocoa with Love Swift articles and updated them all to Swift 3 and iOS 10/macOS 10.12 – both the article text and the code on github.

Hooray for tedious housekeeping! This is what it’s like to be a code owner.

Introduction

This article is my response to the question of the week: “how much work is involved in migrating to Swift 3?”

I don’t precisely know how long I’ve spent on Cocoa with Love updates. Over successive Swift 3 betas, I’ve spent about a week updating the code in my own Swift projects (including Cocoa with Love and a half dozen other projects). It’s clear that the actual answer for a given project is that the time taken will depend on which APIs you’re using.

AppKit, UIKit and Foundation APIs that have simply been renamed are pervasive (30% of all lines changed is entirely possible) but Xcode will largely auto-convert to them to the new names so the amount of work involved is low.

Renaming your own functions to match the new Swift naming conventions is a harder problem. It’s not mandatory at all but there’s potentially a lot of work involved here if, like me, you find traditional Objective-C naming is a difficult habit to break.

There have also been a few major APIs (Dispatch, pointers, strings) where the changes are more complex than simply a renaming. This may involve redesigning and can take a while. It really depends how you previously used these APIs as to whether or not this ends up a large part of your time.

Then, of course, Swift 3 is hitting at the same time as iOS 10 and macOS Sierra. Moving to Xcode 8 will usually involve solving both compatibility problems at once.

The rest of this article is a look at the main changes I made to each article to bring Cocoa with Love up to date. If you’d prefer to look at the code diffs, you can look at the “Cwl*” projects on my github page and compare the “master” branches (which are now merged with the “swift3-prelease” branches) with the “swift2.3” branches (I won’t be updating the “swift2.3” branches but they’re there if you need them).

A new era for Cocoa with Love

Off to an easy start: nothing to update in this one (it was really just a meta post and contained no code).

Partial functions in Swift, Part 1: Avoidance

Some minor changes to the article but the biggest changes were voluntary so that the function parameters would be more aligned with Swift API Design Guidelines: Argument Labels.

As an example of this, I changed the following function:

funcdivideFiveBy(x:NonZeroInt)->Int

to the more Swift 3 styled:

funcdivideFive(byx:NonZeroInt)->Int

in accordance with the idea that the preposition “by” should be used as the parameter name, rather than part of the function name (and, of course, the Swift 3 change that first parameter names are included, by default).

There were a few minor changes to collection indexes due to SE-0065 A New Model for Collections and Indices but nothing major.

Sadly, despite the collection index changes Swift introduced, there remains a failure to check that String.CharacterView.Index is advanced using the correct String.CharacterView, so the totally invalid:

letcharacterView1="👿👿".charactersletinvalidIndex="Unrelated string".characters.index(after:characterView1.startIndex)print(characterView1[invalidIndex])

still runs without a precondition failure and still produces invalid UTF garbage.

Partial functions in Swift, Part 2: Catching precondition failures

Since this article involved a lot of C-style pointer manipulation, it was significantly affected by:

The need to pass unsafe pointers through withMemoryRebound(to:capacity:) to recast them had a huge impact on this code. Initially, the code was significantly worse, until I created specialized pointer conversion extension functions on the underlying data structures. The result is much nicer than the original Swift 2 code but only because I was kicked into refactoring by an uncomfortable syntax change. I’d prefer to see a helper function that combines withUnsafeMutablePointer and withMemoryRebound(to:capacity:) into a single step, in future.

Use it or lose it: why safe C is sometimes unsafe Swift

As a follow up to the previous article, it’s unsurprising that the biggest changes here involved the same SE-0107 UnsafeRawPointer API changes.

Tracking tasks with stack traces in Swift

In addition to the Swift renaming and pointer changes that every article so far has required, this one also saw changes from SE-0016 Add initializers to Int and UInt to convert from UnsafePointer and UnsafeMutablePointer.

The biggest difficulty here was that the automatic Xcode Swift 2 to 3 converter doesn’t seem to do a good job with String.fromCString to String?(validatingUTF8:) conversions, especially when the parameter in Swift 2 was a potentially nilUnsafePointer<CChar> which, in Swift 3, is an UnsafePointer<CChar>? and now needs to be unwrapped before passing as a parameter. It’s not a difficult change to make in code but when you don’t even know why the conversion tool has made a mess out of a particular line of code (because you can’t remember what the line of code was supposed to do) then these things can stop you in your tracks for a few minutes.

Gathering system information in Swift with sysctl

More renaming, more pointer shenanigans. No surprises there.

The case names for enums are all lowercase now. It’s a change that I fully understand but it’s amazing how automatic things become in programming with reuse. I still keep writing new code with uppercase case names and need to go back and change them.

Errors: unexpected, composite, non-pure, external.

I included the following code sample in the Swift 2 version of this article:

funcsecondsSince(previous:NSDate)->Double{letcurrent=NSDate()letresult=current.timeIntervalSinceDate(previous)returnresult}

This is replaced by:

funcsecondsSince(previous:Date)->Double{letcurrent=Date()letresult=current.timeIntervalSince(date:previous)returnresult}

I want to highlight this example because it points out something that is non-obvious: the “Great Swift Renaming” is not just a renaming. Date is an example of a type affected by SE-0069 Mutability and Foundation Value Types. In reality, the whole type has changed. Where NSDate was an Objective-C wrapper around CFDate, the new type is a Swift wrapper, obeying slightly different semantics. You can see the new implementation in the Date.swift file in the Swift repository.

Breaking Swift with reference counted structs

So, this one’s a complete write-off.

I’ve added the following note to the top of this article:

OUT OF DATE AS OF SWIFT 3: While it is still possible to have a heap allocated, reference counted struct with cleanup behaviors in Swift 3, it is no longer possible to capture a reference to self in a struct method as describedS in this article. The problems detailed in section 2 in this article have been fixed in Swift 3 and the remainder of this article is merely a historical artefact. Don’t worry: you’re better off as a result.

Indent with tabs or spaces? I wish I didn’t need to know.

This article contained just 3 lines of Swift and didn’t require any updates.

Presenting unanticipated errors to users

Lots of Cocoa renaming. Roughly half of all lines involved at least one change as part of this renaming. However, it was quick and easy: I barely remember doing it.

Comparing Swift to C++ for parsing

The most widespread changes were voluntary changes to bring function naming into line with the Swift API guidelines. Most prominently, I changed:

funcmatchString(_string:String)

to:

funcmatch(string:String)

There were also plenty of changes from uppercase case names to lowercase case names.

As a file with heavy use of UnicodeScalar there were some impacts from SE-0130 Replace repeating Character and UnicodeScalar forms of String.init.

The change that took the most time was updating the parser itself. Swift 3 adds new parsable elements to the mangled name grammar (including raw pointers and nested generics) and numerous items have changed name or representation (protocol <> becomes Any.Type, protocol<A, B> becomes A & B, ErrorProtocol becomes Error).

Random number generators in Swift

This article required a number of updates but for quite different reasons to the other articles so far.

Swift 3 removed jrand48 so I removed the implementation involving it entirely from the article. Not a big loss but I think it’s strange that Swift is choosing to wholly remove functions, not merely deprecate them.

Second: performance numbers changed across the board so I updated the table in the article. Some tests ran a couple percent faster in Swift 3 compared to Swift 2.3 (although I’m not convinced that the change in /dev/random or arc4random performance had anything to do with Swift) and some tests definitely ran slower.

	Swift 2.3 (Seconds)	Swift 3 (Seconds)
`DevRandom`	123.50	113.250
`Arc4Random`	4.651	4.359
`Lfsr258`	0.872	0.717
`MT19937_64` (before my changes)	0.643	0.631
`MersenneTwister` (and MT19937_64 after changes)	0.517	0.533
`Lfsr176`	0.511	0.498
`xoroshiro128plus`	0.352	0.367
`Xoroshiro`	0.347	0.362
`ConstantNonRandom`	0.221	0.211

What I did notice is that Swift 3 is much less aggressive about inlining. Where I had previously inlined the Mersenne Twister code into a withUnsafeMutablePointer closure as follows:

withUnsafeMutablePointer(&state_internal){ptrinletstate=UnsafeMutablePointer<UInt64>(ptr)// Mersenne twister "twist" code using `state` here}

I needed to rewrite this with the “twist” code out-of-line to prevent significant performance problems. i.e.:

letstate=withUnsafeMutablePointer(to:&state_internal){$0.withMemoryRebound(to:UInt64.self,capacity:MersenneTwister.stateCount){$0}}// Mersenne twister "twist" code using `state` here

This is horribly memory unsafe, since we need to leak the state pointer outside the scope where it is correctly bound. Unfortunately, without this change, performance dropped to 0.622 seconds (more than 16% slower).

I don’t know if these inlining changes were a deliberate choice (inlining is a tradeoff, after all) or an accidental regression. In any case, as a closure-heavy language, I wish Swift offered the ability to force closure inlining.

Even after fixing the inlining issues, my MersenneTwister slowed from 0.517 seconds to 0.533 seconds. Curiously, the C implementation that I modified to follow the same steps had exactly the same performance change on my computer. I don’t know what the underlying cause would be there but it might be due to changes in LLVM.

Mutexes and closure capture in Swift

This article was significantly impacted by SE-0088 Modernize libdispatch for Swift 3 naming conventions. Specifically, the dispatch_queue_sync function has become DispatchQueue.sync.

iOS 10 and macOS 10.12 have added a new locking primitive: os_unfair_lock_t. This fills the gap left by the problematic OSSpinLock so I’ve added mention of it to the article.

I’ve updated all the performance numbers for Swift 3 although they’re largely the same.

Parsing whitespace in an Xcode extension

This was the first article I wrote using Swift 3 – since it only works with Xcode 8.

However, I’ve heavily updated the code for this article since I first wrote it. Not to fix Swift issues but to fix limitations in the parser itself. What was originally a 600 line parser is now over 900 lines where only a couple hundred lines remain from the original implementation. The approach is largely the same but it’s had some significant revisions to avoid problems policing whitespace across a wider range of code.

Exponential time complexity in the Swift type checker

I really wish I could say anything in this article had changed but the compiler issues described all remain exactly as is.

The Swift developers have indicated to me that “rewriting the Swift type checker is on their To-Do list” but that list is very long so I have no idea when we’ll see progress in this area.

Design patterns for safe timer usage

The fourth Swift 3 beta, containing the new Dispatch API was released the day after I wrote this article. I radically overhauled it immediately to use the new API. However, those changes were made in haste and the parameter names I chose for my functions didn’t use the same nouns as similar concepts in Apple’s own APIs and the parameters were non-sensically ordered. I’ve gone back and made these changes so it all feels a little more coherent.

Values and errors part 1 & 2

Both of these articles were written after Swift 3 beta 6 (the last major breaking change) and they remain fully functional in Xcode 8.

Conclusion

One article is a complete write-off. Five articles required no changes at all. Twelve articles required updates ranging from a couple minor fixes to hundreds of fixes to bring them up-to-date for Swift 3. Over on github, all my “swift3-prerelease” branches are merged into “master”.

Looking at performance, Swift 3’s performance numbers aren’t dramatically changed for the few cases Cocoa with Love has tested. There are a number of cases that ended up slightly faster but function inlining seems to be reduced in numerous places for reasons I can’t totally understand or cleanly reproduce in simple test cases, and this significantly hurts cases where previously inlined code falls back to capturing closures.

Cocoa API renaming is the most prominent change in Swift 3 but it is mindless and simple – Xcode will do 75% or more of these automatically. Even if Swift 2 code gets included after the initial Swift 3 conversion, fixing is rarely more than “Next Issue” (Command-Apostrophe) followed by “Select” (Return) to choose the Xcode suggested fix.

A bigger burden is the desired (but not required) updating of your own APIs to follow similar guidelines. There’s no help here. As verbose as Objective-C’s naming conventions were, I used them for so long (even in other languages) that I think I’ll require more practice before Swift’s will be instinctive for me.

The most difficult mandatory problem when converting to Swift 3 is when your code uses pointers, string constructors, Dispatch or another API that Swift has rethought, rather than renamed. You can’t ignore these problems, Xcode might not be able to suggest an automatic solution, and you’ll need to simply rewrite whole blocks of code. Best to get your tests in order before you start.

Let’s hope the goal of source compatibility for Swift 4 and beyond is realized.

I implemented a double-ended queue type (deque) in Swift to satisfy a particular need in one of my programs. The algorithm itself isn’t very interesting and it’s not what I’ll be discussing.

Instead, I’ll look at implementing a copy-on-write type in Swift and optimizing the code to satisfy my core aim of running at least as fast as Array when used as a first-in-first-out (FIFO) queue for queue lengths around 20 items – without requiring any initial capacity to achieve this performance.

It turns out to be a tricky optimization task due to problems with the functions used to access the copy-on-write buffer. Join me for a look at optimizing a Swift copy-on-write collection.

Introduction

The Swift standard library doesn’t have many built-in container types. It was Swift 1.2 before it introduced a basic Set type. While there are a number of utility storage types, like CollectionOfOne, in general, Swift offers an Array, Dictionary and Set as the only general purpose collection types.

Swift lacks any type ideally suited as a FIFO queue. In other standard libraries, this is commonly handled by a double-ended queue or Deque type.

You won’t find a double-ended queue in the Foundation library either but the CFArray/NSMutableArray addresses this deficiency by degenerating into a binary heap under queue-like operations. This lets you use an NSMutableArray as a double-ended queue. A million years ago (in 2005) the Ridiculous Fish blog wrote about this in an article elaborately titled: “Array”. It’s a fascinating read, if you haven’t read it before. However, elements in an NSMutableArray need to be heap allocated and the container itself is not copy-on-write – neither of which seem ideal in Swift so I didn’t want to use this option.

There is an existing, commonly used Deque implementation which appears to be solid and well maintained. However, I was predominantly interested in queues of length 0 to 20 with an initial capacity of zero and under these conditions, I found this implementation performed slower than Array. I chose not to use this option because I want a better focus on the low end.

I could try to graft a Swift copy-on-write wrapper onto a C++ std::deque but ultimately, I chose not to take this option either – although I’d still like to try this in future as I’m curious to work out what would be involved in using C++ to store Swift data (I assume C++ would function as little more than byte-level storage).

Ultimately, I found myself in a position where I would need to write the type myself. I wanted to satisfy the following aims:

Copy-on-write
No heap allocation until the first element is added
Automatic growing and downsizing of storage (down to a minimum capacity)
As fast as Array (or faster) for pushing 10 elements then popping them all in a FIFO fashion, considerably faster for 50.
Implement all of RandomAccessCollection, RangeReplaceableCollection, ExpressibleByArrayLiteral and CustomDebugStringConvertible

Since it is the easiest type of double-ended queue to implement, I’ll be implementing a “circular-buffer” style double-ended queue – one where the first element is allowed to have an “offset” from the front of the storage and successive pops from one end and pushes to the other will cause initialized values in the queue to offset within the storage until they wrap around at the end.

Figure 1: a ring-buffer showing the indices of four initialized elements in a storage of capacity six

a circular-buffer showing the indices of four initialized elements in a storage of capacity six

Figure 2: the valid indices after removing the first three elements and appending three more

the valid indices after removing the first three elements and appending three more

An Array would continually move the values so index 0 was always at the start of the buffer. By comparison, the circular-buffer allows the index 0 element to be offset within the buffer. The caveat is that the storage might not be contiguous, it may overrun the end of the storage and wrap around to the beginning again.

This is not usually the most optimal approach to a double-ended queue (that would involve a more complex split-buffer or heap storage) but we’ll see how it goes.

Copy-on-write in Swift

In Swift, idiomatic collection type are usually “copy-on-write”. This means that assigning a collection to a new variable merely increases the reference count to the original but subsequent attempts to mutate a multiply referenced collection cause the mutated reference to copy itself to a unique location and mutate the unique, rather than shared version.

This is how Array, Dictionary and Set behave.

For this to work, the underlying storage of the collection must be a reference counted class type. To avoid a redundant allocation (one for the reference counted class type and one for the raw storage buffer) we use the create function on ManagedBufferPointer<Header, Element> to allocate a class type (usually a subclass of ManagedBuffer<Header, Element>) and a raw storage buffer in a single allocation.

Combined with the isKnownUniquelyReferenced function (which tests if any class type is singly or multiply reference) this gives us all the tools required.

Overview of the Deque design

A rough description of a Deque type using a ManagedBuffer<Header, Element> would then look like this:

publicstructDeque<T>:RandomAccessCollection,RangeReplaceableCollection,ExpressibleByArrayLiteral,CustomDebugStringConvertible{publictypealiasIndex=IntpublictypealiasIndices=CountableRange<Int>publictypealiasElement=Tvarbuffer:ManagedBuffer<DequeHeader,T>?=nilpublicinit()publicinit(arrayLiteral:T...)publicvardebugDescription:Stringpublicsubscript(_at:Index)->TpublicvarstartIndex:IndexpublicvarendIndex:IndexpublicmutatingfuncreplaceSubrange<C>(_subrange:Range<Int>,withnewElements:C)whereC:Collection,C.Iterator.Element==T{ifisKnownUniquelyReferenced(&buffer)&&/* values fit in existing storage */{// Mutate in place}else{// Use the `create` method on ManagedBuffer}}}structDequeHeader{varoffset:Intvarcount:Intvarcapacity:Int}

That’s the entire set of required methods, properties and typealiases. You get quite a bit for free from the Collection protocols that all ultimately gets channeled through the subscript or the replaceSubrange function. It gets a lot more difficult if the Index isn’t Int but fortunately that’s not the case here.

Storing values safely in an uninitialized buffer

The first tricky hurdle to overcome is safely maintaining reference counts for data stored in an unsafe buffer.

In typical Swift programming, automatic reference counting lets us ignore reference counts. This is not typical Swift programming. When the buffer is allocated, it is “uninitialized” (doesn’t contain a normal value). If we were to try and assign values into the buffer normally:

buffer.withUnsafeMutablePointerToElements{pointerinpointer[index]=newValue}

Swift’s automatic reference counting would try to automatically release the object at pointer[index] before assigning newValue to that location. This is a problem: when uninitialized, the location could be garbage data and attempting to release its contents could cause a EXC_BAD_ACCESS or other crash or misbehavior.

The way around this problem is that we must manually manage all memory in the buffer. This means copying values into the buffer with initialize:

buffer.withUnsafeMutablePointerToElements{pointerinpointer.advanced(by:index).initialize(to:newValue)}

or if we’re moving values from one buffer to another (releasing in the old location):

oldBuffer.withUnsafeMutablePointerToElements{oldPointerinnewBuffer.withUnsafeMutablePointerToElements{newPointerinnewPointer.advanced(by:newIndex).moveInitialize(from:oldPointer.advanced(by:oldIndex),count:moveCount)}}

and when we’re done, we need to release the values:

buffer.withUnsafeMutablePointerToElements{pointerinpointer.advanced(by:offset).deinitialize(count:valueCount)}

Limitations of ManagedBuffer

So we’re responsible for manually maintaining reference counts – including manually releasing values before the buffer is destroyed.

In most cases, this means that we need to perform deinitialize in a deinit method. This means that we can’t rely on the default ManagedBuffer class alone: while it calls deinitialize on the Header, it doesn’t do anything with the Element buffer.

At a minimum, we need to subclass ManagedBuffer and add a deinitialize for any values that are still in the buffer at deinit time.

Here’s one I prepared earlier

I went ahead and implemented the entire Deque based on the information I’ve discussed so far. The implementation is a little tedious but isn’t very difficult. It’s about 350 lines (including comments and blank lines). You can download this initial version here. Later in this article, I will refer back to this version as the “conservative” version.

While not required, I included optimized versions of the following functions:

append(_:)
insert(_:, at:)
remove(at:)

to improve performance for commonly expected operations.

The implementation also includes a runtime specified minimumCapacity so that the queue can dynamically grow and shrink but won’t shrink below the minimum (the default value of this is zero and falling back to zero capacity causes the entire buffer object tob be deallocated).

Since the key metric that I’m using the gauge success or failure is how this compares relative to Array for small queues, I be evaluating performance with the following test:

letouterCount=100_000letinnerCount=20varaccumulator=0for_in1...outerCount{vardeque=Deque<Int>()foriin1...innerCount{deque.append(i)accumulator^=(deque.last??0)}for_in1...innerCount{accumulator^=(deque.first??0)deque.remove(at:0)}}XCTAssert(accumulator==0)

This test case appends 20 elements sequentially at the end of a new queue and then removes them sequentially from the front. This process is them repeated 100000 times.

I’ll compare against Array. The only difference between the two tests is the var deque = Deque<Int> line will be var deque = Array<Int> for the Array version.

	Time taken (seconds)
Initial `Deque` implementation	16.217
`Array`	0.214

That’s not good. What’s the problem?

Problem number 1: Specialization needed

The first difficulty is that, relative to third-party code, the standard library cheats. Where my Deque implementation was compiled into a separate module and therefore received no generic specialization, the Swift standard library is always compiled as though it is part of the same module – so it can always take advantage of generic specialization.

This is causing the following accessors into the buffer:

buffer.withUnsafeMutablePointerToElements{header,elementsinelements.advanced(by:header.pointee.offset).initialize(to:newValue)}

to use capturing closures, with heap allocation overhead, and this is responsible for most of the execution time.

This is a problem I’ve discussed previously. The solution is similar: ensure that the Deque and the test case are linked as part of the same module and ensure that “Whole program optimization” is enabled.

	Time taken (seconds)
Initial `Deque` compiled in the same module	0.589
`Array`	0.204

Okay, that’s more than an order of magnitude better but it’s still not great. What’s the problem now?

Problem number 2: ManagedBuffer doesn’t inline properly

It turns out that despite being fully specialized, Swift is still not inlining some of the closures. For reasons I can’t quite understand, Swift is refusing to inline the withUnsafeMutablePointers and related functions.

I therefore implemented the ManagedBuffer versions of these functions on my own DequeBuffer and stopped my DequeBuffer being a subclass of ManagedBuffer:

finalclassDequeBuffer<T>{classfunccreate(capacity:Int,count:Int)->DequeBuffer<T>{letp=ManagedBufferPointer<DequeHeader,T>(bufferClass:self,minimumCapacity:capacity){buffer,capacityFunctioninDequeHeader(offset:0,count:count,capacity:capacity)}returnunsafeDowncast(p.buffer,to:DequeBuffer<T>.self)}funcwithUnsafeMutablePointers<R>(_body:(UnsafeMutablePointer<DequeHeader>,UnsafeMutablePointer<T>)throws->R)rethrows->R{returntryManagedBufferPointer<DequeHeader,T>(unsafeBufferObject:self).withUnsafeMutablePointers(body)}funcwithUnsafeMutablePointerToElements<R>(_body:(UnsafeMutablePointer<T>)throws->R)rethrows->R{returntryManagedBufferPointer<DequeHeader,T>(unsafeBufferObject:self).withUnsafeMutablePointerToElements(body)}funcwithUnsafeMutablePointerToHeader<R>(_body:(UnsafeMutablePointer<DequeHeader>)throws->R)rethrows->R{returntryManagedBufferPointer<DequeHeader,T>(unsafeBufferObject:self).withUnsafeMutablePointerToHeader(body)}deinit{withUnsafeMutablePointers{header,bodyinDeque<T>.deinitialize(range:0..<header.pointee.count,header:header,body:body)}}}

	Time taken (seconds)
`Deque` with custom buffer class	0.354
`Array`	0.207

Okay but we’re still a factor of 2 slower than desired.

Problem number 3: Underlying closures still won’t behave

Bypassing the ManagedBufferwithUnsafeMutablePointers function with my own implementation improved things but now profiling reports the underlying ManagedBufferPointer.withUnsafeMutablePointers function as the biggest time consumer.

Out of frustration, I dug into the Swift standard library source code to work out exactly what the ManagedBufferPointer.withUnsafeMutablePointers function does so I could bypass it.

The following unsafe accessors and helpers on my DequeBuffer type is the result:

staticvarheaderOffset:Int{returnInt(roundUp(UInt(MemoryLayout<HeapObject>.size),toAlignment:MemoryLayout<DequeHeader>.alignment))}staticvarelementOffset:Int{returnInt(roundUp(UInt(headerOffset)+UInt(MemoryLayout<DequeHeader>.size),toAlignment:MemoryLayout<T>.alignment))}varunsafeElements:UnsafeMutablePointer<T>{returnUnmanaged<DequeBuffer<T>>.passUnretained(self).toOpaque().advanced(by:DequeBuffer<T>.elementOffset).assumingMemoryBound(to:T.self)}varunsafeHeader:UnsafeMutablePointer<DequeHeader>{returnUnmanaged<DequeBuffer<T>>.passUnretained(self).toOpaque().advanced(by:DequeBuffer<T>.headerOffset).assumingMemoryBound(to:DequeHeader.self)}

I then use these accessors inside my withUnsafeMutablePointers implementations instead of calling down into ManagedBufferPointer.

As you can see, all I’m really doing is advancing the self pointer to the correct locations for the Header and Elements in the buffer allocated by ManagedBufferPointer– which is the same work that ManagedBufferPointer would perform. My code cannot run the defer { _fixLifetime(_nativeBuffer) } call that the ManagedBufferPointer.withUnsafeMutablePointers call runs but my understanding is that this is implied by instance method invocation on self so the lifetime should be safely ensured.

	Time taken (seconds)
Final `Deque` implementation	0.147
`Array`	0.205

We’re finally faster than Swift’s builtin Array type in this test case.

Isn’t this totally unsafe?

I’m offsetting a pointer into the middle of a buffer I didn’t allocate and assuming the layout within the buffer won’t change in future.

Isn’t this a hideous hack? Doesn’t this go against the “maintainable apps” aim that I declared for Cocoa with Love in “A new era for Cocoa with Love”?

Yes and yes.

It’s disappointing but the reality is that if we implement the Deque using ManagedBufferPointer to allocate the underlying storage, we’re forced to choose between an “optimized” version that delivers reasonable performance or a “conservative” version that is so slow that it isn’t really competitive with Array until queue lengths reach 80 to 200 (depending on initial capacity).

Frankly, the performance problems with the conservative version limit its usefulness so much that it’s practically useless. If you’re paranoid about memory safety, then you’d be better off simply using an Array or rewriting without ManagedBufferPointer.

I’ve chosen to go with the optimized version and guard against any potential unsafety with assert statements on construction of DequeBuffer that confirm the values returned from accessors remain the same as the results from ManagedBufferPointer would be. This should guard against the layout changing and the pointers being invalid.

Since the external interface of the class is identical in both cases, it should be trivial to revert to the conservative version in the even that Swift fixes the performance problems involved here.

Comparing with std::deque

Beating the Swift Array isn’t really a giant accomplishment: it’s not designed to be high performance in this case. Let’s instead compare with something designed for use as a FIFO queue.

Here’s how Deque compares with libc++’s std::deque:

	Time taken (seconds)
Final `Deque` implementation	0.147
`std::deque`	0.037

Ouch. How did we get beaten again?

While std::deque’s “split-buffer” is a more efficient implementation than our “circular-buffer” implementation (since extending the capacity doesn’t require moving the existing contents to the newly allocated buffer), that’s not the source of the problem here. The source of the problem is that std::deque immediately allocates enough memory for the whole buffer. If our implementation sets a minimum capacity of 20 on construction, we get a lot closer in performance:

	Time taken (seconds)
Final `Deque` with minimum capacity of 20	0.043
`std::deque`	0.036

What’s the cause of the remaining difference?

Compiling Swift without safety checks (which disables precondition checks) gets the Deque version down to 0.039 seconds. I suspect that I could probably rewrite a few conditional unwraps as force unwraps (which don’t need to be checked when safety checks are disabled) and get the performance close to parity.

Frankly though, I have no desire to run with Swift’s safety checks disabled or to remove the copy-on-write behavior (which also incurs a cost relative to the C++ version). For my purposes, I consider the results in either of the last two tables “good enough” and I’m going to leave it there.

Usage

The project containing this Deque implementation is available on github: mattgallagher/CwlUtils.

The CwlDeque.swift file is fully self-contained so you can just copy the file, if that’s all you need. Otherwise, the ReadMe.md file for the project contains detailed information on cloning the whole repository and adding the framework it produces to your own projects.

As mentioned earlier, the repository also contains a “conservative” version of CwlDeque.swift that omits the unsafe pointer offsets.

Conclusion

This was partly fun, partly an exercise in frustration.

It was fun to implement a basic copy-on-write, circular-buffer deque in Swift. Needing to carefully initialize and deinitialize to manually manage reference counts seems like a delightful throwback to pre-ARC Objective-C but it’s straightforward once you realize that it’s required.

The frustration came as I tried to understand the cause of performance problems. While I know a little about the Swift compiler internals, optimizing code like this is still largely a black-box exercise for me: looking at profiling results and haphazardly trying things on problem areas, hoping the situation will improve somehow. I don’t really know why ManagedBuffer and ManagedBufferPointer refused to inline correctly or if I could have fixed the problem a different way. I do know that I shouldn’t have needed to do anything – the conservative and optimized versions should have had identical performance.

The final solution delivers excellent performance but I’m not happy with the pointer arithmetic it required. With the assert statements double checking the pointer arithmetic, it should remain correct but ultimately, I look forward to reverting to the “conservative” version of the code once Swift addresses the underlying cause of the performance problems.

Previously on Cocoa with Love…

A new theme

How do we improve “maintainability”?

Conclusion

Background: type requirements versus runtime expectations

Background: preconditions

Background: preconditions in Swift

Partial functions

Hidden partial functions

The problem with partial functions

Avoid partial functions and use total functions instead

Failable construction, non-failable usage

Other approaches for avoiding partial functions

Change the return type

Change the behavior

Keep dependent components together

Minor aside/complaint about how Swift String indexes work

Change the design

What are the reasons for writing partial functions?

Aesthetics

Internal functions with simple conditions

Method overrides

Effectively unreachable code paths

Forced correctness

Logic tests

Conclusion

Background

Tests first

A list of serious caveats

Trying to write a Mach exception handler

The Mach exception handler: rewriting history

Setting up a Mach exception handler

Usage

Conclusion

An example

C’s memory model

Swift’s memory model

A destructuring optimization pass

The fix

How does this relate to the previous article?

Conclusion

Introduction

We could use a log file

A journal of stack traces instead

Capturing the stack trace

Rolling my own

A simple example using these functions

Implementation of callStackReturnAddresses

Usage

Packaging note

Conclusion

Introduction

NSProcessInfo and UIDevice

uname

Looking for the source

What else can sysctl do?

Improving sysctl’s interface with a nested set of wrappers

Usage

Conclusion

Figure 1: a pure function

Figure 2: a piecewise function

Errors are expectation failures

Implications of errors

Impact of a composite type on the caller

Figure 3: a non-pure function

Non-pure statements

Impact of non-pure functions on testing

Figure 4: a non-pure error

External state

Conclusion

Class fields in a struct

Trying to access a struct from a closure

Completely loopy

Can we break the loop?

Copies bad, shared references good?

Some perspective

Conclusion

A quick rhetorical question…

Document formats could eliminate the debate entirely

Tabs versus spaces

What else can `sysctl` do?

A Swift `UnicodeScalar` scanner

Conditionals and `switch` statements