Guest post: Wheels of Change
Whizz-Kidz's fundraising director shares the 'why' behind their Wheels of Change project changing the lives of disabled children.Read more
Somo and Air bp have a proud history of collaboration. We’ve enjoyed working with them on many projects, including an augmented reality app for their learning centre which led them to describe us as the best digital agency they’d worked with in 25 years. We, in turn, appreciate their innovative approach to energy and their willingness to embrace new technologies to pursue that.
We were very intrigued, then, to be approached by them with a technical challenge in 2015: could we use computer vision to help ground crew refuel light aircraft, by scanning fuel decals that identify the correct fuel to use?
For context, though turbines have been the dominant power plant for airliners since the 1950s, light aircraft may also use piston engines, and these use different kinds of fuel; using the wrong one (though extremely rare) can be catastrophic.
This was an exciting challenge, with an opportunity to do some real good; we duly built a prototype, handed this over to Air bp – along with extensive notes to allow them to progress to full self-sufficiency and take this product forward – and celebrated a job well done.
That was five years ago. Since then, Air bp has grown this prototype into an application that is used at 169 airports in 19 countries on four continents, fuelling upwards of 800–1000 aircraft a day. This impressive success not only won bp’s superlatively prestigious helios safety award last year, but was just declared overall winner in the Real IT Awards, including awards for “Excellence in Application Modernisation” and “Product Innovation”.
We thought this would be a very good time to remember exactly how we came to build this prototype, the challenges on the way, and how we met them – and put Air bp on the road to making an excellent safety record into, we hope, a flawless one.
Combustion engines are tricky things. Hundreds, and sometimes thousands, of moving parts must all mesh precisely, with microscopic spaces between them. These parts must be lubricated and cooled. A precise balance of fuel and air produces just the right amount of heat to run the engine. The wrong fuel ruins this balance, making the engine too cold – in which case it will stop – or too hot, which can cause components to warp, crack and seize, and lubricants and coolants to freeze, boil or burn.
However, the engine must also keep the fuel happy. All fuels have a flashpoint, a temperature at which they will start to burn; if the engine gets the fuel too hot too soon, it will burn prematurely; if the engine doesn’t get the fuel hot enough, it won’t vaporise and mix with the air, and will not burn cleanly, or at all.
Gasoline engines use a volatile, light fuel that burns explosively when ignited by a spark, pushing each piston through its power stroke. Turbines and certain approved diesel engines, by contrast, use a heavier fuel that burns more slowly and continuously, but which has a higher energy density. Mix these up and engines will, at best, stop working, and at worst, be destroyed. Of course, total engine failure during flight has worse consequences than engine maintenance headaches, particularly during take-off, when aircraft may lack the height and speed to glide to safety.
For this reason, it is very important to know which aircraft use turbine or diesel engines, and which use gasoline engines. Unfortunately, some models of aircraft can use all of these. For example, one of these helicopters uses a gasoline engine, and one uses a turbine. See if you can guess which is which.
Of course, there are some safety systems to prevent misfuelling. The simplest is that different fuels are delivered through different-shaped nozzles; unfortunately some aircraft have nonstandard fillers, defeating this. Aircraft are also labelled with their fuel type to allow ground crew to work out which bowser to trundle over. The overwhelming majority of the time, this works, but very rarely – around one time in two million – misfuelling does still occur.
Our job was to get this number even lower.
Our first approach to this was to have an app that quite literally reads the text of the fuel decal and interprets it, much as a person would.
Optical Character Recognition, or OCR, is a technology with a surprisingly long history: the first devices with any claim to achieve this were patented in 1914. Fonts designed specifically for machines to read were defined in 1968 (one by the legendary Adrian Frutiger). Around the same time, systems capable of reading any character in any style began to be developed, notably by Ray Kurzweil, and applications like mail sorting started to use this new technology. Deep learning has recently augmented its effectiveness still further.
However, it is a technology that requires many challenges to be met before it can be applied.
The first challenge is one familiar to anyone who has seen a face in a piece of toast, or in the shape of mountains on Mars – a phenomenon known as pareidolia. Faces are so important to us – allowing us, among other things, to recognise friends and guess how they are feeling – that we see them everywhere.
It will come as no surprise that a system designed to recognise any letter, even the most distorted, will see text in rivets, windows, random decorations and fluff on the camera lens.
The second challenge is rather subtler. When a human reads, they use context – formatting, spacing, remembered rules about capital letters, remembered information from previous text – to help interpret the letters correctly. So, for instance, consider this text:
There are three near-identical vertical bars here. We know that the first is an upper-case ‘I’ and the second two are lower-case ‘l’s, because we understand the conventions of writing and the name “Ilkley” is familiar (particularly if you’ve ever been on a moor without a hat). By default, a machine knows none of this, and will struggle to tell which letters are which.
Remove the context, and a human would struggle too. For example, I lied: the vertical bar near the centre is actually an upper case ‘I’, but the human mind interprets it as a lower case ‘l’ because that’s what we expect to see. In fact the lower case ‘l’ is slightly taller and narrower, which can be seen – barely – with a horizontal bar behind the text to give the height of the capital ‘I’, showing the lower-case ‘l’ and ‘k’ protruding above the bar:
Along with these intrinsic issues, we found that the algorithm needed a very high-contrast image to work successfully, so we compressed the input image to pure black and white. We also unfortunately found that the algorithm was quite slow to run, meaning that for practical use, we needed to reduce the resolution of the image – which also, unfortunately, reduced its detail.
These and a number of other challenges meant that, while successful scans were possible using OCR, they were not guaranteed. Here are some example images, along with what the algorithm thought the text was:
As these testing examples show, OCR proved a useful stepping stone, but cumbersome and too unreliable to improve an already excellent safety process. We needed something better.
In Somo’s history, we’ve produced a number of augmented reality products (including, as mentioned above, one for Air bp); the nature of augmented reality means it must be able to detect the exact positions of objects in real-time. The library we had used most at that time uses a form of feature tracking that can produce a positive identification in a small number of frames, and then track an object through subsequent frames. The first step is known as point cloud registration, and can be achieved by a number of methods, but essentially finds a geometric transformation that will superimpose the features the computer knows about on the ones it can see:
The tracking algorithm is called the Kanade-Lucas-Tomasi algorithm (after its inventors), or KLT for short.
This system is designed to discriminate and detect a fixed set of flat images using a feature representation of those images. The features, in this case, are high-contrast corners. By detecting all the high-contrast corners in a video stream, the KLT algorithm is able to work out what image is present and where it is positioned. Here, for example, is a feature representation of an AVGAS decal:
Each point is a feature the algorithm can use to find and track this decal in a video stream.
We created feature representations of a number of fuel decals and found this method to be extremely good, with many of the challenges of OCR gone – but replaced with new challenges to surmount.
As we’ve already seen, text often relies on a good deal of context to interpret it, and shapes from one part of a piece of text can often be repeated in another. Although significantly better than OCR, the image detector would occasionally pick up features common to two decals, and mistake one for the other.
A good way to think about this is to examine the classic signs used for toilets in public buildings. They are designed to be geometrically simple for ease of legibility at a distance, and both figures have common features, but also differences. However, if (as once happened to me) someone’s head is right in front of the door, and you can only see the top of the figure, it is very hard to tell which toilet you’re looking at. On the other hand, if you can only see the bottom, you have all the information you need.
This revealed the solution we would use to solve this problem. We looked at the decals differentially, keeping only those features that were unique to a given fuel type. Doing this dramatically improved our reliability.
Several decals presented similar challenges; multiple refinements of this type were needed to get our reliability into the range we wanted, but we were able to bring our error rate down to zero on the few thousand test examples performed in the field.
Since precision was such a high priority for this project, we tuned our image recognition system aggressively for low error rates. As such, sometimes it wouldn’t manage to recognise a decal at all. In these circumstances, therefore, we had an automatic fallback mechanism, allowing the device to switch to OCR and attempt a recognition from there. A slight tweak to the system – allowing it to attempt to read the text multiple times – improved that to the point where one system or another could usually handle any decal.
Once we had a recognition, it would then be shown to the operator alongside the decal the app had recognised. This meant that if the decal was an incorrect match, it would be extremely obvious. In this case, the operator was to report the mismatch.
Without analysing the misfuelling incidents directly, it is hard to say what the cause of misfuelling is, but we conjecture that even the step of forcing ground crew to look at a photo of the real fuel decal and an example graphic side by side could improve safety, by focussing attention on that.
Computer vision in 2020 is a very different beast to computer vision in 2015. Deep learning is widely available and easy to apply on a device. Today, we would almost certainly try to train a deep convolutional network as an image recogniser.
Not only that, but several companies, notably including Google, have released powerful image processing algorithms, methods for adapting them to specific uses (such as AutoML Vision) and optical character recognition services, many of which can run offline on portable devices. We would now have a wealth of techniques to try.
Interestingly, a common technique for improving accuracy with deep networks is called boosting. Boosting essentially works like this: a network is trained on some data set, then its performance on that set is measured. It will normally get the wrong answer for a small fraction of the training data. These wrong answers are then boosted – given a much higher priority – so that the network specifically trains on data it found hard to interpret the first time round.
What’s interesting about this is that almost all our improvements to the more conventional forms of computer vision amount to a form of boosting: we looked for things that could confuse the system, then found ways to eliminate those.
Finally, image processing can now be achieved with much greater speed and power using high-performance processors designed specifically for graphics – GPUs. OpenGL is a programming language for drawing 3D graphics, but significant success has been achieved using it to specify computational processes to run on GPUs. In addition, most of the major machine learning frameworks have GPU optimisations available. We would certainly be able to achieve better OCR performance with GPU power.
Somo is an accelerator. We not only do innovative work for our clients; we aim to help them work innovatively for themselves, carrying on the work we begin.
In this exact way, Air bp has carried on the work we began for them five years ago. They modified the application to work with a wider range of devices, and to integrate directly with their own data systems and; they added functionality to allow the app to be aware, not only of the fuel decals on an aircraft, but of the type of fuel in a bowser standing ready for fuelling. Most importantly, they designed a safety procedure for ground crew to follow, a rigorous “enhanced three-way cross check”, coordinated through the app.
In June 2016, they performed the first live fuelling trials; in October 2017, Larnaca airport in Cyprus became the first live adopter of the new flow. As we have said already, today, this process reduces the misfuelling risk for 800–1000 aircraft daily.
Air bp are rightly proud of their achievement. Winning the Real IT award is a fitting tribute to such an imaginative solution to a difficult safety problem. Somo is proud to have been there at the beginning, accelerating Air bp towards their goal.
It takes a lot of energy and skill to take flight. We’re delighted to have been able to offer a little of both.