A grain of sand
A story about sand - and data
When the word data is mentioned, which it is more and more often, it is often associated to very large and ever-growing quantities, which calls for poetic comparisons with the number of stars in the entire universe or other large numbers. This is all quite abstract and difficult to imagine. A more pragmatic comparison is to think of sand. This story will tell you why.
Selling sand in the Sahara
There used to be (perhaps there still is) an expression about a particularly gifted sales person, that she “could sell sand in the Sahara”. The underlying assumption, of course, is that if you can sell a product to someone who already has plenty, then you must possess certain skills of persuasion.
I stopped using that expression many years ago, when I started working for a paint company. We manufactured and sold specialized protective coatings for steel structures in industrial plants all over the world. When a new plant had been built, it would stand fresh and shiny with bright colours and look like the couple of hundred million dollars, it had cost. This would be the festive moment, when the executive management made speeches, an invited notability would cut the cord and the plant was officially inaugurated.
Because of the beautiful colours and the fabulous appearance of the new plant, most people assume that this is why the plant is painted. That this is what the paint does. It sure does that, but the invisible and yet most important role of the protective coating is to protect the steel against corrosion, which requires more layers (usually three) of different types of paint. The anti-corrosive layer is applied directly to the steel. It provides anti-corrosive properties and quite importantly, it sticks to the steel. It will however not stick properly to the steel surface unless the steel is clean and has the right roughness. Before any paint can be applied to the steel, the steel must be sandblasted, which essentially consists of blowing sand at high pressure over the steel surface. (There is much more to this, than my description seems to indicate. You can read more about steel surface preparation here).
Which brings us to the sand.
Sand has properties. Some is very clean. Some contains impurities. Some is round. Some is sharp. The photo below shows a closeup of sand from Hawaii. Sand from other locations will be different. This means, that some sand is perfect for sandblasting. And some sand is terrible. If impurities get blasted into the steel, they may cause the corrosion the paint was supposed to prevent.
Sand is not just sand!
At this point it becomes clear that sand is not just sand. It is essential to have the right kind of sand for the task, you want to do. Which is why it is actually possible to sell sand in the Sahara.
Why it may be useful to know about paint and sandblasting, when you work with algorithms and data
One of the sentences we often hear from companies is “we have plenty of data”. This is most likely very true. Most companies have vast amounts of data. It is however somewhat like looking at the Sahara and saying “there is a lot of sand”.
Like sand, data has properties.
Like sand, data has properties. It can be clean. It can be impure. It can even be missing. This has consequences for what you can expect to be able to do with your data. Throwing machine learning tools (think sandblaster) at impure or missing data will lead to bad results. This is why the biggest effort in working with tools such as machine learning lies in the boring, but extremely important and valuable work of cleaning and enriching the data. Maybe even setting up new data collection methods to get the missing data. Then comes the work of integrating the applications in the existing infrastructure (think layers of protective coating) finalized with the user interface (think the glossy, colourful top layer). This effort is invisible at the time of the grand opening of the new AI-algorithm. All that is visible is the beautiful user interface that makes everything look shiny and new. This is what people see and remember.
I should note, that neither sandblasting, nor data cleaning, is really boring to the skilled people applying their talents to the task. It just simply rarely reaches the board rooms, as it is not considered exciting to talk about.
Data and algorithms are very abstract concepts with very real consequences.
Data and algorithms are very abstract concepts with very real consequences. The point about telling the story about painting and sandblasting steel is to give a more hands-on image of the process of making algorithms. If you are unfamiliar with painting steel, then think about the last time you redecorated a room. Moving all the furniture, repairing damages in the walls, preparing and cleaning the surfaces are all the boring, but necessary, preparations and only when you apply the final layer of paint, do you feel the progress. Then think about the time you cut corners and skipped some or all of the boring preparations. What was the result?
Chasing Fata Morganas?
The comparison with the Sahara is meaningful from another point of view. Looking at the desert it may be overwhelming to decide where to start without a map and a guide. Without a strategy for how to cross the desert and what to look for you may find yourself chasing Fata Morganas and getting lost. The same applies when embarking on a project to extract value from the plentiful data. To decide where to start it helps to know what you want to achieve and then to analyze the possible paths to get there. The answer may not be what is immediately visible.
Sand is not just sand. Data is not just data. A glossy top-layer applied to a poorly prepared surface will fall off and need to be repaired. A fancy algorithm applied to poorly prepared data will lead to equally bad results. Walking across endless deserts or working with enormous amounts of data without a plan can be fatal.
There is huge value to be extracted from data and from applying the right tools from the AI toolbox. Eventhough it is very abstract, it is not magic. Just like applying paint to steel is not magic. It is knowledge, skills and hard work. And having the right kind of sand.
And that is why, even in the Sahara, it can make sense to buy sand.