WiFi Fingerprinting and Entropy


There is a lot of activity in the indoor location/navigation space. In fact, I should highlight and underline the ‘A LOT’ in that sentence. The IP landscape is big, and utterly uncharted by actual legal challenges (ed – expect that to change once revenue begins to appear for indoor nav apps). There are many, heterogeneous approaches to fingerprinting algorithms, and, in fact, no dominant consensus has developed regarding the best underlying technology. Many R&D groups are focusing on WiFi, but there is a preponderance of geo-magnetic, low-power Bluetooth, proprietary network, GNSS and other technologies that are pursued by teams even more diverse than the underlying, algorithmic approaches.

Who will dominate this quiet, but furious, R&D arms race? Well, I think I am beginning to see where the real underlying problems are. No matter how good your radio technology, or how many access points you can place (strategically) on-site; no matter how much you filter your data; no matter how much SLAM measurement you take, there is entropy. Really, really serious entropy. Great, masses of unquantified variation that arise from a host of factors: from the radio access points, the building materials, the constantly changing radio dynamics of the indoor environment, but MOST importantly, from the many variations in materials, sensor quality, driver algorithms and gremlins that reside in each client device.

We have gathered many, many millions of data points from all imaginable devices in various indoor locations with varying degrees of WiFi coverage, and one factor is apparent that can be controlled by the underlying platform: the degree to which you can normalize for device variance. Take, for example, a high-quality device, with excellent sensors, from which we grab raw signal data from the kernel level (…bypassing 3rd party APIs and drivers, please) and process with our best-in-class DSP voodoo. We take a measure of the average signal strength throughout a large indoor area, sampling at less than 1 second intervals across all WiFi channels (our client hops channels and reads fingerprint data in fraction-of-a-second intervals), and what do we see: a graph of samples vs. signal strength that is clearly Gaussian. But let’s take another device: an HTC or Samsung phone. Run identical tests, and what do we see? A great lumpy mess of a graph that might be somewhat Gaussian if read in the right light, after a few glasses of wine. Why? Device characteristics – driver performance, materials, antenna gain, etc., etc., etc.

One of the clear factors that will see an improvement in the quality of location accuracy is normalization for device properties. Unfortunately, the solution to that problem involves the crunching of massive data sets, across many devices, many indoor locations. And that is probably the reason our new Director of R&D and Algorithms comes directly from the ATLAS project at CERN. Please welcome Thomas to the team. ‘Antenna gain’ is about to replace tau-Letpons as his new best friend.