It’s not data gathering when it’s been cultivated

Throughout the long arc of human history, foraging has been an essential skill. It’s what’s allowed us to make good use of our environment, finding gems out there like pears, blueberries, and – yes – even honey. The latter shows the lengths we’ll go to get that rare treat. When you look at societies that still teach, prize, and value the learned art of foraging, you’ll see people improvise ways to scale branchless trees for the simplest taste of something sweet.

All of this is learned and practiced. It’s passed down generation to generation through stories, play, and ritual. It’s tested through sports and rites that celebrate when winners emerge – those confident of their skills and safe enough to not kill the others as to join the adults in this trusted pursuit.

Data gathering, however, is a misnomer.

Oh, don’t get me wrong – it sure feels like gathering when you’re doing it. You search high and low through what seems like an endless sea of leaves to find just one tiny bit of information. You may even feel triumphant with the find. The similarities end here.

You see, gathering typically occurs in unaltered environments. Nature maintains the ecosystem. Gatherers respect the ecosystems at play, such as the cycles of bees, and failure to do so can cause longitudinal harm. In turn, nature plays by established and known rules.

Open data is cultivated. It’s a farm. It may not feel kept from this vantage point, what with all the thorns, branches, and other things, but the sign out front tells the truth: it’s already been cut off the branch. You are looking through pears past their prime that were long ago cut. In the wild, these fall to the ground and start to rot.

Cultivated data allowed to rot.

You see, the vast amount of time we go to gather data, we’re not going to where data grows wild, but to cultivated farms that house data. Nature adheres to rules. Sure, poisonous berries and fruit may look like other fruit, but it’s rarely mixed in with that exact fruit. There’s also usually other clues, such as…pssst…tomatoes don’t grow that close to the ground. No, they’re usually higher so the deer eat them first.

Horsenettle, AKA data that WAS typed up, scanned, and THEN put into a PDF. This is not data.

When we cultivate things, we put them in buckets to try to pawn them off on others. In some farms, Anjou pears and Bartlett pears end up in the same bucket. Unhappy people (don’t look this way) spend too much time trying to split them apart or those that didn’t pass their fruit-picking tests end up bringing home the wrong pear. In the worst of places, people grab horsenettle (above) and try to pass it off as some type of heirloom tomato. These, kids, are not the places to shop.

Also rarely an effective strategy…column names tend to change over time.

So, why do we settle for this with our data?

For years, we lived in a world where we could safely design in isolation and the impacts were small. Things have gotten bigger and grown far more overlapping. The horsenettle and the tomatoes have gotten too close in proximity for comfort. We need better ways to catalogue our data, both internally and externally. The better the data is maintained internally, the easier it is to design and maintain an architecture for sharing it, such as what Cincinnati has done and what numerous others are starting to do to share their data. This is a start, but far from a long term solution.

We need systems that follow logical and simple to maintain patterns that support the work of the experts ultimately generating the data. We also need Open Data entities that make it easier to get to the data in an appropriate way (note: not data sheds with critter names, but APIs and curation that tracks name changes). We need clear referral paths to track down farms that mix pears and attempt to peddle horsenettle as viable fruit. It’s not, kids – it will kill you. And, long term, so will bad data.

Here’s to organized data and good health.

Sincerely yours without a drop a solanine,

This grocery mart-foraging analyst