My industry’s favorite new buzzword—harmonized !— can describe many diverse, aspirational attributes of data systems. What does this new wonder-term mean, and to whom?

Image for post
Image for post
Photo by Danil Shostak on Unsplash

Across the span which ranges from data engineering, data applications, data science, and data analytics, to the industry verticals in which my company is focusing — Healthcare and Life Sciences — there are bazillions of acronyms, jargon terms, and buzzwords.

These code phrases are often quite useful for:


Image for post
Image for post
Photo by Kari Shea on Unsplash

You know how sometimes you get sick after an intense period of Adulting, and someone inevitably says “it’s just your body telling you to slow down"?

After 7 years of startup hustle, on Saturday my 2-year old laptop told me to slow down.

It happened out of nowhere — my trusty machine which had been happily crunching code just a few hours before presented me with what folks here in Brussels (presumably) call l’ecran noir.

Hm.

Turned it off, then on again. Nope. Tried again.

Panic. Please, not this, not now.

Over the next few hours, no web-searched solution or…


Image for post
Image for post
Photo by Amy-Leigh Barnard on Unsplash

The era of the monolithic data warehouse/data lake is coming to an end — long live the decentralized data mesh!

Oh, do not despair! All those person-years spent cleaning, transferring, and loading data into your centralized systems hasn’t been in vain. With data mesh, you don’t have to start again from scratch with new technology — i.e. you don’t have to replace your RDBMS, Snowflake, or Databricks with a new vendor or open-source solution. A data mesh will simply utilize your existing databases, warehouses and lakes as nodes in its greater, decentralized network of data products.

If this is your…


Image for post
Image for post
Photo by Patrick Tomasso on Unsplash

Data Science —
Answering questions with data
Is presumed to be an art,
Or at least a high-tech craft,
Producing exponential value and driving innovation.

Organizations know
Answering questions with data
Needs to be faster,
Automated.
They design a plan —
A centralized data lake with dashboards!
But it takes too long to build.
It goes over budget.
Centralized data doesn’t scale
And dashboards aren’t specific enough to be useful.

To this day, Eighty percent of questions are answered the slow way. It’s a human-scale process — Emails, meetings, queries, modeling & analysis — Waiting, waiting for weeks…


Using the most successful, scalable pattern in software history to solve the worst problems in Data Science and Analytics.

Image for post
Image for post
Photo by Greg Rakozy on Unsplashhttps://unsplash.com/@grakozy

Data, data, everywhere…

…brimming with immense potential value for discovery in science, business and society.

Unfortunately, the actual utility of most collected data is greatly diminished for value discovery/extraction purposes — like drinking salty seawater, the cost/benefit is a net loss. Why is this?


Image for post
Image for post
https://commons.wikimedia.org/wiki/File:Punishment_sisyph.jpg

An ever-growing list of anti-patterns and symptoms, in no particular order.

I think about this mostly from the SELECT-side, so I’m sure there’s a fair amount missing on the INSERT/UPDATE-side, and also from the NoSQL perspective.


Image for post
Image for post

You should be able to read this straight through, even though terms are presented in alphabetical order. Alternatively, you can jump around to specific terms of interest.

Terms in bold (←except this one) are all defined in this glossary. I’m going to figure out how to turn them into anchor links later.

API-Driven Design

What if a user’s Data Experience in software were primarily driven by server-defined functionality — instead of being driven by front-end functionality?

This would turn a front-end application into a simple browser of server content — which seems feature-weak — that is, if you only have one type…


Image for post
Image for post
Example output from the Java UMAP library in Tag.bio, embedding ~11,000 TCGA Pan Cancer Atlas tumor samples using gene expression dimensions.

Developed in collaboration between San Francisco, California based Tag.bio and New Zealand based Real Time Genomics, the umap-java library represents a port of the original Uniform Manifold Approximation and Projection (UMAP) Python implementation by Leland McInnes.

The open source project is available here on GitHub, and is released for use under a BSD-3 License.

On a personal note, I’d like to offer heartfelt thanks to everyone who contributed to this work so far.


Vending machine, automated choice and delivery.
Vending machine, automated choice and delivery.
Data analysis systems should be systematic, like vending machines. Your question or request goes in, and a Useful Data Artifact comes out.

For starters, I’d like to acknowledge Josh Dunn for first using the term “Useful Data Artifacts” in a conversation over lunch at the Boston Seaport a few weeks back. I’d been using the term “Data Artifacts” for some time, but what’s the value of one, if it’s not useful?

What is a Useful Data Artifact?

First and foremost, a Useful Data Artifact is an actual digital thing. It is not an idea, a thought, a realization, or an insight. It’s not in your brain — it’s a structured data object, created when you or an algorithm do something with data.

More technically — a Useful Data…


Image for post
Image for post

I’m a co-founder at a San Francisco based startup, but I spend most of my time working from Brussels, so I find myself on both sides of this issue.

Jesse Paquette

Full-stack programmer, computational biologist, and pick-up soccer addict, located in Brussels and San Francisco. https://www.linkedin.com/in/jessepaquette/

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store