You know how sometimes you get sick after an intense period of Adulting, and someone inevitably says “it’s just your body telling you to slow down"?
After 7 years of startup hustle, on Saturday my 2-year old laptop told me to slow down.
It happened out of nowhere — my trusty machine which had been happily crunching code just a few hours before presented me with what folks here in Brussels (presumably) call l’ecran noir.
Turned it off, then on again. Nope. Tried again.
Panic. Please, not this, not now.
Over the next few hours, no web-searched solution or secret-key-combo-into-safe-mode helped. The computer seemed to start up just fine, but the screen stayed black. …
The era of the monolithic data warehouse/data lake is coming to an end — long live the decentralized data mesh!
Oh, do not despair! All those person-years spent cleaning, transferring, and loading data into your centralized systems hasn’t been in vain. With data mesh, you don’t have to start again from scratch with new technology — i.e. you don’t have to replace your RDBMS, Snowflake, or Databricks with a new vendor or open-source solution. A data mesh will simply utilize your existing databases, warehouses and lakes as nodes in its greater, decentralized network of data products.
Data Science —
Answering questions with data —
Is presumed to be an art,
Or at least a high-tech craft,
Producing exponential value and driving innovation.
Answering questions with data
Needs to be faster,
They design a plan —
A centralized data lake with dashboards!
But it takes too long to build.
It goes over budget.
Centralized data doesn’t scale —
And dashboards aren’t specific enough to be useful.
To this day,
Eighty percent of questions are answered the slow way.
It’s a human-scale process —
Emails, meetings, queries, modeling & analysis —
Waiting, waiting for weeks
For the bottleneck —
Artisan data specialists handcrafting answers. …
…brimming with immense potential value for discovery in science, business and society.
Unfortunately, the actual utility of most collected data is greatly diminished for value discovery/extraction purposes — like drinking salty seawater, the cost/benefit is a net loss. Why is this?
An ever-growing list of anti-patterns and symptoms, in no particular order.
I think about this mostly from the SELECT-side, so I’m sure there’s a fair amount missing on the INSERT/UPDATE-side, and also from the NoSQL perspective.
You should be able to read this straight through, even though terms are presented in alphabetical order. Alternatively, you can jump around to specific terms of interest.
Terms in bold (←except this one) are all defined in this glossary. I’m going to figure out how to turn them into anchor links later.
What if a user’s Data Experience in software were primarily driven by server-defined functionality — instead of being driven by front-end functionality?
This would turn a front-end application into a simple browser of server content — which seems feature-weak — that is, if you only have one type of data application server to connect to. …
Developed in collaboration between San Francisco, California based Tag.bio and New Zealand based Real Time Genomics, the umap-java library represents a port of the original Uniform Manifold Approximation and Projection (UMAP) Python implementation by Leland McInnes.
On a personal note, I’d like to offer heartfelt thanks to everyone who contributed to this work so far.
For starters, I’d like to acknowledge Josh Dunn for first using the term “Useful Data Artifacts” in a conversation over lunch at the Boston Seaport a few weeks back. I’d been using the term “Data Artifacts” for some time, but what’s the value of one, if it’s not useful?
First and foremost, a Useful Data Artifact is an actual digital thing. It is not an idea, a thought, a realization, or an insight. It’s not in your brain — it’s a structured data object, created when you or an algorithm do something with data.
More technically — a Useful Data Artifact is a nonrandom subset or derivative digital product of a data source, created by an intelligent agent (human or software) after performing a function on the data source. …
I’m a co-founder at a San Francisco based startup, but I spend most of my time working from Brussels, so I find myself on both sides of this issue.
I’m the technical co-founder of Tag.bio, and the software platform we’ve built is my baby. So naturally, I’m biased—this 10-point overview is intended to clarify my declarations of technological awesomeness with more objectivity.
Disclaimer — this is intended to be as concise as possible, so it’s chock-full of technical jargon. If that suits you, please continue.