Dr. Padhi brings to Tag.bio a wealth of experience in cloud computing architecture and infrastructure for data research and discovery, as well as over 16 years of experience in building products, strategies and solutions for research communities. Most notably, he has led various scientific and computing projects to solve many challenging problems in the scientific world.
Across the span which ranges from data engineering, data applications, data science, and data analytics, to the industry verticals in which my company is focusing — Healthcare and Life Sciences — there are bazillions of acronyms, jargon terms, and buzzwords.
These code phrases are often quite useful for:
You know how sometimes you get sick after an intense period of Adulting, and someone inevitably says “it’s just your body telling you to slow down"?
After 7 years of startup hustle, on Saturday my 2-year old laptop told me to slow down.
It happened out of nowhere — my trusty machine which had been happily crunching code just a few hours before presented me with what folks here in Brussels (presumably) call l’ecran noir.
Turned it off, then on again. Nope. Tried again.
Panic. Please, not this, not now.
Over the next few hours, no web-searched solution or…
The era of the monolithic data warehouse/data lake is coming to an end — long live the decentralized data mesh!
Oh, do not despair! All those person-years spent cleaning, transferring, and loading data into your centralized systems hasn’t been in vain. With data mesh, you don’t have to start again from scratch with new technology — i.e. you don’t have to replace your RDBMS, Snowflake, or Databricks with a new vendor or open-source solution. A data mesh will simply utilize your existing databases, warehouses and lakes as nodes in its greater, decentralized network of data products.
Data Science —
Answering questions with data —
Is presumed to be an art,
Or at least a high-tech craft,
Producing exponential value and driving innovation.
Answering questions with data
Needs to be faster,
They design a plan —
A centralized data lake with dashboards!
But it takes too long to build.
It goes over budget.
Centralized data doesn’t scale —
And dashboards aren’t specific enough to be useful.
To this day, Eighty percent of questions are answered the slow way. It’s a human-scale process — Emails, meetings, queries, modeling & analysis — Waiting, waiting for weeks…
…brimming with immense potential value for discovery in science, business and society.
Unfortunately, the actual utility of most collected data is greatly diminished for value discovery/extraction purposes — like drinking salty seawater, the cost/benefit is a net loss. Why is this?
An ever-growing list of anti-patterns and symptoms, in no particular order.
I think about this mostly from the SELECT-side, so I’m sure there’s a fair amount missing on the INSERT/UPDATE-side, and also from the NoSQL perspective.
You should be able to read this straight through, even though terms are presented in alphabetical order. Alternatively, you can jump around to specific terms of interest.
Terms in bold (←except this one) are all defined in this glossary. I’m going to figure out how to turn them into anchor links later.
What if a user’s Data Experience in software were primarily driven by server-defined functionality — instead of being driven by front-end functionality?
This would turn a front-end application into a simple browser of server content — which seems feature-weak — that is, if you only have one type…
Developed in collaboration between San Francisco, California based Tag.bio and New Zealand based Real Time Genomics, the umap-java library represents a port of the original Uniform Manifold Approximation and Projection (UMAP) Python implementation by Leland McInnes.
On a personal note, I’d like to offer heartfelt thanks to everyone who contributed to this work so far.
For starters, I’d like to acknowledge Josh Dunn for first using the term “Useful Data Artifacts” in a conversation over lunch at the Boston Seaport a few weeks back. I’d been using the term “Data Artifacts” for some time, but what’s the value of one, if it’s not useful?
First and foremost, a Useful Data Artifact is an actual digital thing. It is not an idea, a thought, a realization, or an insight. It’s not in your brain — it’s a structured data object, created when you or an algorithm do something with data.
More technically — a Useful Data…