A JSON configuration for my recently-developed microbiome data product. Note the file names.

Transcend the constraints of English parlance for better organization and faster interpretation of your work.

Preface

Do you remember when you learned how to format calendar dates for alphanumeric sorting? You either had dates in the month-first US format, e.g. 09–11–1991, or in the day-first European format, e.g. 11–09–1991, and you realized that sorting didn’t work. Date sorting needs to happen first by year, then by month, and finally by day-of-month, e.g. 1991–09–11.

Or — do you remember when you learned better practices around clearer naming for variables, methods, and classes? …


Djehouty, CC BY-SA 4.0, via Wikimedia Commons

What do we mean when we use these trendy terms? Why do they matter? And aren’t these just new names for things that already existed?

Let’s begin with why these terms matter

I’m presuming you’ll agree:

  • Developers should be able to build novel, useful, scalable, secure software with only a minimal amount of clean, maintainable code.
  • Minimizing the complexity and scope of code enables faster deployment of software applications which are easier to maintain and improve.
  • Minimizing the complexity and scope of code enables a larger, more diverse, more domain-specialized pool of developers to design, implement, maintain, and improve software.
  • All of this results in more software, and more useful software, eating the world, faster.


I’m pleased to announce that Sanjay Padhi, PhD — formerly Head of AWS Research and leader in scientific computing at CERN — has joined Tag.bio as Chief Technologist and Executive Vice President.

Dr. Padhi brings to Tag.bio a wealth of experience in cloud computing architecture and infrastructure for data research and discovery, as well as over 16 years of experience in building products, strategies and solutions for research communities. Most notably, he has led various scientific and computing projects to solve many challenging problems in the scientific world.


My industry’s favorite new buzzword—harmonized !— can describe many diverse, aspirational attributes of data systems. What does this new wonder-term mean, and to whom?

Photo by Danil Shostak on Unsplash

Across the span which ranges from data engineering, data applications, data science, and data analytics, to the industry verticals in which my company is focusing — Healthcare and Life Sciences — there are bazillions of acronyms, jargon terms, and buzzwords.

These code phrases are often quite useful for:

  • Naming things — e.g. SQL.
  • Saving time while speaking or typing — e.g. FAIR data.
  • Describing complexity through abstraction — e.g. the Abstract Factory Pattern.
  • Making explicit distinctions between similar, complex things — e.g. ELT vs ETL.
  • Segregating people into inside-group and outside-group — i.e. if you understand the term, you’re part…


Photo by Kari Shea on Unsplash

You know how sometimes you get sick after an intense period of Adulting, and someone inevitably says “it’s just your body telling you to slow down"?

After 7 years of startup hustle, on Saturday my 2-year old laptop told me to slow down.

It happened out of nowhere — my trusty machine which had been happily crunching code just a few hours before presented me with what folks here in Brussels (presumably) call l’ecran noir.

Hm.

Turned it off, then on again. Nope. Tried again.

Panic. Please, not this, not now.

Over the next few hours, no web-searched solution or…


Photo by Amy-Leigh Barnard on Unsplash

The era of the monolithic data warehouse/data lake is coming to an end — long live the decentralized data mesh!

Oh, do not despair! All those person-years spent cleaning, transferring, and loading data into your centralized systems hasn’t been in vain. With data mesh, you don’t have to start again from scratch with new technology — i.e. you don’t have to replace your RDBMS, Snowflake, or Databricks with a new vendor or open-source solution. A data mesh will simply utilize your existing databases, warehouses and lakes as nodes in its greater, decentralized network of data products.

If this is your…


Photo by Patrick Tomasso on Unsplash

Data Science —
Answering questions with data
Is presumed to be an art,
Or at least a high-tech craft,
Producing exponential value and driving innovation.

Organizations know
Answering questions with data
Needs to be faster,
Automated.
They design a plan —
A centralized data lake with dashboards!
But it takes too long to build.
It goes over budget.
Centralized data doesn’t scale
And dashboards aren’t specific enough to be useful.

To this day,
Eighty percent of questions are answered the slow way.
It’s a human-scale process —
Emails, meetings, queries, modeling & analysis —
Waiting, waiting for weeks
For the bottleneck…


Using the most successful, scalable pattern in software history to solve the worst problems in Data Science and Analytics.

Photo by Greg Rakozy on Unsplashhttps://unsplash.com/@grakozy

Data, data, everywhere…

…brimming with immense potential value for discovery in science, business and society.

Unfortunately, the actual utility of most collected data is greatly diminished for value discovery/extraction purposes — like drinking salty seawater, the cost/benefit is a net loss. Why is this?

  1. Collected data suffers from extreme variety — in encoding formats, access methods, and bespoke, domain-specific schemas. Even if it were possible to reformat and restructure the entire universe of data sources as standard, SQL-compatible tables, there would still remain the impossible task of joining and deciphering the vast diversity of table/column schemas that would arise.
  2. Collected data suffers from…


https://commons.wikimedia.org/wiki/File:Punishment_sisyph.jpg

An ever-growing list of anti-patterns and symptoms, in no particular order.

I think about this mostly from the SELECT-side, so I’m sure there’s a fair amount missing on the INSERT/UPDATE-side, and also from the NoSQL perspective.

  1. Entities are referenced across your codebase via unique keys generated within the database. H/T — Nicklas Millard.
  2. You have SQL queries in front-end code.
  3. Users can see table or column names in the front-end.
  4. You have to mitigate risk of SQL injection from API calls.
  5. You have an extensive authentication/authorization model implemented within the database.
  6. Your database has an authentication/authorization model that is aware…


You should be able to read this straight through, even though terms are presented in alphabetical order. Alternatively, you can jump around to specific terms of interest.

Terms in bold (←except this one) are all defined in this glossary. I’m going to figure out how to turn them into anchor links later.

API-Driven Design

What if a user’s Data Experience in software were primarily driven by server-defined functionality — instead of being driven by front-end functionality?

This would turn a front-end application into a simple browser of server content — which seems feature-weak — that is, if you only have one type…

Jesse Paquette

Full-stack programmer, computational biologist, and pick-up soccer addict, located in Brussels and San Francisco. https://www.linkedin.com/in/jessepaquette/

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store