Building a TypeScript monorepo with Bazel | Dataform
Guide

Building a TypeScript monorepo with Bazel

A short introduction to managing multiple TypeScript NPM packages with Bazel inside a monorepo.

Engineering
picture of author

Lewis Hemens on June 23, 2019

At Dataform we maintain a handful of NPM packages and a documentation site in one single monorepo and we do it all with Bazel.

I'm going to quickly talk through monorepos and Bazel, then deep dive into interesting parts our monorepo with some real code examples, covering:

  • Bazel TypeScript basics
  • Managing multiple packages in a single monorepo
  • Building and publishishing NPM packages with Bazel

Our project is open-source, so you can view all the code, or clone and build it with Bazel at: https://github.com/dataform-co/dataform

All your code in one repo

a.k.a the monorepo, so hot right now.

I used to work at Google so I may be biased, but there is a huge amount of value to having all your code in one single repository. You end up spending a lot less time doing repetitive tasks, updating git submodules, pushing new packages, running bash scripts - the kind of things that distract you from the important task at hand.

With a single code base, it becomes very easy to re-use code and libraries between different projects, but you need a good build system to make it work.

Bazel

_{Fast, Correct} - Choose two_ - https://bazel.build

Ever tried to clone and compile an open-source repo just to spend 30 minutes wrangling with missing or broken system dependencies, mismatched versions and a myriad of bash scripts that just don't work? Yeah...

Bazel is a build system. It's highly opinionated and tricky to master, but leaves you with an extremely fast, hermetic, and reproducible build process once adopted.

Bazel is still fairly young, but the ecosystem is evolving extremely quickly. It's also built on solid foundations - being used internally at Google Bazel is called Blaze and helps to power Google's one colossal monorepo (literally all the code).

The problem

We maintain several NPM packages with inter dependencies. Our goals here are:

  • To manage all packages in a single repository
  • For our builds to be fast and reliable
  • To test changes to multiple packages at the same time
  • An easy way to manage versions across all these packages
  • To write as few bash scripts as possible

A basic Bazel TS rule

Here's an example of how you build TypeScript library with Bazel. Our simplest package in the repo is @dataform/core and we'll use this as an example for most of the post.

The folder looks like a normal TS package, except for the BUILD file. Here's the part of that file that actually compiles the TypeScript:

ts_library(
    name = "core",
    srcs = glob(["**/*.ts"]),
    module_name = "@dataform/core",
    deps = [
        "//protos",
        "@npm//@types/moo",
        "@npm//@types/node",
        "@npm//moo",
        "@npm//protobufjs",
    ],
)

This rule in the BUILD file tells Bazel:

  • The library called core
  • It should include all .ts files within this folder
  • It's (node) module name is @dataform/core
  • It has one internal dependency //protos
  • It has a few NPM dependencies, just like a package.json file

To build this TS library you can run:

bazel build core

Note: ts_library rules and other node related rulesets are not core to the bazel runtime but are imported from elsewhere. You can read more about them here: https://github.com/bazelbuild/rules_nodejs.

Dependencies between packages

If you've worked with a NPM based monorepo before, you've probably used a tool like Lerna.

Lerna makes it easy to link packages locally so you can test changes across multiple NPM packages. It also makes it easier to manage versioning between them. We want that.

Bazel builds and links packages without going anywhere near an actual NPM package. In our @dataform/core example above, the ts_library rule depends on //protos which is just another ts_library rule.

ts_library(
    name = "protos",
    srcs = glob(["index.ts"]),
    module_name = "@dataform/protos",
    deps = [
        ...
    ],
)

The ts_libary rule does some magic to make sure that built packages are available under the module_name attribute provided, which matches the NPM package they will be published at.

So in our @dataform/core package, we can import from the //protos package whose module_name is @dataform/protos like this:

import { dataform } from "@dataform/protos";

When we publish to NPM, these imports will resolve correctly too as the module names match the package names.

Managing multiple packages

Lerna also helps you manage multiple package.json files, updating all versions together and publishing them. We would like a way to do the same thing in Bazel.

To generate package.json files we built small tool in our monorepo that Bazel uses to generate package.json files using layers of JSON templates and string substitutions.

For the @dataform/core package, we have a core.package.json file looks like this:

{
  "name": "@dataform/core",
  "description": "Dataform core API.",
  "main": "index.js",
  "types": "index.d.ts",
  "dependencies": {
    "@dataform/protos": "$DF_VERSION",
    "moo": "^0.5.0",
    "protobufjs": "^6.8.8"
  }
}

Any extra info, licenses, homepage etc - is inherited from the base common.package.json so we don't have to keep several files in sync.

The special string $DF_VERSION gets replaced with a global constant defined as part of the Bazel build system in version.bzl.

Building NPM packages

To evaluate these JSON templates, we wrote a Bazel macro to invoke our tool above and we invoke it in the @dataform/core BUILD file like so:

load("//tools/npm:package.bzl", "dataform_npm_package")

dataform_npm_package(
    name = "package",
    package_layers = [
        "//:common.package.json",
        "core.package.json",
    ],
    deps = [":core"],
)

This custom bazel macro both generates a final package.json from two templates listed, and creates an output dist folder with the compiled TypeScript that's ready to be published.

To see the output of this and the final generated package.json, you can run the following command:

bazel build core:package

Bazel tells us it's put the package in the folder bazel-bin/core/package with .js and .d.ts files as well as the final package.json (this is kind of like a dist folder) that is ready to publish!

Note: We haven't fully automated this step yet, and it's still necessary to make sure that the dependencies and package name in the BUILD file match those in the package.json template, but it's certainly feasible to automate this entirely.

Publishing NPM packages

Publishing is easy at this point, and the rules_nodejs libraries have this built in. To publish a package we can run:

bazel run //core:package.publish

We still have a bash script to do this for all packages but all it does is invoke Bazel commands:

#!/bin/bash
set -e

# Test all the things.

bazel test //...

# Publish all the things.

bazel run //api:package.publish
bazel run //core:package.publish
bazel run //cli:package.publish
bazel run //crossdb:package.publish
bazel run //protos:package.publish

Conclusion

This is by no means a complete solution yet, and in reality will require you to learn quite a lot about Bazel to get it working on your own project. For anyone trying, hopefully our repo can serve as a good reference!

Despite that, I hope that this demonstrates that using Bazel is a great solution to managing complex projects and many Node / Typescript packages inside a single repository. With a a few small extra Bazel rules you can build a TypeScript monorepo that is lighting fast and will scale as your project does.

If you found this post interesting and would be interested in a Bazel TypeScript starter pack repo, reach out and we'll see what we can do!

More content from Dataform

SQL vs R. Which to use for data analysis? illustration
Guest Post

SQL vs R. Which to use for data analysis?

Learn more
Data as a Utility Tool illustration
Guest Post

Data as a Utility Tool

Learn more
CI/CD for ETL/ELT SQL pipelines illustration
Guide

CI/CD for ETL/ELT SQL pipelines

Learn more

Learn more about the Dataform data modeling platform

Dataform brings open source tooling, best practices and software engineering inspired workflows to advanced data teams that are looking to scale, helping you deliver reliable data to the entire organization.