Table of Contents

This proposal was merged into wicg/webpackage repo.

Last update: Mar 2021

This is a strawperson proposal at very early stage, aiming to load multiple resources efficiently, using a Content-Addressable Bundle, which can link to other bundles.

Authors

Participate

Introduction

There is a proposal which aims to address the above issues, called Subresource loading with Web Bundles, a new way to load a large number of resources efficiently using a format that allows multiple resources to be bundled, e.g. Web Bundles.

With the Subresource loading with Web Bundles proposal, developers will write

<link
  rel="webbundle"
  href="https://example.com/dir/subresources.wbn"
  resources="https://example.com/dir/a.js https://example.com/dir/b.js https://example.com/dir/c.png"
/>

to tell the browser that subresources specified in resources attribute can be found within the https://example.com/dir/subresources.wbn bundle. The browsers will fetch a bundle, and load resources from the bundle.

However, currently, there is no mechanism to fetch the partial content of the bundle. Browsers will fetch the entire bundle even if only one of the resources in the bundle is updated in a server side.

This explainer proposes a new approach on the top of Subresource loading with Web Bundles, aiming more flexibility of how resources are grouped together as bundles. This allows web developers to split code into various bundles which can then be loaded on demand or in parallel, and also provide a new capability to express a dependency between bundles, which can be used for a browser to improve a loading performance.

Goals

Non-Goals

Web Bundle format

TODO(hayato): Define a format using CDDL. For a while, this section explains a non-normative conceptual format.

Bundle's URL should be <main-resource-url>.<hash>.wbn.

This proposal extends the WebBundle Format with a new version number. The bundle should have the following fields:

  1. bundle's hash: abcd

    Note: This proposal has not decided which hash functions we should use. This section writes down a hash as just 4 characters, however, a hash would be much longer in real cases, like 40 characters used in a Git commit ID.

  2. bundle's main-resource URL: https://example.com/app/index.js

    Note: The current WebBundle format also defines a main-resource URL. Now, a main resource URL must be a part of the bundle's URL, "<main-resource-url>.<hash>.wbn".

  3. Resources section, which is conceptually as follows:

URLtyperesponse headersbodyhash
./index.jsinline<encoded response header><body>b0b0
./foo.jsinline<encoded response header><body>c0c0
./bar.jsinline<encoded response header><body>d0d0
https://cdn.example.com/date-util.js.f1f2.wbnexternalNANAf1f2

A hash must be calculated as follows:

We call a bundle which satisfies the above requirements, Content Addressable Bundles ; In short, CAB.

These requirements imply that a dependency graph of bundles forms a DAG (directed acyclic graph). If it has a cycle, a hash can't be calculated.

To distinguish CAB and other existing Web Bundles which are not CAB, we may call the former immutable bundles, and the latter mutable bundles in order to make the difference clear in some contexts. Unless otherwise noted, a bundle means a CAB, an immutable bundle, in this proposal.

Web APIs

Declarative form

Note: Declarative syntax is tentative. We borrow <link>-based API from Subresource loading with web bundles proposal for the purpose of the explanation.

Example

The page (https://example.com/app/index.html):

<link rel=webbundle
      href=https://example.com/app/index.js.abcd.wbn
      resources="index.js" >

<script type="module" src="index.js" />

The bundle (https://example.com/app/index.js.abcd.wbn):

URLtyperesponse headersbodyhash
./index.jsinline<encoded response header><body>b0b0
./foo.jsinline<encoded response header><body>c0c0
./bar.jsinline<encoded response header><body>d0d0
https://cdn.example.com/date-util.js.f1f2.wbnexternalNANAf1f2

Another bundle (https://cdn.example.com/date-util.js.f1f2.wbn) (which is linked from the index.js.abcd.wbn):

URLtyperesponse headersbodyhash
./date-util.jsinline<encoded response header><body>0808
./string-util.jsinline<encoded response header><body>0909
https://cdn.example.com/common-util.js.3b3b.wbnexternalNANA3b3b

For illustration purposes, the dependency graph of resources is:

toppage
└── index.js
    ├── foo.js
    │   └── https://cdn.example.com/date-util.js
    │       ├── string-util.js
    │       └── https://cdn.example.com/common-util.js
    │           └── ...
    └── bar.js

These resources are bundled into three bundles, as follows:

toppage
└── https://example.com/app/index.js.abcd.wbn
    ├── index.js (main-resurce)
    ├── foo.js
    ├── bar.js
    └── https://cdn.example.com/date-util.js.f1f2.wbn
        ├── date-util.js (main-resurce)
        ├── string-util.js
        └── https://cdn.example.com/common-util.3b3b.js
            ├── commons-util.js (main-resource)
            └── ...

How to Load

  1. When the HTML page is parsed, a browser records that index.js can be loaded from the bundle, https://example.com/app/index.js.abcd.wbn.

  2. The browser starts to fetch https://example.com/app/index.js.abcd.wbn.

    1. (This is asynchronously done) The browser must parse the index section of the fetched bundle, and record a list of resource's URLs, foo.js, bar.js, and https://cdn.example.com/date-util.js, as resources which must be loaded from the bundle.

    Note a relative URL is resolved based on the bundle's URL. For an external resource, we use a filename convention. If the URL ends with ".<hash>.wbn", the browser removes this suffix and then record the URL.

  3. The browser sees <script type="module" src="index.js" /> tag. Since index.js is already noted, the browser must load it from the bundle.

  4. Suppose that index.js, which is an inline resource in the bundle, has the following content:

    import * as foo from "./foo.js";
    import * as bar from "./bar.js";
    ...
    

    The browser must load ./foo.js and ./bar.js from the bundle since these resources are already recorded.

  5. Suppose that foo.js, which is an inline resource in the bundle, has the following content:

    import * as date from "https://cdn.example.com/date-util.js";
    ...
    

    The browser knows https://cdn.example.com/date-util.js is an external resource and it should be loaded from https://cdn.example.com/date-util.js.f1f2.wbn. The browser must start to fetch https://cdn.example.com/date-util.js.f1f2.wbn.

  6. The browser must parse the index section of the bundle, date-util.js.f1f2.wbn, and record a list of resource's URLs, as the browser did for index.js.abcd.wbn.

  7. After recorded, the browser loads the main resource, date-util.js, from the bundle. If date-util.js depends on ./string-util.js, string-util.js is loaded from the bundle.

  8. Continue processing...

Notes:

In the previous example, we use <link>-based API to declare a starting node and its index information, as a bootstrap, however, if a browser supports Navigate to a bundle feature, we don't need such a declarative form in HTML.

For example, given that a bundle, https://example.com/app/index.html.0101.wbn, whose main resource is HTML file:

URLtyperesponse headersbodyhash
./index.html (main-resource)inline<encoded response header><body>0101
./index.jsinline<encoded response header><body>b0b0
./foo.jsinline<encoded response header><body>c0c0
./bar.jsinline<encoded response header><body>d0d0
https://cdn.example.com/date-util.js.f1f2.wbnexternalNANAf1f2

If a browser supports entering https://example.com/app/index.html.0101.wbn URL directly in its address bar, ./index.html doesn't have to declare resources. The browser can know them from the bundle's index section, before starting to parse index.html file.

Cache strategy

It's up-to browsers how to cache bundles. This explainer doesn't define any formal procedure, however, there are several possible approaches:

Key scenarios

TODO(hayato): [Description of the end-user scenario]

Considered alternatives

Resource Bundles

TODO(hayato): Mention scopes.

FAQ

What happens if a bundle doesn't declare a dependency to a resource which is used from its inline resources?

Nothing is wrong if you know what you are doing.

However, this proposal strongly suggests that a bundle should be self-contained as much as possible; Include every resources as inline resources, or declare external dependencies explicitly in the bundle in order to achieve reproducibility.

In the above example, https://example.com/app/index.js.abcd.wbn declares a dependency to the specific version of date-util.js with https://cdn.example.com/date-util.js.f1f2.wbn.

It's totally fine to use an external file directly without any declaration. For example, index.js might use an external module, hello.js, as follows:

import * as hello from "https://cdn.example.com/hello.js";

However, we no longer guarantee reproducibility. If https://cdn.example.com/hello.js changes its semantics, the web site, a user of index.js.abcd.wbn, might break. Especially, this can be problematic if an external dependency's content is out of control.

If you can control an undeclared external resource's API stability somehow, this might not be a big issue. For example, index.js can hard-code the specific version of hello.js, such as:

import * as hello from "https://cdn.example.com/hello@4.12.js";

However, this proposal suggests that this kind of hard-coding should be avoided, and such a dependency information should be specified outside of source code as much as possible.

We can see many similar patterns in other package management systems, such as node (package.json), Rust (Cargo.toml), deno (deps.ts).

This proposal aims to support this pattern by making it possible to declare dependencies inside of a bundle.

Any recommendations for bundle's granularity? How should we group resources into bundles?

That's totally up-to you. There are various factors and trade-off to make a decision.

For example, if you want to make an initial loading faster, with a cold start, you probably want to make the size of a bootstrap bundle smaller; let it include only minimum inline resources which are required for first contentful paint.

toppage
└── https://example.com/app/index.js.abcd.wbn
    ├── index.js (main-resurce, which should be small)
    ├── support-file-for-initial-loading.js (which should be small)
    ├── ...
    ├── https://example.com/lazy-load-1.js.0088.wbn (the size might be big)
    │   └── ...
    ├── https://example.com/lazy-load-2.js.0089.wbn (the size might be big)
    │   └── ...
    └── https://example.com/lazy-load-3.js.008a.wbn (the size might be big)
        └── ...

As another example, if you have:

In this case, you might want to group these resources separately in order to reduce the total bytes transferred in each day.

For example, a site may serve the following bundles today:

toppage
   https://example.com/app/index.js.abcd.wbn
     index.js (main-resurce)
     https://example.com/daily-hot-contents.00a0.wbn
       ...
     https://example.com/cold-contents.js.00a1.wbn
       ...

If the daily-hot-contents is updated in the next day, a site re-packages their resources as follows:

toppage
└── https://example.com/app/index.js.c0e3.wbn (changed)
    ├── index.js (main-resurce)
    ├── https://example.com/daily-host-contents.00c0.wbn (changed)
    │   └── ...
    └── https://example.com/cold-contents.js.00a1.wbn (didn't change)
        └── ...

Then, the total byte transferred in the next day is nearly the sum of index.js and https://example.com/daily-host-contents.00c0.wbn because the browser probably can load cold-contents.js.00a1.wbn, whose hash didn't change, from its cache.

I would like to separate code into bundles for various reasons, but I'm afraid this causes too many round-trips if bundles are nested and discovered one bye one. Is there any good way to avoid it?

A possible workaround as a user-land solution would be to declare indirect dependent bundles as well as direct dependent bundles.

In the example case, the top page may declare:

<!-- This is mandarory -->
<link rel=webbundle href=https://example.com/app/index.js.abcd.wbn resources="index.js" >
<!-- The followings are optional, as a hint for a browser to prefetch -->
<link rel=webbundle href=https://cdn.example.com/date-util.js.f1f2.wbn>
<link rel=webbundle href=https://cdn.example.com/common-util.js.3b3b.wbn>

The second and third link elements are not mandatory because they are not a direct dependency, but declaring them should guarantee that a browser prefetch these bundles in parallel, instead of on-demand basis.

How can we create CAB? Is there any tool?

Nothing yet.

In the future, we hope that js bundlers provide an option to use Content Addressable Bundles as its output format.

Also, it would be nice that CDN supports a CAB format to serve their contents so that we can link to their CAB from our application bundles.

How can mutable bundles and immutable bundles (CAB) interact each other?

TODO(hayato): We might want to backport a declarative dependency capability to mutable bundles as well as CAB.

We might want to let them refer to each other, with some restrictions:

We'll explore this problem space, and update this section.

Stakeholder Feedback / Opposition

Not yet.

References & acknowledgements