1
    2
    3
    4
    5
    6
    7
    8
    9
   10
   11
   12
   13
   14
   15
   16
   17
   18
   19
   20
   21
   22
   23
   24
   25
   26
   27
   28
   29
   30
   31
   32
   33
   34
   35
   36
   37
   38
   39
   40
   41
   42
   43
   44
   45
   46
   47
   48
   49
   50
   51
   52
   53
   54
   55
   56
   57
   58
   59
   60
   61
   62
   63
   64
   65
   66
   67
   68
   69
   70
   71
   72
   73
   74
   75
   76
   77
   78
   79
   80
   81
   82
   83
   84
   85
   86
   87
   88
   89
   90
   91
   92
   93
   94
   95
   96
   97
   98
   99
  100
  101
  102
  103
  104
  105
  106
  107
  108
  109
  110
  111
  112
  113
  114
  115
  116
  117
  118
  119
  120
  121
  122
  123
  124
  125
  126
  127
  128
  129
  130
  131
  132
  133
  134
  135
  136
  137
  138
  139
  140
  141
  142
  143
  144
  145
  146
  147
  148
  149
  150
  151
  152
  153
  154
  155
  156
  157
  158
  159
  160
  161
  162
  163
  164
  165
  166
  167
  168
  169
  170
  171
  172
  173
  174
  175
  176
  177
  178
  179
  180
  181
  182
  183
  184
  185
  186
  187
  188
  189
  190
  191
  192
  193
  194
  195
  196
  197
  198
  199

content / browser / cache_storage / README.md [blame]

# Architecture (as of July 29th 2016)
This document describes the browser-process implementation of the [Cache
Storage specification](
https://w3c.github.io/ServiceWorker/#cachestorage-interface).

As of June 2018, Chrome components can use the Cache Storage interface via
`CacheStorageManager` to store Request/Response key-value pairs. The concept of
`CacheStorageOwner` was added to distinguish and isolate the different
components.

## Major Classes and Ownership
### Ownership
Where '=>' represents ownership, '->' is a reference, and '~>' is a weak
reference.

##### `CacheStorageContextImpl`->`CacheStorageManager`=>`CacheStorage`=>`CacheStorageCache`
* A `CacheStorageManager` can own multiple `CacheStorage` objects.
* A `CacheStorage` can own multiple `CacheStorageCache` objects.

##### `StoragePartitionImpl`->`CacheStorageContextImpl`
* `StoragePartitionImpl` effectively owns the `CacheStorageContextImpl` in the
  sense that it calls `CacheStorageContextImpl::Shutdown()` on deletion which
  resets its `CacheStorageManager`.

##### `RenderProcessHost`->`CacheStorageDispatcherHost`->`CacheStorageContextImpl`

##### `CacheStorageDispatcherHost`=>`CacheStorageCacheHandle`~>`CacheStorageCache`
* The `CacheStorageDispatcherHost` holds onto handles for:
  * JavaScript references to cache objects

##### `CacheStorageDispatcherHost`=>`CacheStorageHandle`~>`CacheStorage`
* The `CacheStorageDispatcherHost` holds onto handles for:
  * JavaScript references to caches

##### `CacheStorageCacheDataHandle`=>`CacheStorageCacheHandle`~>`CacheStorageCache`
* `CacheStorageCacheDataHandle` is the blob data handle for a response body
  and it holds a `CacheStorageCacheHandle`.  It streams from the
  `disk_cache::Entry` response stream. It's necessary that the
  `disk_cache::Backend` (owned by `CacheStorageCache`) stays open so long as
  one of its `disk_cache::Entry`s is reachable. Otherwise, a new backend might
  open and clobber the entry.

##### `CacheStorageCache`=>`CacheStorageCacheHandle`~>`CacheStorageCache`
* The `CacheStorageCache` will hold a self-reference while executing an
  operation.  This self-reference is dropped between subsequent operations,
  so shutdown is possible when there are no external references even if there
  are more operations in the scheduler queue.

### CacheStorageDispatcherHost
1. Receives IPC messages from a render process and creates the appropriate
   `CacheStorageManager` or `CacheStorageCache` operation.
2. For each operation, holds a `CacheStorageCacheHandle` to keep the cache
   alive since the operation is asynchronous.
3. For each cache reference held by the render process, holds a
   `CacheStorageCacheHandle`.
4. For each CacheStorage reference held by the renderer process, holds a
   `CacheStorageHandle`.  This is used to inform the CacheStorage about
   whether its externally used so it can keep warmed cache objects alive
   to mitigate rapid opening/closing/opening churn.

### CacheStorageManager
1. Forwards calls to the appropriate `CacheStorage` for a given origin-owner
   pair, loading `CacheStorage`s on demand.
2. Handles `QuotaManager` and `BrowsingData` calls.

### CacheStorage
1. Manages the caches for a single origin-owner pair.
2. Handles creation/deletion of caches and updates the index on disk
   accordingly.
3. Manages operations that span multiple caches (e.g., `CacheStorage::Match`).
4. Backend-specific information is handled by `CacheStorage::CacheLoader`

### CacheStorageCache
1. Creates or opens a net::disk_cache (either `SimpleCache` or `MemoryCache`)
   on initialization.
2. Handles add/put/delete/match/keys calls.
3. Owned by `CacheStorage` and deleted either when `CacheStorage` deletes or
   when the last `CacheStorageCacheHandle` for the cache is gone.

### CacheStorageIndex
1. Manages an ordered collection of metadata
   (CacheStorageIndex::CacheStorageMetadata) for each CacheStorageCache owned
   by a given CacheStorage instance.
2. Is serialized by CacheStorage::CacheLoader (WriteIndex/LoadIndex) as a
   Protobuf file.

### CacheStorageCacheHandle
1. Holds a weak reference to a `CacheStorageCache`.
2. When the last `CacheStorageCacheHandle` to a `CacheStorageCache` is
   deleted, so to is the `CacheStorageCache`.
3. The `CacheStorageCache` may be deleted before the `CacheStorageCacheHandle`
   (on `CacheStorage` destruction), so it must be checked for validity before
   use.

### CacheStorageHandle
1. Holds a weak reference to a `CacheStorage`.
2. When the last `CacheStorageHandle` to a `CacheStorage` is
   deleted, internal state is cleaned up.  The `CacheStorage` object is not
   deleted, however.
3. The `CacheStorage` may be deleted before the `CacheStorageHandle`
   (on browser shutdown), so it must be checked for validity before use.

## Directory Structure
$PROFILE/Service Worker/CacheStorage/`origin`/`cache`/

Where `origin` is a hash of the origin and `cache` is a GUID generated at the
cache's creation time.

The reason a random directory is used for a cache is so that a cache can be
doomed and still used by old references while another cache with the same name
is created.

### Directory Contents
`CacheStorage` creates its own index file (index.txt), which contains a
mapping of cache names to its path on disk. On `CacheStorage` initialization,
directories not in the index are deleted.

Each `CacheStorageCache` has a `disk_cache::Backend` backend, which writes in
the `CacheStorageCache`'s directory.

## Layout of the disk_cache::Backend
A cache is represented by a `disk_cache::Backend`. The Request/Response pairs
referred to in the specification are stored as `disk_cache::Entry`s.  Each
`disk_cache::Entry` has three streams: one for storing a protobuf with the
request/response metadata (e.g., the headers, the request URL, and opacity
information), another for storing the response body, and a final stream for
storing any additional data (e.g., compiled JavaScript).

The entries are keyed by full URL. This has a few ramifications:
 1. Multiple vary responses for a single request URL are not supported.
 2. Operations that may require scanning multiple URLs (e.g., `ignoreSearch`)
    must scan every entry in the cache.

*The above could be fixed by changes to the backend or by introducing indirect
entries in the cache. The indirect entries would be for the query-stripped
request URL. It would point to entries to each query request/response pair and
for each vary request/response pair.*

## Threads
* CacheStorage classes live on the IO thread. Exceptions include:
  * `CacheStorageContextImpl` which is created on UI but otherwise runs and is
   deleted on IO.
  * `CacheStorageDispatcherHost` which is created on UI but otherwise runs and
    is deleted on IO.
* Index file manipulation and directory creation/deletion occurs on a
  `SequencedTaskRunner` assigned at `CacheStorageContextImpl` creation.
* The `disk_cache::Backend` lives on the IO thread and uses its own worker
  pool to implement async operations.

## Asynchronous Idioms in CacheStorage and CacheStorageCache
1. All async methods should asynchronously run their callbacks.
2. The async methods often include several asynchronous steps. Each step
   passes a continuation callback on to the next. The continuation includes
   all of the necessary state for the operation.
3. Callbacks are guaranteed to run so long as the object
   (`CacheStorageCacheCache` or `CacheStorage`) is still alive. Once the
   object is deleted, the callbacks are dropped. We don't worry about dropped
   callbacks on shutdown. If deleting prior to shutdown, one should `Close()`
   a `CacheStorage` or `CacheStorageCache` to ensure that all operations have
   completed before deleting it.

### Scheduling Operations
Operations are scheduled in a sequential scheduler (`CacheStorageScheduler`).
Each `CacheStorage` and `CacheStorageCache` has its own scheduler. If an
operation freezes, then the scheduler is frozen. If a `CacheStorage` call winds
up calling something from every `CacheStorageCache` (e.g.,
`CacheStorage::Match`), then one frozen `CacheStorageCache` can freeze the
`CacheStorage` as well. This has happened in the past (`Cache::Put` called
`QuotaManager` to determine how much room was available, which in turn called
`Cache::Size`). Be careful to avoid situations in which one operation triggers
a dependency on another operation from the same scheduler.

At the end of an operation, the scheduler needs to be kicked to start the next
operation. The idiom for this in CacheStorage/ is to wrap the operation's
callback with a function that will run the callback as well as advance the
scheduler. So long as the operation runs its wrapped callback the scheduler
will advance.

## Opaque Resource Size Obfuscation
Applications can cache cross-origin resources as per
[Cross-Origin Resources and CORS](https://www.w3.org/TR/service-workers-1/#cross-origin-resources).
Opaque responses are also cached, but in order to prevent "leaking" the size
of opaque responses their sizes are obfuscated. Random padding is added to the
actual size making it difficult for an attacker to ascertain the actual resource
size via quota APIs.

When Chromium starts, a new random padding key is generated and used
for all new caches created. This key is used by each cache to calculate padding
for opaque resources. Each cache's key is persisted to disk in the cache index file

Each cache maintains the total padding for all opaque resources within the
cache. This padding is added to the actual resource size when reporting sizes
to the quota manager.

The padding algorithm version is also written to each cache allowing for it
to be changed at a future date. CacheStorage will use the persisted key and
padding from the cache's index unless the padding algorithm has been changed,
one of values is missing, or deemed to be incorrect. In this situation the cache
is enumerated and the padding recalculated during open.