Derived Predicates

Glean supports predicates that are defined in terms of a query. There are two types of derived predicates, "stored" and "on demand".

Stored derived predicates

For example:

predicate OutTarget.1 :
    {
        file : src.File,
        target : Target,
    }
    stored {F,T} where TargetOut {T,F}

This is a schema for a predicate OutTarget with a key type as usual. But unlike a regular predicate, facts of this predicate are not generated by an indexer, instead they are generated by the query given on the final line.

The keyword stored tells Glean that the facts for this predicate will be stored in the database. Omitting the stored keyword indicates that you want the facts to be generated on demand; more about this in On-demand derived predicates below.

You can read the query as

There is a fact OutTarget {F,T} for every fact TargetOut {T,F}

The query can be any arbitrary Angle query; the syntax is described in Angle Guide. The only requirement is that the values produced by the query must match the key type of the predicate being defined.

Why is this useful? Well, the predicate TargetOut is defined like this:

predicate TargetOut.1 :
    {
        target : Target,
        file : src.File,
    }

This is a mapping from Target to File (see Efficient matching of facts). If we want the reverse mapping, from File to Target, we need a predicate with the fields in the other order, which is exactly what OutTarget is. But it would be laborious to write actual code to generate and store these facts in the database, so Glean allows us to define OutTarget directly in terms of a query, and it will automatically compute the facts of OutTarget and store them in the database.

When do the facts get computed and stored?

Using the glean command-line tool, you direct the server to compute and store the facts for a predicate like this:

glean derive --service <write-server> buck.TargetOut

Replacing <write-server> with the appropriate name of the write service you're using, and replace buck.TargetOut with the name of the predicate you want to derive.

This may take some time, depending on how many facts need to be computed and stored.

note

Remember to do this before using glean finish to mark the database as finished.

Deriving multiple predicates

You can derive multiple predicates together:

glean derive --service <write-server> <predicate> <predicate> ...

But note that these predicates must be independent; they cannot depend on each other. If you have derived predicates that depend on each other, you have to issue separate glean derive commands to derive the predicates in bottom-up dependency order.

On-demand derived predicates

The other type of derived predicate is one where the facts are not stored in the database, but are computed on-demand when there is a query for the predicate.

This is useful for a few reasons:

We can support backwards compatibility by defining old predicates in terms of new ones, or forwards compatibility by doing the reverse.
We can define queries that extract data and bundle it in a way that's convenient and efficient for the client. This allows clients to avoid fetching more data than they need, for example.
Most importantly, we can encapsulate complex queries by defining them in the schema as derived predicates, even building up libraries representing whole abstraction layers over the raw data. Clients can then use the higher-level abstraction regardless of where they're querying from or what language they're using. For a great example of using this in practice, see the codemarkup schema that we use to provide a language-neutral abstraction over language-specific schemas.

For example, in the cxx schema we have a lot of different kinds of declarations. Clients often want to search for a declaration by name, but each of the different declaration kinds has the name in a different place, so this ends up being quite a complicated query from the client side. Using a derived predicate we can easily capture this complexity in one place so that it can be reused by all clients that want to search for declarations by name:

predicate DeclarationWithName :
    {
        name : string,
        decl : Declaration
    }
    {Str, Decl} where
      N = Name Str;
      Decl =
        (Declaration (record_ R) where
          R = RecordDeclaration { name = { name = N }}) |
        (Declaration (function_ F) where
          F = FunctionDeclaration { name = { name = { name = N }}}) |
        # and so on, for all declaration types

Using this predicate requires no magic on the part of the client, they just query for the cxx1.DeclarationWithName predicate in exactly the same way as they would for other predicates, and the Glean query server returns the appropriate facts.

Derived predicates for schema migration

One important use case for derived predicates is to make it possible to change the schema without breaking things.

Essentially the idea is that a derived predicate can define the old predicate in terms of the new predicate, providing backwards-compatibility to clients that are expecting to query for the old predicate. Additionally, we might define the new predicate in terms of the old predicate, for forwards-compatibility to allow new clients to work with old data.

Let's work through an example to illustrate the process. Suppose your schema is like this:

schema lang.1 {

predicate Declaration :
    {
         name : string,
         source : src.Range,
     }
}

now suppose we want to add documentation to the declarations that we indexed. We define a new version of the schema, lang.2, with a new Declaration predicate:

schema lang.2 : lang.1 {

predicate Declaration :
    {
        name : string,
        source : src.Range,
        doc : string
    }
}

Now, we proceed to make our changes:

Update the schema
Modify the indexer to generate facts of the new predicate lang.Declaration.2

At this point, any DBs generated by the new indexer will have lang.Declaration.2 facts, and not lang.Declaration.1. Existing clients that query for the old facts will get no results. We can probably recompile those clients to pick up the new lang.Declaration.2 facts, but that would be a tricky migration: the new client won't work on the old DBs, and the old client won't work on the new DBs.

To make this migration smoother, we can add a derived predicate:

schema lang.2 : lang.1 {

predicate Declaration :
    {
        name : string,
        source : src.Range,
        doc : string
    }

 derive lang.Declaration.1
    { name = N, source = S } where
        lang.Declaration.2 { name = N, source = S }
}

the derive lang.Declaration.1 declaration is just like adding an on-demand derived predicate to predicate Declaration in the lang.1 schema, but we have to declare it as part of the lang.2 schema because it needs to refer to lang.Declaration.2.

This derived predicate takes effect as follows:

It does not apply to old DBs that contain lang.Declaration.1 but not lang.Declaration.2.
It does apply to new DBs created with the new lang.Declaration.2 schema. So after the schema change, we can only create facts of lang.Declaration.2, not the old predicate.

So clients that query for lang.Declaration.1 will continue to work both with old DBs containing lang.Declaration.1 and new DBs that contain lang.Declaration.2, and we can migrate them to use the new schema at our leisure.

Default derived predicates

There's one extra feature that can be used to make the schema migration even smoother.

Recall with the derive declaration in the previous section we had to synchronise the update of the schema with the rollout of the new version of the indexer to generate the new facts? It's possible to decouple those by making one tweak:

 derive lang.Declaration.1 default
    # ... same as before

The addition of the default keyword to the declaration has the following effect:

A default derived predicate only takes effect when the DB is complete (i.e. read-only) and contains no facts of the predicate.

This allows us to update the schema but still generate facts of the old predicate. The derived predicate will only kick in when we update the indexer to generate the new facts.

What's more, we can use this technique to provide forwards compatibility too:

 derive lang.Declaration.2 default
    { name = N, source = S, doc = "" } where
        lang.Declaration.1 { name = N, source = S }

Since this is a default derivation, it will take effect when there are no facts of the new predicate. So we can update clients to work with the new version of the predicate, and they will continue to work on old DBs - albeit with empty strings for the new doc field, because the old DBs don't contain that data.

How do I write and test a derived predicate?

There are two testing workflows in the following sections, depending on whether you want to test an on-demand or a stored derived predicate.

When you're done, to make the derived predicate available see Schema Workflow.

Testing an on-demand derived predicate

There's a process for iterating and testing derived predicates using the shell with a local database. Follow these steps:

Obtain the DB you want to test with, let's assume you put it in ~/local/gleandb.

Start the shell with the local DB and schema:

glean shell --db-root ~/local/gleandb --schema glean/schema/source

Select your DB with the :db command.

Make edits to the local schema source files in glean/schema/source. There's no need to run glean/schema/sync, you can pick up the changes immediately in the shell:

:reload

Test your derived predicate using queries in the shell, use :reload to pick up new changes, and repeat as necessary.

The :timeout command can be used to change the default query timeout while iterating.

If you run into performance issues, try the techniques in Debugging Queries.

Testing a stored derived predicate

If you're adding a new stored derived predicate to the schema, the workflow is as follows.

First, obtain the base DB with the facts you want to derive from. Let's say you've put it in ~/local/gleandb, and the DB is called base/0.

Make your modifications to the schema in glean/schema/source.

Stack a new empty DB on top of the base DB:

glean create --db-root ~/local/gleandb --db stacked/0 \
    --stacked base/0 \
    --schema glean/schema/source \
    --update-schema-for-stacked

The flag --update-schema-for-stacked is important, it tells Glean that you want to use the current schema for the stacked DB and not the schema in the base DB. Without the flag, the changes you've made to the schema won't be visible in the stacked DB.

note

You're only allowed to add things to the schema in the stacked DB, not change things. Changes to existing predicates and types will be rejected with an error message. This is because a DB stack has a single schema to describe the data it contains; we can extend the schema when stacking new DBs, but we can't modify the schema that describes the data in the rest of the stack.

If you need to test changes to an existing predicate, copy the predicate and give it a new name to test it, and then fold the changes back into the original when you've finished testing.

Now, you can derive your new predicate:

glean derive --db-root ~/local/gleandb --db stacked/0 my.new.Predicate

and inspect the results in the shell:

glean shell --db-root ~/local/gleandb --db stacked/0

Stored derived predicates​

When do the facts get computed and stored?​

Deriving multiple predicates​

On-demand derived predicates​

Derived predicates for schema migration​

Default derived predicates​

How do I write and test a derived predicate?​

Testing an on-demand derived predicate​

Testing a stored derived predicate​