Introduction

Just want to get started and read the docs later? Jump to Building Glean.

Overview

Glean is a system for working with facts about source code. It is designed for collecting and storing detailed information about code structure, and providing access to the data to power tools and experiences from online IDE features to offline code analysis.

For example, Glean can return results for queries that your IDE might need to perform, such as:

Where is the definition of this method?
Where are all the callers of this function?
Who inherits from this class?
What are all the declarations in this file?

Moreover, Glean can return results for these kinds of queries accurately and efficiently on a large-scale codebase.

But Glean isn't limited to storing particular kinds of data, or answering particular queries. Glean comes with indexers and schemas for some languages which support queries like the examples above, but you can also define your own schemas and store whatever data you like, perhaps augmenting the data that existing indexers collect. So, for example, you could store test coverage data or profiling data.

Glean's powerful query language means that you can build tools around complex queries of the underlying data. For example, you could search for dead code, write code linters, API migration tools or refactoring tools, all by using Glean queries instead of a compiler API to inspect the code structure.

Components

Glean consists of the following:

An efficient storage backend built on RocksDB, for storing facts. Facts are immutable terms described by user-defined schemas, and form a DAG. Facts are automatically de-duplicated by the storage backend. Think of it as being able to store and query the AST of your code, efficiently and with full type-safety¹.
A query engine implementing our declarative query language Angle. Angle is a logic language with similarities to Datalog, but with extensions that make it suitable for building complex queries over Glean data². Like in Datalog, Glean can derive new facts automatically by defining rules using Angle.
A server that manages multiple databases on disk, and serves requests from clients to create, write, and query databases. The server currently uses Thrift, but there's no reason there couldn't also be servers exposing other protocols in the future. The server is designed to be deployed at scale, serving replicated databases to large numbers of clients.
An interactive shell where you can type queries and explore the data.
A command-line tool for creating, writing, and querying databases, either directly or by connecting to the server.
Several example schemas for common programming languages, and indexers for some of those. Note that Glean doesn't force all the data into a single schema; there can be arbitrary amounts of language-specific detail in the schema for each language. Language-neutral abstractions can be built by deriving facts using Angle.
Glass, a language-agnostic symbol server. Glass is a server layer on top of Glean that exposes an API for performing common language-independent queries over Glean data, such as listing the symbols in a source file. Glass can be used as the basis for language tools; in fact Glass is used to power our Glean-based LSP server.
A generic LSP server, built using Glean and Glass. This can be used to browse a large codebase in VS Code³: index the code using Glean, and connect VS Code to the LSP server to provide common code-navigation features such as go-to-definition, find-references, and symbol search.

while we could in principle store the full AST, for efficiency reasons we typically store only the parts we need for the clients we want to support. Usually that means things like the locations of definitions and cross-references, but not expressions. ↩
If you're familiar with Datalog, it's worth noting that currently Angle is limited to non-recursive queries only. ↩
Or any IDE that supports LSP. ↩

Overview​

Components​

Footnotes​

Overview

Components

Footnotes