# Decoders

LMQL support various decoding algorithms, which are used to generate text from the token distribution of a language model. The decoding algorithm in use, is specified right at the beginning of a query, e.g. `argmax`. Here, we provide a brief overview of the currently supported decoders.

LMQL also includes a library for array-based decoding `dclib`, which can be used to implement custom decoders. More information on this, will be provided in the future. The implementation of the available decoding procedures is located in `src/lmql/runtime/dclib/decoders.py` of the LMQL repository.

In general, all LMQL decoding algorithms are model-agnostic and can be used with any LMQL-supported inference backend. For more information on the supported inference backends, see the [Models](./models.md) chapter.

## Specifying The Decoding Algorithm

Depending on the context, LMQL offers two ways to specify the decoding algorithm to use. 

**Queries with Decoding Clause**: The first option is to simply specify the decoding algorithm and its parameters as part of the query itself. This can be particularly useful, if your choice of decoder is relevant and should be part of your program.

```{lmql}

name::specify-decoder
beam(n=2)
    "This is a query with a specified decoder: [RESPONSE]
from
    "openai/text-ada-001"
```

**Specifying the Decoding Algorithm Externally**: The second option is to specify the decoding algorithm and parameters externally, i.e. separatly from the actual program code:

```python
import lmql

@lmql.query(model="openai/text-davinci-003", decoder="sample", temperature=1.8)
def tell_a_joke():
    '''lmql
    """A list good dad joke. A indicates the punchline:
    Q:[JOKE]
    A:[PUNCHLINE]""" where STOPS_AT(JOKE, "?") and  STOPS_AT(PUNCHLINE, "\n")
    '''

tell_a_joke() # uses the decoder specified in @lmql.query(...)
tell_a_joke(decoder="beam", n=2) # uses a beam search decoder with n=2
```

This is only possible when using LMQL from a Python program. For more information on this, also see the chapter on how to specify the [model to use for decoding](models.md).
## Supported Decoding Algorithms

In general, the very first keyword of an LMQL query, specifies the decoding algorithm to use. For this, the following decoder keywords are available:

### `argmax`

The `argmax` decoder is the simplest decoder available in LMQL. It greedily selects the most likely token at each step of the decoding process. It has no additional parameters. Since `argmax` decoding is deterministic, one can only generate a single sequence at a time.

### `sample(n: int, temperature: float)`

The `sample` decoder samples `n` sequences in parallel from the model. The `temperature` parameter controls the randomness of the sampling process. Higher values of `temperature` lead to more random samples, while lower values lead to more likely samples. A temperature value of `0.0` is equivalent to the `argmax` decoder.

### `beam(n: int)`

A simple beam search decoder. The `n` parameter controls the beam size. The beam search decoder is deterministic, so it will generate the same `n` sequences every time. The result of a `beam` query is a list of `n` sequences, sorted by their likelihood.

### `beam_sample(n: int, temperature: float)`

A beam search decoder that samples from the beam at each step. The `n` parameter controls the beam size, while the `temperature` parameter controls the randomness of the sampling process. The result of a `beam_sample` query is a list of `n` sequences, sorted by their likelihood.

## Novel Decoders

LMQL also implements a number of novel decoders. These decoders are experimental and may not work as expected. They are also not guaranteed to be stable across different LMQL versions. More documentation on these decoders will be provided in the future.

### `var(b: int, n: int)`

An experimental implementation of variable-level beam search.

### `beam_var(n: int)`

An experimental implementation of a beam search procedure that groups by currently-decoded variable and applies adjusted length penalties.

## Inspecting Decoding Trees

LMQL also provides a way to inspect the decoding trees generated by the decoders. For this, make sure to execute the query in the Playground IDE and click on the `Advanced Mode` button, in the top right corner of the Playground. This will open a new pane, where you can navigate and inspect the LMQL decoding tree.

Among other things, this view allows you to track the decoding process, active hypotheses and interpreter state, including the current evaluation result of the `where` clause. For an example, consider the [translation example](https://lmql.ai/playground/#translation) as included in the Playground IDE (make sure to enable `Advanced Mode`).


## Other Decoding Parameters

* `max_len: int` - The maximum length of the generated sequence. If not specified, the default value of `max_len` is `512`. Note if the maximum length is reached, the LMQL runtime will throw an error if the query has not yet come to a valid result, according to the provided `where` clause.

* `openai_chunksize: int` - The chunksize parameter for OpenAI's `Completion` API. If not specified, the default value of `openai_chunksize` is `32`. See also the description of this parameter in the [Models](./models.md#configuring-speculative-openai-api-use) chapter.