Log Output Options

Jobs in Logpush now have a new key, output_options, which replaces logpull_options and allows for more flexible formatting. You can modify output_options via the API.

Replace logpull_options

Previously, Logpush jobs could be customized by specifying the list of fields, sampling rate, and timestamp format in logpull_options as URL-encoded parameters. For example:

{    "id": 146,    "dataset": "http_requests",    "enabled": false,    "name": "<DOMAIN_NAME>",    "logpull_options": "fields=ClientIP,EdgeStartTimestamp,RayID&sample=0.1&timestamps=rfc3339",    "destination_conf": "s3://<BUCKET_PATH>?region=us-west-2"
}

We have replaced this with output_options as it is used for both Logpull and Logpush.

{    "id": 146,    "dataset": "http_requests",    "enabled": false,    "name": "<DOMAIN_NAME>",    "output_options": {        "field_names": ["ClientIP", "EdgeStartTimestamp", "RayID"],        "sample_rate": 0.1,        "timestamp_format": "rfc3339"    },    "destination_conf": "s3://<BUCKET_PATH>?region=us-west-2"
}

Output types

By default Logpush outputs each record as a single line of JSON (also known as ndjson).

With output_options you can switch to CSV or single JSON object, further customize prefixes, suffixes, delimiters, or provide your own record template (in a stripped-down version of Go text/template syntax).

The output_options object has the following settings:

field_names: array of strings. For the moment, there is no option to add all fields at once, you need to specify the fields names.
output_type: string to specify output type, such as ndjson or csv (default ndjson). This sets default values for the rest of the settings depending on the chosen output type. Some formatting rules (like string quoting) are different between output types.
batch_prefix: string to be prepended before each batch.
batch_suffix: string to be appended after each batch.
record_prefix: string to be prepended before each record.
record_suffix: string to be appended after each record.
record_template: string to use as template for each record instead of the default comma-separated list. All fields used in the template must be present in field_names as well, otherwise they will end up as null. Format as a Go text/template without any standard functions (like conditionals, loops, sub-templates, etc.). The template can only consist of these three types of tokens:
- Action: this is either a {{ .Field }} or a {{ "constant text" }}.
- Text: this is just constant text in-between the {{ actions }}.
- Comment: the {{/* comments */}} are silently dropped.
record_delimiter: string to be inserted in-between the records as separator.
field_delimiter: string to join fields. Will be ignored when record_template is set.
timestamp_format: string to specify format for timestamps, such as unixnano, unix, or rfc3339. Default unixnano.
sample_rate: floating number to specify sampling rate (default 1.0: no sampling). Sampling is applied on top of filtering, and regardless of the current sample_interval of the data.
CVE-2021-44228: bool, default false. If set to true, will cause all occurrences of ${ in the generated files to be replaced with x{.

Examples

Specifying field_names and output_type will result in the remaining options being configured as below for the specified output_type:

ndjson

Default output_options for ndjson

{    "record_prefix": "{",    "record_suffix": "}\n",    "field_delimiter": ","
}

Example output_options

"output_options": {    "field_names": ["ClientIP", "EdgeStartTimestamp", "RayID"],    "output_type": "ndjson"
}

Example output

{"ClientIP":"89.163.242.206","EdgeStartTimestamp":1506702504433000200,"RayID":"3a6050bcbe121a87"}
{"ClientIP":"89.163.242.207","EdgeStartTimestamp":1506702504433000300,"RayID":"3a6050bcbe121a88"}
{"ClientIP":"89.163.242.208","EdgeStartTimestamp":1506702504433000400,"RayID":"3a6050bcbe121a89"}

ndjson with different field names:

Example output_options

"output_options": {    "field_names": ["ClientIP", "EdgeStartTimestamp", "RayID"],    "output_type": "ndjson",    "record_template": "\"client-ip\":{{.ClientIP}},\"timestamp\":{{.EdgeStartTimestamp}},\"ray-id\":{{.RayID}}"
}

Example output

{"client-ip":"89.163.242.206","timestamp":1506702504433000200,"ray-id":"3a6050bcbe121a87"}
{"client-ip":"89.163.242.207","timestamp":1506702504433000300,"ray-id":"3a6050bcbe121a88"}
{"client-ip":"89.163.242.208","timestamp":1506702504433000400,"ray-id":"3a6050bcbe121a89"}

Literal with double curly-braces ({{}}), that is, "double{{curly}}braces", can be inserted following go text/template convention, that is, "{{double{{curly}}braces}}".

csv

Default output_options for CSV

{    "record_suffix": "\n",    "field_delimiter": ","
}

Example output_options

"output_options": {    "field_names": ["ClientIP", "EdgeStartTimestamp", "RayID"],    "output_type": "csv"
}

Example output

"89.163.242.206",1506702504433000200,"3a6050bcbe121a87"
"89.163.242.207",1506702504433000300,"3a6050bcbe121a88"
"89.163.242.208",1506702504433000400,"3a6050bcbe121a89"

csv/json variants

Based on above, other formats similar to csv or json are also supported:

csv with header:

Example output_options

"output_options": {    "field_names": ["ClientIP", "EdgeStartTimestamp", "RayID"],    "output_type": "csv",    "batch_prefix": "ClientIP,EdgeStartTimestamp,RayID\n"
}

Example output

ClientIP,EdgeStartTimestamp,RayID
"89.163.242.206",1506702504433000200,"3a6050bcbe121a87"
"89.163.242.207",1506702504433000300,"3a6050bcbe121a88"
"89.163.242.208",1506702504433000400,"3a6050bcbe121a89"

tsv with header:

Example output_options

"output_options": {    "field_names": ["ClientIP", "EdgeStartTimestamp", "RayID"],    "output_type": "csv",    "batch_prefix": "ClientIP\tEdgeStartTimestamp\tRayID\n",    "field_delimiter": "\t"
}

Example output

ClientIP EdgeStartTimestamp  RayID
"89.163.242.206"    1506702504433000200 "3a6050bcbe121a87"
"89.163.242.207"    1506702504433000300 "3a6050bcbe121a88"
"89.163.242.208"    1506702504433000400 "3a6050bcbe121a89"

json with nested object:

Example output_options

"output_options": {    "field_names": ["ClientIP", "EdgeStartTimestamp", "RayID"],    "output_type": "ndjson",    "batch_prefix": "{\"events\":[",    "batch_suffix": "\n]}\n",    "record_prefix": "\n  {\"info\":{",    "record_suffix": "}}",    "record_delimiter": ","
}

Example output

{"events":[  {"info":{"ClientIP":"89.163.242.206","EdgeStartTimestamp":1506702504433000200,"RayID":"3a6050bcbe121a87"}},  {"info":{"ClientIP":"89.163.242.207","EdgeStartTimestamp":1506702504433000300,"RayID":"3a6050bcbe121a88"}},  {"info":{"ClientIP":"89.163.242.208","EdgeStartTimestamp":1506702504433000400,"RayID":"3a6050bcbe121a89"}}
]}

How to migrate

In order to migrate your jobs from using logpull_options to the new output_options, take these steps:

Change the &fields=ClientIP,EdgeStartTimestamp,RayID parameter to an array in output_options.field_names.
Change the &sample=0.1 parameter to output_options.sample_rate.
Change the &timestamps=rfc3339 parameter to output_options.timestamp_format.
Change the &CVE-2021-44228=true parameter to output_options.CVE-2021-44228.

For example, if logpull_options are fields=ClientIP,EdgeStartTimestamp,RayID&sample=0.1&timestamps=rfc3339&CVE-2021-44228=true, the output_options would be:

"output_options": {    "field_names": ["ClientIP", "EdgeStartTimestamp", "RayID"],    "sample_rate": 0.1,    "timestamp_format": "rfc3339",    "CVE-2021-44228": true
}

Log Output Options

​​ Replace logpull_options

​​ Output types

​​ Examples

​​ ndjson

​​ csv

​​ csv/json variants

​​ How to migrate

Replace logpull_options

Output types

Examples

ndjson

csv

csv/json variants

How to migrate