Skip to content

Commit

Permalink
[azure logs] add routing integration to use only one azure-eventhub i…
Browse files Browse the repository at this point in the history
…nput (#11984)

Switch the integration package from the one-input-per-data-stream model to the one-input model.

One input per data stream model:

![image](https://github.com/user-attachments/assets/ce60cd18-80fc-4805-aebf-322dcfd5f374)

One input model:

![image](https://github.com/user-attachments/assets/b8ae0dbd-6ca6-4caa-ad5c-3c4a6f573dee)

In the one-input model, there is only one azure-eventhub input running and sending events to the `events` data stream. In the `events` data stream, the ingest pipeline performs these tasks:

- discover and set the `event.dataset` field using the `category` field in the event.
- use the `event.dataset` field to reroute the event to the target data stream.

The discover process uses the following logic:

- if the event doesn't have a category, it sets `event.dataset` to `azure.eventhub` (the generic integration)
- if the event does have a category, it sets `event.dataset` to `azure.platformlogs` (it's probably an Azure log)
- if the event category is supported, it sets `event.dataset` to specific one like `azure.activitylogs` or `azure.signinlogs`.

After the discovery step, the routing rules use the `event.dataset` value to forward the events to the best available target data stream.
  • Loading branch information
zmoog authored Dec 9, 2024
1 parent d19f5c2 commit f99850b
Show file tree
Hide file tree
Showing 12 changed files with 1,417 additions and 11 deletions.
1 change: 1 addition & 0 deletions .github/CODEOWNERS
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,7 @@
/packages/azure/data_stream/activitylogs @elastic/obs-infraobs-integrations
/packages/azure/data_stream/auditlogs @elastic/obs-infraobs-integrations
/packages/azure/data_stream/eventhub @elastic/obs-ds-hosted-services
/packages/azure/data_stream/events @elastic/obs-ds-hosted-services
/packages/azure/data_stream/identity_protection @elastic/obs-infraobs-integrations
/packages/azure/data_stream/platformlogs @elastic/obs-infraobs-integrations
/packages/azure/data_stream/provisioning @elastic/obs-infraobs-integrations
Expand Down
504 changes: 504 additions & 0 deletions packages/azure/_dev/build/docs/events.md

Large diffs are not rendered by default.

5 changes: 5 additions & 0 deletions packages/azure/changelog.yml
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
- version: "1.20.0"
changes:
- description: Add Azure Logs integration v2 (preview)
type: enhancement
link: https://github.com/elastic/integrations/pull/11984
- version: "1.19.4"
changes:
- description: Fix destination.geo.region_name mapping.
Expand Down
46 changes: 46 additions & 0 deletions packages/azure/data_stream/events/agent/stream/stream.yml.hbs
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
{{#if connection_string}}
connection_string: {{connection_string}}
{{/if}}
{{#if storage_account_container }}
storage_account_container: {{storage_account_container}}
{{else}}
{{#if eventhub}}
storage_account_container: filebeat-events-{{eventhub}}
{{/if}}
{{/if}}
{{#if eventhub}}
eventhub: {{eventhub}}
{{/if}}
{{#if consumer_group}}
consumer_group: {{consumer_group}}
{{/if}}
{{#if storage_account}}
storage_account: {{storage_account}}
{{/if}}
{{#if storage_account_key}}
storage_account_key: {{storage_account_key}}
{{/if}}
{{#if resource_manager_endpoint}}
resource_manager_endpoint: {{resource_manager_endpoint}}
{{/if}}
tags:
{{#if preserve_original_event}}
- preserve_original_event
{{/if}}
{{#each tags as |tag i|}}
- {{tag}}
{{/each}}
{{#contains "forwarded" tags}}
publisher_pipeline.disable_host: true
{{/contains}}
{{#if processors}}
processors:
{{processors}}
{{/if}}
sanitize_options:
{{#if sanitize_newlines}}
- NEW_LINES
{{/if}}
{{#if sanitize_singlequotes}}
- SINGLE_QUOTES
{{/if}}
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
---
description: Pipeline for parsing Azure logs.
processors:
- set:
field: ecs.version
value: '8.11.0'
# TODO: we can remove this processor when https://github.com/elastic/beats/issues/40561
# is fixed and released.
- rename:
field: azure
target_field: azure-eventhub
if: 'ctx.azure?.eventhub != null'
ignore_missing: true
- set:
field: event.kind
value: event

#
# Set `event.dataset` value based on the event category.
# ------------------------------------------------------
#
# In the `routing_rules.yml` file, we use the
# `event.dataset` value to route the event to
# the appropriate data stream.
#

- json:
field: message
target_field: tmp_json
description: 'Parses the message field as JSON and stores it in a temporary field to identify the event dataset.'

# Defaults to azure.events if the `category` field is not present.
- set:
field: event.dataset
value: azure.events
description: 'Sets the default event dataset.'

# Sets the event dataset based on the `category` field.
- set:
field: event.dataset
value: azure.platformlogs
if: 'ctx.tmp_json?.category != null'
description: 'If the event has a category field, we consider it a platform log.'
- set:
field: event.dataset
value: azure.activitylogs
if: 'ctx.tmp_json?.category == "Administrative" || ctx.tmp_json?.category == "Security" || ctx.tmp_json?.category == "ServiceHealth" || ctx.tmp_json?.category == "Alert" || ctx.tmp_json?.category == "Recommendation" || ctx.tmp_json?.category == "Policy" || ctx.tmp_json?.category == "Autoscale" || ctx.tmp_json?.category == "ResourceHealth"'
- set:
field: event.dataset
value: azure.application_gateway
if: 'ctx.tmp_json?.category == "ApplicationGatewayFirewallLog" || ctx.tmp_json?.category == "ApplicationGatewayAccessLog"'
- set:
field: event.dataset
value: azure.auditlogs
if: 'ctx.tmp_json?.category == "AuditLogs"'
- set:
field: event.dataset
value: azure.firewall_logs
if: 'ctx.tmp_json?.category == "AzureFirewallApplicationRule" || ctx.tmp_json?.category == "AzureFirewallNetworkRule" || ctx.tmp_json?.category == "AzureFirewallDnsProxy" || ctx.tmp_json?.category == "AZFWApplicationRule" || ctx.tmp_json?.category == "AZFWNetworkRule" || ctx.tmp_json?.category == "AZFWNatRule" || ctx.tmp_json?.category == "AZFWDnsQuery"'
- set:
field: event.dataset
value: azure.graphactivitylogs
if: 'ctx.tmp_json?.category == "MicrosoftGraphActivityLogs"'
- set:
field: event.dataset
value: azure.identity_protection
if: 'ctx.tmp_json?.category == "RiskyUsers" || ctx.tmp_json?.category == "UserRiskEvents"'
- set:
field: event.dataset
value: azure.provisioning
if: 'ctx.tmp_json?.category == "ProvisioningLogs"'
- set:
field: event.dataset
value: azure.signinlogs
if: 'ctx.tmp_json?.category == "SignInLogs" || ctx.tmp_json?.category == "NonInteractiveUserSignInLogs" || ctx.tmp_json?.category == "ServicePrincipalSignInLogs" || ctx.tmp_json?.category == "ManagedIdentitySignInLogs"'
- set:
field: event.dataset
value: azure.springcloudlogs
if: 'ctx.tmp_json?.category == "ApplicationConsole" || ctx.tmp_json?.category == "SystemLogs" || ctx.tmp_json?.category == "IngressLogs" || ctx.tmp_json?.category == "BuildLogs" || ctx.tmp_json?.category == "ContainerEventLogs"'
description: 'Azure Spring Apps log categories (refs: https://learn.microsoft.com/en-us/azure/azure-monitor/reference/supported-logs/microsoft-appplatform-spring-logs)'

# Remove the temporary field used to identify the event dataset.
- remove:
field: tmp_json
ignore_missing: true
description: 'Removes the temporary field used to identify the event dataset.'

#
#
# Error handling
#

on_failure:
- set:
field: error.message
value: '{{ _ingest.on_failure_message }}'
16 changes: 16 additions & 0 deletions packages/azure/data_stream/events/fields/base-fields.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
- name: '@timestamp'
type: date
description: Event timestamp.
- name: data_stream.type
type: constant_keyword
description: Data stream type.
- name: data_stream.dataset
type: constant_keyword
description: Data stream dataset name.
- name: data_stream.namespace
type: constant_keyword
description: Data stream namespace.
- name: event.module
type: constant_keyword
description: Event module
value: azure
27 changes: 27 additions & 0 deletions packages/azure/data_stream/events/fields/fields.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
- name: azure-eventhub
type: group
fields:
- name: eventhub
type: keyword
description: |
Event hub name
- name: offset
type: long
description: |
Offset
- name: enqueued_time
type: keyword
description: |
The enqueued time
- name: partition_id
type: keyword
description: |
Partition ID
- name: consumer_group
type: keyword
description: |
Consumer group
- name: sequence_number
type: long
description: |-
Sequence number
42 changes: 42 additions & 0 deletions packages/azure/data_stream/events/fields/package-fields.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
- name: azure
type: group
fields:
- name: subscription_id
type: keyword
description: |
Azure subscription ID
- name: correlation_id
type: keyword
description: |
Correlation ID
- name: tenant_id
type: keyword
description: |
Tenant ID
- name: resource
type: group
fields:
- name: id
type: keyword
description: |
Resource ID
- name: group
type: keyword
description: |
Resource group
- name: provider
type: keyword
description: |
Resource type/namespace
- name: namespace
type: keyword
description: |
Resource type/namespace
- name: name
type: keyword
description: |
Name
- name: authorization_rule
type: keyword
description: |
Authorization rule
86 changes: 86 additions & 0 deletions packages/azure/data_stream/events/manifest.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
type: logs
title: Azure Logs (v2 preview)
dataset: azure.events
streams:
- input: "azure-eventhub"
enabled: false
template_path: "stream.yml.hbs"
title: Collect Azure logs from Event Hub
description: |
Collect all the supported (see list below) Azure logs from Event Hub to a target data stream.
✨ **New in version 1.20.0+:** by enabling this integration, you can collect all the logs from the following Azure services and route them to the appropriate data stream:
- Microsoft Entra ID logs:
- Audit
- Identity Protection
- Provisioning
- Sign-in
- Platform logs
- Activity logs
- Microsoft Graph Activity Logs
- Spring Apps logs
- Firewall logs
- Application Gateway logs
**You MUST turn off the v1 integrations** when you enable this v2 integration. If you run both integrations simultaneously, you will see duplicate logs in your data stream.
If you need to collect raw events from Azure Event Hub, we recommend using the [Custom Azure Logs integration](https://www.elastic.co/docs/current/integrations/azure_logs) which provides more flexibility.
To learn more about the efficiency and routing enhancements introduced in version 1.20.0, please read the [Azure Logs (v2 preview)](https://www.elastic.co/docs/current/integrations/azure/events) documentation.
vars:
- name: preserve_original_event
required: true
show_user: true
title: Preserve original event
description: Preserves a raw copy of the original event, added to the field `event.original`
type: bool
multi: false
default: false
- name: storage_account_container
type: text
title: Storage Account Container
multi: false
required: false
show_user: false
description: >
The storage account container where the integration stores the checkpoint data for the consumer group. It is an advanced option to use with extreme care. You MUST use a dedicated storage account container for each Azure log type (activity, sign-in, audit logs, and others). DO NOT REUSE the same container name for more than one Azure log type. See [Container Names](https://docs.microsoft.com/en-us/rest/api/storageservices/naming-and-referencing-containers--blobs--and-metadata#container-names) for details on naming rules from Microsoft. The integration generates a default container name if not specified.
- name: tags
type: text
title: Tags
multi: true
required: true
show_user: false
default:
- azure-eventhub
- forwarded
- name: processors
type: yaml
title: Processors
multi: false
required: false
show_user: false
description: >
Processors are used to reduce the number of fields in the exported event or to enhance the event with metadata. This executes in the agent before the logs are parsed. See [Processors](https://www.elastic.co/guide/en/beats/filebeat/current/filtering-and-enhancing-data.html) for details.
- name: sanitize_newlines
type: bool
title: Sanitizes New Lines
description: Removes new lines in logs to ensure proper formatting of JSON data and avoid parsing issues during processing.
multi: false
required: false
show_user: false
default: false
- name: sanitize_singlequotes
required: true
show_user: false
title: Sanitizes Single Quotes
description: Replaces single quotes with double quotes (single quotes inside double quotes are omitted) in logs to ensure proper formatting of JSON data and avoid parsing issues during processing.
type: bool
multi: false
default: false
# Ensures agents have permissions to write data to `logs-*-*`
elasticsearch:
dynamic_dataset: true
dynamic_namespace: true
57 changes: 57 additions & 0 deletions packages/azure/data_stream/events/routing_rules.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
- source_dataset: azure.events
rules:
- target_dataset: azure.activitylogs
if: ctx.event?.dataset == 'azure.activitylogs'
namespace:
- "{{data_stream.namespace}}"
- default
- target_dataset: azure.application_gateway
if: ctx.event?.dataset == 'azure.application_gateway'
namespace:
- "{{data_stream.namespace}}"
- default
- target_dataset: azure.auditlogs
if: ctx.event?.dataset == 'azure.auditlogs'
namespace:
- "{{data_stream.namespace}}"
- default
- target_dataset: azure.eventhub
if: ctx.event?.dataset == 'azure.eventhub'
namespace:
- "{{data_stream.namespace}}"
- default
- target_dataset: azure.firewall_logs
if: ctx.event?.dataset == 'azure.firewall_logs'
namespace:
- "{{data_stream.namespace}}"
- default
- target_dataset: azure.graphactivitylogs
if: ctx.event?.dataset == 'azure.graphactivitylogs'
namespace:
- "{{data_stream.namespace}}"
- default
- target_dataset: azure.identity_protection
if: ctx.event?.dataset == 'azure.identity_protection'
namespace:
- "{{data_stream.namespace}}"
- default
- target_dataset: azure.platformlogs
if: ctx.event?.dataset == 'azure.platformlogs'
namespace:
- "{{data_stream.namespace}}"
- default
- target_dataset: azure.provisioning
if: ctx.event?.dataset == 'azure.provisioning'
namespace:
- "{{data_stream.namespace}}"
- default
- target_dataset: azure.signinlogs
if: ctx.event?.dataset == 'azure.signinlogs'
namespace:
- "{{data_stream.namespace}}"
- default
- target_dataset: azure.springcloudlogs
if: ctx.event?.dataset == 'azure.springcloudlogs'
namespace:
- "{{data_stream.namespace}}"
- default
Loading

0 comments on commit f99850b

Please sign in to comment.