使用Terraform玩转logtail日志采集-阿里云(云淘科技)

本文将介绍使用Terraform实现logtail日志自动化采集的最佳实践。

目标读者

开发、运维人员。

背景介绍

基础架构即代码 (Infrastructure as code, IaC) 工具通过使用配置文件而不是通过图形用户界面来管理基础架构。采用基础设施即代码的思想,是将云主机、网络、存储等基础设施都当作软件来对待。这样不仅可对其进行编程处理,而且能实现版本化管理、重用、调试等手段,甚至在出现错误时还可以进行还原操作。同时,对于在软件开发生命周期中进行的代码评审、自动化测试、CI/CD等活动也可以应用到基础设施管理上来。

相关概念

Terraform是由HashiCorp公司创建的开源工具,是一个IT基础架构自动化编排工具,它的口号是“Write, Plan, and Create Infrastructure as Code”, 是一个“基础设施即代码”工具。Terraform的命令行接口(CLI)提供一种简单机制,用于将配置文件部署到阿里云或其他任意支持的云上,并对其进行版本控制。Logtail是阿里云日志服务SLS提供的日志采集Agent,用于采集阿里云ECS、阿里云ACK、自建IDC、其他云厂商等服务器上的日志。阿里云作为第三大云服务提供商,terraform-alicloud-provider已经支持了包括SLS、OSS、SLB、RDS在内的众多云产品。

方案实施

前提条件

Terraform已安装完成,安装方式。

场景一:使用Terraform全新构建SLS资源

步骤1:创建Terraform配置模版

创建Terraform工作目录,并定义sls logtail采集配置。采集配置两部分组成:

  • provider:在这里可以完成Alibaba Cloud provider 的一些基础配置。更多详见链接。
    • 鉴权的配置,包括静态鉴权、ECS role鉴权等多种方式。本文采用静态鉴权的方式。
    • 地域的配置,用于表示后续资源创建的地域。
  • resource
    • alicloud_log_project:用于创建project,更多配置详见链接。
    • alicloud_log_store:用于创建logstore,更多配置详见链接。
    • alicloud_log_store_index:用于开启索引,更多配置详见链接。
    • alicloud_log_machine_group:用于创建机器组,更多配置详见链接。
    • alicloud_logtail_config:用于创建采集配置,更多配置详见链接。
    • alicloud_logtail_attachment:用于关联机器组跟采集配置,更多配置详见链接。

​接下来将演示如何通过标识型机器组,采集对应机器上特定目录下的文件。$ mkdir learn-terraform-sls
$ cd learn-terraform-sls
$ vi terraform.tf

provider “alicloud” {
region = “cn-zhangjiakou”
}

resource “alicloud_log_project” “test” {
name = “tf-test-project-zhangjiakou”
description = “create by terraform”
}

resource “alicloud_log_store” “test” {
project = alicloud_log_project.test.name
name = “tf-test-logstore”
retention_period = 7
shard_count = 3
auto_split = true
max_split_shard_count = 60
append_meta = true
}

resource “alicloud_log_store_index” “test” {
project = alicloud_log_project.test.name
logstore = alicloud_log_store.test.name
full_text {
case_sensitive = true
token = ” #$%^*\r

}
}

resource “alicloud_log_machine_group” “test” {
project = alicloud_log_project.test.name
name = “tf-log-machine-group”
topic = “terraform”
identify_type = “ip”
identify_list = [“172.26.51.68”]
}

resource “alicloud_logtail_config” “test” {
project = alicloud_log_project.test.name
logstore = alicloud_log_store.test.name
input_type = “file”
log_sample = “test”
name = “tf-log-config”
output_type = “LogService”
input_detail = <<DEFINITION
{
"logPath": "/root/tmp",
"filePattern": "access.log",
"logType": "json_log",
"topicFormat": "default",
"discardUnmatch": false,
"enableRawLog": false,
"fileEncoding": "gbk",
"maxDepth": 10
}

DEFINITION

}

resource "alicloud_logtail_attachment" "test" {
project = alicloud_log_project.test.name
logtail_config_name = alicloud_logtail_config.test.name
machine_group_name = alicloud_log_machine_group.test.name
}

步骤2:初始化工作目录

执行terraform init初始化工作目录,安装alicloud provider。$ terraform init
Initializing the backend…
Initializing provider plugins…
Terraform has been successfully initialized!

You may now begin working with Terraform. Try running “terraform plan” to see
any changes that are required for your infrastructure. All Terraform commands
should now work.

If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.terraform fmt格式化terraform.tf文件。$ terraform fmt
terraform.tfterraform validate验证配置模版正确性。$ terraform validate
Success! The configuration is valid.

步骤3:创建基础设施

通过terraform apply创建resource对应的资源。在创建动作真正下发前,Terraform会打印出所有的execution plan,他们描述了将会发生的变更事件。如下面的执行,将会有6个plan用于新增资源。$ terraform apply
An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
+ create

Terraform will perform the following actions:

# alicloud_log_machine_group.test will be created
+ resource “alicloud_log_machine_group” “test” {
+ id = (known after apply)
+ identify_list = [
+ “172.26.51.68”,
]
+ identify_type = “ip”
+ name = “tf-log-machine-group”
+ project = “tf-test-project-zhangjiakou”
+ topic = “terraform”
}

# alicloud_log_project.test will be created
+ resource “alicloud_log_project” “test” {
+ description = “create by terraform”
+ id = (known after apply)
+ name = “tf-test-project-zhangjiakou”
}

# alicloud_log_store.test will be created
+ resource “alicloud_log_store” “test” {
+ append_meta = true
+ auto_split = true
+ enable_web_tracking = false
+ id = (known after apply)
+ max_split_shard_count = 60
+ name = “tf-test-logstore”
+ project = “tf-test-project-zhangjiakou”
+ retention_period = 7
+ shard_count = 3
+ shards = (known after apply)
}

# alicloud_log_store_index.test will be created
+ resource “alicloud_log_store_index” “test” {
+ id = (known after apply)
+ logstore = “tf-test-logstore”
+ project = “tf-test-project-zhangjiakou”

+ full_text {
+ case_sensitive = true
+ include_chinese = false
+ token = ” #$%^*\r

}
}

# alicloud_logtail_attachment.test will be created
+ resource “alicloud_logtail_attachment” “test” {
+ id = (known after apply)
+ logtail_config_name = “tf-log-config”
+ machine_group_name = “tf-log-machine-group”
+ project = “tf-test-project-zhangjiakou”
}

# alicloud_logtail_config.test will be created
+ resource “alicloud_logtail_config” “test” {
+ id = (known after apply)
+ input_detail = jsonencode(
{
+ discardUnmatch = false
+ enableRawLog = false
+ fileEncoding = “gbk”
+ filePattern = “access.log”
+ logPath = “/root/tmp”
+ logType = “json_log”
+ maxDepth = 10
+ topicFormat = “default”
}
)
+ input_type = “file”
+ log_sample = “test”
+ logstore = “tf-test-logstore”
+ name = “tf-log-config”
+ output_type = “LogService”
+ project = “tf-test-project-zhangjiakou”
}

Plan: 6 to add, 0 to change, 0 to destroy.

Do you want to perform these actions?
Terraform will perform the actions described above.
Only ‘yes’ will be accepted to approve.

Enter a value:确认无误后,输入yes,将会真正的执行。alicloud_log_project.test: Creating…
alicloud_log_project.test: Creation complete after 1s [id=tf-test-project-zhangjiakou]
alicloud_log_machine_group.test: Creating…
alicloud_log_store.test: Creating…
alicloud_log_machine_group.test: Creation complete after 0s [id=tf-test-project-zhangjiakou:tf-log-machine-group]
alicloud_log_store.test: Still creating… [10s elapsed]
alicloud_log_store.test: Still creating… [20s elapsed]
alicloud_log_store.test: Still creating… [30s elapsed]
alicloud_log_store.test: Still creating… [40s elapsed]
alicloud_log_store.test: Still creating… [50s elapsed]
alicloud_log_store.test: Still creating… [1m0s elapsed]
alicloud_log_store.test: Creation complete after 1m0s [id=tf-test-project-zhangjiakou:tf-test-logstore]
alicloud_logtail_config.test: Creating…
alicloud_log_store_index.test: Creating…
alicloud_logtail_config.test: Creation complete after 1s [id=tf-test-project-zhangjiakou:tf-test-logstore:tf-log-config]
alicloud_logtail_attachment.test: Creating…
alicloud_logtail_attachment.test: Creation complete after 0s [id=tf-test-project-zhangjiakou:tf-log-config:tf-log-machine-group]
alicloud_log_store_index.test: Creation complete after 1s [id=tf-test-project-zhangjiakou:tf-test-logstore]可以使用terraform show查看配置的状态。$ terraform show
# alicloud_log_machine_group.test:
resource “alicloud_log_machine_group” “test” {
id = “tf-test-project-zhangjiakou:tf-log-machine-group”
identify_list = [
“172.26.51.68”,
]
identify_type = “ip”
name = “tf-log-machine-group”
project = “tf-test-project-zhangjiakou”
topic = “terraform”
}

# alicloud_log_project.test:
resource “alicloud_log_project” “test” {
description = “create by terraform”
id = “tf-test-project-zhangjiakou”
name = “tf-test-project-zhangjiakou”
}

# alicloud_log_store.test:
resource “alicloud_log_store” “test” {
append_meta = true
auto_split = true
enable_web_tracking = false
id = “tf-test-project-zhangjiakou:tf-test-logstore”
max_split_shard_count = 60
name = “tf-test-logstore”
project = “tf-test-project-zhangjiakou”
retention_period = 7
shard_count = 3
shards = [
{
begin_key = “00000000000000000000000000000000”
end_key = “55000000000000000000000000000000”
id = 0
status = “readwrite”
},
{
begin_key = “55000000000000000000000000000000”
end_key = “aa000000000000000000000000000000”
id = 1
status = “readwrite”
},
{
begin_key = “aa000000000000000000000000000000”
end_key = “ffffffffffffffffffffffffffffffff”
id = 2
status = “readwrite”
},
]
}

# alicloud_log_store_index.test:
resource “alicloud_log_store_index” “test” {
id = “tf-test-project-zhangjiakou:tf-test-logstore”
logstore = “tf-test-logstore”
project = “tf-test-project-zhangjiakou”

full_text {
case_sensitive = true
include_chinese = false
token = ” #$%^*\r

}
}

# alicloud_logtail_attachment.test:
resource “alicloud_logtail_attachment” “test” {
id = “tf-test-project-zhangjiakou:tf-log-config:tf-log-machine-group”
logtail_config_name = “tf-log-config”
machine_group_name = “tf-log-machine-group”
project = “tf-test-project-zhangjiakou”
}

# alicloud_logtail_config.test:
resource “alicloud_logtail_config” “test” {
id = “tf-test-project-zhangjiakou:tf-test-logstore:tf-log-config”
input_detail = jsonencode(
{
discardUnmatch = false
enableRawLog = false
fileEncoding = “gbk”
filePattern = “access.log”
logPath = “/root/tmp”
logType = “json_log”
maxDepth = 10
topicFormat = “default”
}
)
input_type = “file”
log_sample = “test”
logstore = “tf-test-logstore”
name = “tf-log-config”
output_type = “LogService”
project = “tf-test-project-zhangjiakou”
}可以看到日志已经可以正常采集。

场景二:使用Terraform变更SLS资源

基础设施往往不是一成不变的,经常会随着业务的变化而变化。Terraform提供了管理资源变更的能力,只需要修改Terraform配置模版,Terraform就可以构建出execution plan,只修改对应的部分达到预期的状态。例如随着业务的增张,使用sls的场景也会发生变化。例如1、之前logstore 7天的ttl时间太短,需要调整成30天。2、ip型机器组不好扩展,需要改成自定义标识型。

步骤1:更改terraform.tf配置模版

provider “alicloud” {
region = “cn-zhangjiakou”
}

resource “alicloud_log_project” “test” {
name = “tf-test-project-zhangjiakou”
description = “create by terraform”
}

resource “alicloud_log_store” “test” {
project = alicloud_log_project.test.name
name = “tf-test-logstore”
retention_period = 30
shard_count = 3
auto_split = true
max_split_shard_count = 60
append_meta = true
}

resource “alicloud_log_store_index” “test” {
project = alicloud_log_project.test.name
logstore = alicloud_log_store.test.name
full_text {
case_sensitive = true
token = ” #$%^*\r

}
}

resource “alicloud_log_machine_group” “test” {
project = alicloud_log_project.test.name
name = “tf-log-machine-group”
topic = “terraform”
identify_type = “userdefined”
identify_list = [“user_defined_id_zhangjiakou_test”]
}

resource “alicloud_logtail_config” “test” {
project = alicloud_log_project.test.name
logstore = alicloud_log_store.test.name
input_type = “file”
log_sample = “test”
name = “tf-log-config”
output_type = “LogService”
input_detail = <<DEFINITION
{
"logPath": "/root/tmp",
"filePattern": "access.log",
"logType": "json_log",
"topicFormat": "default",
"discardUnmatch": false,
"enableRawLog": false,
"fileEncoding": "gbk",
"maxDepth": 10
}

DEFINITION

}

resource "alicloud_logtail_attachment" "test" {
project = alicloud_log_project.test.name
logtail_config_name = alicloud_logtail_config.test.name
machine_group_name = alicloud_log_machine_group.test.name
}

步骤2:执行Terraform配置变更

运行terraform apply使得新的配置生效。Terraform会识别出配置的变化,并生成了2条execution plan。$ terraform apply
alicloud_log_project.test: Refreshing state… [id=tf-test-project-zhangjiakou]
alicloud_log_machine_group.test: Refreshing state… [id=tf-test-project-zhangjiakou:tf-log-machine-group]
alicloud_log_store.test: Refreshing state… [id=tf-test-project-zhangjiakou:tf-test-logstore]
alicloud_logtail_config.test: Refreshing state… [id=tf-test-project-zhangjiakou:tf-test-logstore:tf-log-config]
alicloud_log_store_index.test: Refreshing state… [id=tf-test-project-zhangjiakou:tf-test-logstore]
alicloud_logtail_attachment.test: Refreshing state… [id=tf-test-project-zhangjiakou:tf-log-config:tf-log-machine-group]

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
~ update in-place

Terraform will perform the following actions:

# alicloud_log_machine_group.test will be updated in-place
~ resource “alicloud_log_machine_group” “test” {
id = “tf-test-project-zhangjiakou:tf-log-machine-group”
~ identify_list = [
– “172.26.51.68”,
+ “user_defined_id_zhangjiakou_test”,
]
~ identify_type = “ip” -> “userdefined”
name = “tf-log-machine-group”
project = “tf-test-project-zhangjiakou”
topic = “terraform”
}

# alicloud_log_store.test will be updated in-place
~ resource “alicloud_log_store” “test” {
append_meta = true
auto_split = true
enable_web_tracking = false
id = “tf-test-project-zhangjiakou:tf-test-logstore”
max_split_shard_count = 60
name = “tf-test-logstore”
project = “tf-test-project-zhangjiakou”
~ retention_period = 7 -> 30
shard_count = 3
shards = [
{
begin_key = “00000000000000000000000000000000”
end_key = “55000000000000000000000000000000”
id = 0
status = “readwrite”
},
{
begin_key = “55000000000000000000000000000000”
end_key = “aa000000000000000000000000000000”
id = 1
status = “readwrite”
},
{
begin_key = “aa000000000000000000000000000000”
end_key = “ffffffffffffffffffffffffffffffff”
id = 2
status = “readwrite”
},
]
}

Plan: 0 to add, 2 to change, 0 to destroy.

Do you want to perform these actions?
Terraform will perform the actions described above.
Only ‘yes’ will be accepted to approve.

Enter a value: yes

alicloud_log_machine_group.test: Modifying… [id=tf-test-project-zhangjiakou:tf-log-machine-group]
alicloud_log_store.test: Modifying… [id=tf-test-project-zhangjiakou:tf-test-logstore]
alicloud_log_machine_group.test: Modifications complete after 1s [id=tf-test-project-zhangjiakou:tf-log-machine-group]
alicloud_log_store.test: Modifications complete after 1s [id=tf-test-project-zhangjiakou:tf-test-logstore]

Apply complete! Resources: 0 added, 2 changed, 0 destroyed.

场景三:使用Terraform管理过程中,控制台变更问题

如果长期使用Terraform管理SLS资源,但是有时也存在偶尔通过控制台(或者API等其他渠道)对SLS资源做配置变更,这种变更Terraform不会天然感知到的。这里要分两种情况:1、控制台或API变更了Terraform已经管控的资源的属性。场景:上文Terraform创建的tf-test-logstore logstore的TTL是30,控制台改成了10,此时执行terraform apply会将TTL刷回成30。2、控制台或API变更了Terraform未管控的资源的。场景:控制台在tf-test-logstore下创建了新采集配置console_manual。因为不在Terraform配置模版的管控范围内,terraform apply不会有任何影响。对于上述两种情况,如何保证Terraform管控的一致性呢?这时候就可以用到terraform import命令来解决这个问题了。虽然说可以有机制来进行配置的同步,但是整个操作过程特别是变更识别的过程相对来说还是比较繁琐的,所以强烈建议使用来Terraform管控后,尽量还是要保证配置的单一性,避免无谓的管理开销。​接下来我们如何同步上述提到的两类变更。

步骤1:明确涉及变更的资源

  • 存量变更:logstore: tf-test-logstore ttl改成了10
    • resource:alicloud_log_store.test
  • 新增变更:logstore: tf-test-logstore下新建了console_manual采集配置,并且该采集配置机器组建立了关联。
    • 新增resource:alicloud_logtail_config.console_manual
    • 新增resource:alicloud_logtail_attachment.console_manual

存量变更直接在terraform.tf找到对应位置即可。新增变更,因为配置较多无法手动修改,所以先在terraform.tf中声明即可。terraform.tf变更点如下:resource “alicloud_log_store” “test” {
– retention_period = 30
+ retention_period = 10

+ resource “alicloud_logtail_config” “console_manual_config” {
+ project = alicloud_log_project.test.name
+ logstore = alicloud_log_store.test.name
+ name = “console_manual”
+ }

+ resource “alicloud_logtail_attachment” “console_manual_attachment” {
+ project = alicloud_log_project.test.name
+ logtail_config_name = alicloud_logtail_config.console_manual_config.name
+ machine_group_name = alicloud_log_machine_group.test.name
+ }

步骤2:导出资源

针对新增的管控资源执行terraform import命令。命令格式:terraform import . # 导出新增的采集配置console_manual
## 资源ID取值为 project:logstore:name
terraform import alicloud_logtail_config.console_manual_config tf-test-project-zhangjiakou:tf-test-logstore:console_manual

# 导出新增采集配置console_manual跟机器组的绑定关系
## 资源ID取值为 project:logtail_config_name:machine_group_name
terraform import alicloud_logtail_attachment.console_manual_attachment tf-test-project-zhangjiakou:console_manual:tf-log-machine-group查看terraform.tfstate会发现上述导出资源的最新状态,然后根据这些状态去补齐terraform.tf模版配置。是否完整补齐通过terraform plan命令进行验证,这个过程可能存在反复。# terraform.tfstate新增的资源状态
{
“mode”: “managed”,
“type”: “alicloud_logtail_config”,
“name”: “console_manual_config”,
“provider”: “provider.alicloud”,
“instances”: [
{
“schema_version”: 0,
“attributes”: {
“id”: “tf-test-project-zhangjiakou:tf-test-logstore:console_manual”,
“input_detail”: “{\”adjustTimezone\”:false,\”advanced\”:{\”force_multiconfig\”:false,\”tail_size_kb\”:1024},\”delayAlarmBytes\”:0,\”delaySkipBytes\”:0,\”discardNonUtf8\”:false,\”discardUnmatch\”:false,\”dockerExcludeEnv\”:{},\”dockerExcludeLabel\”:{},\”dockerFile\”:false,\”dockerIncludeEnv\”:{},\”dockerIncludeLabel\”:{},\”enableRawLog\”:false,\”enableTag\”:false,\”fileEncoding\”:\”utf8\”,\”filePattern\”:\”test.log\”,\”filterKey\”:[],\”filterRegex\”:[],\”key\”:[\”content\”],\”localStorage\”:true,\”logBeginRegex\”:\”.*\”,\”logPath\”:\”/root/tmp\”,\”logTimezone\”:\”\”,\”logType\”:\”common_reg_log\”,\”maxDepth\”:10,\”maxSendRate\”:-1,\”mergeType\”:\”topic\”,\”preserve\”:true,\”preserveDepth\”:1,\”priority\”:0,\”regex\”:\”(.*)\”,\”sendRateExpire\”:0,\”sensitive_keys\”:[],\”shardHashKey\”:[],\”tailExisted\”:false,\”timeFormat\”:\”\”,\”topicFormat\”:\”none\”}”,
“input_type”: “file”,
“log_sample”: “”,
“logstore”: “tf-test-logstore”,
“name”: “console_manual”,
“output_type”: “LogService”,
“project”: “tf-test-project-zhangjiakou”
}
}
]
},
{
“mode”: “managed”,
“type”: “alicloud_logtail_attachment”,
“name”: “console_manual_attachment”,
“provider”: “provider.alicloud”,
“instances”: [
{
“schema_version”: 0,
“attributes”: {
“id”: “tf-test-project-zhangjiakou:console_manual:tf-log-machine-group”,
“logtail_config_name”: “console_manual”,
“machine_group_name”: “tf-log-machine-group”,
“project”: “tf-test-project-zhangjiakou”
}
}
]
}当terraform plan测试通过后,说明terraform已经完成了配置同步。$ terraform plan
Refreshing Terraform state in-memory prior to plan…
The refreshed state will be used to calculate this plan, but will not be
persisted to local or remote state storage.

alicloud_log_project.test: Refreshing state… [id=tf-test-project-zhangjiakou]
alicloud_log_machine_group.test: Refreshing state… [id=tf-test-project-zhangjiakou:tf-log-machine-group]
alicloud_log_store.test: Refreshing state… [id=tf-test-project-zhangjiakou:tf-test-logstore]
alicloud_log_store_index.test: Refreshing state… [id=tf-test-project-zhangjiakou:tf-test-logstore]
alicloud_logtail_config.test: Refreshing state… [id=tf-test-project-zhangjiakou:tf-test-logstore:tf-log-config]
alicloud_logtail_config.console_manual_config: Refreshing state… [id=tf-test-project-zhangjiakou:tf-test-logstore:console_manual]
alicloud_logtail_attachment.console_manual_attachment: Refreshing state… [id=tf-test-project-zhangjiakou:console_manual:tf-log-machine-group]
alicloud_logtail_attachment.test: Refreshing state… [id=tf-test-project-zhangjiakou:tf-log-config:tf-log-machine-group]

————————————————————————

No changes. Infrastructure is up-to-date.

This means that Terraform did not detect any differences between your
configuration and real physical resources that exist. As a result, no
actions need to be performed.

场景四:Terraform纳管存量SLS资源

对于长期使用控制台、阿里云CLI、资源编排服务或者直接调用API创建和管理SLS资源,初次使用Terraform时,同样面临着使用Terrform将存量的资源导入的问题。大体步骤如下:

  • 找到要使用Terraform接管的project。
  • 梳理Project所有的资源列表,包括Logstore列表、Logstore的索引、Logstore的采集配置、机器组列表、机器组跟采集配置的关联关系。
  • 通过terraform import命令来对存量资源的导入,进而使用Terraform统一管理。

​操作步骤“使用Terraform管理过程中,控制台变更问题”章节类似,这里就不再做详细介绍。

发表评论