Spring Boot 3 监控、可观测性与性能调优

应用上线后，如何知道它是否健康？高峰期响应变慢，瓶颈在哪里？Spring Boot Actuator + Micrometer + Prometheus + Grafana 构成了完整的可观测性栈。本文覆盖从基础监控到生产级性能调优的全链路方案。

Spring Boot Actuator

Actuator 提供了一组开箱即用的 HTTP/GrafanaDashboard 端点，用于监控和管理应用。

依赖引入

xml

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-actuator</artifactId>
</dependency>

暴露端点

yaml

# application.yml
management:
  endpoints:
    web:
      exposure:
        include: health,info,metrics,prometheus,loggers
      base-path: /actuator  # 默认前缀
  endpoint:
    health:
      show-details: when_authorized  # 生产环境：仅授权用户可见详情
      probes:
        enabled: true  # 启用 K8s 探针端点
  health:
    db: true            # 启用数据库健康检查
    redis: true          # 启用 Redis 健康检查
    disk-space: true     # 启用磁盘空间检查
  metrics:
    export:
      prometheus:
        enabled: true    # 暴露 Prometheus 格式指标
    tags:
      application: ${spring.application.name}  # 添加标签便于聚合

核心端点

端点	说明
`/actuator/health`	健康检查（K8s 探针：`/health/liveness`、`/health/readiness`）
`/actuator/info`	自定义应用信息
`/actuator/metrics`	所有指标（可加 `?names=http.server.requests` 过滤）
`/actuator/prometheus`	Prometheus 格式指标数据
`/actuator/env`	查看/修改环境变量
`/actuator/beans`	所有 Spring Bean
`/actuator/configprops`	所有配置属性
`/actuator/threaddump`	线程 dump
`/actuator/heapdump`	堆 dump（仅在端点暴露时开启，慎用）

自定义健康指示器

java

@Component
public class EmailHealthIndicator implements HealthIndicator {

    private final JavaMailSender mailSender;

    @Override
    public Health health() {
        try {
            Transport transport = mailSender.getSession().getTransport();
            transport.connect("smtp.example.com", 587, "user", "pwd");
            transport.close();
            return Health.up().withDetail("smtp", "connected").build();
        } catch (Exception e) {
            return Health.down()
                    .withDetail("error", e.getMessage())
                    .withException(e)
                    .build();
        }
    }
}

Micrometer 指标体系

Micrometer 是 Spring Boot 3.x 的指标抽象层，提供 Timers、Gauges、Counters、Summary 四类核心指标类型。

自定义 Timer 指标

java

@Service
public class OrderService {

    private final MeterRegistry meterRegistry;

    public OrderService(MeterRegistry meterRegistry) {
        this.meterRegistry = meterRegistry;
    }

    // 方式1：@Timed 注解（最简洁，推荐）
    @Timed(
        value = "order.place",
        description = "下单耗时",
        percentiles = {0.5, 0.9, 0.99},  // P50/P90/P99
        histogram = true  # 生成直方图数据
    )
    public OrderResult placeOrder(Order order) {
        return processOrder(order);
    }

    // 方式2：Timer.builder（更灵活）
    private final Timer paymentTimer = Timer.builder("payment.process")
            .description("支付处理时间")
            .tag("type", "alipay")
            .publishPercentiles(0.5, 0.9, 0.95, 0.99)
            .publishPercentileHistogram()
            .register(meterRegistry);

    public PaymentResult processPayment(Payment payment) {
        return paymentTimer.record(() -> {
            // 实际支付逻辑
            return doPay(payment);
        });
    }

    // 方式3：Counter 计数器
    private final Counter errorCounter = Counter.builder("order.error")
            .description("订单错误总数")
            .tag("type", "payment_declined")
            .register(meterRegistry);

    public void onPaymentDeclined(Order order) {
        errorCounter.increment();
    }

    // 方式4：Gauge 实时值（用于非单调递增的数值）
    @Autowired
    private OrderQueue orderQueue;

    @PostConstruct
    public void registerGauge() {
        Gauge.builder("order.queue.size", orderQueue, Queue::size)
                .description("订单队列当前长度")
                .register(meterRegistry);
    }
}

@Timed 在 Controller 层使用

java

@RestController
@Timed(value = "http.request", description = "HTTP 请求时间", longTask = true)
public class UserController {

    @GetMapping("/{id}")
    @Timed(value = "user.get", description = "获取用户详情")
    public User getUser(@PathVariable Long id) {
        return userService.findById(id);
    }
}

Prometheus + Grafana 集成

docker-compose 启动 Prometheus + Grafana

yaml

# prometheus.yml
# 全局配置
global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'blog-api'
    metrics_path: '/actuator/prometheus'
    static_configs:
      - targets: ['app:8080']
    relabel_configs:
      - source_labels: [__address__]
        target_label: instance
        regex: '(.+):\d+'
        replacement: '${1}'

yaml

# docker-compose.yml（追加服务）
services:
  prometheus:
    image: prom/prometheus:v2.54.1
    ports:
      - "9090:9090"
    volumes:
      - ./docker/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus-data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.retention.time=30d'

  grafana:
    image: grafana/grafana:11.4.0
    ports:
      - "3000:3000"
    environment:
      - GF_SECURITY_ADMIN_USER=admin
      - GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_PASSWORD}
      - GF_USERS_ALLOW_SIGN_UP=false
    volumes:
      - ./docker/grafana/provisioning:/etc/grafana/provisioning
      - grafana-data:/var/lib/grafana

volumes:
  prometheus-data:
  grafana-data:

Grafana Dashboard 导入

Grafana 官网提供了 JVM (Micrometer) 和 Spring Boot 官方 Dashboard：

Dashboard ID 4701：JVM Metrics（Micrometer）
Dashboard ID 12900：Spring Boot 2.1+ Statistics

导入方式：

bash

# 通过 Grafana API 自动化导入
curl -X POST "http://admin:${GRAFANA_PASSWORD}@localhost:3000/api/dashboards/db" \
  -H "Content-Type: application/json" \
  -d '{"dashboard": {"id": null, "title": "Spring Boot", ...}}'

启动诊断分析

spring-startup 分析器

Spring Boot 3.x 提供启动分析器，在 application.yml 中开启：

yaml

spring:
  startup:
    analysis:
      enabled: true

访问端点：GET /actuator/startup 返回 JSON 格式的启动时间线，包含每个 Bean 初始化的耗时。

常见启动慢场景

java

// 场景1：大量 Bean 初始化时查询数据库
// 解决：使用 @Lazy 延迟加载非核心 Bean
@Lazy
@Autowired
private HeavyReportGenerator reportGenerator;

// 场景2：数据源连接超时
// 解决：检查数据库 URL 是否正确、连接是否可达
spring:
  datasource:
    hikari:
      connection-timeout: 10000  # 默认 30s，改短加快启动失败检测

// 场景3：Redis 连接失败
// 解决：配置重试或降级
spring:
  data:
    redis:
      timeout: 5s
      lettuce:
        pool:
          max-active: 8

JVM 参数调优

常用 JVM 参数

bash

# 建议的基础参数（JDK 17+）
java -jar app.jar \
  -XX:+UseG1GC \                  # G1 垃圾收集器（JDK 11+ 默认）
  -XX:MaxGCPauseMillis=200 \      # 最大 GC 暂停时间目标
  -XX:+HeapDumpOnOutOfMemoryError \
  -XX:HeapDumpPath=/var/log/app.hprof \
  -Xms256m \                      # 最小堆
  -Xmx512m \                      # 最大堆（生产环境按需调整）
  -XX:+UseContainerSupport \      # 支持 Docker 容器内存限制（JDK 10+）
  -XX:MaxRAMPercentage=75.0      # 容器内存的 75%（推荐替代固定 -Xmx）

容器环境内存配置

yaml

# Kubernetes deployment.yaml
resources:
  requests:
    memory: "512Mi"
  limits:
    memory: "1Gi"

bash

# 配合 JVM 参数（使用 MaxRAMPercentage 替代固定 -Xmx）
java -XX:MaxRAMPercentage=75.0 -XX:InitialRAMPercentage=50.0 -jar app.jar
# 初始 256M，最大 768M（1Gi * 0.75）

G1GC 调优参数

bash

# 针对高吞吐服务
java -XX:+UseG1GC \
     -XX:G1HeapRegionSize=8m \
     -XX:MaxGCPauseMillis=100 \
     -XX:G1ReservePercent=15 \
     -XX:InitiatingHeapOccupancyPercent=45 \
     -jar app.jar

HikariCP 连接池调优

HikariCP 是 Spring Boot 2.x 后的默认连接池，零依赖、高性能。

参数对照表

参数	默认值	调优建议
`maximum-pool-size`	10	建议：`CPU核心数 × 2` + 磁盘并发数；过高导致连接竞争
`minimum-idle`	同 maximum	高峰稳定服务可设为等于 maximum
`connection-timeout`	30000ms	从池获取连接的超时
`idle-timeout`	600000ms（10min）	最小连接数的空闲连接存活时间
`max-lifetime`	1800000ms（30min）	连接最大生命周期（应略小于 DB 连接超时）
`cachePrepStmts`	false	MySQL 建议开启 prepared statement 缓存

生产配置示例

yaml

spring:
  datasource:
    hikari:
      pool-name: BlogHikariPool
      maximum-pool-size: 20
      minimum-idle: 5
      connection-timeout: 30000
      idle-timeout: 600000
      max-lifetime: 1800000
      leak-detection-threshold: 60000  # 连接泄漏检测（1分钟）
      connection-test-query: SELECT 1    # MySQL 建议不加；HikariCP 自动 Ping 检测
      auto-commit: true

# MySQL specific
  jpa:
    properties:
      hibernate:
        hibernate.connection.handles_quit:

连接池监控

java

// 注入 HikariPoolMXBean 获取运行时统计
@Bean
public HikariPoolMXBean hikariPoolMXBean(DataSource dataSource) {
    return ((HikariDataSource) dataSource).getHikariPoolMXBean();
}

// Prometheus 端点自动暴露以下指标：
// hikaricp_connections_active{pool="BlogHikariPool"}
// hikaricp_connections_idle{pool="BlogHikariPool"}
// hikaricp_connections_pending{pool="BlogHikariPool"}
// hikaricp_connections_max{pool="BlogHikariPool"}
// hikaricp_connections_timeout{pool="BlogHikariPool"}

性能调优检查清单

plaintext

□ 数据库慢查询：启用 JPA show-sql + 慢查询日志，添加必要索引
□ N+1 查询：使用 @EntityGraph 或 JOIN FETCH 预加载关联数据
□ 连接池耗尽：检查最大连接数是否满足并发需求
□ 内存泄漏：使用 --heapdump-on-out-of-memory-error，MAT 分析堆转储
□ GC 频繁：使用 -XX:+PrintGCDetails 观察 GC 日志，调整堆大小和 GC 参数
□ 锁竞争：使用 jstack 抓取线程 dump，分析 BLOCKED 线程
□ 冷启动慢：评估是否适合 GraalVM Native Image
□ HikariCP 连接泄漏：设置 leak-detection-threshold，定期监控 active 连接数
□ OSIV 开启：生产环境设置 spring.jpa.open-in-view=false
□ 反射/动态代理滥用：使用 Arthas 动态追踪热点方法

小结

Spring Boot Actuator 提供健康检查、指标、配置查看等运维端点，合理配置暴露权限
Micrometer 是指标抽象层，@Timed、@Counter、@Gauge 注解让指标埋点零门槛
Prometheus + Grafana 是云原生可观测性标准栈，Actuator 的 /actuator/prometheus 端点即插即用
HikariCP 调优核心是合理设置 maximum-pool-size 和 connection-timeout，leak-detection-threshold 可提前发现连接泄漏
JVM 容器环境推荐使用 MaxRAMPercentage 替代固定 -Xmx，配合 G1GC 可覆盖大多数场景

Spring Boot 3 监控、可观测性与性能调优

Spring Boot 3 监控、可观测性与性能调优

Spring Boot Actuator

依赖引入

暴露端点

核心端点

自定义健康指示器

Micrometer 指标体系

自定义 Timer 指标

@Timed 在 Controller 层使用

Prometheus + Grafana 集成

docker-compose 启动 Prometheus + Grafana

Grafana Dashboard 导入

启动诊断分析

spring-startup 分析器

常见启动慢场景

JVM 参数调优

常用 JVM 参数

容器环境内存配置

G1GC 调优参数

HikariCP 连接池调优

参数对照表

生产配置示例

连接池监控

性能调优检查清单

小结

评论

Related Articles

依赖注入与 Bean 管理

Spring Boot 3 监控、可观测性与性能调优

Spring Boot 3 新特性与 GraalVM AOT 编译