大数据分析中,有一种常见的场景,那就是时序数据,简言之,数据一旦产生绝对不会修改,随着时间流逝,每个时间点都会有个新的状态值。这种时序数据的量级往往异常夸张,例如传感器的原始监控数据:
https://lizhiyong.blog.csdn.net/article/details/114898620
一个简单的加速度传感器一年的数据量就是31e!!!制造业传感器数据如果不经底层PLC等下位机预处理,直接打到边缘计算网关,即使mqtt也会有巨大的负载!!!
类似的,还有服务器的原始监控数据,例如常见的Prometheus和Zabbix,当集群很多时,监控项同样很多,再算上虚拟化后的容器和虚拟机内都可能部署了监控,此时的数据量级就灰常可观!!!一小时几百亿条数据都是常见的事情!!!
但是很多原始的监控数据如果全部存下来,存储成本高的可怕,同时信息密度极低,更多时候我们可能只关注近期的全部热数据来做在线的模型训练,人工查看每秒钟几千条数据也是不切合实际的,事实上,做一个简单的秒级/分钟级统计就能满足大多数的分析场景,超过1天的冷数据其实已经没什么时效性。
对于此类场景,可以高吞吐、预聚合的数据库,在压测后,从Apache Druid、Clickhouse、Kylin中,选择了前者。。。专业的事情要交给专业的组件去做!!!
对于非内核和二开的业务开发人员,更多场景应该关注的是API、特性及用法,不应该在部署这种事情上花费太多精力!!!笔者之前已部署了Docker Desktop:
https://lizhiyong.blog.csdn.net/article/details/145580868
今天在Win10环境再搭建个Apache Druid最新版玩玩。
官网:
https://druid.apache.org/
注意不是阿里数据库连接池的那个Druid!!!
截至2025-02-13,Apache Druid最新版本是32.0.0。
参考官网:
https://druid.apache.org/docs/latest/tutorials/docker
官方给出了使用docker-compose.yml编排容器的教程,作为一个实时组件,大内存是必须的!!!但是启动8个容器【Zookeeper+PostgreSQL+6个Druid】每个最多7GB内存也不是什么大事!!!
https://raw.githubusercontent.com/apache/druid/32.0.0/distribution/docker/docker-compose.yml
获取到这个资源文件:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 |
version: "2.2" volumes: metadata_data: {} middle_var: {} historical_var: {} broker_var: {} coordinator_var: {} router_var: {} druid_shared: {} services: postgres: container_name: postgres image: postgres:latest ports: - "5432:5432" volumes: - metadata_data:/var/lib/postgresql/data environment: - POSTGRES_PASSWORD=FoolishPassword - POSTGRES_USER=druid - POSTGRES_DB=druid # Need 3.5 or later for container nodes zookeeper: container_name: zookeeper image: zookeeper:3.5.10 ports: - "2181:2181" environment: - ZOO_MY_ID=1 coordinator: image: apache/druid:32.0.0 container_name: coordinator volumes: - druid_shared:/opt/shared - coordinator_var:/opt/druid/var depends_on: - zookeeper - postgres ports: - "8081:8081" command: - coordinator env_file: - environment broker: image: apache/druid:32.0.0 container_name: broker volumes: - broker_var:/opt/druid/var depends_on: - zookeeper - postgres - coordinator ports: - "8082:8082" command: - broker env_file: - environment historical: image: apache/druid:32.0.0 container_name: historical volumes: - druid_shared:/opt/shared - historical_var:/opt/druid/var depends_on: - zookeeper - postgres - coordinator ports: - "8083:8083" command: - historical env_file: - environment middlemanager: image: apache/druid:32.0.0 container_name: middlemanager volumes: - druid_shared:/opt/shared - middle_var:/opt/druid/var depends_on: - zookeeper - postgres - coordinator ports: - "8091:8091" - "8100-8105:8100-8105" command: - middleManager env_file: - environment router: image: apache/druid:32.0.0 container_name: router volumes: - router_var:/opt/druid/var depends_on: - zookeeper - postgres - coordinator ports: - "3012:8888" #这里笔者改为3012防止霸占有用的端口 command: - router env_file: - environment |
参照官网另一篇:
https://druid.apache.org/docs/latest/configuration/
自己玩玩可以先不改这些运行时配置,容器启动的,后续要重新部署也非常容易!!!
还需要:
https://raw.githubusercontent.com/apache/druid/32.0.0/distribution/docker/environment
做另一个配置文件:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
# Java tuning #DRUID_XMX=1g #DRUID_XMS=1g #DRUID_MAXNEWSIZE=250m #DRUID_NEWSIZE=250m #DRUID_MAXDIRECTMEMORYSIZE=6172m DRUID_SINGLE_NODE_CONF=micro-quickstart druid_emitter_logging_logLevel=debug druid_extensions_loadList=["druid-histogram", "druid-datasketches", "druid-lookups-cached-global", "postgresql-metadata-storage", "druid-multi-stage-query"] druid_zk_service_host=zookeeper druid_metadata_storage_host= druid_metadata_storage_type=postgresql druid_metadata_storage_connector_connectURI=jdbc:postgresql://postgres:5432/druid druid_metadata_storage_connector_user=druid druid_metadata_storage_connector_password=FoolishPassword druid_indexer_runner_javaOptsArray=["-server", "-Xmx1g", "-Xms1g", "-XX:MaxDirectMemorySize=3g", "-Duser.timezone=UTC", "-Dfile.encoding=UTF-8", "-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager"] druid_indexer_fork_property_druid_processing_buffer_sizeBytes=256MiB druid_storage_type=local druid_storage_storageDirectory=/opt/shared/segments druid_indexer_logs_type=file druid_indexer_logs_directory=/opt/shared/indexing-logs druid_processing_numThreads=2 druid_processing_numMergeBuffers=2 DRUID_LOG4J=<?xml version="1.0" encoding="UTF-8" ?><Configuration status="WARN"><Appenders><Console name="Console" target="SYSTEM_OUT"><PatternLayout pattern="%d{ISO8601} %p [%t] %c - %m%n"/></Console></Appenders><Loggers><Root level="info"><AppenderRef ref="Console"/></Root><Logger name="org.apache.druid.jetty.RequestLog" additivity="false" level="DEBUG"><AppenderRef ref="Console"/></Logger></Loggers></Configuration> |
部署文件看起来麻雀虽小五脏俱全!!!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
PS C:\Users\zhiyong> cd E:\dockerData\volume\druid1 PS E:\dockerData\volume\druid1> ls 目录: E:\dockerData\volume\druid1 Mode LastWriteTime Length Name ---- ------------- ------ ---- -a---- 2025-02-13 23:26 2980 docker-compose.yml -a---- 2025-02-13 23:33 1576 environment PS E:\dockerData\volume\druid1> docker compose up -d time="2025-02-13T23:34:39+08:00" level=warning msg="E:\\dockerData\\volume\\druid1\\docker-compose.yml: the attribute `version` is obsolete, it will be ignored, please remove it to avoid potential confusion" [+] Running 72/15 ? router Pulled 230.7s ? coordinator Pulled 230.7s ? postgres Pulled 181.0s ? historical Pulled 230.7s ? broker Pulled 230.7s ? middlemanager Pulled 230.7s ? zookeeper Pulled 85.7s [+] Running 15/15 ? Network druid1_default Created 0.1s ? Volume "druid1_druid_shared" Created 0.0s ? Volume "druid1_historical_var" Created 0.0s ? Volume "druid1_middle_var" Created 0.0s ? Volume "druid1_router_var" Created 0.0s ? Volume "druid1_metadata_data" Created 0.0s ? Volume "druid1_coordinator_var" Created 0.0s ? Volume "druid1_broker_var" Created 0.0s ? Container postgres Started 2.4s ? Container zookeeper Started 2.4s ? Container coordinator Started 1.6s ? Container router Started 2.5s ? Container broker Started 2.3s ? Container historical Started 2.5s ? Container middlemanager Started 2.8s PS E:\dockerData\volume\druid1> |
拉取镜像成功后很快就能拉起容器:
好家伙。。。还顺便把其它组件的端口也给暴露出来了。。。
于是还**白piao**到一个PG和Zookeeper!!!
验证
http://localhost:3012/unified-console.html#
灰常好,现在已经拥有了一个最新Apache Druid32.0.0!!!
转载请注明出处:https://lizhiyong.blog.csdn.net/article/details/145622903