广告位联系
返回顶部
分享到

PostgreSQL索引失效会发生什么?

PostgreSQL 来源:互联网 作者:佚名 发布时间:2022-09-22 08:59:46 人浏览
摘要

前段时间碰到个奇怪的索引失效的问题,实际情况类似下面这样: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 bill=# begin; BEGIN bill=*# create index idx_t1 on t1(id); CREATE INDEX bill=*# explain select * f

前段时间碰到个奇怪的索引失效的问题,实际情况类似下面这样:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

bill=# begin;

BEGIN

bill=*# create index idx_t1 on t1(id);

CREATE INDEX

bill=*# explain select * from t1 where id = 1;

                     QUERY PLAN

----------------------------------------------------

 Seq Scan on t1  (cost=0.00..25.88 rows=6 width=36)

   Filter: (id = 1)

(2 rows)

 

bill=*# end;

COMMIT

bill=# explain select * from t1 where id = 1;

                             QUERY PLAN

---------------------------------------------------------------------

 Bitmap Heap Scan on t1  (cost=1.50..7.01 rows=6 width=36)

   Recheck Cond: (id = 1)

   ->  Bitmap Index Scan on idx_t1  (cost=0.00..1.50 rows=6 width=0)

         Index Cond: (id = 1)

(4 rows)

很显然的问题就是,我在事务中创建了索引,却没办法使用。但是当事务提交了后便可以正常使用了,这是什么情况呢?

这个其实和pg_index中indcheckxmin属性有关,关于这个字段的解释如下:

If true, queries must not use the index until the xmin of this pg_index row is below their TransactionXmin event horizon, because the table may contain broken HOT chains with incompatible rows that they can see

经检查也确实如此:

1

2

3

4

5

bill=*# select indcheckxmin from pg_index where indexrelid = 'idx_t1'::regclass;

 indcheckxmin

--------------

 t

(1 row)

那么问题来了,什么情况下创建索引时会将索引的该属性设置为true呢?

从前面官方文档对于该字段的解释,如果表中包含broken HOT chains 则会为true,那什么是broken HOT chains ?似乎和HOT机制有关。那是不是只有存在broken HOT chains 才会设置为true呢?

这里就不卖关子了,直接给出结论,然后我们再去一一验证。

经测试发现,以下两种情况会导致索引的indcheckxmin设置为true:

  • 当前事务中表上存在broken HOT chains,即官方文档中所说;
  • 当old_snapshot_threshold被设置时。

场景一:broken HOT chains

这种情况,只要在当前事务中表中存在HOT更新的行时就会存在。那么什么时候会进行HOT更新呢?两个前提:

  • 新的元组和旧元组必须在同一个page中;
  • 索引字段不能进行更新。

既然如此,实际中常见的两种情况就是:

  • 对表上最后一个page进行更新;
  • 表设置了fillfactor,即每个page上有预留的空闲空间。

例子:

表中插入10条数据,自然只有1个page:

1

2

bill=# insert into t1 select generate_series(1,10),md5(random()::text);

INSERT 0 10

进行更新:

1

2

bill=# update t1 set info = 'bill' where id = 10;

UPDATE 1

查看发现的确是HOT更新:

关于t_infomask2字段的解释这里就不再赘述。

接下来我们创建索引:

可以发现indcheckxmin被设置为true,在当前事务中索引不可用。

经过验证,在index_build阶段,判断到BrokenHotChain,便将indcheckxmin修改为true。

具体的修改代码如下:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

/*此时indexInfo->ii_BrokenHotChain已被修改为true */

if ((indexInfo->ii_BrokenHotChain || EarlyPruningEnabled(heapRelation)) &&

        !isreindex &&

        !indexInfo->ii_Concurrent)

    {

        Oid         indexId = RelationGetRelid(indexRelation);

        Relation    pg_index;

        HeapTuple   indexTuple;

        Form_pg_index indexForm;

        pg_index = table_open(IndexRelationId, RowExclusiveLock);

        indexTuple = SearchSysCacheCopy1(INDEXRELID,

                                         ObjectIdGetDatum(indexId));

        if (!HeapTupleIsValid(indexTuple))

            elog(ERROR, "cache lookup failed for index %u", indexId);

        indexForm = (Form_pg_index) GETSTRUCT(indexTuple);

        /* If it's a new index, indcheckxmin shouldn't be set ... */

        Assert(!indexForm->indcheckxmin);

/*将indcheckxmin修改为true */

        indexForm->indcheckxmin = true;

        CatalogTupleUpdate(pg_index, &indexTuple->t_self, indexTuple);

        heap_freetuple(indexTuple);

        table_close(pg_index, RowExclusiveLock);

    }

同样我们也可以验证得知,的确是因为brokenhotchains导致的indcheckxmin被设置为true。

场景二:old_snapshot_threshold

先来看例子:

最简单的场景,完全的一张空表,在事务中创建索引indcheckxmin就会被设置为true,果然索引也是不可用。

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

bill=# drop table t1;

DROP TABLE

bill=# create table t1(id int,info text);

CREATE TABLE

bill=# begin;

BEGIN

bill=*# create index idx_t1 on t1(id);

CREATE INDEX

bill=*# select indcheckxmin from pg_index where indexrelid = 'idx_t1'::regclass;

 indcheckxmin

--------------

 t

(1 row)

 

bill=*# explain select * from t1 where id = 1;

                     QUERY PLAN

----------------------------------------------------

 Seq Scan on t1  (cost=0.00..25.88 rows=6 width=36)

   Filter: (id = 1)

(2 rows)

那么为什么old_snapshot_threshold会产生这样的影响呢?

经过跟踪发现,当开启该参数时,在事务中创建索引的snapshotdata结构如下:

(SnapshotData) $6 = {
  snapshot_type = SNAPSHOT_MVCC
  xmin = 856
  xmax = 856
  xip = 0x00007fd55c804fc0
  xcnt = 0
  subxip = 0x00007fd55ad5d000
  subxcnt = 0
  suboverflowed = false
  takenDuringRecovery = false
  copied = false
  curcid = 1
  speculativeToken = 0
  vistest = NULL
  active_count = 0
  regd_count = 0
  ph_node = {
    first_child = NULL
    next_sibling = NULL
    prev_or_parent = NULL
  }
  whenTaken = 691752041261069
  lsn = 208079736
}

而禁用该参数呢?

(SnapshotData) $7 = {
  snapshot_type = SNAPSHOT_MVCC
  xmin = 828
  xmax = 828
  xip = 0x00007fad31704780
  xcnt = 0
  subxip = 0x00007fad3155d000
  subxcnt = 0
  suboverflowed = false
  takenDuringRecovery = false
  copied = false
  curcid = 1
  speculativeToken = 0
  active_count = 0
  regd_count = 0
  ph_node = {
    first_child = NULL
    next_sibling = NULL
    prev_or_parent = NULL
  }
  whenTaken = 0
  lsn = 0
}

可以看到,区别在于不使用该参数时,创建snapshotdata不会设置whenTaken和lsn,那么这两个参数是干嘛的呢?

先来看看snapshotdata的结构:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

typedef struct SnapshotData

{

    SnapshotType snapshot_type; /* type of snapshot */

    /*

     * The remaining fields are used only for MVCC snapshots, and are normally

     * just zeroes in special snapshots.  (But xmin and xmax are used

     * specially by HeapTupleSatisfiesDirty, and xmin is used specially by

     * HeapTupleSatisfiesNonVacuumable.)

     *

     * An MVCC snapshot can never see the effects of XIDs >= xmax. It can see

     * the effects of all older XIDs except those listed in the snapshot. xmin

     * is stored as an optimization to avoid needing to search the XID arrays

     * for most tuples.

     */

    TransactionId xmin;         /* all XID < xmin are visible to me */

    TransactionId xmax;         /* all XID >= xmax are invisible to me */

    /*

     * For normal MVCC snapshot this contains the all xact IDs that are in

     * progress, unless the snapshot was taken during recovery in which case

     * it's empty. For historic MVCC snapshots, the meaning is inverted, i.e.

     * it contains *committed* transactions between xmin and xmax.

     *

     * note: all ids in xip[] satisfy xmin <= xip[i] < xmax

     */

    TransactionId *xip;

    uint32      xcnt;           /* # of xact ids in xip[] */

    /*

     * For non-historic MVCC snapshots, this contains subxact IDs that are in

     * progress (and other transactions that are in progress if taken during

     * recovery). For historic snapshot it contains *all* xids assigned to the

     * replayed transaction, including the toplevel xid.

     *

     * note: all ids in subxip[] are >= xmin, but we don't bother filtering

     * out any that are >= xmax

     */

    TransactionId *subxip;

    int32       subxcnt;        /* # of xact ids in subxip[] */

    bool        suboverflowed;  /* has the subxip array overflowed? */

    bool        takenDuringRecovery;    /* recovery-shaped snapshot? */

    bool        copied;         /* false if it's a static snapshot */

    CommandId   curcid;         /* in my xact, CID < curcid are visible */

    /*

     * An extra return value for HeapTupleSatisfiesDirty, not used in MVCC

     * snapshots.

     */

    uint32      speculativeToken;

    /*

     * For SNAPSHOT_NON_VACUUMABLE (and hopefully more in the future) this is

     * used to determine whether row could be vacuumed.

     */

    struct GlobalVisState *vistest;

    /*

     * Book-keeping information, used by the snapshot manager

     */

    uint32      active_count;   /* refcount on ActiveSnapshot stack */

    uint32      regd_count;     /* refcount on RegisteredSnapshots */

    pairingheap_node ph_node;   /* link in the RegisteredSnapshots heap */

    TimestampTz whenTaken;      /* timestamp when snapshot was taken */

    XLogRecPtr  lsn;            /* position in the WAL stream when taken */

    /*

     * The transaction completion count at the time GetSnapshotData() built

     * this snapshot. Allows to avoid re-computing static snapshots when no

     * transactions completed since the last GetSnapshotData().

     */

    uint64      snapXactCompletionCount;

} SnapshotData;

如上所示,TimestampTz表示snapshot何时产生的,为什么启用old_snapshot_threshold时会设置该值呢?

因为该值正是用来判断快照是否过旧的:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

/*

 * Implement slower/larger portions of TestForOldSnapshot

 *

 * Smaller/faster portions are put inline, but the entire set of logic is too

 * big for that.

 */

void

TestForOldSnapshot_impl(Snapshot snapshot, Relation relation)

{

        if (RelationAllowsEarlyPruning(relation)

                && (snapshot)->whenTaken < GetOldSnapshotThresholdTimestamp())

                ereport(ERROR,

                                (errcode(ERRCODE_SNAPSHOT_TOO_OLD),

                                 errmsg("snapshot too old")));

}

这样我们也比较好理解为什么设置了该参数时创建的索引在当前事务中不可用:

因为我们不设置该参数时,在事务中创建索引是可以保证MVCC的一致性,那么索引便是安全可用的。

而使用参数时,由于TimestampTz被设置,数据库会对其进行判断该行数据是否已经过期,如果过期了那便会被清理掉,这样对于索引来说便是不安全的,没法保证数据的一致性,对于不是hot-safe的索引,自然要将其indcheckxmin设置为true,防止在事务中创建索引后数据实际已经过期被删除的情况。

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

/*

    * At this moment we are sure that there are no transactions with the

    * table open for write that don't have this new index in their list of

    * indexes.  We have waited out all the existing transactions and any new

    * transaction will have the new index in its list, but the index is still

    * marked as "not-ready-for-inserts".  The index is consulted while

    * deciding HOT-safety though.  This arrangement ensures that no new HOT

    * chains can be created where the new tuple and the old tuple in the

    * chain have different index keys.

    *

    * We now take a new snapshot, and build the index using all tuples that

    * are visible in this snapshot.  We can be sure that any HOT updates to

    * these tuples will be compatible with the index, since any updates made

    * by transactions that didn't know about the index are now committed or

    * rolled back.  Thus, each visible tuple is either the end of its

    * HOT-chain or the extension of the chain is HOT-safe for this index.

    */

总结

当pg_index的indcheckxmin字段被设置为true时,直到此pg_index行的xmin低于查询的TransactionXmin视界之前,查询都不能使用此索引。

而产生这种现象主要有两种情况:

1. 表上在当前事务中存在broken HOT chains;

2. old_snapshot_threshold被设置时。


版权声明 : 本文内容来源于互联网或用户自行发布贡献,该文观点仅代表原作者本人。本站仅提供信息存储空间服务和不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权, 违法违规的内容, 请发送邮件至2530232025#qq.cn(#换@)举报,一经查实,本站将立刻删除。
原文链接 : https://foucus.blog.csdn.net/article/details/121701242
相关文章
  • Postgresql删除数据库表中重复数据的几种方法
    一直使用Postgresql数据库,有一张表是这样的: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 DROP TABLE IF EXISTS public.devicedata; CREATE TABLE public.devicedata ( Id varchar(20
  • PostgreSQL HOT与PHOT有哪些区别

    PostgreSQL HOT与PHOT有哪些区别
    1、HOT概述 PostgreSQL中,由于其多版本的特性,当我们进行数据更新时,实际上并不是直接修改元数据,而是通过新插入一行数据来进行间接
  • PostgreSQL索引失效会发生什么?

    PostgreSQL索引失效会发生什么?
    前段时间碰到个奇怪的索引失效的问题,实际情况类似下面这样: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 bill=# begin; BEGIN bill=*# create inde
  • PostgreSQL pg_filenode.map文件介绍

    PostgreSQL pg_filenode.map文件介绍
    今天在网上看到有人问误删pg_filenode.map该如何恢复或者重建,解决这个问题前我们先来了解下pg_filenode.map文件。 对于PostgreSQL中的每张表在磁
  • PostgreSQL limit的神奇作用介绍
    最近碰到这样一个SQL引发的性能问题,SQL内容大致如下: 1 2 3 4 5 6 7 SELECT * FROM t1 WHERE id = 999 AND (case $1 WHEN true THEN info = $2 ELSE info = $3 end) l
  • PostgreSql生产级别数据库安装要注意事项

    PostgreSql生产级别数据库安装要注意事项
    我让公司的小伙伴写一个生产级别的PostgreSQL的安装文档,结果他和我说:不是用一个命令就能安装好么?还用写文档么?。我知道他想说的
  • PostgreSQL12.5中分区表的一些操作介绍
    1、创建一个有DEFAULT的分区表 1、先创建主表 1 2 3 4 5 6 7 8 9 10 11 12 13 14 create table tbl_log ( id serial, create_time timestamp(0) without time zone, remark char
  • Postgres中UPDATE更新语句源码分析
    PG中UPDATE源码分析 本文主要描述SQL中UPDATE语句的源码分析,代码为PG13.3版本。 整体流程分析 以update dtea set id = 1;这条最简单的Update语句进行
  • 在Centos8-stream安装PostgreSQL13的教程

    在Centos8-stream安装PostgreSQL13的教程
    一、安装postgresql13-server 1 2 yum install -y https://download.postgresql.org/pub/repos/yum/reporpms/EL-7-x86_64/pgdg-redhat-repo-latest.noarch.rpm yum install -y postgresql13-
  • PostgreSQL自动更新时间戳实例介绍
    什么是PostgreSQL时间戳数据类型? 在PostgreSQL中,下一个数据类型是 TIMESTAMP ,它可以存储 TIME 和 DATE 值。但它不支持任何 时区数据。这意味
  • 本站所有内容来源于互联网或用户自行发布,本站仅提供信息存储空间服务,不拥有版权,不承担法律责任。如有侵犯您的权益,请您联系站长处理!
  • Copyright © 2017-2022 F11.CN All Rights Reserved. F11站长开发者网 版权所有 | 苏ICP备2022031554号-1 | 51LA统计