博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
PostreSQL崩溃试验全记录
阅读量:5838 次
发布时间:2019-06-18

本文共 9726 字,大约阅读时间需要 32 分钟。

由于客户提到,运行某些大运算量批处理时,PostgreSQL突发性使用大量资源的问题,

进行了如下的调查,发现确实发生了崩溃。PostgreSQL 需要资源控制方案啊。

现在正在考虑是否必须要用 os 级别的限制方案:

过程如下:

测试环境:

内存:大约1024MB

postgresql.conf 设定:

    使用缺省值:checkpoint_segments =3  shard_buffers=32MB

    这些是有意为之,就是想看看数据量大、shared_buffer小的状况:

建立表(一条记录大约1024字节):

    postgres=# create table test01(id integer, val char(1024)); 

向表中插入大量数据(总共插入2400MB)

   postgres=# insert into test01 values(generate_series(1,2457600),repeat( chr(int4(random()*26)+65),1024));

插入操作会花费一些时间,此时用ps 命令看PostgreSQL个进程对内存使用状况:‘

 

[root@server ~]# ps aux | grep postroot      3180  0.0  0.0 105296   712 pts/1    S    16:31   0:00 su - postgrespostgres  3181  0.0  0.0  70304   676 pts/1    S+   16:31   0:00 -bashpostgres  3219  0.0  0.2 113644  2864 pts/1    S    16:32   0:00 /usr/local/pgsql/bin/postgres -D /gao/datapostgres  3221  0.4  3.0 113724 35252 ?        Ss   16:32   0:01 postgres: writer process                  postgres  3222  0.2  0.1 113644  1616 ?        Ds   16:32   0:00 postgres: wal writer process              postgres  3223  0.0  0.0 114380  1148 ?        Ss   16:32   0:00 postgres: autovacuum launcher process     postgres  3224  0.0  0.0  73332   472 ?        Ss   16:32   0:00 postgres: stats collector process         root      3252  0.0  0.0 105296   712 pts/2    S    16:32   0:00 su - postgrespostgres  3253  0.0  0.0  70304   676 pts/2    S    16:32   0:00 -bashpostgres  3285  0.0  0.0  83488   740 pts/2    S+   16:32   0:00 ./psqlpostgres  3286 14.8 80.2 2598332 924308 ?      Ds   16:32   0:35 postgres: postgres postgres [local] INSERTroot      3333  0.0  0.0  65424   812 pts/3    S+   16:36   0:00 grep post

可以发现,INSERT操作正在消耗80%以上的内存。

再等片刻,发现psql端传来反馈:

WARNING:  terminating connection because of crash of another server processDETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.HINT:  In a moment you should be able to reconnect to the database and repeat your command.The connection to the server was lost. Attempting reset: Failed.!>

此时看看Log,可以看到:background writer (3321) 已经被干掉了,所有连接被重置。

LOG:  autovacuum launcher startedLOG:  database system is ready to accept connectionsLOG:  checkpoints are occurring too frequently (2 seconds apart)HINT:  Consider increasing the configuration parameter "checkpoint_segments".LOG:  checkpoints are occurring too frequently (1 second apart)HINT:  Consider increasing the configuration parameter "checkpoint_segments".LOG:  checkpoints are occurring too frequently (2 seconds apart)HINT:  Consider increasing the configuration parameter "checkpoint_segments".LOG:  checkpoints are occurring too frequently (1 second apart)HINT:  Consider increasing the configuration parameter "checkpoint_segments".LOG:  checkpoints are occurring too frequently (2 seconds apart)HINT:  Consider increasing the configuration parameter "checkpoint_segments".LOG:  checkpoints are occurring too frequently (1 second apart)HINT:  Consider increasing the configuration parameter "checkpoint_segments".LOG:  checkpoints are occurring too frequently (2 seconds apart)HINT:  Consider increasing the configuration parameter "checkpoint_segments".LOG:  checkpoints are occurring too frequently (4 seconds apart)HINT:  Consider increasing the configuration parameter "checkpoint_segments".LOG:  checkpoints are occurring too frequently (10 seconds apart)HINT:  Consider increasing the configuration parameter "checkpoint_segments".LOG:  checkpoints are occurring too frequently (9 seconds apart)HINT:  Consider increasing the configuration parameter "checkpoint_segments".LOG:  checkpoints are occurring too frequently (7 seconds apart)HINT:  Consider increasing the configuration parameter "checkpoint_segments".LOG:  checkpoints are occurring too frequently (9 seconds apart)HINT:  Consider increasing the configuration parameter "checkpoint_segments".LOG:  checkpoints are occurring too frequently (7 seconds apart)HINT:  Consider increasing the configuration parameter "checkpoint_segments".LOG:  checkpoints are occurring too frequently (5 seconds apart)HINT:  Consider increasing the configuration parameter "checkpoint_segments".LOG:  checkpoints are occurring too frequently (5 seconds apart)HINT:  Consider increasing the configuration parameter "checkpoint_segments".LOG:  checkpoints are occurring too frequently (10 seconds apart)HINT:  Consider increasing the configuration parameter "checkpoint_segments".LOG:  checkpoints are occurring too frequently (11 seconds apart)HINT:  Consider increasing the configuration parameter "checkpoint_segments".LOG:  checkpoints are occurring too frequently (15 seconds apart)HINT:  Consider increasing the configuration parameter "checkpoint_segments".LOG:  checkpoints are occurring too frequently (23 seconds apart)HINT:  Consider increasing the configuration parameter "checkpoint_segments".LOG:  checkpoints are occurring too frequently (7 seconds apart)HINT:  Consider increasing the configuration parameter "checkpoint_segments".LOG:  checkpoints are occurring too frequently (8 seconds apart)HINT:  Consider increasing the configuration parameter "checkpoint_segments".LOG:  checkpoints are occurring too frequently (13 seconds apart)HINT:  Consider increasing the configuration parameter "checkpoint_segments".LOG:  checkpoints are occurring too frequently (7 seconds apart)HINT:  Consider increasing the configuration parameter "checkpoint_segments".LOG:  checkpoints are occurring too frequently (8 seconds apart)HINT:  Consider increasing the configuration parameter "checkpoint_segments".LOG:  checkpoints are occurring too frequently (8 seconds apart)HINT:  Consider increasing the configuration parameter "checkpoint_segments".LOG:  checkpoints are occurring too frequently (8 seconds apart)HINT:  Consider increasing the configuration parameter "checkpoint_segments".LOG:  checkpoints are occurring too frequently (4 seconds apart)HINT:  Consider increasing the configuration parameter "checkpoint_segments".LOG:  checkpoints are occurring too frequently (5 seconds apart)HINT:  Consider increasing the configuration parameter "checkpoint_segments".LOG:  checkpoints are occurring too frequently (9 seconds apart)HINT:  Consider increasing the configuration parameter "checkpoint_segments".LOG:  background writer process (PID 3221) was terminated by signal 9: KilledLOG:  terminating any other active server processesWARNING:  terminating connection because of crash of another server processDETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.HINT:  In a moment you should be able to reconnect to the database and repeat your command.WARNING:  terminating connection because of crash of another server processDETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.HINT:  In a moment you should be able to reconnect to the database and repeat your command.WARNING:  terminating connection because of crash of another server processDETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.HINT:  In a moment you should be able to reconnect to the database and repeat your command.WARNING:  terminating connection because of crash of another server processDETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.HINT:  In a moment you should be able to reconnect to the database and repeat your command.LOG:  all server processes terminated; reinitializingFATAL:  the database system is in recovery modeLOG:  database system was interrupted; last known up at 2013-08-30 16:36:42 CSTLOG:  database system was not properly shut down; automatic recovery in progressLOG:  consistent recovery state reached at 0/B7657BD0LOG:  redo starts at 0/B60FE2B8LOG:  unexpected pageaddr 0/B044C000 in log file 0, segment 184, offset 4505600LOG:  redo done at 0/B844B940LOG:  autovacuum launcher startedLOG:  database system is ready to accept connections

各个进程都重新生成了:

[root@server ~]# ps aux | grep postroot      3180  0.0  0.0 105296   660 pts/1    S    16:31   0:00 su - postgrespostgres  3181  0.0  0.0  70304   628 pts/1    S+   16:31   0:00 -bashpostgres  3219  0.0  0.5 113644  6276 pts/1    S    16:32   0:00 /usr/local/pgsql/bin/postgres -D /gao/dataroot      3252  0.0  0.0 105296   660 pts/2    S    16:32   0:00 su - postgrespostgres  3253  0.0  0.0  70304   628 pts/2    S    16:32   0:00 -bashpostgres  3285  0.0  0.0  83488  1144 pts/2    S+   16:32   0:00 ./psqlpostgres  3348  0.0  0.0 113644   984 ?        Ss   16:40   0:00 postgres: writer process                  postgres  3349  0.0  0.0 113644   732 ?        Ss   16:40   0:00 postgres: wal writer process              postgres  3350  0.0  0.1 114336  1756 ?        Ss   16:40   0:00 postgres: autovacuum launcher process     postgres  3351  0.0  0.0  73332   580 ?        Ss   16:40   0:00 postgres: stats collector process         root      3361  0.0  0.0  65420   796 pts/3    R+   16:44   0:00 grep post[root@server ~]#

回到psql端再看,发现连接已经失效了:

!> \Invalid command \. Try \? for help.!> \dt;You are currently not connected to a database.!>

根据向社区提问,据说是因为OS级别的OOM错误发生,所以杀死了Postmaster进程。

总之,此种情况表明,如果没有对资源消费总量的控制,那么突发性的用户访问会导致崩溃的。

本文转自健哥的数据花园博客园博客,原文链接:http://www.cnblogs.com/gaojian/p/3291938.html,如需转载请自行联系原作者

你可能感兴趣的文章
怎么防止重复发送Ajax
查看>>
ubuntu 下安装 mysql
查看>>
关于k-means聚类算法的matlab实现
查看>>
Git分支2
查看>>
一键安装Gitlab后的备份、迁移与恢复
查看>>
因为本人工作繁忙,精力有限,本博客停止更新。有兴趣的博友可以关注我在CSDN上的主博客...
查看>>
三元表达式,推导式,递归,匿名函数,内置函数
查看>>
SQL server查看触发器是否被禁用
查看>>
[C++基础]在构造函数内部调用构造函数
查看>>
跟随我在oracle学习php(8)
查看>>
Spring 3.1.0 Hibernate 3.0 Eclipse Spring WEB例子
查看>>
使用Unicode写文本文件:一个简单类的示例
查看>>
UVA-10212 The Last Non-zero Digit. 分解质因子+容斥定理
查看>>
求两个集合的交集,并集,差集
查看>>
[Laravel] Laravel的基本数据库操作部分
查看>>
Kotlin的语法糖(一)基础篇
查看>>
OkHttp源码分析
查看>>
让你的app体验更丝滑的11种方法!冲击手机应用榜单Top3指日可待
查看>>
windows kernel exploitation基础教程
查看>>
NS_OPTIONS枚举的用法
查看>>