血压低吃什么补血| 梦见面包是什么意思| 茄子不能和什么一起吃| 双肾盂是什么意思| 什么是0| gv是什么| 柠檬是什么季节的水果| 半边脸疼是什么原因引起的| 西瓜有什么营养| 玄关是什么位置| 人丝是什么面料| 睡觉梦到蛇是什么意思| 什么哲理| 什么牌子的床垫好| 政协主席是什么级别| 乳房疼痛吃什么消炎药| 肝血亏虚吃什么中成药调理| 山字五行属什么| 疱疹用什么药好| 西凤酒属于什么香型| 柔式按摩是什么意思| 罗布麻是什么东西| 补办港澳通行证需要什么材料| hp感染是什么意思| 女人做什么好| 长颈鹿的脖子像什么| 农历五月初五是什么星座| 三点水一个条读什么| 手指关节痛挂什么科| 枸杞有什么用| 耳鸣是什么原因导致的| 女性多囊是什么意思| 做肠镜要挂什么科| 木薯淀粉可以用什么代替| 高碱性食物都有什么| oce是什么牌子| 热能是什么| 朝朝暮暮是什么意思| 口腔溃疡长什么样| 打耳洞什么季节最好| alcon是什么牌子| 消化内科主要看什么病| 轻度肠上皮化生是什么意思| 什么叫囊性结节| 12月7日是什么星座| 生不逢时是什么意思| 为什么会血压低| 支气管炎是什么引起的| 双歧杆菌三联和四联有什么区别| 梦见黑狗是什么意思| 不停的出汗是什么原因| 山野是什么意思| 钾高吃什么药| 病毒性感染是什么原因引起的| 关东煮为什么叫关东煮| 不修边幅是什么意思| 肥猪拱门是什么生肖| 上眼药是什么意思| 鱼是什么意思| ercp是什么检查| 鹌鹑吃什么| 检查胸部挂什么科| 06年属什么| 脸上起红疙瘩是什么原因| 鹊桥是什么意思| 包皮溃烂是什么原因| 痔疮是什么东西| 早泄用什么药| 145是什么意思| 先天性聋哑病属于什么遗传病| 以讹传讹什么意思| 阿普唑仑是什么药| 尾椎骨痛挂什么科| min是什么单位| 吃蓝莓有什么好处| 喝柠檬水有什么作用与功效| igc是什么意思| 血尿是什么原因引起的| 后脑勺发麻是什么原因| 治飞蚊症用什么眼药水| 上焦有火吃什么中成药| 六十岁叫什么之年| 女性肛裂要抹什么药好| 羊奶粉和牛奶粉有什么区别| 血压低吃什么补得最快| 1984年是什么年| balenciaga什么品牌| 凌厉是什么意思| 擦是什么意思| 涸的意思是什么| 青海古代叫什么| 4月11号是什么星座| 绿色裤子配什么上衣| 蒲公英和什么相克致死| 黄毛什么意思| 七个月宝宝能吃什么水果| 阴虚火旺吃什么调理| 子宫内膜14mm说明什么| showroom是什么意思| 肩膀疼吃什么药| 男人不举是什么原因造成的| 有什么好处| 为什么打喷嚏| 丙肝是什么病| 梦到一个人意味着什么| 侍妾是什么意思| 中医减肥挂什么科| gg是什么意思| 维生素吃多了有什么副作用| 黑枸杞泡茶是什么颜色| 吊儿郎当什么意思| crp高是什么感染| 额窦炎吃什么药效果好| 三个又是什么字| pt950是什么材质| 尿痛是什么原因| 文王卦是什么意思| 属鸡适合佩戴什么饰品| 广义货币m2是什么意思| 高职本科是什么意思| ppm是什么| copd什么意思| 颏下是什么部位| 水瓶女和什么座最配| rps是什么| 什么方法可以快速排便| 左侧卵巢无回声是什么意思| 人为什么要日b| 热裤是什么裤子| 什么得什么的| 子宫内膜息肉有什么症状| 后循环缺血是什么意思| 为什么叫太平间| 九宫是什么意思| 睡觉起来眼皮肿是什么原因| 最小的一位数是什么| 英语四级什么时候报名| 庚子是什么意思| 动物的脖子有什么作用| 自动化是干什么的| 呔是什么意思| 玄胡又叫什么| 一个巾一个占念什么| 端粒是什么| 文书是什么意思| 什么东西蛋白质最高| 早晨4点是什么时辰| 尿常规白细胞高是什么原因| 吃东西想吐是什么原因| 1938年属什么| 龟奴是什么| 命门是什么意思| 八月二十二是什么星座| 燕窝是补什么的| 五更泻吃什么药| 洪都拉斯为什么不与中国建交| 尿少尿黄是什么原因引起的| 有什么好用的vpn| 肛周瘙痒用什么药| 胸闷气短吃什么药| 卡其色裙子配什么颜色上衣好看| 什么让我成长| 情感障碍是什么意思| 头晕耳鸣吃什么药| 人参片泡水喝有什么功效和作用| 尿毒症吃什么最好| 长宽高用什么字母表示| 揽子是什么意思| 尿液特别黄是什么原因引起的| 衣服为什么会发霉| 腱鞘炎贴什么膏药| 周岁和虚岁是什么意思| 山地自行车什么牌子好| qcy是什么牌子| 梦见打井是什么意思| 丙辰日是什么意思| 火车票改签是什么意思| 痔疮什么东西不能吃| ozark是什么牌子| 百什么齐什么| 补肾壮阳吃什么药效果好| 天天洗头发有什么危害| 好哒是什么意思| 牙疼吃什么药止痛快| doms是什么意思| 198是什么意思| 无锡有什么好玩的| 炎细胞浸润是什么意思| 什么会导致铅中毒| 对口升学什么意思| 身份证最后一位代表什么| 频繁打喷嚏是什么原因| 一什么黑影| 盲目是什么意思| 绿野仙踪是什么意思| 蒲公英有什么功效和作用| 青帝是什么意思| 95棉5氨纶是什么面料| 1983年出生是什么命| 牙龈肿痛发炎吃什么药| 进去是什么感觉| 喘不上气挂什么科| 马来西亚说什么语言| 上呼吸道感染吃什么消炎药| 臣字五行属什么| 硒中毒有什么症状| 王羲之的儿子叫什么名字| 监视居住是什么意思| 心慌吃什么药能缓解| 葡萄糖为什么叫葡萄糖| 月经期间吃什么补血| 中位数是什么| 10.11是什么星座| 利福平是什么药| 神经内科主要看什么| 手抖吃什么药马上控制| 夜不能寐是什么意思| 动脉硬化用什么药好| lot是什么| 正月开什么花| 痱子吃什么药| 石斛有什么作用和功效| 心有灵犀什么意思| 坐月子能吃什么| 监督的近义词是什么| 上眼皮肿是什么原因| 左眼上眼皮跳是什么预兆| 92年是什么命| Zucchini是什么意思| 喉咙痒痒的吃什么药| 悠悠是什么意思| 豆芽炒什么好吃| 烤鱼用什么鱼| 家政是什么工作| 私人订制什么意思| 牙痛吃什么药好得快| 龙王庙是指什么生肖| 事宜愿为是什么意思| 五行中什么生水| 深喉是什么意思| 白化病有什么危害吗| 亚麻籽有什么功效| 桑葚酒有什么功效| 屋里喷什么消毒最好| 什么都有| acs是什么病| 为什么吐后反而舒服了| 身不由己是什么生肖| 二拇指比大拇指长代表什么| 云母是什么东西| 快递客服主要做什么| 11月15日出生是什么星座| 靶向治疗是什么| 四十属什么| 多动症去医院挂什么科室| 脸肿挂什么科| 什么是植物神经| or发什么音| 鼻子有痣代表什么| 汗斑是什么样的图片| 座是什么结构| 肌张力高有什么表现| 宝五行属性是什么| 6月份什么星座| 百度

Thursday, February 26, 2015

Why you might need statement_cost_limit

Here's a commonplace ops crisis: the developers push a new dashboard display widget for user homepages on your application.  This seems to work fine with in testing, and they push it out to production ... not realizing that for some large subset of users dissimilar from your tests, the generated query triggers a sequential scan on the second-largest table in the database.   Suddenly your database servers are paralyzed with load, and you have to shut down the whole site and back out the changes.

Wouldn't it be nice if you could just tell the database server "don't run expensive queries for the 'web' user"?  Well, thanks to my colleague Andrew Dunstan, who wrote plan_filter with support from Twitch.TV, now you can.

Sort of.  Let me explain.

PostgreSQL has had statement_timeout for a while, which can be set on a per-user basis (or other places) to prevent application errors from running queries for hours.  However, this doesn't really solve the "overload" issue, because the query runs for that length of time, gobbling resources until it's terminated.  What you really want to do is return an error immediately if a query is going to be too costly.

plan_filter is a loadable module which allows you to set a limit on the cost of queries you can execute.  It works, as far as we know, with all versions of Postgres starting at 9.0 (we've tested 9.1, 9.3 and 9.4). 

Let me show you.  First, you have to load the module in postgresql.conf:

    shared_preload_libraries = 'plan_filter'

Then you alter the "web" user to have a strict limit:

    ALTER USER web SET plan_filter.statement_cost_limit = 200000.0

Then try some brain-dead query as that user, like a blanket select from the 100m-row "edges" graph table:

    \c - web
    SELECT * FROM edges;

    STATEMENT:  select * from edges;
    ERROR:  plan cost limit exceeded
    HINT:  The plan for your query shows that it would probably
    have an excessive run time. This may be due to a logic error
    in the SQL, or it maybe just a very costly query. Rewrite 
    your query or increase the configuration parameter
    "plan_filter.statement_cost_limit".

Obviously, your application needs to handle this error gracefully, especially since you'll likely get it for hundreds or thousands of queries at once if you're sending bad queries due to a code change. But a bunch of errors is definitely better than having to restart your whole app cluster.   It's comparatively easy to just display a broken widget icon.

So why did I say "sort of", and why aren't we submitting this as a feature for PostgreSQL 9.5?

Well, there's some issues with limiting by plan cost.  The first is that if you can't run the query due to the cost limit, you also can't run an EXPLAIN to see why the query is so costly in the first place.  You'd need to set plan_filter.statement_cost_limit = 0 in your session to get the plan.

The second, and much bigger, issue is that plan cost estimates are just that: estimates.  They don't necessarily accurately show how long the query is actually going to take.  Also, unless you do a lot of cost tuning, costs do not necessarily consistently scale between very different queries.   Worst of all, some types of queries, especially those with LIMIT clauses, can return a cost in the plan which is much higher than the real cost because the planner expects to abort the query early.

So you're looking at a strong potential for false positives with statement_cost_limit.  This means that you need to both set the limit very high (like 5000000) and work your way down, and test this on your staging cluster to make sure that you're not bouncing lots of legitimate queries.  Overall, statement_cost_limit is mainly useful to DBAs who know their query workloads really well.

That means it's not ready for core Postgres (assuming it ever is).  Fortunately, PostgreSQL is extensible so you can use it right now while you wait for it to eventually become a feature, or to be supplanted by a better mechanism of resource control.

Tuesday, February 17, 2015

Spring/Summer 2015 Conference Schedule

What follows is my conference travel schedule through the early summer.  I'm posting it so that local PUGs will know when I'm going to be nearby, in case you want me to come talk to your members.  Also, so folks can find me at conference booths everywhere.

This list is also for anyone who was unaware of the amount of Postgres content available this year at conferences everywhere.
  • SCALE, Los Angeles, this week: 2-day Postgres track, booth.  Use code "SPEAK" if you still haven't registered for a small discount.  I'm speaking on 9.4 (Friday), and PostgreSQL on AWS (Sunday).
  • March 10, Burlingame, CA: pgDay SF 2015 Running the event, and a lightning talk.
  • March 25-27, NYC, NY: pgConf NYC: speaking on PostgreSQL on PAAS: a comparison of all the big ones.
  • April 25-26, Bellingham, WA: LinuxFest NorthWest, tentatively.  Talks haven't been chosen yet.  If I go, I'll also be working a booth no doubt.  I understand there are plans to have a bunch of Postgres stuff at this event.
  • June 16-20, Ottawa, Canada: pgCon of course.
  • July 20-24, Portland, OR: OSCON (tentatively, talks not selected).  Postgres talk of some sort, and probably booth duty.
Now you know.

Sunday, February 15, 2015

Running with scissors mode

DBAs Running with Scissors

Based on some comments in my post about "in-memory" databases, I realized that my post about running Postgres without disk sync was no longer available on Database Soup.  So I'm reposting the instructions here.

This blog post version has corrections and clarifications thanks to Andres Freund, since it first went up yesterday.  Thanks, Andres.

Running PostgreSQL this way was christened "running with scissors mode" by Gavin Roy, because you're operating completely without crash-safety; if something happens to the server, even a power fluctuation, your database contents are untrustworthy and may be corrupt.  However, it can be a useful way to run Postgres for extra, read-only replicas used strictly for load-balancing, or if what you're loading into Postgres is completely disposable/replaceable.

Note that these settings do not, in fact, disable all disk writes.  What they do instead is minimize disk writes, and make all disk writes asynchronous, dependant entirely on the OS's own memory swapping and dirty block flushing for any disk writes.  This is what you want; you don't want the database to halt because, for example, you simply ran out of space in memory.

So, without further ado, here's the settings:

    work_mem =  (RAM - DBsize - shared_buffers)*2 / max_connections
    temp_buffers = (RAM - DBsize - shared_buffers)*4 / max_connections
    temp_file_limit = 0

On the one hand, we want to set work_mem high in order to avoid on-disk sorts. On the other hand, having pinned RAM for sorts push the database out of memory would be counterproductive.  As such, you want to set up work memory to use available RAM you don't need for database caching.  The above assumes that max_connections is set to something sensible for the number of connections you actually need.  You should really be using pgbouncer as well with this setup.


Set temp_file_limit = 0 to cause queries to be cancelled instead of doing disk sorts.

     bgwriter_lru_maxpages = 0
     wal_level = minimal
     fsync = off
     synchronous_commit = off
     full_page_writes = off

     wal_log_hints = off
     wal_buffers = 64MB


Here we're minimizing the amount of writing we do to the transaction log, and making said writing completely asynchronous.  We're also disabling background writing.

     checkpoint_segments = 8 to 64
   checkpoint_timeout = 60min
     checkpoint_completion_target = 0.9

Checkpoint segments is a bit trickier.  On the one hand, you want it to be large enough that it's not cycling a lot and triggering extra disk flushes. On the other hand, you want all the segments to stay cached in RAM.  So something moderate, 256MB to 2GB, depending on how much RAM you have.  Don't set it to more than 1/32nd of RAM.  Ideally, we'd be able to disable checkpoints entirely, but currently there's no reasonable way to do that.

   stats_temp_directory = '/pgramdisk/stats_tmp'

You will also want to move the stats file to a ramdisk so that it's not being written out.  This is a good optimization in general, even outside of running with scissors mode.


Finally, we need to ensure that PostgreSQL will not restart if the system crashes; at that point, you assume your database is corrupt and proceed to recover it from another source.  The first part of doing this is to disable any autostart in your init system configuration.  Secondly, after starting up Postgres, add a line like this to the beginning of postgresql.conf:

     DO_NOT_RESTART=True

The exact parameter you use doesn't matter; what matters is that it's not a recognized parameter, so that Postgres will error out instead of  restarting.  This does mean extra steps when you want to manually restart this node for configuration changes, but there isn't a really good way around that.

Next up, we'll run some performance tests to see how much this benefits us.

Continued in Part II.

Photo of "DBAs running with scissors" taken by Gavin Roy.

Friday, February 13, 2015

"In-memory" is not a feature, it's a bug

So, I'm hearing again about the latest generation of "in-memory databases". Apparently Gartner even has a category for them now.  Let me define an in-memory database for you:

     An in-memory database is one which lacks the capability of spilling to disk.

As far as I know in my industry literature reading, nobody has demonstrated any useful way in which data should be stored differently if it never spills to disk.   While the talented engineers of several database products have focused on other performance optimizations to the exclusion of making disk access work, that's not an optimization of the database; it's an optimization of engineer time.   The exact same database, with disk access capabilities, would be automatically superior to its predecessor, because users would now have more options.

PostgreSQL can be an "in-memory" database too, if you simply turn all of the disk storage features off.  This is known as "running with scissors" mode, and people do it for useful effect on public clouds with disposable replicas.

So an "in-memory" database is a database with a major limitation.  It's not a feature, any more than an incapability of supporting SQL access is a feature.  Let's define databases by their useful features, not by what they lack, please.

Besides which, with the new types of persistent memory and fast random access storage coming down the pipe in a couple years, there soon won't be any difference between disk and memory anyway.

Thursday, February 12, 2015

Tree Join Tables: preventing cycles

Searching Google, I was surprised to find that there were few solutions published for a common issue: preventing users from creating a cycle when you create a self-join table.  So here's one solution, which will be "good enough" for most people, but has some caveats (see below).

First, the setup: we have a table of items.  Items can be in one or more collections.  Each item can itself be a collection, allowing users to create collections of collections.  So the first thing we need is a self-join table on the "adjacency list" model:

    create table collections (
        collection_id int not null references items(id) on delete cascade,
        item_id int not null references items(id) on delete cascade,
        constraint collections_pk primary key ( collection_id, item_id )
    );
    create index on collections(item_id);

So the first part of preventing cycles is to prevent the simplest cycle, where a collection collects itself.  That can be done with a constraint:

     alter table collections add constraint
     no_self_join check ( collection_id <> item_id )

Now comes the tough part, preventing cycles of more than one, two, or N collections in a chain.  This requires us to look down a chain of possible collections and make sure that each inserted tuple doesn't complete a loop.  Fortunately, WITH RECURSIVE works for this provided we do it in a BEFORE trigger.  If we did it in an AFTER trigger, the trigger itself would cycle, which would be no good.

    CREATE OR REPLACE FUNCTION collections_prevent_cycle ()
    returns trigger
    language plpgsql
    as $f$
    BEGIN
        -- select recusively, looking for all child items of the new collection
        -- and making sure that they don't include the new collection
        IF EXISTS ( WITH recursive colitem as (
                select collection_id, item_id
                from collections
                where collection_id = NEW.item_id
                UNION ALL
                select colitem.collection_id, collections.item_id
                from collections
                join colitem on colitem.item_id = collections.collection_id
            )
            SELECT collection_id from colitem
            WHERE item_id = NEW.collection_id
            LIMIT 1 ) THEN
                RAISE EXCEPTION 'You may not create a cycle of collections.';
        END IF;
       
        RETURN NEW;
    END; $f$;

    CREATE TRIGGER collections_prevent_cycle
    BEFORE INSERT OR UPDATE ON collections
    FOR EACH ROW EXECUTE PROCEDURE collections_prevent_cycle();

As I said, this solution will be "good enough" for a variety of uses.  However, it has some defects:

Concurrency: It is vulnerable to concurrency failure.  That is, if two users simultaneously insert "A collects B" and "B collects A", this trigger would not prevent it.  The alternative is locking the entire table on each commit, which is also problematic.

Cost: we're running a pretty expensive recursive query with every insert.  For applications where the tree table is write-heavy, this will decrease throughput significantly.

So my, challenge to you is this: come up with a better solution for this, which solves either the concurrency or cost problem without making the other problem worse.

P.S.: this blog has reached half a million views.  Thanks, readers!

Friday, February 6, 2015

A statement on recent conference events

The PostgreSQL user group in Moscow is currently conducting their first-ever PostgreSQL-themed conference, which has been a tremendous success.  Unfortunately, the venue booked by the conference chose to include inappropriate dancers as part of their entertaiment package. The conference organizers and the Russian PostgreSQL community were not aware of the nature of the entertainment supplied ahead of time.

The PostgreSQL Core Team believes there is no place for inappropriate or discriminatory behaviour at PostgreSQL conferences and tries to ensure that all our conferences are suitable for anyone to attend. As PostgreSQL is an Open Source project with volunteer contributors and a federated organizational structure, we do not have supervisory control over how individual conferences are organized, which means that sometimes they do not benefit from general community experience.

The Russian conference organizers are expected to comment on this unforseen incident once the conference is concluded. The international community will be working with them to make sure that this mistake is not repeated.

Josh Berkus
On Behalf of the PostgreSQL Core Team
and the PostgreSQL Global Development Project

Thursday, February 5, 2015

Some notes on today's update release

We released a security and cumulative bugfix release to all supported versions today.  That means it's update time.  What follows is my personal advice on the update.

For the first time in a while, we have a bunch of "low-risk" security fixes in this release, but no "critical" security fixes.  The reason I put those terms in quotes is that it doesn't matter how critical the fixes are in general; it matters how critical they are to you.  So you should definitely read over the release notes and the CVE notices to check how they affect you.

All five of the security holes patched require prior authentication.  Four of the five have not been proven to have an actual privilege escalation vector; they may be only denial-of-service attacks.  And the fifth security issue only affects you if you are using per-column privileges for columns with constraints on them.  That's why I regard these issues as relatively "low-risk".

There are also some important fixes to performance and replication for versions 9.4 and 9.3, so users of those versions should apply the update soon.  For other users, unless you live in the Fiji Islands or other places affected by timezone changes, you can probably wait for your next scheduled maintenance window.  You do have scheduled maintenance windows, yes?

Other people who might care to apply this update sooner rather than later include:
  • Users who have already had issues with autovacuum
  • People using the new logical decoding
  • Users who have a single archive which is shared between master and replicas.
  • Folks who create a bunch of tablespaces.
  • Developers who use tsquery, xpath(), and/or complex regular expression searches
  • JSONB users.
  • Norwegians who use Postgres on Windows
  • Users who have reported bugs with explicit locking and deadlocking in the last few months.
Again, though, read the release notes.  Because it's always possible that we fixed a bug that already affects you.
放屁臭鸡蛋味什么原因 女生下面什么样 子欲养而亲不待是什么意思 梦见春梦是什么意思 什么睡姿对髋关节好
hpv42阳性是什么意思 什么是支原体 dxm是什么药 结核有什么症状 食人鱼长什么样子
右手无名指戴戒指代表什么 什么的同学 什么叫多巴胺 absolutvodka什么酒 婀娜多姿是什么意思
溥仪和慈禧什么关系 扭曲是什么意思 做胃镜之前需要注意什么 土霉素喂鸡有什么作用 移动电源和充电宝有什么区别
大便拉不干净是什么原因hkuteam.com 叶酸对人体有什么好处hcv8jop5ns5r.cn 地藏菩萨为什么不能拜hcv8jop1ns7r.cn 精子是什么味道的hanqikai.com 晚上睡觉流口水是什么原因jingluanji.com
喝什么饮料解酒最快最有效hcv9jop6ns7r.cn 总是抽筋是什么原因hcv8jop8ns6r.cn 踏马什么意思hcv8jop3ns0r.cn 泵头是什么hcv8jop0ns9r.cn 比基尼是什么意思hcv7jop6ns1r.cn
barry什么意思hcv8jop4ns4r.cn 豆包什么意思hcv8jop8ns0r.cn 吉人自有天相什么意思hcv8jop4ns3r.cn 腺苷钴胺片治什么病hcv7jop4ns6r.cn 睡眠障碍应该挂什么科室jiuxinfghf.com
冰岛为什么不让随便去hcv9jop6ns5r.cn 与世隔绝的绝是什么意思hcv8jop9ns0r.cn 宫内孕和宫外孕有什么区别hcv8jop1ns4r.cn 月经粉红色是什么原因hcv8jop6ns9r.cn 木隶念什么hcv8jop1ns7r.cn
百度