sql-server – 为什么LEN()函数严重低估了SQL Server 2014中的基
我有一个带有字符串列的表和一个检查具有一定长度的行的谓词.在SQL Server 2014中,无论我检查的长度如何,我都会看到1行的估计值.这产生了非常糟糕的计划,因为实际上有数千甚至数百万行,SQL Server正在选择将此表放在嵌套循环的外侧. 是否有SQL Server 2014的基数估计值为1.0003,而SQL Server 2012估计为31,622行?有一个很好的解决方法吗? 以下是该问题的简短复制: -- Create a table with 1MM rows of dummy data CREATE TABLE #customers (cust_nbr VARCHAR(10) NOT NULL) GO INSERT INTO #customers WITH (TABLOCK) (cust_nbr) SELECT TOP 1000000 CONVERT(VARCHAR(10),ROW_NUMBER() OVER (ORDER BY (SELECT NULL))) AS cust_nbr FROM master..spt_values v1 CROSS JOIN master..spt_values v2 GO -- Looking for string of a certain length. -- While both CEs yield fairly poor estimates,the 2012 CE is much -- more conservative (higher estimate) and therefore much more likely -- to yield an okay plan rather than a drastically understimated loop join. -- 2012: 31,622 rows estimated,900K rows actual -- 2014: 1 row estimated,900K rows actual SELECT COUNT(*) FROM #customers WHERE LEN(cust_nbr) = 6 OPTION (QUERYTRACEON 9481) -- Optionally,use 2012 CE GO Here is a more complete script showing additional tests 我也读过whitepaper on the SQL Server 2014 Cardinality Estimator,但没有找到任何澄清情况的东西. 解决方法对于遗留CE,我看到估计是行的3.16228% – 这是用于column = literal谓词的“幻数”启发式(还有其他基于谓词构造的启发式方法 – 但LEN包裹在列的周围遗留的CE结果与此猜测框架相匹配).你可以在Joe Sack的 Selectivity Guesses in absence of Statistics和Ian Jose的 Constant-Constant Comparison Estimation的帖子上看到这个例子.-- Legacy CE: 31622.8 rows SELECT COUNT(*) FROM #customers WHERE LEN(cust_nbr) = 6 OPTION ( QUERYTRACEON 9481); -- Legacy CE GO 现在,对于新的CE行为,它看起来现在对优化器可见(这意味着我们可以使用统计信息).我查看了下面的计算器输出,你可以查看相关的自动生成统计数据作为指针: -- New CE: 1.00007 rows SELECT COUNT(*) FROM #customers WHERE LEN(cust_nbr) = 6 OPTION ( QUERYTRACEON 2312 ); -- New CE GO -- View New CE behavior with 2363 (for supported option use XEvents) SELECT COUNT(*) FROM #customers WHERE LEN(cust_nbr) = 6 OPTION (QUERYTRACEON 2312,QUERYTRACEON 2363,QUERYTRACEON 3604,RECOMPILE); -- New CE GO /* Loaded histogram for column QCOL: [tempdb].[dbo].[#customers].cust_nbr from stats with id 2 Using ambient cardinality 1e+006 to combine distinct counts: 999927 Combined distinct count: 999927 Selectivity: 1.00007e-006 Stats collection generated: CStCollFilter(ID=2,CARD=1.00007) CStCollBaseTable(ID=1,CARD=1e+006 TBL: #customers) End selectivity computation */ EXEC tempdb..sp_helpstats '#customers'; --Check out AVG_RANGE_ROWS values (for example - plenty of ~ 1) DBCC SHOW_STATISTICS('tempdb..#customers','_WA_Sys_00000001_B0368087'); --That's my Stats name yours is subject to change 遗憾的是,逻辑依赖于对不同值的数量的估计,而不是针对LEN函数的影响进行调整. 可能的解决方法 通过将LEN重写为LIKE,您可以在两种CE模型下获得基于trie的估计: SELECT COUNT_BIG(*) FROM #customers AS C WHERE C.cust_nbr LIKE REPLICATE('_',6); 有关使用跟踪标志的信息: > 2363:显示大量信息,包括正在加载的统计信息.> 3604:将DBCC命令的输出打印到消息选项卡. (编辑:温州站长网) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |