Pluggable Shuffle

https://issues.apache.org/jira/browse/SPARK-2044

sked join:连接键值不平衡;例如:连接州与交易记录,如果交易集中在某个州,那么大部分的交易数据都送到一个reducer 参见skew join

Hive的解决方法
把查询分成两个 例如 select A.id from A join B on A.id = B.id 变成: select A.id from A join B on A.id = B.id where A.id <> 1select A.id from A join B on A.id = B.id where A.id = 1 and B.id = 1 缺点:a,b表要扫两遍 如果B键值数据比较少,倾斜A,那么b的键值数据可以装入内存

results matching ""

    No results matching ""