DWS层建设实战-2
发布于 2022-07-14
Day08_DWS层建设实战-商品主题统计宽表
Section titled “Day08_DWS层建设实战-商品主题统计宽表”知识点01:课程内容大纲与学习目标
Section titled “知识点01:课程内容大纲与学习目标”#课程内容大纲 1、DWS层构建 商品主题统计宽表的实现 核心:表关系梳理、字段抽取、结果合并 用户主题统计宽表的实现思路
#学习目标 掌握主题需求的分析 掌握表关系梳理与字段抽取知识点02:DWS层搭建—商品主题宽表
Section titled “知识点02:DWS层搭建—商品主题宽表”- 主题需求
- 指标
下单次数、下单件数、下单金额被支付次数、被支付件数、被支付金额被退款次数、被退款件数、被退款金额被加入购物车次数、被加入购物车件数被收藏次数好评数、中评数、差评数
--总共15个指标- 维度
日期(day)+商品- 本主题建表操作
注意:建表操作需要在hive中执行,presto不支持hive的建表语法。
知识点03:DWS层搭建—商品主题宽表—需求分析
Section titled “知识点03:DWS层搭建—商品主题宽表—需求分析”- 确定指标字段与表的关系
当需求提出指标和维度之后,我们需要做的就是确定通过哪些表能够提供支撑。
思考:是不是意味着数仓分层之后,上一层的表只能查询下一层的,能不能跨层?
答案是不一定的。
#维度 dwb_order_detail 时间、商品
#下单次数、下单件数、下单金额dwb_order_detail order_id: 下单次数相当于下了多少订单(有多少个包含这个商品的订单) ----> 下单次数 buy_num : 下单件数相当于下了多少商品 ----> 下单件数 total_price(无运费)/goods_amount(有运费) 每个商品下单金额指的是订单金额还是商品金额?应该是商品金额(订单中可能会包含其他商品) ----> 下单金额
#被支付次数、被支付件数、被支付金额dwb_order_detail #支付状态的判断 order_state: 只要不是1和7 就是已经支付状态的订单 is_pay: 这个字段也可以 0表示未支付,1表示已支付。#推荐使用这个字段来判断 #次数 件数 金额 order_id ----> 被支付次数 buy_num ----> 被支付件数 total_price ----> 被支付金额
#被退款次数、被退款件数、被退款金额dwb_order_detail #退款的判断 refund_id: 退款单号 is not null的就表明有退款 #次数 件数 金额 order_id ----> 被退款次数 buy_num ----> 被退款件数 total_price ----> 被退款金额
#被加入购物车次数、被加入购物车件数yp_dwd.fact_shop_cart(能够提供购物车相关信息的只有这张表) id: 次数 buy_num: 件数
#被收藏次数yp_dwd.fact_goods_collect id: 次数
#好评数、中评数、差评数yp_dwd.fact_goods_evaluation_detail geval_scores_goods:商品评分0-10分
#如何判断 好 中 差 (完全业务指定) 得分: >= 9 好 得分: >6 <9 中 得分:<= 6 差概况起来,计算商品主题宽表,需要参与计算的表有:
yp_dwb.dwb_order_detail 订单明细宽表
yp_dwd.fact_shop_cart 购物车表
yp_dwd.fact_goods_collect 商品收藏表
yp_dwd.fact_goods_evaluation_detail 商品评价表
知识点04:DWS层搭建—商品主题宽表—step1—下单、支付、退款统计
Section titled “知识点04:DWS层搭建—商品主题宽表—step1—下单、支付、退款统计”- 大前提:使用row_number对数据进行去重
基于dwb_order_detail表根据商品(goods_id)进行统计各个指标的时候;
为了避免同一笔订单下有多个重复的商品出现(正常来说重复的应该合并在一起了);
应该使用row_number对order_id和goods_id进行去重。
- 下单次数、件数、金额统计
基于上述的order_base进行查询
- 支付次数、件数、金额统计
计算支付相关指标之前,可以先使用is_pay进行订单状态过滤
然后基于过滤后的数据进行统计
可以继续使用CTE引导
- 退款次数、件数、金额统计
可以先使用refund_id is not null查询出退款订单,然后进行统计
知识点05:DWS层搭建—商品主题宽表—step2—购物车、收藏统计
Section titled “知识点05:DWS层搭建—商品主题宽表—step2—购物车、收藏统计”思考:为什么下面这两个查询需要考虑拉链的状态?上面的下单、支付等统计为什么不需要?
- 购物车次数、件数统计
-- 购物车次数、件数cart_count as ( select substring(create_time, 1, 10) dt, goods_id sku_id, count(id) cart_count, sum(buy_num) cart_num from yp_dwd.fact_shop_cart where end_date = '9999-99-99' group by substring(create_time, 1, 10), goods_id),- 收藏次数统计
-- 收藏次数favor_count as ( select substring(c.create_time, 1, 10) dt, goods_id sku_id, count(c.id) favor_count from yp_dwd.fact_goods_collect c where end_date='9999-99-99' group by substring(c.create_time, 1, 10), goods_id),知识点06:DWS层搭建—商品主题宽表—step3—好评、中评、差评
Section titled “知识点06:DWS层搭建—商品主题宽表—step3—好评、中评、差评”- 好评、中评、差评次数
根据评分来判断好评、中评、差评
老师上课SQL
Section titled “老师上课SQL”insert into yp_dws.dws_sku_daycountselect * from (with t1 as (select -- 维度字段 dt, goods_id , goods_name , --指标字段 order_id , buy_num , total_price , -- 判断字段: order_state , is_pay , refund_id, refund_state, --去重处理 row_number() over(partition by order_id,goods_id) as rn1 from yp_dwb.dwb_order_detail),t2 as( select dt, goods_id, goods_name, count(distinct order_id) as order_count, -- 下单次数 sum(buy_num) as order_num, -- 下单件数, sum(total_price) as order_amount -- 下单金额 from t1 where rn1 = 1 group by dt,goods_id ,goods_name),
t3 as ( select dt, goods_id, goods_name, count(distinct order_id) as payment_count, -- 被支付次数 sum(buy_num) as payment_num, -- 被支付件数, sum(total_price) as payment_amount -- 被支付金额 from t1 where rn1 = 1 and order_state not in(1,7) and is_pay = 1 group by dt,goods_id ,goods_name),t4 as ( select dt, goods_id, goods_name, count(distinct order_id) as refund_count , -- 被退款次数 sum(buy_num) as refund_num , -- 被退款件数, sum(total_price) as refund_amount -- 被退款金额 from t1 where rn1 = 1 and refund_id is not null group by dt,goods_id ,goods_name),t5 as ( select substr(sc.create_time,1,10)as dt, od.goods_id, goods_name, count(id) as cart_count , -- 被加入购物车次数 sum(sc.buy_num) as cart_num -- 被加入购物车件数, from yp_dwd.fact_shop_cart sc left join yp_dwb.dwb_order_detail od on sc.goods_id = od.goods_id group by substr(sc.create_time,1,10), od.goods_id ,goods_name),
t6 as ( select substr(gc.create_time,1,10) as dt, od.goods_id, goods_name, count(id) as favor_count -- 收藏次数 from yp_dwd.fact_goods_collect gc left join yp_dwb.dwb_order_detail od on gc.goods_id = od.goods_id group by substr(gc.create_time,1,10), od.goods_id,goods_name),
t7 as ( select substr(ged.create_time,1,10) as dt, od.goods_id, goods_name, count(if(geval_scores_goods >= 9,1,null)) as evaluation_good_count, -- 好评数 count(if(geval_scores_goods >= 6 and geval_scores_goods < 9,1,null)) as evaluation_mid_count, -- 好评数 count(if(geval_scores_goods < 6,1,null)) as evaluation_bad_count -- 好评数 from yp_dwd.fact_goods_evaluation_detail ged left join yp_dwb.dwb_order_detail od on ged.goods_id = od.goods_id group by substr(ged.create_time,1,10), od.goods_id,goods_name),t8 as ( select coalesce(t2.dt, t3.dt, t4.dt, t5.dt, t6.dt, t7.dt) as dt, coalesce(t2.goods_id, t3.goods_id, t4.goods_id, t5.goods_id, t6.goods_id, t7.goods_id) as sku_id, coalesce(t2.goods_name, t3.goods_name, t4.goods_name, t5.goods_name, t6.goods_name, t7.goods_name) as sku_name, coalesce(t2.order_count, 0) as order_count, coalesce(t2.order_num, 0) as order_num, coalesce(t2.order_amount, 0) as order_amount, coalesce(t3.payment_count, 0) as payment_count, coalesce(t3.payment_num, 0) as payment_num, coalesce(t3.payment_amount, 0) as payment_amount, coalesce(t4.refund_count, 0) as refund_count, coalesce(t4.refund_num, 0) as refund_num, coalesce(t4.refund_amount, 0) as refund_amount, coalesce(t5.cart_count, 0) as cart_count, coalesce(t5.cart_num, 0) as cart_num, coalesce(t6.favor_count, 0) as favor_count, coalesce(t7.evaluation_good_count, 0) as evaluation_good_count, coalesce(t7.evaluation_mid_count, 0) as evaluation_mid_count, coalesce(t7.evaluation_bad_count, 0) as evaluation_bad_count from t2 full join t3 on t2.dt = t3.dt and t2.goods_id = t3.goods_id full join t4 on t3.dt = t4.dt and t3.goods_id = t4.goods_id full join t5 on t4.dt = t5.dt and t4.goods_id = t5.goods_id full join t6 on t5.dt = t6.dt and t5.goods_id = t6.goods_id full join t7 on t6.dt = t7.dt and t6.goods_id = t7.goods_id)select dt, sku_id, sku_name, sum(order_count), sum(order_num), sum(order_amount), sum(payment_count), sum(payment_num), sum(payment_amount), sum(refund_count), sum(refund_num), sum(refund_amount), sum(cart_count), sum(cart_num), sum(favor_count), sum(evaluation_good_count), sum(evaluation_mid_count), sum(evaluation_bad_count)from t8group by dt,sku_id,sku_name)insert into hive.yp_dws.dws_sku_daycountselect * from (with order_t1 as ( select -- 维度字段 dt, goods_id , goods_name , --指标字段 order_id , buy_num , total_price , -- 判断字段: order_state , is_pay , refund_id, refund_state, --去重处理 row_number() over(partition by order_id,goods_id) as rn1 from hive.yp_dwb.dwb_order_detail),-- 下单次数、下单件数、下单金额order_t2 as( select dt, goods_id, goods_name, count(order_id) as order_count, sum(buy_num) as order_num, sum(total_price) as order_amount from order_t1 where rn1 = 1 group by dt,goods_id,goods_name),-- 被支付次数、被支付件数、被支付金额payment_t3 as ( select dt, goods_id, goods_name, count(order_id) as payment_count, sum(buy_num) as payment_num, sum(total_price) as payment_amount -- 过滤出已经支付的订单数据: order_state 不能为 1和7 并且 is_pay =1 from order_t1 where rn1 = 1 and order_state not in(1,7) and is_pay = 1 group by dt,goods_id,goods_name),-- 被退款次数、被退款件数、被退款金额refund_t4 as ( select dt, goods_id, goods_name, count(order_id) as refund_count, sum(buy_num) as refund_num, sum(total_price) as refund_amount -- 过滤出已经退款的订单数据: refund_id is not null 并且 refund_state = 5 from order_t1 where rn1 = 1 and refund_id is not null and refund_state = 5 group by dt,goods_id,goods_name),-- 被加入购物车次数、被加入购物车件数cart_t5 as ( select substr(sc.create_time ,1,10) as dt, sc.goods_id , g.goods_name , count(sc.id) as cart_count, sum(sc.buy_num) as cart_num from hive.yp_dwd.fact_shop_cart sc left join hive.yp_dwb.dwb_goods_detail g on sc.goods_id = g.id group by substr(sc.create_time ,1,10) ,sc.goods_id ,g.goods_name),-- 被收藏次数favor_t6 as( select substr(gc.create_time,1,10) as dt, gc.goods_id , g.goods_name , count(gc.id) as favor_count from hive.yp_dwd.fact_goods_collect gc left join hive.yp_dwb.dwb_goods_detail g on gc.goods_id = g.id group by substr(gc.create_time,1,10), gc.goods_id ,g.goods_name),-- 好评, 中评 和 差评evaluation_t7 as ( select substr(ed.create_time,1,10) as dt, ed.goods_id , g.goods_name, count( if(ed.geval_scores_goods is null or ed.geval_scores_goods >8 , ed.id, null) ) as evaluation_good_count, count( if(ed.geval_scores_goods is not null and ed.geval_scores_goods between 6 and 8, ed.id, null) ) as evaluation_mid_count, count( if(ed.geval_scores_goods is not null and ed.geval_scores_goods < 6, ed.id, null) ) as evaluation_bad_count
from hive.yp_dwd.fact_goods_evaluation_detail ed left join hive.yp_dwb.dwb_goods_detail g on ed.goods_id = g.id group by substr(ed.create_time,1,10), ed.goods_id ,g.goods_name),t8 as( select coalesce(t2.dt,t3.dt,t4.dt,t5.dt,t6.dt,t7.dt) as dt, coalesce(t2.goods_id,t3.goods_id,t4.goods_id,t5.goods_id,t6.goods_id,t7.goods_id) as sku_id, coalesce(t2.goods_name,t3.goods_name,t4.goods_name,t5.goods_name,t6.goods_name,t7.goods_name) as sku_name,
coalesce(t2.order_count,0) as order_count, coalesce(t2.order_num,0) as order_num, coalesce(t2.order_amount,0) as order_amount,
coalesce(t3.payment_count,0) as payment_count, coalesce(t3.payment_num,0) as payment_num, coalesce(t3.payment_amount,0) as payment_amount,
coalesce(t4.refund_count,0) as refund_count, coalesce(t4.refund_num,0) as refund_num, coalesce(t4.refund_amount,0) as refund_amount,
coalesce(t5.cart_count,0) as cart_count, coalesce(t5.cart_num,0) as cart_num,
coalesce(t6.favor_count,0) as favor_count,
coalesce(t7.evaluation_good_count,0) as evaluation_good_count, coalesce(t7.evaluation_mid_count,0) as evaluation_mid_count, coalesce(t7.evaluation_bad_count,0) as evaluation_bad_count
from order_t2 t2 full join payment_t3 t3 on t2.dt = t3.dt and t2.goods_id = t3.goods_id full join refund_t4 t4 on t2.dt = t4.dt and t2.goods_id = t4.goods_id full join cart_t5 t5 on t2.dt = t5.dt and t2.goods_id = t5.goods_id full join favor_t6 t6 on t2.dt = t6.dt and t2.goods_id = t6.goods_id full join evaluation_t7 t7 on t2.dt = t7.dt and t2.goods_id = t7.goods_id)select dt, sku_id, sku_name, sum(order_count) as order_count, sum(order_num) as order_num, sum(order_amount) as order_amount, sum(payment_count) as payment_count, sum(payment_num) as payment_num, sum(payment_amount) as payment_amount, sum(refund_count) as refund_count, sum(refund_num) as refund_num, sum(refund_amount) as refund_amount, sum(cart_count) as cart_count, sum(cart_num) as cart_num, sum(favor_count) as favor_count, sum(evaluation_good_count) as evaluation_good_count, sum(evaluation_mid_count) as evaluation_mid_count, sum(evaluation_bad_count) as evaluation_bad_count
from t8 group by dt,sku_id,sku_name)DWS层搭建—用户主题宽表—需求分析
Section titled “DWS层搭建—用户主题宽表—需求分析”- 主题需求
- 指标
登录次数、收藏店铺数、收藏商品数、加入购物车次数、加入购物车金额、下单次数、下单金额、支付次数、支付金额- 维度
用户、日期- 建表操作
注意:建表操作需要在hive中执行,presto不支持hive的建表语法。
- 确定字段与表关系
#登录次数yp_dwd.fact_user_login id login_user
#收藏店铺数、收藏商品数yp_dwd.fact_store_collect id user_id
#加入购物车次数、加入购物车金额yp_dwd.fact_shop_cartyp_dwb.dwb_goods_detail #因为购物车中没有金额,因此需要和商品详情表进行关联 goods_promotion_price: 商品促销价格(售价)
#下单次数、下单金额yp_dwd.fact_shop_orderyp_dwd.fact_shop_order_address_detail #通过订单主副表可以提供
#支付次数、支付金额yp_dwd.fact_trade_record- 最终实现合并的方式
- 使用full join合并
用户主题的统计宽表,作为项目练习由大家自己完成。
所用到的技术点之前已经完成铺垫完毕。
----------------------------------课堂代码----------------------------------insert into yp_dws.dws_user_daycountselect * from(-- 登录次数with login_count as ( select login_user as user_id, count(id) as login_count, dt from yp_dwd.fact_user_login group by login_user, dt),-- 店铺收藏数store_collect_count as ( select user_id, count(id) as store_collect_count, substring(create_time, 1, 10) as dt from yp_dwd.fact_store_collect where end_date='9999-99-99' group by user_id, substring(create_time, 1, 10)),-- 商品收藏数goods_collect_count as ( select user_id, count(id) as goods_collect_count, substring(create_time, 1, 10) as dt from yp_dwd.fact_goods_collect where end_date='9999-99-99' group by user_id, substring(create_time, 1, 10)),
-- 加入购物车次数和金额cart_count_amount as ( select count(cart.id) as cart_count, sum(coalesce(g.goods_promotion_price,0)) as cart_amount, buyer_id as user_id, substring(cart.create_time, 1, 10) as dt from yp_dwd.fact_shop_cart cart, yp_dwb.dwb_goods_detail g where cart.end_date='9999-99-99' and cart.goods_id=g.id group by buyer_id, substring(cart.create_time, 1, 10)),-- 订单宽表,为后面的下单次数和金额准备数据order_base as ( select buyer_id, create_date, order_id, order_amount, row_number() over(partition by orderd.order_id) rn from yp_dwb.dwb_order_detail orderd where orderd.is_valid=1 ), -- 下单次数和金额 order_count_amount as ( select o.buyer_id user_id, substring(o.create_date,1,10) dt, count(o.order_id) order_count, sum(coalesce(o.order_amount,0)) order_amount from order_base o where o.rn=1 group by o.buyer_id, substring(o.create_date,1,10) ), -- 支付次数和金额payment_count_amount as ( select count(id) as payment_count, sum(coalesce(order_amount,0)) as payment_amount, create_user user_id, substring(create_time, 1, 10) as dt from yp_dwd.fact_shop_order_address_detail where is_valid = 1 and pay_time is not null and end_date='9999-99-99' group by create_user, substring(create_time, 1, 10)),
fulljoin as ( select coalesce(lc.dt, scc.dt, gcc.dt, cc.dt, oc.dt, pc.dt) dt , coalesce(lc.user_id, scc.user_id, gcc.user_id, cc.user_id, oc.user_id, pc.user_id) user_id , coalesce(login_count,0) login_count , coalesce(store_collect_count,0) store_collect_count , coalesce(goods_collect_count,0) goods_collect_count , coalesce(cart_count,0) cart_count , coalesce(cart_amount,0) cart_amount , coalesce(order_count,0) order_count , coalesce(order_amount,0) order_amount , coalesce(payment_count,0) payment_count , coalesce(payment_amount,0) payment_amount from login_count lc full join store_collect_count scc on lc.dt=scc.dt and lc.user_id=scc.user_id full join goods_collect_count gcc on lc.dt=gcc.dt and lc.user_id=gcc.user_id full join cart_count_amount cc on lc.dt=cc.dt and lc.user_id=cc.user_id full join order_count_amount oc on lc.dt=oc.dt and lc.user_id=oc.user_id full join payment_count_amount pc on lc.dt=pc.dt and lc.user_id=pc.user_id)
selectdt,user_id,-- 登录次数sum(coalesce(login_count,0)) login_count,-- 店铺收藏数sum(coalesce(store_collect_count,0)) store_collect_count,-- 商品收藏数sum(coalesce(goods_collect_count,0)) goods_collect_count,-- 加入购物车次数和金额sum(coalesce(cart_count,0)) cart_count,sum(coalesce(cart_amount,0)) cart_amount,-- 下单次数和金额sum(coalesce(order_count,0)) order_count,sum(coalesce(order_amount,0)) order_amount,-- 支付次数和金额sum(coalesce(payment_count,0)) payment_count,sum(coalesce(payment_amount,0)) payment_amountfrom fulljoingroup by dt, user_id)----------------------------------参考代码----------------------------------insert into yp_dws.dws_user_daycount-- 登录次数with login_count as ( select login_user as user_id, count(id) as login_count, dt from yp_dwd.fact_user_login group by login_user, dt),-- 店铺收藏数store_collect_count as ( select user_id, count(id) as store_collect_count, substring(create_time, 1, 10) as dt from yp_dwd.fact_store_collect where end_date='9999-99-99' group by user_id, substring(create_time, 1, 10)),-- 商品收藏数goods_collect_count as ( select user_id, count(id) as goods_collect_count, substring(create_time, 1, 10) as dt from yp_dwd.fact_goods_collect where end_date='9999-99-99' group by user_id, substring(create_time, 1, 10)),
-- 加入购物车次数和金额cart_count_amount as ( select count(cart.id) as cart_count, sum(coalesce(g.goods_promotion_price,0)) as cart_amount, buyer_id as user_id, substring(cart.create_time, 1, 10) as dt from yp_dwd.fact_shop_cart cart, yp_dwb.dwb_goods_detail g where cart.end_date='9999-99-99' and cart.goods_id=g.id group by buyer_id, substring(cart.create_time, 1, 10)),-- 订单宽表,为后面的下单次数和金额准备数据order_base as ( select buyer_id, create_date, order_id, order_amount, row_number() over(partition by orderd.order_id) rn from yp_dwb.dwb_order_detail orderd where orderd.is_valid=1 ), -- 下单次数和金额 order_count_amount as ( select o.buyer_id user_id, substring(o.create_date,1,10) dt, count(o.order_id) order_count, sum(coalesce(o.order_amount,0)) order_amount from order_base o where o.rn=1 group by o.buyer_id, substring(o.create_date,1,10) ), -- 支付次数和金额payment_count_amount as ( select count(id) as payment_count, sum(coalesce(order_amount,0)) as payment_amount, create_user user_id, substring(create_time, 1, 10) as dt from yp_dwd.fact_shop_order_address_detail where is_valid = 1 and pay_time is not null and end_date='9999-99-99' group by create_user, substring(create_time, 1, 10)),
fulljoin as ( select coalesce(lc.dt, scc.dt, gcc.dt, cc.dt, oc.dt, pc.dt) dt , coalesce(lc.user_id, scc.user_id, gcc.user_id, cc.user_id, oc.user_id, pc.user_id) user_id , coalesce(login_count,0) login_count , coalesce(store_collect_count,0) store_collect_count , coalesce(goods_collect_count,0) goods_collect_count , coalesce(cart_count,0) cart_count , coalesce(cart_amount,0) cart_amount , coalesce(order_count,0) order_count , coalesce(order_amount,0) order_amount , coalesce(payment_count,0) payment_count , coalesce(payment_amount,0) payment_amount from login_count lc full join store_collect_count scc on lc.dt=scc.dt and lc.user_id=scc.user_id full join goods_collect_count gcc on lc.dt=gcc.dt and lc.user_id=gcc.user_id full join cart_count_amount cc on lc.dt=cc.dt and lc.user_id=cc.user_id full join order_count_amount oc on lc.dt=oc.dt and lc.user_id=oc.user_id full join payment_count_amount pc on lc.dt=pc.dt and lc.user_id=pc.user_id)
selectdt,user_id,-- 登录次数sum(coalesce(login_count,0)) login_count,-- 店铺收藏数sum(coalesce(store_collect_count,0)) store_collect_count,-- 商品收藏数sum(coalesce(goods_collect_count,0)) goods_collect_count,-- 加入购物车次数和金额sum(coalesce(cart_count,0)) cart_count,sum(coalesce(cart_amount,0)) cart_amount,-- 下单次数和金额sum(coalesce(order_count,0)) order_count,sum(coalesce(order_amount,0)) order_amount,-- 支付次数和金额sum(coalesce(payment_count,0)) payment_count,sum(coalesce(payment_amount,0)) payment_amountfrom fulljoingroup by dt, user_id ; -- order by dt, user_id ; 发布于 2022-07-14