USA LaoTu 高薪转行做数据第一平台

只要你有颗求变的心并愿意行动,到这里来,必使你得改变!成千聪明华人循此捷径高薪进入大公司! => 同学留言 => Topic started by: jingzzsaccount on April 07, 2013, 10:11:31 AM

Title: Timeline of building a datawarehouse
Post by: jingzzsaccount on April 07, 2013, 10:11:31 AM
Hi Laotulaoshi and experienced BI classmates,

I have started work for 4 weeks. I like my coworkers and I like my work. I have been familiarizing my company's database for the past 4 weeks. Now, they want me to build OTAP for them. I need to come up with a timeline next week. The problem is that their OLTP data are really dirty, there are a lot of bad data (e.g., duplicate records, lots of typos, missing values, some attributes have inconsistent reference integrity) and I am not sure how much time is needed to clean the OTLP. Also, I am not sure how long it will take to obtain all the business requirements for different users. Whether data cleaning and meeting with business users should be conducted parallelly or sequentially? The company also have a lot of server problems, there has been a lot of disruptions in accessing to programs and files at work, which makes it more difficult to predict how long it will take to complete the first cube.
I would appreciate if any of you can share your experience.

Thanks in advance,
Jing 
Title: Re: Timeline of building a datawarehouse
Post by: guoz100 on April 08, 2013, 04:26:36 AM
Whether data cleaning and meeting with business users should be conducted parallelly or sequentially?

Well, that depends on the priority. And the priority is decided by your boss. If a user has a reporting project with coming deadline, then it is more important to meet with the user, clean the data for his report first, then finish the project before the deadline. So the first thing I will do is to collect all the deadlines from different users and different projects. Then give the deadline list to my boss. Let him know that there are a lot of dirty data in the database and we need to clean those tables first to produce the reports. Let him know how long I think it needs to clean the dirty data. (You can estimate the time by doing some test.) Then let me decide what should be done first. Ask him to decide whether you should do it sequentially or parallely.

If there is always connection disruptions from the server side, then you need longer time than you estimate to clean the dirty data, you also need to let your boss know. He needs to know that it is not your fault or it is not due to your efficiency. Then give him an updated estimated time for you to complete the work. Ask for extension if there is no way aviod it.

Basically, don't panic. So far I don't see anything that is your fault.
Title: 我做的研究Re: Timeline of building a datawarehouse
Post by: dandan2 on April 08, 2013, 07:31:18 AM
推荐Jingzz读以下的DW in 4 Steps Link:
http://dwjunkie.wordpress.com/2011/06/07/a-data-warehouse-in-4-steps/
有用的地方很多,如:
1)Dimension Modeling 就像 Mind Mapping, 要arrange WORKSHOP with Business dept users first!!! (这里说的不是一个电话,一个会议能搞定的,要WORKSHOP,就是全方位的多layer的跟users套,把他/她们的所有要求(将来要分析的needs全弄懂)。所以说BI=MBA+DBA一点也不为过!不要看Source system(不被它牵制), focus on how do I want it to be????
2)Star schema generation: ERD 驾到!a Good Point: Do a PROTOTYPE for business users. so they can look at their Data Warehouse at an early stage, (keep them involved!) 这样你跟老板汇报时,就如鱼得水了
3) Data MAPPING (contrast to step 1 Mind Mapping), 一定记住要参考Kimball group's DDD Worksheet(Detail Design Dimensional Worksheet) 哟,哇,所有你想到的,想不到的全给你了(每一个sheet是一个table, see attached example)
This step is most difficult: b/c: diff.name could mean same things; diff data type; data on diff level of aggregation; data need to be calculated; data may not in the source system......(你把所想到的困难全列出来,问问小组其他人,每个困难大概要多长时间,then you will get an estimate of time frame. (side note: Even when Consultant quote client's job, they need to follow company timeline sheet, so check with your company/director/coworker, see if they have a timeline idea (or better yet documentation),  for steps that you come up...)
4) Build your LEGO: cubes and reports, step3 和 4 可以parallel。

Title: Re: Timeline of building a datawarehouse
Post by: jingzzsaccount on April 08, 2013, 05:41:00 PM
Thanks Dandan and Guoz for your suggestions!

DanDan- thanks for the link, you are very resourceful! I will study the topic more carefully.

Happy Monday!
Jing
Title: Re: Timeline of buildidatawarehouseng a
Post by: s2012 on May 22, 2014, 08:18:26 PM
哪里下载“detailed dimensional design worksheet” provided by the Kimball Group?

谢谢!
Title: Re: Timeline of building a datawarehouse
Post by: Sweetnothing on May 22, 2014, 08:36:53 PM
http://www.kimballgroup.com/data-warehouse-business-intelligence-resources/books/data-warehouse-dw-lifecycle-toolkit/

chapter 7