Oct 26, 2009

The project wiki - a cost reduction tool

Some readers will be familiar with TED: Ideas Worth Spreading a series of talks on just about everything worthwhile in Technology, Education and Design. I recently revisited Yochai Benklers talk on the new open-source economics from 2005 where he explains how collaborative projects like Wikipedia and Linux represent the next stage of human organisation.

The principle discussed is one it is more productive for a large number of people to work collaboratively on producing content than it is for a single individual to do so.

In Business Intelligence projects there are a number of formal documents that need to be produced to satisfy constraints external to the project (e.g. company business processes). Data Warehousing projects often fail to satisfy two masters, firstly failing to record information and knowledge pertinent to the on-going success of the project and secondly producing too many 'formal' documents that are time consumingly produced by key individuals but of little value to the project itself.

Data Management & Warehousing, my company, uses something called Project Services that combines Trac, SVN and the ideas of the Data Warehouse Documentation Roadmap to combine a Wiki, Version Control, Ticketing and Project/Team Website. Bringing these together and using them optimally is a micro-example of the concepts in Yochai Benklers talk.

A data warehouse team can quickly and efficiently build wiki pages that record much, if not all of the information that is internal to the project. This can include the business definition dictionary, various pieces of the requirements, architecture, analysis, standards and definitions, etc. What is more in this environment it is possible to have links between wiki pages and documents and between wiki pages and the source code itself. This means that users of the system can quickly and intuitively navigate through information and, where there is missing information or inaccuracies they can fix it themselves. In the case of project services there is also a ticketing system for task, risk, issue, enhancement tracking built in to complete the project management and governance aspects. Because this approach makes it easier for users they are more willing to contribute to the overall solution and to follow the required processes.

Implementing and using a tool such as project services is key to creating a BI-on-Rails solution because whilst the strict version control and issue management can be enforced it also allows agile processes to work and strong internal communications within the team.
Share/Save/Bookmark

Oct 22, 2009

Oh what a tangled web we weave, when first we practise to deceive!

The deception in our case is the accuracy of the data and the return on investment of the data warehouse solution. Too often projects make compromises in the implementation of a business intelligence solution that create massive cost and user dis-satisfaction downstream that lead to the failure of BI projects:

For example:

* In the project management setting un-realistic or un-deliverable targets, opting for tactical solutions that soon overtake the main project and the failure to communicate issues to senior management and business users that will affect the delivery timescales and costs.

* In the requirements phase believing that you know better that the user and not getting their sign off that they understand and agree with the requirements.

* In the analysis phase the failure to correctly identify the master sources of information and to do sufficient work to understand how the data is stored in the source system and what data is needed to meet the requirements.

* In the design phase the expediency of allowing the developers to 'get on with their job' by not thoroughly validating the design and checking the sources to ensure that the design meets criteria such as timeliness, reliability and accuracy.

* In the build phase by coding to the bear minimum to get the job done and not to a standard that ensures it will run successfully time after time.

* In the testing phase where as well as the standard data and volume data you test all the data boundary conditions that might occur.

* In the data quality aspects of the programme where data is fixed on the way in to the system or after it has been loaded because it is 'too difficult' to fix it back at the source system.

These are only a few examples but each of them makes the final solution more difficult (a tangled web) and therefore more costly to operate and maintain. If you are in a project where doing the right thing is compromised then remember that the project will have to pay for these 'efficiencies' later. Project managers should always strive to improve any project lifecycle and deliver on time. Too often, however, it becomes expedient to bury the truth and deliver anything to meet an arbitrary date that, in the long term, leaves the business user unsatisfied.

For more on this subject have a look at:

* Black Swans and White Elephants - RoI in Business Intelligence

* Getting to NO

* Technical Debt

By the way the quote is often mistakenly attributed to William Shakespeare but was written by Sir Walter Scott (1771 - 1832) in his poem Marmion - another data quality problem!
Share/Save/Bookmark

Oct 14, 2009

BI Convention over Configuration

The convention over configuration discussion when designing a data warehouse often leads to fanatical discussions by technical people over the 'best' approach to do things.

Convention can be used to define the standard way to design a data model (see Process Neutral Data Modelling) or to implement an ETL transformation (for example a reducing sets approach) amongst other things but inevitably someone will claim to be able to make an individual element 'better' bet by configuration.

For example if we have an ETL mapping and always approach handling data change in a certain way then unless we are really sure of the benefits we should stick to the convention. In our example a DBA offers to make a single ETL mapping faster. The DBA can halve the time - so what is the harm?

Firstly if the new algorithm is as complete but just faster then there is no harm. In fact I would encourage you to update your conventions and re-factor your code so that all of the mappings exploit the algorithm.

But what if the faster algorithm works because it omits some element? Maybe it doesn't need to check all the columns, perhaps there is an assumption of referential integrity, or it joins two steps together where other tables need the intermediate results.

This is where the cost creeps back in. In the future the omitted column is needed, referential integrity is no longer true, the intermediate step is required, etc. At best this causes un-necessary rework. At worse it causes silent corruption of the data that causes a crash or takes months to unravel. It also means that because it does not conform to the standard the person fixing the code has to work through the code to understand it first, and assuming they understand it perfectly, only then they can then fix it - all of this taking more resource time and cost.

The convention over configuration option does not mean that there is only one way to do something, nor does it mean that it is optimal for the individual case - it means that there are a defined set of algorithms for a given process and that these are collectively optimal for the solution - a case of looking at the whole picture rather than individual elements. In the process neutral data modelling technique it will often create a data model with over 1000 tables but there are 10 named approaches to the data modelling issue. How many ETL algorithms will you need? My guess is that it is in the same order of magnitude, about 100:1 against the total number or transformations. What is more the number of ETL transformation algorithms required drops dramatically if the data model is uniform because it is easy to describe a standard ETL algorithm for a standard data model.

Returning to the start of this article - your DBA can do this mapping better? But in what sense? Is it more cost-effective, more maintainable? Easier to understand, or cheaper to maintain? The decision to go the bespoke route for this one case should balance these factors and their costs against the necessity of the performance gain.

Experience tells me that in 99.9% of cases bespoke is not worthwhile in the long term.
Share/Save/Bookmark

Oct 13, 2009

We've got a new puppy - we've called him ETL!

Monday ...
Our house is full of optimism today - we've been out and spent an absolute fortune on a pedigree pup. We were really impressed by the salesman at the shop. He offered something to each family member. Mum (our family manager and my wife) was sold on the the cost efficiency. We won't need to install an alarm system to protect our house (we live at Data Quality Villa, on Single Version Of The Truth Avenue) and he can be taught to fetch and carry just about anything we need. Our daughter and oldest child Elizabeth-Ann, known as Busy-Ann to her friends, liked the way it was easy to train him to do exactly what she wanted. My son Dev is just grateful to have a new play thing. He's been talking about going on long hot summer days hacking through the country side - so that's him out of my hair. And me ? Thomas Archibald (T.Arch. to my friends), well I'm pleased that all the research I did on the web before hand has paid off - the salesman told me I had paid the right price for the right pup. I must admit that I enjoyed all those trips off to Dog Shows to evaluate different breeds and meet other owners - who would have thought you would have to travel to so many exotic places to see different dogs?

Tuesday ...
So Mum's a little bit upset this morning - I don't know why - I told here there would be a learning curve with this breed before he did anything - but that it would be worth it in the end. She said I was just looking for the latest accessory, but a dog is not just an accessory its for life! Anyway Dev is cleaning up the little accidents in the utility room (you know where all the machines are kept) and says that he is starting to get ETL to do what he wanted. He's mentioned that he has started to learn the same language at ETL - I don't know why they expected it to be as easy as dragging a toy or dropping a bone but there you go.

Wednesday ...
ETL is settling in well - he's taken over an entire corner in the utility room and when he lies in his bed paws and ears hang over the edge looking cute. Mum was moaning again about the little accidents. She says it's taking too much time but I don't see the problem - Dev and his girlfriend DeBrA clean them up so Mum really doesn't have to do anything to worry about. It was quite funny when I got home tonight, Dev had all his mates around, DeBrA was there along with the twins Si Admin and Nat Admin eating cold pizza, they were all chatting about how ETL just wasn't very quick at learning new things or at fetching things when he has learnt them. The good news is that ETL is finally fetching something - he brought me my fluffy slippers when I got home tonight.

Thursday ...
Mum is still unhappy about the wasted time looking after ETL - so I popped in to the shop where I got him and asked for some advice. They sent me to the local vets - who can cure anything the salesman said. I popped in on the way home and met a couple of the firms partners. 'SI Partners' is a big company with 'global reach' and a 'local presence' so I was delighted that they said they could help. I've had to take out an insurance policy with lots of small print about exemptions and things that aren't covered by the standard contract but that's normal for these sort of things. One of the partners is coming round tomorrow to do a review and let us know what we need to do.

Friday ...
Mum's still unhappy (how did we know that was coming?) but the partner came around early and set our minds to rest. We're missing the obvious, we didn't get the accessories - there are a complete range of toys, bells and whistles made especially for this breed! We got a leash to keep him under control when we bought him but we haven't used it (according to the partner that is why ETL doesn't have any discipline) and we didn't think the extras would be needed but there you go, the only downside is that the extras cost almost as much as the dog did. The partner is also sending around a couple of practice nurses to help keep an eye on things. It would have all been expensive but the for call-off insurance (he said something about the extras and the nursing not being covered but we will sort that out)

Saturday ...
Still not got ETL trained. Busy-Ann is unhappy because a simple change seems to cause so many problems. ETL could fetch my old slippers but doesn't seem to understand that they are now redundant as I got some new ones for my birthday and so I really want him to fetch my new ones. The old ones were furry and he seemed to like those but he doesn't get on with these new leather ones and has chewed the corners off the toes. It's pretty bad news as fetching the slippers is the only thing we've got him to do so far and I felt the toes were an integral part of the slipper!

Sunday ...
Didn't sleep too well last night. There was a lot of noise in the middle of the night and I went down stairs to find Dev howling at the moon in the garden. I asked him what he was up to and he said that since he had learnt ETLs language he found it easier to bark himself! Eventually I persuaded him to go to bed. I overslept because of the disturbance and when I got downstairs I found a note from my wife to say she had left and taken Dev with her. She said that since we got the dog our house had become the dirtiest on the street, not good for a house called 'Data Quality Villas'. She was going somewhere where they would listen to her and appreciate Dev for his new found skills. I don't know what the divorce settlement will be like but it is bound to be expensive. Busy-Ann has stayed with me but she won't have anything to do with ETL. Oh! the dog, that's right, that's why you are reading this - he's fine, but you would be too if you had round the clock nursing from SI practice, he's spread out too and has taken over the whole of the utility room. If I didn't know better I would say all those extras and no exercise is making him a little bloated and he is still fetching the wrong slippers.

Monday ...
New week, fresh start. I got a new girlfriend - you might think it is too soon but I was lonely - she's not permanent or anything and co-incidentally she's got a son called Dev too. They've moved in and are helping out with ETL. Between the nurses and Dev 2 (our little joke - he's the second Dev we've had in the house you see) ETL has at least got round to bringing me my new slippers! I've asked a few friends about their dogs. Most of them have pedigree dogs too, some have tried several breeds over the years, most have had about the same results. It just goes to show that it was nothing we did that was wrong, it's just how it is with dogs. There was one guy who had a simple mongrel which he called Michelle Script - a strange name if you ask me - but he was delighted. He said he started with lower expectations, looked after the dog well and got lots of love and devotion back - but I said that I couldn't be seen with a mongrel, what would my friends at the Analyst Club think? I've decided to have a look at other breeds again, I'll let you know how I get on.
Share/Save/Bookmark

Oct 11, 2009

Version Control on Rails

In my consultancy assignments, I usually work in large organisations that have outsourced IT departments and very formal processes for everything. One aspect of this is version control.

In such an organisation every document contains a long table near the beginning with a row for every change, who made the change and when it was made and of course there will be a version number for every change. Usually the version number has at least 2 parts.

Also, the file name will have the version number in it, such as: "Some Formal Document v1.3a.doc".

Traditionally (over the last 15 years or so) the files are stored in a Windows share. There will also be copies on local drives on various individuals' PCs. How often do we find different people have different versions and they all believe them to be the latest?

Now, many organisations are waking up to the benefits of version control systems. However, they usually do two stupid things. The first is that they buy a version control system and the second is that they use their expensive version control system as if it were a Windows share.

You don't need to buy a version control system. You almost certainly don't need to buy a "Software Configuration Management" system, nor do you need to buy an "Application Lifecycle Management" system. The best version control systems are free and they are called Subversion and CVS. Using these tools is easy and simple. Discipline is necessary to get the best out of them, but you don't need a massive amount of training. If you are really keen, you could read Pragmatic Version Control using Subversion. (You are really keen, aren't you?)

While you're reading the book you could minimise your chances of going off the rails by following the simple recommendations listed below:

  • Let the version control system do the work!
  • Remember that the version in the version control system is the master version; any file on your local disk is not the master until you check it in to the version control system.
  • Do not change the name of a file from one version to another; all versions of one file go in the same folder, under the same name (then you can see the change history).
  • If you do need to change the name of a file or folder, do it in the the version control system client, not on the working file on your local disk. (If you do it in the version control system it will change the working file for you and everything will stay in sync.)
  • Never, never, never put a version number, or date, or time in a filename - this is a version control system, not a Windows share!
  • Check in atomic units – one component per file - do not check in zip files or tar files
  • The version control system version information is trusted over any version information recorded inside a file - so do not record it inside the file.
So what has this got to do with Rails? Well, there is the DRY principle, to start with:

Every piece of knowledge in the system should be expressed in just one place.

The version control system keeps a log of your versions, so don't keep an incomplete and out-of-date one in the document - delete that big table from the front of your document - you really do not need it. And do not put the version number in the document or in the file name. This is just repeating yourself and if you do that you will end up with many conflicting versions of the truth.

Every time someone works with the wrong version of a document, they waste company resources. Don't tolerate it.
Share/Save/Bookmark

Oct 8, 2009

Agile development of a BI application

Wandering around the web I found a presentation by Soren Burkhart of Hawaii Business Consulting, LLC given at the Rails Conference in Portland, Oregon (May 2007) that describes a classic agile approach to developing a Business Intelligence application for the Office of Foreign Assets Control ("OFAC") of the US Department of the Treasury

It demonstrates the collecting of data from sources and manipulating it into a presentation format. Along the way he interacts with the customer to determine look and feel and builds test cases for now and to allow re-testing when the sources inevitably change. A great little demo of fast and maintainable (rather than quick and dirty) BI reporting development - all in just 45 minutes.

It is an enclosure to this article or you can download it from http://www.hawaiibcllc.com/BIWR.pdf
Share/Save/Bookmark

Can Business Intelligence Be Agile?


The agile manifesto and it's underlying principles set out four key values for developing software solutions:

Individuals and interactions over processes and tools
Working software over comprehensive documentation
Customer collaboration over contract negotiation
Responding to change over following a plan

If an organisation is willing to accept these values for software development then how much more important should adoption of them be for the development of business intelligence solution upon which critical business decisions are made.

Furthermore business intelligence projects do not see themselves as 'developing software' because they use tools to generate ETL and tools to generate reports. Management teams also shy away from the idea that their business intelligence teams are developing software because they have long since moved away from in-house development to implementing packages. This also goes some way to explaining why management are so keen to buy Business Intelligence packages.

The reality is that ALL business intelligence projects are producing a massive amount of software. This software mainly consists of interfaces between systems (ETL) and interfaces with users (the reporting tools). Because the system is not aligned to a single business process individuals use the system in ad-hoc rather than structured ways. The BI system is also the recipient of information from all the source systems and therefore is constantly subjected to change that is outside it's control.

However the organisations business intelligence system is built the reality is that there will be millions of lines of code, massive amounts of change and demanding users doing things in unexpected ways.

The question that should be asked is not 'Can Business Intelligence Be Agile?' but 'Developing Business Intelligence solutions requires an agile process - can my company organise itself to successfully use agile methods to develop the required solution?'


Share/Save/Bookmark