Re: Normalization and AVG across columns
- From: Bernard Peek <bap@xxxxxxxxxx>
- Date: Wed, 1 Mar 2006 18:16:12 +0000
In message <1141227436.798705.126830@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>, brianc@xxxxxxxxxxxxxx writes
I have a table that holds the employee id, a job name and then 5
columns (Proposal Phase Rating, Preconstruction Phase Rating, etc.)
that hold a number rating 1-5. I've read that this design may not be
considered "normalized" due to the fact that I will need to get an
average (Job Average) across the columns. My question(s) is:
This is a bit of database theory that covers normalisation (I'm in the UK, I use UK spelling.) Forget that you are using Access for a while, because Access 12 might have new features.
You've done the difficult part and realised that if you store the average value in that table then it won't be normalised. The next thing to decide is whether that really matters to you. Sometimes it's OK to have denormalised data, as long as you recognise that and are prepared for the consequences.
Normalisation can cause problems in any of the fundamental data operations; add, modify, delete. If you understand what can happen, and are prepared for it, denormalisation is OK. It usually requires some extra programming to keep the dependent data synchronised.
In this particular instance a delete operation doesn't require any special action because you can delete the individual and average data simultaneously. They are all held in the same record, there's no risk.
Adding a record has a risk, you need to be sure that when the record is created the average field is filled with the correct value. You also need to consider pathological conditions. What is the average if one of the contributing values is null? Is this an error? Should you use the average of the values you have, or return a null average? It's your data, you need to answer these questions before you commit to a database design.
Amending a record has similar risks. If you amend one of the contributing figures you have to be sure that the average is recalculated and stored before anyone else can read the wrong data. Just as when adding a record you need to consider what to do with null data.
How you resolve these issues depends on the database implementation and the performance requirements.
You could store only the contributing figures and calculate the average on the fly each time. You keep the database normalised but at the cost of having to repeat the same calculation each time you read the data. If the data is written and never modified this is wasteful. (If your database supports them you could create a view (a Query in Access) and include the average as a calculated field.)
If your database supports them you could use triggers to recalculate the average and store it whenever a record is added. If it doesn't (The Jet database in Access doesn't) you can write some middleware code and have your applications talk to the middleware instead of talking direct to the database.
1) Should I break this table down differently, having 1 column for
EmpID, 1 column for Job Name, 1 column for "Rating Type" and 1 column
for "Rating Score"? If I do alter the table to reflect that changes in
the previous sentence, doesn't that provide for extra data entries (ie
Job name and EmpID have to be entered in each time a phases score is
entered rather than each time a new job is entered).
2) Unrelated to the questions above, I have a table with more than 40
columns called Emp_Main. This table holds 1 record for each employee
and includes emergency contact info, benefit info, etc. There will be
less than 300 records in this table. Is it better for me to break the
table down so that Contact Info, Benefit Info, etc are their own table?
Is it better to have 1 large table or many small ones?
If you have sensitive data like emergency contact data then you need to keep it safe. That's probably easier if you have fewer tables. It's also easier to extract the data your management need to have at home in case the office burns down.
Database design is partly driven by the theoretical (conceptual & logical) structure of the data. You need that before you decide what to store. It's also driven by practicality, your choice of database software and the purpose of the data. There are usually several different ways of building the database from a logical structure. Different structures optimise different things, database designers should know what needs to be optimised in any particular situation.
--
Bernard Peek
London, UK. DBA, Manager, Trainer & Author.
.
- References:
- Normalization and AVG across columns
- From: brianc
- Normalization and AVG across columns
- Prev by Date: Re: chart object drives me nut
- Next by Date: Re: Opening folder when the folder name has a comma in it.
- Previous by thread: Re: Normalization and AVG across columns
- Next by thread: Delete Posts
- Index(es):
Relevant Pages
|