Need help with evaluation/testing (probably simple)



I want to do an evaluation of a test series conducted to find out whether a
newly developed computer system can perform a given task better than a
human. However, I am somewhat at a loss as to what kind of statistical
evaluations are possible in this case. The test series had the following
properties:

(1) the procedure in question can be performed by a human (TS1)
or a computer (TS2)
(2) the result of the procedure can be expressed numerically (real value)
(3) we set a goal (= desired result value) which was to be realized in
the test series
(4) the procedure was performed by the computer and by the human 5 times
each, trying to reach the given goal, which remained the same for all
experiments
(5) the results of the 10 experiments were measured
(6) the deviation between the goal and a specific result is called error,
which may be positive or negative
(7) the error determines how good a result it is rated: the smaller the
error, the better the result

Now, on to my questions. Please forgive my lack of statistical knowledge:

(1) Many test talk about a 'population'. What exactly would be the
population in this case?

(2) For evaluation, I thought about giving the following characteristics
for both test series (TS1 and TS2):

- the mean error
- the standard deviation of the error
- the mean of the absolute error (to avoid cancelling out)
- the standard deviation of the absolute error
(to see whether the error is systematic)

Is this choice reasonable?

(3) I realize that a sample size of 5 is not very opulent, but is there a
viable way of testing whether the results of (TS2) are 'better' than
those of (TS1)? I read about the Mann-Whitney test, would this be
applicable in this case?

thx
Charly

.