Re: Lazy summer project: build a basketball app.



Bob Badour wrote:
kvnkrkptrck@xxxxxxxxx wrote:

Bob Badour wrote:

kvnkrkptrck@xxxxxxxxx wrote:


All, it's been a long time since I've done a from scratch database
design. What follows is a shameless appeal for anyone with some
database design expertise, a few extra minutes, and desire to help out
- to offer any criticisms/advice they see fit. I'll conclude the
opening paragraph with an apology to those without said expertise, few
extra minutes, or desire to help; for the time wasted on reading the
opening paragraph (and a suggestion to read no further/waste no further
time).

My dad is a huge D3 college basketball fan, and during the season, he
attempts to keep daily tabs on the entire division (hundreds of teams).
Last year, he approached me for help. He'd been using dozens of Excel
sheets to track the games, and was having tremendous difficulties
keeping up. Data entry was a nightmare - all games needed to be
modified in two places (once for each team). Due to lack of data
integrity, his sheets were riddled with errors/typos. And reporting
was even worse - he spent hours each weekend manually sorting sheets,
summing functions, etc., in order to answer such simple questions as
"Which team has the best record", and "Next weekend, are there any
undefeated teams playing each other".

Anyway, at the time, I helped him out by writing some ugly Excel macros
for data synchronization (allowing him to enter a score once and have
it posted in both places) - but it was conditioned on his promise to
let me design a relational database for the next season.

So here I sit - it's been a long time since I've done a from-scratch
relational design, but it's a slow afternoon, so I figured I'd start
some ER planning.

I start with a Team entity, with each team having a division attribute
(1, 2, or 3), a conference attribute, a college name, a shorthand name
/ abbreviation, and a city/state (pk will be division, conference,
college name). I can hand this entity over to my dad right now, and he
can start populating it. But I'm getting a bit stuck on the next step.


At first, I thought, I'll need a simple Game relation. But I think
this will need to be a super relation of sorts - because certain
attributes of tournament games are different than non-tournament games
(e.g. existence of a home/away team; existence of the name of the
tournament; a tournament round number; etc.).

Another odd characteristic of the Game relation is that I can't come up
with a decent natural key while retaining symmetry of opponents (team A
vs. team B == team B vs. team A).
If I recall correctly, in a game between some team1 and team2, both
(Game Date, Team1) and (Game Date, Team2) are viable candidates - no
team will ever play two games on the same day. But I'd have to break
the symmetry in order to get a natural key (I think).

So, I'm now leaning towards introducing an artificial Game entity with
a surrogate Game_id, e.g. <Game_id(pk), scheduled date>, and also
introducing a Opponent relation <Game ID (pk / fk on Game), Team (pk /
fk on Team), Points scored, ...>. I'd have to enforce some
application-level constraints (e.g. each game_id must exist exactly two
times in Opponent (if at all)), but it does keep a nice symmetry.

Anyway, I've got ample time to think this one through - and my thought
is "if I build this right the first time, I can design some front-ends
for data loading, some views for data extraction, etc..., and my dad
might have a nice toy to play with for years to come".

So, having said that - I'm open to any design advice or criticisms
anyone may have to offer. My best idea right now:
Team <division, conference, college, a shorthand name, city, state>
Game <Game_id (pk), Date>
TourneyGame <Game_id (pk - fk on Game.Game_id), Tournament_name,
round_number>
NonTourneyGame <Game_id (pk - fk on Game.Game_id)>
TourneyOpponent <Game_id (pk - fk on TourneyGame.game_id), Team_fk
(pk), Points_Scored>
NonTourneyOppenent <Game_id (pk - fk on NonTourneyGame.game_id),
Team_fk (pk), Home_Away_Ind, Points_scored>

What about forfeits? Or does college basketball not have forfeits?


I don't think forfeits are an issue - although games can certainly be
cancelled or rescheduled due to inclemental weather (one more argument,
I suppose, for avoiding Game_Date as part of a key - if a game between
A and B changes from the 1st to the 2nd of Sep, is it a different game?
I think not.). I'll ask to be sure.


Do all teams have a home court? If so, when a tournament game is played
at the home court of one of the teams, is that team not a home team? Is
a team ever considered the home team when playing at a different court?

Does college basketball have anything like a pre-season or exhibition game?


In his spread***, he used "A", "H" or "T" for this column... I don't
see a single instance of "AT" or "HT", so my guess is that he'd
consider a team as neither home nor away, even if it was playing in a
tournament hosted on its own court. I'll ask him, though, if he'd like
the ability to specify a team in a tournament as "home" if it is
hosting the tournament and away otherwise.

I would not be so hasty to assume the spread*** reflects what your
father wants to enter. It could be he set that up for entry because he
did not have an easy way to derive it. If you can derive it, you will
save him having to enter it.


What are the requirements your father has?

The requirements are inferred from the types of facts he included on
his spreadsheets from last season. My idea was to build a basic model,
asking questions as I go along. But my "high-order" goal is to deliver
something that lets him track the same facts he'd tracked last season,
but taking advantage of a relational foundation so I can quickly and
easily build on it data entry forms, queries, reports, etc.

I wion't get into how long it took me to write the stupid error-prone
score synchronization Excel macro, but all I could think was, "This
should *not* be this hard"...

Perhaps you should question your father on the requirements he tried to
meet with his spread*** before making all kinds of tradeoffs and
compromises. You might be able to compromise less and deliver more.

In other words, put the horse before the carriage. Check. I'll
actually be visiting him this weekend, so I suppose I'll plan to bring
a notebook. My current understanding of the situation is that the
spreadsheets haved served as a fact base to help him
1) make insightful comments in discussions with other D3 fans (online
or other local enthusiasts)
2) identify "interesting" upcoming match-ups (between undefeated teams,
or between two conference leaders, etc.)
3) identify upcoming local matchups (which he may go to see in person)
4) monitor teams that he has some "link" to (knows the coach, knows
someone who went there, used to coach there, etc.)

Each year, he has made his spreadsheets a little more elaborate than
the prior year - but last year was the first time he attempted to track
the entire division. I'm not sure if there is more information about
games/schools/etc. that he'd care to track, but that will be a big part
of what I ask him over the weekend. I know that he obtains most of his
schedule information from schools via their websites, about 2 - 4 weeks
before the season, in a variety of formats, and enters them in manually
(using one spread*** per conference)... he is able to eliminate
duplicates within a conference, but all interconference games are
tracked in two places. These schedules, I believe, either list a
specific opponent (and exact date, home/away, etc.), or the name of a
tournament (and the start date of the tournament). I think he entered
tournament games with a "null" opponent, and then filled in opponents
after the fact (as well as inserting the second and third round games).


"Conference games" may just be inferred from the fact that the two
teams are in the same conference.... I'll have to ask. I'll also ask
if he would like to have special tournament tracking flexibility - at
the very least, it's an interesting challenge on its own right to
represent a tournament in a way that captures: "Who played who in
which round".

Druing the season, I believe there is a website that does a fair job of
collecting the scores (d3hoops.com or some such), and he gets the
majority of scores by cutting/pasting scores from there into his
spread***. As far as I know, the site only posts teams, scores, and
overtimes; but I can check on whether more information is available.
Once I have a data model in place, I can write a little parse/load
script to allow for easier data loading.

Anyway, good pointers, I'll report back when I get some better spec's
(based on questions like, "is there any information you'd like to
store, but don't"; "how do you want to handle data entry? if by loading
a file, how should missing info, conflicting info, duplicate entries,
typos, etc be handled". Of course, a lot of what he'll tell me will
probably have more to do with front end, but I'll try to get as much
feel for the underlying data models and constraints as I can.

.


Quantcast