Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie

Modeling a Sports Statistics Database

Options
  • 01-04-2014 3:48pm
    #1
    Registered Users Posts: 400 ✭✭


    Hi All,

    I've been a software engineer for many years now but my work has almost entirely been frontend based. As such, I've actually had very little dealings with databases. The few experiences that I have had with them I've merely been querying an existing DB.

    Recently I've decided to create an Android application for recording the statistics of a sports team. I'm trying to keep it pretty generic but Soccer and GAA are the real candidates for use.

    As part of this I'm trying to create a suitable database model to organize this information but to be honest I'm a little lost and I'm hoping some of the more learned posters can have a look and maybe give me some pointers.

    So far I've determined the following elements

    * Player (player name)
    * Position (position name)
    * Team (team name)
    * Match (a reference to the teamId, opposition name, number of players, date)
    * Statistic (stat name e.g. 'Scored Goal')

    The relationship model that I've come up with is currently looking like this:

    oPIOMQM.png
    (Primary keys in bold)

    The 'join' tables (I believe that's what they're called) are suppose to represent the following associations:

    * PlayerTeam (The association between a player and a team)
    * MatchPlayer (The selection of a player for a match. The is_active flag I have included would be used for substitutes)
    * PlayerStatistic (The instance of some stat by a player, in a particular match and the time it occurred at)

    Again, this is effectively my first time modelling a DB so if anyone could provide me with some feedback/suggestions/pointers etc. I would be very grateful.

    Thanks

    Additional information:
    As said before the application will be Android (Java) based and the database will be SQLite but I don't think that makes too much of a difference here.

    As it might help the understanding of how I wish to use the model. The basic use of the application would follow the following flow:

    * User opens app for first time
    ** User enters Team name
    ** User enters Player names
    * User starts a new Match
    ** User enters Opposition Name
    ** Selects starting Players and their Position
    * Match commences
    ** User records some Statistics
    ** User identifies substitutions
    ** User can review Statistic records 'in-play'
    * User finishes the Match
    * User can review the finalised Statistic records in various formats


Comments

  • Registered Users Posts: 203 ✭✭shakedown


    For a first time DB model, that's pretty good! I think that model would work with all your use cases, so why dont you start with it and adjust if required.


  • Registered Users Posts: 1,311 ✭✭✭Procasinator


    Just free-flowing thoughts written down:
    You mention GAA and Soccer, so I assume this a multi-sport app? At the moment, there is no simple way to distinguish which is which in your model.

    I imagine storing venue and hence home/away might be an important stat.

    Should you be storing league/competition/tournament?

    You may want to split player_name into first and last name, as referring to players by last name alone could be a common case.

    Is the idea of the app to follow one team in each sport, or multiple? If multiple, opponent_name may be better served as a foreign key to a team instead of just a varchar.

    Are there any stats that apply to the team, rather than the player? You can probably derive most from the player stats, but maybe not. Should you record win/lose/draw for a match (even if you can determine it via stats, it may make sense to have it there too).

    Do you want to record points in the tornament?


  • Registered Users Posts: 400 ✭✭irishbuzz


    Thanks for the replies.
    You mention GAA and Soccer, so I assume this a multi-sport app? At the moment, there is no simple way to distinguish which is which in your model.

    A good suggestion I will include a Sports table with each Team referencing a sport_id. I think it probably also makes sense to move the number_of_players field from the Match table into here.
    I imagine storing venue and hence home/away might be an important stat.

    I could forsee this being useful. I suppose there is also the option of a neutral venue so a bit flag won't quite suffice. I'm not sure it makes sense to add a Venue table, maybe that might be overkill? I could probably track it as a new Integer field on the match table with Home=0 / Away=1 and Neutral=2, I suppose. Not entirely sure what would be the best approach?
    Should you be storing league/competition/tournament?

    Agreed. Competition table added
    You may want to split player_name into first and last name, as referring to players by last name alone could be a common case.

    I don't see the immediate need for this but I can imagine if the need ever did arise it would be difficult to alter the existing data so I'm also going to adopt this now.
    Is the idea of the app to follow one team in each sport, or multiple? If multiple, opponent_name may be better served as a foreign key to a team instead of just a varchar.

    Yes I could see why a user might want to see the history off stats against a particular team.
    Are there any stats that apply to the team, rather than the player? You can probably derive most from the player stats, but maybe not. Should you record win/lose/draw for a match (even if you can determine it via stats, it may make sense to have it there too).

    I don't think there's any team level stats to be recorded. I guess if I was going to go down that road I would add a TeamStatistic table and add them in there. I think that since that would be a distinct table it probably would be easy enough to add whenever the requirement arrises.
    Do you want to record points in the tornament?

    No for now only match statistics are to be recorded
    shakedown wrote: »
    For a first time DB model, that's pretty good! I think that model would work with all your use cases, so why dont you start with it and adjust if required.

    Thanks for the encouragement. I guess since I haven't really worked with DBs too much before I'm a bit unsure how to go about adapting the model when I discover new requirements and complications (especially if I have data recorded). For this reason I was hoping to nail down the model as much as possible before I start throwing data in there.

    I've updated my model to reflect some of the changes as suggested:

    g6QYN53.png

    I appreciate all the suggestions. Can anyone tell me what they think the best idea would be for distinguishing data based on the match period (ie. first half, second half)? Again I was thinking of simply adding a bit flag but I wonder if I should also cover additional 'period' options (e.g. First half of extra time, 3rd and 4th quarters in other sports).


  • Registered Users Posts: 11,979 ✭✭✭✭Giblet


    It looks like you could use a value field on Statistic
    IE: NumberOfYellowCards : 2


  • Registered Users Posts: 1,311 ✭✭✭Procasinator


    Giblet wrote: »
    It looks like you could use a value field on Statistic
    IE: NumberOfYellowCards : 2

    For that example, you could just record them as individual records. An append-only model as such.
    If you want to record the time of each instance, this would be more flexible.

    But yeah, if there is a statistic can have an arbitrary value associated with it, some model should be put in place. Whether or not a simple numerical type covers all possibilities is another matter.

    Another thing you might want to think about is if statistics can be linked to other statistics. For instance, could the relationship between a player's assist and the actual goal scorer be important in your app?


  • Advertisement
  • Registered Users Posts: 11,979 ✭✭✭✭Giblet


    It's looking like a basic EAV model, but with regards to periods, this will be difficult, as statistics may occur during a match, or not. Games injured? Who knows!

    Maybe enumerate the periods of a match (for a specific match type) and attach the timing of these events to a match based on time.
    MatchId | Period | TimeStamp

    Where Period could be "KickOff", "HalfTime" etc.

    These could also be stored against the Statistic, but again, it might not be the case that they occur during a match, or more details may be required.


  • Moderators, Arts Moderators Posts: 35,366 Mod ✭✭✭✭pickarooney


    Not sure what the purpose of the TeamPlayer table is. The information for this is just the link between player and team.

    Have home and away in the Match table to represent the teams.

    Is the Statistic table necessary? Can you not put all the stats in the player statistic table? You'll need to store different types of values in there (percentages, integers) so you'll most likely need one column for each within that table, including some empty/dummy ones for expansion.

    I think you might be better off making each sport use a separate database as
    1) there'll never be a need to consult/compare statistics between sports
    2) you'll keep the tables smaller rather than having a load of unused statistic fields in each database.
    3) If you want to add e.g. hurling or rugby later it will be a lot less messy to create a new model again based off this one than expand all the existing databases and potentially slow down the app for people who have been running it with just one sport's features.

    You'll need some kind of Competition table to collate stats from leagues, cups, European competitions, internationals...

    A formation table, maybe linked to the position table could be useful. Remember to make it impossible to include any more than 11/15 players on the pitch at a time. There are lots of fiddly validation checks to do on this in case of substitution rules, injuries, cards, minutes played...

    On your last question, assuming you have the user entering the data in real time, allow for a 'minute' field that registers the current time in the game (from which the 1st/2nd half will be implied) when the user taps the relevant stat button.

    A big thing, I think, will be sharing the data. If stats geeks are entering in every detail as they watch the match, realistically they'll only be able to do one player or at best one team at a time, so for example if John is monitoring Suarez and Liam is keeping tabs on Gerrard, the stats they've both entered should be mergeable into a general Liverpool table.


  • Registered Users Posts: 400 ✭✭irishbuzz


    Yeah my plan was to create a record for each instance of a statistic.

    I wonder though if were to record substitutions would it make sense to record them as a statistic (A 'taken off' record for one player and a 'brought on' record for the sub). In this case it might make sense to have a field that would track the associated player with it. Although I guess I could always link the two as they would be occurring at the same time?..... My brain is a little fried at the moment from a busy day in the office at the moment to resolve this :)


  • Registered Users Posts: 11,979 ✭✭✭✭Giblet


    Yeah you have a few many to many tables which might be easier to handle de-normalised.


  • Moderators, Arts Moderators Posts: 35,366 Mod ✭✭✭✭pickarooney


    irishbuzz wrote: »
    Yeah my plan was to create a record for each instance of a statistic.

    I wonder though if were to record substitutions would it make sense to record them as a statistic (A 'taken off' record for one player and a 'brought on' record for the sub). In this case it might make sense to have a field that would track the associated player with it. Although I guess I could always link the two as they would be occurring at the same time?..... My brain is a little fried at the moment from a busy day in the office at the moment to resolve this :)

    I'd say the statistic 'substitution' is not particularly interesting on its own. However, if you have a 'time-on' and 'time-off' field in the MatchPlayer table you can extapolate the number and time of substitutions per match and thus per team over the course of the season.

    I wonder do you even need separate MatchPlayer and PlayerStatistic tables or if the former couldn't better cover both.


  • Advertisement
Advertisement