Abstract:
As we know, ‘dark horse’ is a popular word in sports area. People always care about the dark horse since it is related to the prediction before match. In this project, we will focus on tennis tournaments and find some patterns about dark horse in different aspects.
Definition:
Dark horse: a person (such as a player), animal, or thing that competes in a race or other contest and is not expected to win
Data:
1.Resource:
For female matches: linked phrase
For male mathes: linked phrase
2.Description: The data included all the tennis mathes results in 2014 (both male and female).
Tournament = Name of tounament (including sponsor if relevant)
Series = Name of ATP tennis series (Grand Slam, Masters, International or International Gold)
Tier = Tier (tournament ranking) of WTA tennis series.
Court = Type of court (outdoors or indoors)
Surface = Type of surface (clay, hard, carpet or grass)
Winner = Match winner
Loser = Match loser
WRank = ATP Entry ranking of the match winner as of the start of the tournament
LRank = ATP Entry ranking of the match loser as of the start of the tournament
AvgW= Average odds of match winner (as shown by Oddsportal.com)
AvgL= Average odds of match loser (as shown by Oddsportal.com)
Analysis:
1 Abstract all the matches that ‘dark horse’ occurred.
2 In term of players, calculate:
dark win rate: percentage of becoming a 'dark horse'
dark lose rate: percentage of being defeated by a 'dark horse'
pre adv rate: percentage of having advantages before match
stability: percentage of being consisitent with the prior prediction
3.In term of tournames, calculate:
dark horse rate: percentage of having 'dark horse' in the tournament
4.Compare the male and female data Visualization:
Part 1: Use the scatter plot to analysis different players and then classify them with the result:
Consequence from the graph:
From the first graph, we should pay more attention to player who lies on the topright of the graph, such.
From the second graph, players are classified properly with their stability and pre-advanced rate.
Part 2: Use the boxplot to compare dark horse rate of tournaments in terms of level, surface, type
Description of the graph:
The level of the tournament: GS(Grand Slam)>SE(Season Final)>PM(Premier Mandatory)>PR(Premier)>IT(International)
Consequence from the graph:
1.With the increase of the level of the tournament, the dark-horse rate will decrease.
2.The dark horse rate in female international touranments is especially high.
3.Compared with other kinds of surfaces, Dark horse rate in Grass and Indoor are relatively higher.
Part 3: Compare the top 100 of male and female player
Consequence from the graph:
1.Male top 100 players are more stable than female players.
2.The domination of top 100 player for male and female are similara
Part 4: Use the histogram to compare the stability of top 100 female player