3rd Degree


More Attendance Data

June 16th, 2009 . 10:28 am . By: Buzz Carrick

The Attendance Depths post and Not So Fast analysis of last week drew some interesting responses from people, several of whom are much better at stats than I am.  Rather they put them in separate posts, I’ll lump a couple of them together for everyone to see.  There is certainly some interesting stuff in here.

First Aaron Brock did a linear regression on the first 11 years of FC Dallas attendance numbers to try to predict attendance using independent variables from historic data, and measuring that against a single dependent variable (attendance).  For independent variables, he used record (wins, losses, separated out from each other), standing in the conference, whether or not we made the playoffs, and the move to PHP (having a SSS).

Here’s what he found…

I didn’t find anything statistically significant (as you rightly predicted, but I think you were also right that this may be due to the small sample size).  I got a regression output that gave an R square of .5152, meaning that a little more than half of the change in attendance figures is due to the changes in the independent variables.  Excel also gives out a set of coefficients by which you can derive a formula to predict attendance, those outputs were:

Coefficients
Intercept 19688.5
Wins -218.912
Losses -284.335
Ties -116.888
Standing -688.467
Playoffs 1413.097
Soccer-Spec Stadium 1448.202

I thought it was interesting that the intercept term was pretty close to the capacity of PHP (I think it’s within 1,000).  The wins column has a negative effect on attendance (which can’t be right, but might be fixed with a bigger sample), but taking the difference of the effects of a win vs. a loss, it does show that you would lose about 70 people for every game below .500 you finish.  You lose 688 people for every position below 1st in the standings, but you get a bump of 1,413 people for making the playoffs, and there’s an increase of 1,448 people since we have a soccer specific stadium, which I thought was very interesting.  I would really like to see how these figures looked league wide.

But, if you use this logic, and extrapolate FCD’s current form through to the end of the season, 2-6-4 translates into 5-15-10 (ouch), 6th position, miss the playoffs. Plugging that into the formula, it estimates a season-end average attendance of 10,477.  Pretty far north of what we’re at now, but I don’t think it’s out of reach.

If you follow your theory that attendance is based on the previous season’s record, the regression results improve slightly to .5614, and it gives you a 2009 estimated attendance of 10,031, a little closer to where it is now.

Again, none of this is statistically significant due to the small sample size, but it gives you a rough idea of what the attendance picture might look like.

Aaron’s data shows what I think I had previous thought was true with my less in depth method.  Having a winning season, i.e. getting into the playoffs is a big factor.  Interesting that his data also shows the stadium helping, when I wasn’t sure based on my superficial analysis it did that much.  I still think what I see as a negative location factor keeps the positive stadium factor in check.

Then I heard from Tornado Fan again who tried to take into account the Dallas attendance numbers relative to the league wide attendance numbers.  That’s something I mentioned as a more likely telling statistic when I did my very simple breakdown of the FCD numbers.

1. Since the sample size is so small, and significant events therefore have a large impact on the data, I eliminated the three seasons that had factors that significantly influenced the attendance numbers beyond the control of the FO (for the good and bad). Those seasons are the first year of the team (96), Southlake (03), and the first full year at PHP (06).

2. In order to keep things more consistent across all the years of MLS’ history, I decided to use a different measuring stick. Instead of the average attendance I switched to the team’s average attendance in comparison with the league’s average. That would make it easier to compare apples to apples through history of MLS, and it would take into account the Adu and Beckham effects which impacted attendance throughout the league (Beckham didn’t play here, but PHP still had a near sell out for that game). It also evaluates the FO performance relative to how the “state of the league” was doing during that particular time period.

I’ve included the data below, but I found that there have been four “attendance eras” so far:

97-98 when the team averaged around 30% below the league average. Then there was 99 to 02 when things were steadied and the team averaged about 13% below the league average (range: 5 to 17). Then there was 04 and 05 (which included some games at PHP) where the team really fell in terms of attendance to -34% average below. Then the 07-present era where the team has averaged 24% below the league average.

Year Team Avg MLS Avg % Below League GM

Early Years

97 9,678 14,619 -34% Hicks
98 10,948 14,312 -24% Hicks

Peak Years

99 12,211 14,282 -15% Hicks
00 13,102 13,756 -5% Swift
01 12,574 14,961 -16% Swift
02 13,122 15,822 - 17% Swift

Southlake Recovery

04 9,088 15,599 -42% Elliott
05 11,189 15,108 -26% Elliott

Current Era

07 15,145 16,770 -10% Hitchcock
08 13,024 16,460 -21% Hitchcock
09 8,770 14,906 -42% Hitchcock

Within those eras, the team performance has varied. I ultimately think that the three factors ALL influence attendance greatly:

1. Major event, such as a change of stadium venue or start of a franchise
2. Team performance
3. FO performance

Just because a team wins, it doesn’t mean the FO will be able take advantage of it. And just as important, a good FO will counter somewhat the effects of a bad season. Unfortunately, Dallas has neither at the moment. But the “steady” seasons from 99-02 I think show that the performance of the team can go up and down, but a good FO can keeps things somewhat in check. I think it also shows how during the years MLS was deciding on who to contract (00-01), they saw Dallas as having been able to get stable (and thus more financially healthy) relative to the overall attendances around the league.

In conclusion I think it’s clear what we probably all thought from the beginning, many factors influence attendance.  Both of these breakdowns do confirm what I said before, winning is clearly a factor in a positive way and losing is a negative.  A good front office however, as FCD had in the Swift era, can overcome that and keep attendance fairly level (Swift was VP of Marketing and Broadcasting in 99 under Hicks).  That’s something the current regime is clearly NOT able to do.

I’d still like to figure out how to prove that winning helps the next season’s numbers, I would imagine if in no other way at least in season tickets.  I’m not sure how to statistically prove that beyond what I did in Not So Fast.

Oh, almost forgot.  The data also shows HSG should pay whatever it takes to re-hire Andy Swift.  Or if he won’t do it again, perhaps the man they should have hired a couple years ago, former VP of Sales and Sponsorships under Swift and Elliott, John Alper.  You know, the guy who landed Pizza Hut and the other eight(?) primary sponsors FC Dallas had when PHP opened.






18 Comments

  1. Comment by Pegasus on June 16, 2009 10:38 AM

    Isn’t amazing how much money they are willing to lose or not make to prove they were right? Andy for club president and Alper for GM. Oscar for coach. Attendance would jump immediately.

  2. Comment by Barefoot on June 16, 2009 12:15 PM

    First off, Aaron’s regression seems to use all three of wins losses and ties. This presents a multicollinearity problem, and would render the regression invalid.

    Second an R2 of .51 is very good, actually and he should try to replicate this using only wins and losses to eliminate the mutilcollinearity problem.

    Third, to do this right, you should get a dataset of all games in all seasons for all teams with attendance data as the dependent variable. I would use the following independent vars:
    A. Home team points/available points at game time
    B. Visiting team points/available points at game time
    C. Home team points/available points in prior season
    D. Visiting team points/available points in prior season
    E. Soccer Specific Stadium (1 or 0)
    F. Home team’s market size (you could use census bureau MSA data for this).
    G. Home Team years in league

    Done this way, you will have adequate sample size.

    Also by only looking at home/visitor variables instead of individual teams you also cancel out the “Beckham Factor”

    Ideally, I would also like some measure of the home team’s marketing budget over the last 12 months, but I’m confident THAT isn’t available.

  3. Comment by Barefoot on June 16, 2009 12:22 PM

    PS. You could ru that regression for the league as a whole and then run it for each home team only and compare the results to get a sense of market/front office factors.

    Also, if someone wants to build the dataset, I’ll be happy to run the regression (I have access to some nice stats packages) but I lack the time to build the dataset myself.

  4. Comment by Barefoot on June 16, 2009 12:33 PM

    PPS. Okay now I’m thinking about this too much…
    You could also use Census data to determine the hispanic/immigrant population in the market.
    Anyone doing this should use a points or winning % as an IVar instead of # of wins, as it makes the interpretation easier.
    One should probably try some functional transformations on the points% IVar, because it probably isn’t a linear relationship but more likely logarithmic (That is, the effect of a winning record decreases as the winning % increases)

    Okay, Buzz, I put my real email on this post in case you want to get in touch about this.

  5. Comment by historian on June 16, 2009 12:38 PM

    Can’t ties just be counted as half wins? (0.5).

    That’s how winning percentage is figured out.

  6. Comment by DallasSoccer on June 16, 2009 1:00 PM

    I’d be interested in seeing how yearly profit margins and the budget spent towards marketing compares with these numbers.

  7. Comment by Buzz Carrick on June 16, 2009 1:33 PM

    Barefoot, I’ll be in touch.

    Dallas soccer. I don’t think we can get either the profit number beyond is there one nor the marketing numbers.

  8. Comment by Aaron on June 16, 2009 1:36 PM

    You’re absolutely right, barefoot, that would make it a much better dataset, i was just working with what Buzz had. But you can’t take draws out of the data set, i guess you could divide results into “wins” and “not wins”, but you can’t eliminate them entirely. Something like 40% of games in MLS this season have draws this year, that takes out a huge percentage of observations.

  9. Comment by historian on June 16, 2009 2:25 PM

    Can draws not count as half wins and half ties? Does it not work that way?

  10. Comment by timkvfp on June 16, 2009 3:32 PM

    I can provide demographic data by state, MSA/MD, county or census tract level that is based on 2000 census data. I can provide breakdowns by the government race categories, gender, age, etc.

    Just let me know what you need.

  11. Comment by historian on June 16, 2009 5:10 PM

    If you’re going to look at budget spent towards marketing, you probably also have to factor in budget applied to the sales staff, including the number of people on the sales team.

  12. Comment by HRM on June 16, 2009 7:54 PM

    Buzz, if you don’t mind, I’m going to re-post what I did on the other thread and modify it a little, addressing at least one of the points made above.

    For the regression I did, I first calculated the team’s “success%”, which I define as % of total possible points. For example, in 2009, FCD’s 2 wins and 4 draws out of 12 total games is 10 points out of a possible 36, so their success% is 28 percent.

    This method gets to the multicollinearity issue. Another way to address this is using 1 for a win and 0.5 for a draw, as was suggested above.

    The regression result I got was this:

    This year’s success% x 16,477 +
    last year’s success% x 8,285 +
    1,930 for years at PHP.

    For simplicity of explanation, I did this regression without an intercept. That’s fine, but it gives you an unreliable R-squared. But I did a similar regression with an intercept. I got similar results, with an R-squared of 0.87, which is very good.

    Example using 2006:
    0.5417 x 16,477 + 0.5000 x 8,285 + 1,930 = 14,998. That’s very close to the actual result of 14,892.

    So what this says is that a 10% improvement in the success rate (equivalent to about 3 wins) should produce an increase in attendance of about 1,600. Last year’s success rate has a smaller, but still significant effect–10% better success adds about 800 fans per game.

    This method reaches a similar conclusion as others have–there are two years where the actual results are significantly better than the model’s prediction. In 2001 and 2002 the results were better than predicted by 1,211 and 970. Both years were GM’d by Swift.

    Again, there’s so few data points, it’s a little risky to make conclusions.

  13. Comment by Barefoot on June 16, 2009 8:06 PM

    Aaron, my point was that using wins losses and ties creates a problem of multicollinearity because if you know the value of any 2 of those, you must (by definition) know the value of the 3rd. Mathematically, if you include all 3, then you get mutltiple solutions of Y for a given X. (Please don’t ask me to write a proof of this, I’m a long time out of grad school)

    I’m not saying you shouldn’t take ties into account, only that you have to do so in a way that doesn’t give you any two or more variables that can solve for each other.

    HRM’s success rate is a good example of this.

  14. Comment by Barefoot on June 16, 2009 8:13 PM

    PS. Aaron, on reflection I admit I DID say that you should take ties out, I just hadn’t thought it through.

  15. Comment by historian on June 16, 2009 8:26 PM

    why can’t ties count as half wins and half losses. That still gives you x and y.

  16. Comment by Barefoot on June 18, 2009 8:52 AM

    historian: you can do that as long as you don’t use BOTH wins+ties/2 and losses+ties/2 in your model.

    HRM’s success rate is akin to using wins+ties/3.

  17. Comment by Wacko4Burn on June 18, 2009 11:04 AM

    I admire the expert analysis of the stats gurus here, but with all due respect you are just trying to find out who is responsible for more suckage: the players, coaches or front office. But it’s really all three. And if it’s all three then that can only point back to the one who hired them, the owner. The only redeeming hope FCD fans have is that the ownership will do for FCD what they did in Columbus.

    Look to Columbus for your model for getting out of this. A coach looking to redeem himself (Sigi) got an exceptional foreign player looking to revive his career (Schelotto) and surrounded him with hard-working American pieces/parts and some young talent who didn’t know any better in a simple system. That’s exactly what Seattle did too.

  18. Comment by Scott on June 18, 2009 1:01 PM

    I think FCD’s biggest problem has been arrogance. I worked for the local PDL team around the time the move to Southlake came about. I said it was crazy because the hispanic fan base wouldn’t drive that far to see them play – much less the rest of the Dallas fans. A person in the PDL teams management remarked, “hispanics do have cars, right?” We know how that worked out, the stadium sucked and you basically doubled the ticket price by adding the long drive.

    Then the move to Frisco was more of the same, plus gas was $3.75 a gallon at the time they moved up there.

    Don’t get me started on “Hoops” and firing Clarke – who got you to the playoffs with overachieving teams but just couldn’t win once he got there.

Comments RSS TrackBack Identifier URI

Leave a comment

© Copyright 1996-2009 A. Buzz Carrick, All Rights Reserved. This website is an unofficial and independently-operated source of soccer news and information and is not directly affiliated with or endorsed by any team, league or their owners. Logos and other promotional materials are property of their respective owners. For FC Dallas' official team site, visit www.fcdallas.net. Interested in contributing to our effort? Then contact us at buzz@3rddegree.net

    Founded October 1997

    Volume Fourteen

  • 42 Point Playoff Target

  • Current Points: 21
    Points Out: 21
    Games Left: 17