UTAStats - Feb 2009

  • Hey - turns out IRC is out and something a little more modern has taken it's place... A little thing called Discord!

    Join our community @ https://discord.gg/JuaSzXBZrk for a pick-up game, or just to rekindle with fellow community members.

ye i agree if the system has way too much data going through it , just keep pug stats and league match stats , surely this will dramatically reduce the amount of data ?
 
how easy/difficult (granted its more likely to be difficult) is it going to be to re-hash how the stats are accumulated? I would be happy to dedicate some time to trying to re-hash it in the name of learning new stuff which in some way may be useful to me later and obv of some use to the community.
 
In it's current un-touched state, MySQL database is over 3Gb in size (approx 45 million records); UTAstats site is around 1Gb with archived log data.

Bacon was pretty much the 'optimal' system to run it on, but due to the temporary tables created in a twice-hourly update + static cache routine, it was really hitting hard.

So doh would be able to give the specs of that, but it would be something like a Dual Xeon 3.0 with 4Gb RAM, 15k RPM SAS or SCSI drives.

doh has said 'colourful words' in the past about how bad utstats was and how much it was killing the server - I'm sure he'll be able to recall some exact instances + utilisation figures.

how easy/difficult (granted its more likely to be difficult) is it going to be to re-hash how the stats are accumulated? I would be happy to dedicate some time to trying to re-hash it in the name of learning new stuff which in some way may be useful to me later and obv of some use to the community.

Hm, probably on the more difficult side of things if you've not had much exposure to php/mysql.

UT Server will log per-map (round) data to a file similar to that attached; on a given schedule, these files are then downloaded from every server to a central location. The logs are run through a cleansing function to remove any crap data, then every bit of useful log data is dumped directly into MySQL temp tables where a number of grouping and sorting routines are run.

The manipulated temporary data is then extrapolated into the various utstats tables (killsmatrix, player, pinfo, match, weaponstats, smartass_objstats), then rankings are calculated. This whole process takes about 5-10 minutes.

Then, a couple of the really heavy queries (totals, home stats) are run through a cache routine and static html files are generated, which can also take a good 5-10 minutes.

So that's anywhere from 15 to 20 minutes of solid crunching, twice an hour all day every day.

Of course, if you reduce the frequency, then the amount of time + data that needs processing is increased accordingly.

An excerpt from conversations we had last year about the crap queries, and had to build in fail safes to stop utstats from overloading the server:
Code:
[19:21] [@]doh> gah, fucking yahoo crawler is indexing utstats
[19:22] [@]doh> generating some huge queires
[19:44] [+]Timo_Maas> :/
[19:46] [@]doh> | 83710 | league | localhost | league_stats | Query | 44 | Copying to tmp table | SELECT pi.name AS name, pi.country AS country, p.pid, COUNT(p.id) AS games, SUM(p.gamescore) as game |
[19:46] [@]doh> 44 secs.. useless
[19:47] [+]Timo_Maas> hm, that's not one of the usual queries
[19:47] [@]doh> well it was doing something like writing sort index before
[19:47] [&]FragMe> 44 sec o_O
[19:48] [&]FragMe> EXPLAIN on that sucker
[19:48] [@]doh> so i guess that yahoo bot is doing stuff
[19:48] [@]doh> that utstats is nasty shit

[19:52] [@]doh> | 86840 | league | localhost | league_stats | Query | 111 | Creating sort index | SELECT g.id AS gid, g.name AS gamename, SUM(p.frags) AS frags, SUM(p.kills) AS kills, SUM(p.suicides |
[19:52] [@]doh> mmm nice
[19:52] [+]Timo_Maas> i'm adding some script_name columns into those queries to show where they're coming from
[19:52] [@]doh> 130 secs and counting
[19:53] [@]doh> theres that sort index again
[19:53] [@]doh> takes ages
[19:53] [+]Timo_Maas> paste
[19:53] [@]doh> above
[19:53] [@]doh> took like 140 secs

[19:57] [@]doh> # Query_time: 143 Lock_time: 0 Rows_sent: 16 Rows_examined: 11017280
[19:57] [@]doh> use league_stats;
[19:57] [@]doh> SELECT g.id AS gid, g.name AS gamename, SUM(p.frags) AS frags, SUM(p.kills) AS kills, SUM(p.suicides) AS suicides, SU$
[19:57] [@]doh> FROM uts_player AS p, uts_games AS g WHERE p.gid = g.id GROUP BY gamename ORDER BY gamename ASC;
[19:57] [@]doh> thats the full one
[19:57] [@]doh> oops cut off
[19:58] [@]doh> # Query_time: 143 Lock_time: 0 Rows_sent: 16 Rows_examined: 11017280
[19:58] [@]doh> use league_stats;
[19:58] [@]doh> SELECT g.id AS gid, g.name AS gamename, SUM(p.frags) AS frags, SUM(p.kills) AS kills, SUM(p.suicides) AS suicides, SUM(p.teamkills) AS teamkills, COUNT(DISTINCT p.matchid) AS matchcount
[19:58] [@]doh> FROM uts_player AS p, uts_games AS g WHERE p.gid = g.id GROUP BY gamename ORDER BY gamename ASC;

[20:52] [+]Timo_Maas> <!-- loadavg: 3.54 2.58 1.87 1/141 5744
[20:52] [@]doh> its 5/10/15 mins avg
[20:52] [@]doh> and we just need the 5 min avg
[20:52] [@]doh> dunno what the last is tbh
[20:52] [@]doh> do you know FragMe ?
[20:53] [@]doh> /proc/loadavg
[20:53] [@]doh> lol the last one is the number of times the file was queried

[20:54] [+]Timo_Maas> what am i capping it on?
[20:54] [+]Timo_Maas> 3?
[20:54] [+]Timo_Maas> 4?
[20:54] [@]doh> 5
[20:54] [+]Timo_Maas> k
[20:54] [@]doh> i think forums are a bit higher then that, so utstats stop before forums do
[20:54] [@]doh> yeah
[20:54] [+]Timo_Maas> be funny if this renders utstats inoperable
[20:54] [+]Timo_Maas> :P
[20:55] [@]doh> well a loadavg of 5 is pretty high
[20:55] [@]doh> it only goes above that when all processors are busy
[20:55] [@]doh> which is bad
[20:55] [&]FragMe> some heavy disk IO can cause a 6 pretty easy
[20:55] [@]doh> yeah
[20:56] [@]doh> but it never happens on this box
[20:56] [+]Timo_Maas> right, done
[20:56] [@]doh> mostly mysql that uses the disk
[20:56] [+]Timo_Maas> we now have loadavg checking in utstats :P
[20:56] [@]doh> :D
[20:56] [@]doh> i can sleep a bit more safe then
[20:56] [@]doh> should probably test it
[20:57] [+]Timo_Maas> yeh that's what i'm trying now
[20:57] [@]doh> well i'm peaking it now
[20:57] [@]doh> just leave it at 5
[20:58] [@]doh> look in top where its going
[20:58] [@]doh> hitting 4 now
[20:58] [@]doh> its closed now
[20:58] [@]doh> it seems
[20:58] [@]doh> before 5
 

Attachments

  • Unreal.ngLog.2009.02.07.20.49.53.8888.zip
    18.9 KB · Views: 56
Last edited:
:eek: private luv for timo going public!

Anything with enough memory to cache the entire db, and have enough processors in spare not to grind to a halt would do.
The old bacon was a dual socket Nocona type Xeon with hyperthreading - 4gigs of mem running without PAE so having some 3.5GB available - 2 random 10krpm SCSI disks, nothing to brag about, you don't want it running from the disks anyway.
Bacon was sorta struggling with it, i reckon a Core 2 type processor would do a lot better.
 
Surely it would be possible to have the stats stop updating from say, dog std, customs, etc and so it only takes info when matches are running (ie, pug and league matches).

Personally i think pug stats are useful, but public games seem entirely pointless, so why not just completely disable stats reporting unless match mode is enabled on the server. I suspect this would require some effort though. The stats system torp mentioned for CTF stats doesn't seem to suffer from such massive load times though.. you sure theres no some server side mod which can filter out the crap before it is sent to the stats server? - guess this will put strain on the game server then though :\
 
wow, sounds rather complicated all in there Timo, will have to look at it further and get back to you if i feel up to the challenge. Sadly i reckon it'll be a no for the forseeable future. Got other stuff to learn as well :eek:
 
The stats for that ctf de site are very basic, translated into AS it would be pretty much be kills, sprees, objectives, not exactly a huge amount of stats especially as there is a total of 700 maps played.
If all league servers are used and each match is say 7-3 avg score, youd have 140 maps played a night in assault! That will soon mount up especially considering thats not including publics or pug!

The fact that the whole match is recorded, objectives, kills, etc is probably why assault stats grounds to a hault compared to that CTF one.
 
The stats system torp mentioned for CTF stats doesn't seem to suffer from such massive load times though.. you sure theres no some server side mod which can filter out the crap before it is sent to the stats server? - guess this will put strain on the game server then though :\

They also seemed to have logged less than 1000 matches? Bit of a difference from over a million rounds (this was archived down to half a million when taking away data older than 2007). /edit: What MyM said ^ ;)

The log data isn't an issue as it takes fractions of a second to discard what we don't need.

Think the general concencus as per my earlier post is that we'll ditch public stats, or better still, just keep high level info (number of maps, players, kills per server) for publics, and detailed info + full stats for matches.

Still... Need to find some time to get stuck into this as it's tough to just spend 5 minutes here and there without re-thinking it each time.

:eek: private luv for timo going public!
It's OK - I snipped out the good bits :naughty:
 
Last edited:
Forgive my ignorance on this subject.

Afaik, the CTF stats that are older than 2 weeks are non-accessible and only newly generated statistics are visible. I think there are more fields than the ones MyM pointed out though, but obviously not to the degree that UTA has. Supporting 3 servers, 2 of which are full for the majority of the day can't be too dissimilar can it? If it is then ignore me :D

Surely it's worth querying the people behind it though? Maybe they have a solution.
 
That's a ton of webspace just for UTA Stats and a ton more of processing. If there's a chance we can cut out all publics/pugs and just keep minimal league matches I may be able to find some hosting but I'm not entirely sure if it would have nearly the processing power needed.
 
booohooo no utstats. worst news ever :rolleyes:

if it needs too much power, just cut it out. Not that utstats are important anyway
 
If Bart makes too many posts in this thread, just cut him out. Not that Bart is important anyway.
 
They also seemed to have logged less than 1000 matches? Bit of a difference from over a million rounds (this was archived down to half a million when taking away data older than 2007). /edit: What MyM said ^ ;)

The log data isn't an issue as it takes fractions of a second to discard what we don't need.

Think the general concencus as per my earlier post is that we'll ditch public stats, or better still, just keep high level info (number of maps, players, kills per server) for publics, and detailed info + full stats for matches.

Still... Need to find some time to get stuck into this as it's tough to just spend 5 minutes here and there without re-thinking it each time.

It's OK - I snipped out the good bits :naughty:

yeah sounds good imo if it helps. People are only really inetrersted in pug and match stats anyway i think , maybe theres some guys out there who like public stats still thou , not sure.
 
CTF stats are reset for 1st of each month* and i think it takes like the #1 from the previous month from a couple of fields such as most caps / most flag kills and puts them into the history book ^_^
 
Last edited:
I haven't seen much of the code, but I believe that it's pretty inefficient.

There's a few main jobs that I see:
  1. Data gatherting
  2. Data crunching into stats
  3. Stats viewing/searching
1) Isn't a problem, storage is cheap and we can hold many GB of raw data

2) Crunching is intensive, and gets tougher with each new stat which is added. However this doesn't need to be done in real time and a better approach would be to do it in smaller batches which cause less overhead to the box - rather than running it all in one job. More intelligent algorithms and a better approach to doing this would work - but involves lots of some clever persons time.

3) The efficiency of the UTStats code is another killer, since the queries to the database can easily overload the server or lock MySQL. With proper "crunching" of the raw data into views, the stats would be static pages with all the intensive work already done, and the views cached by the webserver or similar - to allow lots of concurrent requests.

I think from my limited knowledge of this, there are 2 options:
  1. Rewrite of the UT Stats system to be able to scale better. Cheap but difficult/time consuming.
  2. Keep the UT Stats code as it is and throw more money/hardware resources at it. Expensive but easier.
 
plse


plse



plse



Keep the stats alive! I need the stats. Stats own. Stats are naise. Stats give me a meaning in life








If the huge ammount of data is the problem, why not delete the old data, and have the server reset all stats each 4 months. I mostly care bout leage matches anyway--> who got most kills, who got most obj, whos using hammers etc. Pug and public stuff is not importent. Neither is the freakin table on who had the most points last year. UTstats for me is just naise too have a closer look at the past match in order to 1) determine a man of the match :D (mainly me offcourse) and 2) to learn from it for future matches.


Just my humble opinion.

Ps. im no brainer with coding and shit but thnx atleast for keeping the stats alive for as long as they have. Gg admins!