'tighten' the rating sys?

Guest

automatic checking

Post by Guest »

Couldn't we use the results of logged-in users with specified ranks for an automatic checking of these? Take a few problems for each percentage, and look at which logged-in users have solved them correctly. Then put those users in a list from strongest to weakest, and check that place in the list. For example, if 44 out of 131 logged-in users with specified rank solved the problem correctly, take the strength of the 44th user as the difficulty of the problem (or adapt to get another percentage than 50% to solve the problem correctly at its level). Doing this with a number of problems on all applicable percentages will get a percentage-to-grade scale that is more than just a random guess.


{Posted by }
admin

Post by admin »

[quote]
Couldn't we use the results of logged-in users with specified ranks for an automatic checking of these? Take a few problems for each percentage, and look at which logged-in users have solved them correctly. Then put those users in a list from strongest to weakest, and check that place in the list. For example, if 44 out of 131 logged-in users with specified rank solved the problem correctly, take the strength of the 44th user as the difficulty of the problem (or adapt to get another percentage than 50% to solve the problem correctly at its level). Doing this with a number of problems on all applicable percentages will get a percentage-to-grade scale that is more than just a random guess.

[/quote]

i've thought about this kind of thing, going off of reported user ratings, but it invites all kinds of problems with getting users to enter accurate ratings. that's why i went with a closed system.

adum


{Posted by admin}
tristesse

Post by tristesse »


Hi.

First off, excuse my temerity.

That out of the way, in reply to the two last posts and only the two last posts:

Have each registered user enter a rating and where it is from if she knows it, or "unknown" if she is not.

Convert this to an internal linear scale (like ELO or the old -100 to 100; anything where small adjustments could be made). This mapping could be fashioned by a dozen people here that are certain of their rank in various forms (Euro, KGS, DGS, IGS, AGA, Japan, etc) and thus fashioning a crude linear scale. The entry-point shouldn't matter that much in the long run anyway.

Adjust both user and problem rating slightly after attempts.

E.g. a 1 kyu attempts a 1 dan problem:

She succeeds: her rating is adjusted up by a tiny fraction, and the problem's rating is adjusted down by a tiny fraction (this possibly based on number of rated attempts).

She fails: rating is adjusted down by an even tinier (approaching the infinitesimal as the difference between the user's and the problem's rating increases) amount. (This is necessary, otherwise a user could simply attempt only problems above their level and never decrease.) Problem is likewise adjusted up by a mote. (This is also necessary, as most of a problem's valiant challengers will be slightly below that problem's level because of this thing called "ambition".)

Likewise for a 1 dan user and a 1 kyu problem, only reversed.

Where the user and problem differes by more than, say, 5 stones (converted to the linear scale), it does not affect either rating.

Users with "unknown" could get a rating after, say, they have completed a run of 20 random problems of varied difficulty, simply interpolating their rating by what they solved...

OR all "unknown" users could start out at a nominal 0.00 (or whatever the system-wide average is) and one's adjustment after fail/success is linear with how many rated problems one has completed. If 10, then the adjustment is severe, if 500, then the adjustment is itsy-bitsy teensy-weensy. (But if this, then there must be a minimum, so that after, say, 500, the adjustments do not decrease anymore.)

NB: there should be two different ratings associated which each problem, one for "normal run" and one for "time trial". I.e. problems that require a lot of reading will be rated higher when it requires to be solved quickly because the stronger player's experience allows him to forego or speed up the reading, but with unlimited time stubborn double-digit kyus might also adamantly work their way to the solution.

OR just use time trial results to adjust rating, period. (Because some users might solve problems fast even with a logical search set.) Actually, I'm in favor of this, but then problems' ratings might be found too low for the stubborn time-employing thinkers' tastes...

On worry: users have no reason to "cheat" or sabotage and enter idiotic ratings. In fact, you could add features such as "get problems suitable for me" etc. which would make it even more pointless to do such a thing.

Site-wide ratings should adjust dynamically for problems as well as users... In theory.

Finally, an idea I consider solid: hide users' goproblem.com rank from them! Then there can be no sense of competition, and it might cull people's obsession with their personal position in the big graph of go players... Besides, users can get a rank graph kick on ten other go playing sites like KGS and DGS, so what's the point. "Get problems suitable for me" will still work fine, so there is no reason to complain, and the problem's ratings might also be hidden in favor of "problems way above my level", "problems below my level" etc difficulty-levels...

This would be in the name of convenience and service, eroding the obsession over relative strength. :)

Anyway, just some thoughts. I know, it sounds like a complete server-side rewrite, and I, for one, would be willing to do whatever I can to help with programming and such. I have some experience with such things from various programming languages (though not Java yet), so anything I can do... This is such a great site, and I love it.




{Posted by tristesse}
tristesse

Post by tristesse »


Further note on this,

There are a a ton of problems that are rated 30 kyu now, that are obviously not 30 kyu level. E.g. crane in the nest (involving a throw-in, snapback, and finally atari-connect!), and some damezumari patterns etc.

30 kyus are players who doesn't know ladders or false eyes or anything like that. They're the guys that try to kill all your stones by surrounding them with one contact play after another. How are they supposed to be able to read snapback and shortage of liberties patterns??

(I think the problem is that a lot of us have grave trouble with remembering when we were at this level, or that we would like to deny that we ever were... ;))

I have seen Graded Go Problems for Beginners Vol. 1 (30-25 kyu) and the problems there are of the type "capture some stones" when 2 stones are already in atari. The "advanced" problems involve ladders and straight-forward semeais. The problems where Black has a 3-point single eye and it's White's turn to kill -- those are problems for 30 kyus!

This might not be a problem for the rest of us yet, but it is probably happening on on the 20, 18, 15, etc kyu levels too. If the general drift continues I think miai problems to hit 30 kyu next (simple nets and short ladders have already fallen). :P

And now for the point of this long, rambly post, to throw yet another idea into the mix: If the dynamic rating stuff (where problems are basically treated as virtual opponents on a game server) is too much work, or the "get problems suitable for me" is too mystic, maybe have a set of anchor problems (possibly taken from Graded Go Problems or similar books?) "fixed" at a certain kyu/dan level and the other kyu/dan levels are adjusted to make it as linear as mathematically possible.



{Posted by tristesse}
Yxven

Post by Yxven »

I think people like me are what throws off the rating system. I've been using this site since I started playing Go. My rating history goes from -65.9 to 21. I recently created a new account to "reset" the previously tried/solved indicators, and that account ranks from 49.6 - 54.4.



{Posted by Yxven}
LCZLAPINSKI

Post by LCZLAPINSKI »

[quote]

Further note on this,

There are a a ton of problems that are rated 30 kyu now, that are obviously not 30 kyu level. E.g. crane in the nest (involving a throw-in, snapback, and finally atari-connect!), and some damezumari patterns etc.

30 kyus are players who doesn't know ladders or false eyes or anything like that. They're the guys that try to kill all your stones by surrounding them with one contact play after another. How are they supposed to be able to read snapback and shortage of liberties patterns??

(I think the problem is that a lot of us have grave trouble with remembering when we were at this level, or that we would like to deny that we ever were... ;))

I have seen Graded Go Problems for Beginners Vol. 1 (30-25 kyu) and the problems there are of the type "capture some stones" when 2 stones are already in atari. The "advanced" problems involve ladders and straight-forward semeais. The problems where Black has a 3-point single eye and it's White's turn to kill -- those are problems for 30 kyus!

This might not be a problem for the rest of us yet, but it is probably happening on on the 20, 18, 15, etc kyu levels too. If the general drift continues I think miai problems to hit 30 kyu next (simple nets and short ladders have already fallen). :P

And now for the point of this long, rambly post, to throw yet another idea into the mix: If the dynamic rating stuff (where problems are basically treated as virtual opponents on a game server) is too much work, or the "get problems suitable for me" is too mystic, maybe have a set of anchor problems (possibly taken from Graded Go Problems or similar books?) "fixed" at a certain kyu/dan level and the other kyu/dan levels are adjusted to make it as linear as mathematically possible.


[/quote]
1. I think part of the problem is that most of the users on the site are higher than 30k especially if they continue to use the site.
2. When players start using the site they may not even have a rating. If a player puts in 30k and then doesn't update his rating as it changes this would have a big effect.
3. Using only the results of rated players could improve the accuracy of the ratings some. Players ratings change too. Also players usually don't update their rating every time their rating changes since their ratings can fluctuate frequently.
4. There is also likely to be variation between various rating systems.
5. The low end problems tend to be rated too low.
6. The high end problems tend to be rated too high.
7. There is also the problem of how quickly if at all a problem gets fixed after someone reports corrections. Missing correct paths or paths marked correct which should be wrong can affect the problem rating a lot.
8. The problem ratings have more to do with ratings compared to other problems than with player ratings.
9. Ideally the problem ratings would have an internal number (something similar to chess' elo rating system). Then the number would be adjusted based on the probability of the problem solver solving it. 1 dan solving a 5k problem should have little effect on the problem rating.
10. Don't use player ratings which are outside the real range for that system if known.


{Posted by LCZLAPINSKI}
Post Reply