This paper:
On the Average Similarity Degree between Solutions of
Random k-SAT and Random CSPs
by
Ke Xu and Wei Li
claim that other papers which I'm not bothering to read have established the bounds of the phase transition factor at 3.145 to 4.602. What a pain in the ass. I am going to use 4.24 for now to get a emperical data to extrapolate the runtime complexity of my algorithm in the average case. This means, though, that in all this I am going to have to play with 2 variables when I eventually try to be thorough.
Lets see...for a problem with 64 variables...there are 64-choose-3, or 41,664 possible clauses.
How many problems are there which contain 4.24n or 271 clauses? So...41664 choose 271:
2072331229310765793377672466034020417636278749137269154995246883568702527021961119330338444155265251631068024493999124533647240936284157676458720837336901229107673520873559441276388455770161315227519330469504702375894948548792917160195998669181535033528022164704312214959828265810771630039781426010241744128982277487082137761437258200076747922608614240395600874853517856787979122329993359135296560199897886389786705354539414437639886905745049531129463397610857195300866636502701516169406583772872284149712200483699807594302353726301355389836944568555083818470495508270586692266585517132868094261950528874724887787195704874995733241769392124448648812676324946824216985434343034624394856095456349420769250043520
Here is a more manageable number:
2.whatever * 10 ^ 708
And here is the fun part. That number is wrong. I forgot about negated literals. Those are difficult to add in, but I think we can just multiply by 8. Damn I just got a craving to play descent 3. Anyway. So...
333,312 possible clauses. Damn! Did I do this right?
2.4... * 10 ^ 953
Obviously saying that every variable must be represented will reduce that space. Just thinking about calculating that is giving me a headache.
One variable missing
So you really have n = 63. 63 choose 3 = 39711. (64 3)-(63 3) = 1953. But there is one for each variable! So its 64*((64 3)-(63 3)) = 124992 possible clauses that will be omitted...but not necessarily at the same time. Shit.
If one variable is missing, you're down 1953 clauses and have only 331359. How many problems exist for that variable missing? (331359 choose 271) = 4.97... * 10^952. And there are n of those cases. And that is just when only one variable is missing.
Oh I did this wrong. If you multiply 64 * (331359 choose 271) you get 3.18 * 10 ^ 954. That is greater than the total number of possible problems I previously calculated.
It is probably because I failed to account for the times when more than one variable may be missing...I'm double counting most of the instances.
Also, I think I failed to account for the negated literals of the missing variable. Yeah. The math in this section is basically wrong.
Back to formula
So, now, actually, I just want to know how many there are.
((n choose 3) choose 4.24n)
((32 choose 3) choose 135) = 2.58105...*10^267.
((16 choose 3) choose 67) = hey it has a name!
6 octovigintillio. Who bothers naming 10^80-ish? Does that really come up a lot?
Anyway:
6022078479523219491126506063987820512839398152713862982670624142131867189974969798288000
I dont think that will fit in a long. Wait!!! We can cheat more. We can lower the factor as well. Lets use 3.1 instead.
12 variables, and 3.1*12=37 clauses leads to:
1370125002560566889654596797962961206718960
Oh, yeah, a tredecillian.
I can generate one million of those problems in 281s. Ok thats not happening, unless my constraints for problem generation reeeeally make a difference.
Oh shit, I did the math wrong again. It needs to be this formula:
(((n choose 3)*8) choose 4.24n)
Keep forgetting about those damn negated literals.
In Summary
1) I am bad at math.
2) Every time I think about testing one of these algorithms it always devolves into trying to calculate the total possible number of problems you could have for a given number of variables.
3) I really need to get over the fact that a thousand or a million sample problems does not constitute a thorough representation of the problem space.
4) Lets go with the following generation parameters for n, m and count:
n= 32,48,64,80,96,112,128
m=4.24n
count=1000
Problems of size 125 take hours to complete, but I can in theory manage 500 or 1000 many ec2 instances. Actually I could do more like 200 instances...idk. Anyway. This should be enough datapoints to see if the curve looks polynomial.
Least Squares Calculator
Subscribe to:
Post Comments (Atom)

0 comments:
Post a Comment