Wednesday, June 26, 2013

[coding] Unsatisfied

I have a large project at work that is a big deal.  My manager is on vacation, and the senior PM who was filling in for the PM who left the company is about to go on vacation.  The one tester on the project has a habit of staring into space blankly whenever I explain something.  It will shortly become the wild west here.

In addition, I am attempting to get fit.  First I ran until my legs hurt constantly, then I biked until my legs hurt constantly, and then I started lifting until everything hurt constantly.  I've been working with a personal trainer and trying to figure out just how hard I can push myself without breaking everything.

In addition, this side business is consuming my life.  It is so much more work that I ever want to do, but I have to do it because I committed and because I want to make money and quit my day job.  Oh and my business partner is super emotional and I have to learn people skills now and also some people we almost took on as partners tried to screw me out of the business.

In addition, I am single again and breathing requires ten times the amount of effort it used to.  In addition, I have less sleep now due to what I assume is mild loneliness.  And now I have to spent a shit ton of time stalking old romantic interests on facebook, destroying my ability to attract mates by giving myself a haircut, and going on dates with girls from PlentyOfFish and NOT OkCupid based on a profile I am too lazy to update because I don't want to bother updating until I work out for a few months and don't look skinny and also buy all new clothes or something.

Also I'm in a kickball league and actively trying to continue my transformation into an extrovert.  It is a challenge.  Oh and kicking a ball is fun too.

Also I am addicted to reading /r/fatpeoplestories on reddit.  This is closely related, and probably a large cause of, the second paragraph of this post.

Therefore, what is it do you think I am spending my time on?  What would be wisest?  Yeah.  3-SAT.  Fucking hell.  When I die my tombstone will probably read "dedicated most of his life in the service of an ultimately flawed belief concerning a mathematical thingy no one cares about."  Anyway.

I was in the shower a week or a few days ago, thinking about my 3-SAT algorithm, and wondering how I could find an instance of 3-SAT that breaks it.  Unfortunately, as if part of some kind of weird addiction, trying to think about how to break it ultimately leads to just thinking about how it has to work.


Then I thought:  why not just convince other people to break it for me?  The world is mostly full of two kinds of people:

1) people who have never heard of 3-SAT
2) people who think P != NP

Its the second group of people that roll their eyes everyone another dumbass tries to announce an algorithm that they think is polynomial time for all instances of 3-SAT.  I think those people can help me out.  I have dedicated hundreds of hours into becoming an expert in a single, crazy approach that makes no sense to other people.  I have basically zero knowledge of breaking 3-SAT algorithms.  However, all of these skeptics in category 2 could fill in that gap for me.  They just need to be inspired to want to help instead of just rolling their eyes and relegating me to the computer science equivalent of someone who claims bigfoot exists but can't prove it.

My first idea:  a contest.  Just throw up a website with an upload button, and promise $X to the first person who can break my algorithm.  Obviously we would need some limits on submission rates per IP, and on problem size and so on to save computing costs.  I can write up some clues (that they should obviously take with a grain of salt), such as saying its a clause-finding algorithm and probably weaker for lower c/v ratios, and make the algorithm print out a number from 0 to 2*(# variables) which indicates how much difficulty it had with an instance, allowing people to fine tune whatever generators they are using to make it more difficult.

Is this a stupid idea?  Probably.  Should I instead be spending my time reading the literature and actually learning how other people are trying to solve this?  Probably.  Am I actually going to do that?  Probably not.


Also, I still have the 1.001 billion 3-SAT instances on which I spent nearly $1000 to generate with Azure instances.

So...all I need is some kind of cheap storage, queue and workflow system that can handle billions of 3-SAT instances.  And this time around I'm not paying for 20 virtual machines to do the processing.

I do have one of those cheap-ass web hosting services, where they promise you "unlimited" shit and cancel your account if your usage doesn't follow their idea of what you should use it for.  I'm already paying for it for other reasons, so I might as well re-use it here.

Also, I need to test drive CloudMine to see if it will work for the skin care app.

Therefore, I sat around for a few hours dreaming this up:

Heterogeneous, cheap-ass 3-SAT storage system for billions of instances


Cleanup

0.  remove duplicates, 2-clauses, 1-clauses, and shuffle so there are no missing variables
1.  swap variables so highest count first (optional)
2.  swap literals so highest count first (optional)
3.  sort literals in clauses by variable # (optional)
4.  sort clauses (optional)


Sharding

Sharding is a fancy ass word for storing things in multiple databases when one database isn't large enough.  We shard on:
1.  v = number of variables
2.  c = number of clauses
3.  counts.  Counts is an descending-ordered list of the counts of each variable.  In addition to listing just the counts of each variable, we also list the higher of the counts of the positive and negative literals.
4.  guid hash.  if we still need more sharding after all that (i.e. if we have millions of instances with the same histogram of variable and literal counts) then just use a hash of the guid that we use to identify each instance.  This is lame though.

5.  Craziness:  if you sorted and ordered the clauses and literals well enough (see Cleanup), you could actually just create a single array of all the literal values for every instance and use that to construct a tree-based index where the nodes of the tree are the actual values in the list.  This wouldn't actually remove all duplicates (duplicate detection should be NP-Complete) however  it would eliminate most of the waste that comes from our poor representation of 3-SAT instances.

Instance GUID

probably something like:  

v.c.vcount0.vliteralcount0.vcount1.vliteralcount1....date.host.randomNumber

v is # variables, c is # clauses, then see #3 in previous section, and host is the host that generated it.


Android App

I should make an android app that generates 3-SAT instances, so that when I am bored standing in the lunch line I can generate an instance according to some parameters, upload it, and look at the results.  Normally I use Duolingo to learn spanish when I'm bored, but the cafeteria is too loud for that.


The Data Store

Obviously all of these tables probably need tombstone fields to avoid ever deleting a row.  Also obviously these tables may be in different databases.

instance_shards
This table only exists in one place

shard_version (increment when new sharding scheme)
v_min
c_min
varcount_min = null (string)
instance_id_hash = null

instances
Many tables exist.  Consult instance_shards to know which one.

instance_id (guid)
v
c
literal_counts
instance_content_location  (url pointing to the actual cnf file)
execution_results_location  (database table with execution details - null when not solved yet)
tag_id (tells you about where this instance came from)


execution_results
Many tables exist.  Might have sort of a 1:1 relationship with instances...not sure.

execution_id (auto increment)
instance_id
result = Unsat, Sat, Fail (0,1,2)
difficulty = 0...2v
details_location = null (reserved for future use)

Details might include algorithm version, host, how many jumps were fixed by filters, etc.


instance_contents
first version:  just a database table
second version:  maybe cloudmine
or s3
or a fileserver at my apartment
the main computer of the U.S.S. Enterprise
carrier pigeons

queue_table
Also, I'll need  queue table when I get around to implementing a portal for people to submit their attempts to break the algorithm.


No comments:

Post a Comment