Colin Champion, 13 Feb 2023

procedure : results : software : index

This page describes an improvement on my earlier semi-Bayesian election solver. The difference is that the voter distribution is now a mixture of 3 Gaussians rather than 3 discs. This was rather harder to program but the results are more satisfying.

The spatial model is as follows: An infinite number of voters satisfy a distribution which is a mixture of 3 components, each of which is an offset standard bivariate Gaussian. The Gaussian means themselves come from a zero-mean standard bivariate Gaussian. 4 candidates are drawn randomly from the voter distribution.

The parameters for each election therefore comprise 3 Gaussian means and 4 candidate locations, each of which has 2 degrees of freedom; but the entire system is invariant under translation and rotation, so the model has only 11 degrees of freedom.

Given the parameters of an election we can compute the proportion of ballots which will rank the candidates in each of 24 possible orderings. These 24 numbers are passed to two voting functions, Condorcet/Minimax and the spatical solver; and each voting method reports the candidate it thinks should be elected (the Condorcet/Minimax function also provides Condorcet bounds). The accuracies of each method are recorded over 500 simulated elections.

The advantage of this model over a mixture of circular discs lies in the absence of unpopulated regions. Each of the 10 rankings of the candidates is adopted by voters in a certain region of the plane (this region being a “polytope”). If the voter distribution is a mixture of discs then some of these regions will be unpopulated, and the implied equation becomes uninformative.

In principle this difficulty disappears under a GMM; in practice, it’s just concealed. The hillclimb minimises the sum of squared errors; each error is the difference between the proportion of voters in the region supporting a given ranking and the proportion of ballots cast that way. If a region is sparsely populated then the absolute error will be small even if the proportional error is significant, so the statistic is dominated by the heavily populated regions.

Good and Tideman ruminate on the possibility of hillclimbing on a sum of squared proportional errors. This would bring difficulties of its own, since it would be necessary to integrate the Gaussian distribution over a polytope to within a given proportional tolerance, and this is numerically difficult in the tails of the distribution.

There’s nothing much to say. Exact results cannot be obtained. Instead an increasingly good approximation is reached by tightening the various tolerances and increasing the number of random starts for the hillclimb. My theoretical prediction is that when the hillclimb is made sufficiently accurate the Bayesian solver will outperform all possible Condorcet methods, and that as it is improved further, it should approach 100% accuracy.

I performed a series of increasingly expensive runs and eventually reached a point at which the Bayesian solver beat the Condorcet upper bound by a convincing margin. At this point each hillclimb was given 60 random starts and a tolerance of 5×10–4 (the absolute tolerance for the Gaussian integrals over polytopes was 3×10–6). After simulating 500 elections I found that all Condorcet methods gave 92·5% accuracy and the Bayesian solver gave 98·1%. Each simulated election took 5' 30" on my laptop.

spatialtest.c : spatialsolver.c

The software is distributed over a number of pages, as shown by the download.

spatialtest.c spatialsolver.c 
gaussint: gaussint.c 
utils: ranf.c random.h memory.h quadrate.c quadrate.h rtlnorm.c log1plus.c besseli.c
hillclimbing: newuoa.c mdplib.c linmax.c 

The call is:

spatialtest ntrials nstarts trialstart

allowing you to specify the number of trials to perform (default 1000) and the number of random starts (default 100). The third argument allows you to skip the random seeds associated with the first trialstart trials. Thus you might perform 200 trials by running

spatialtest 200 20

but you could perform exactly the same tests in two batches by running

spatialtest 100 20
spatialtest 100 20 100

I compile with the following line:

g++ -o spatialtest -O -g spatialsolver.c spatialtest.c newuoa.c mdplib.c ranf.c gaussint.c rtlnorm.c besseli.c log1plus.c quadrate.c linmax.c

• radians     • minimax     • main

#include "memory.h"
#include <math.h>
#include "random.h"

static double radians() { return 2 * 3.1415926535897932384626433832795029 ; }

static double radians(double x,double qlo) 
{ static double pi=radians()/2 ; 
  while(x>=qlo+2*pi) x -= 2*pi*(int)((x-qlo)/(2*pi)) ; 
  while(x<qlo) x += 2*pi*(int)((qlo+2*pi-x)/(2*pi)) ; 
  return x ; 
}
static double radians(double x) { return radians(x,0) ; } 

int factorial(int m) ;                // this is in condorlib
void decompress(int r,int *x,int m) ;
void genelection(xy *v,int nv,xy *c,int nc) ;
int getlosses(xy *v,int nv,xy *c,int nc,double *x) ;
int spatialsolver(int **,double *,int ,int ,double *,int ) ;
void getvotes(xy *v,int nv,xy *c,int nc,int **b,double *freq) ;
static double pi=radians()/2 ; 

/* -------------------------------- minimax --------------------------------- */

int minimax(int **bal,double *freq,int nbal,int m)
{ int i,j,t,balno,*b,imax ;     
  double v,**r=matrix(m,m),*score=vector(m) ; 

  for(balno=0;balno<nbal;balno++) for(v=freq[balno],b=bal[balno],i=0;i<m-1;i++)
    for(j=i+1;j<m;j++) r[b[i]][b[j]] += v ; 
  for(i=0;i<m-1;i++) for(j=i+1;j<m;j++) 
  { v = r[i][j] - r[j][i] ; r[i][j] = v ; r[j][i] = -v ; }

  for(i=0;i<m;i++) for(score[i]=1<<24,j=0;j<m;j++) // arbitrary large number
    if(i!=j&&r[i][j]<score[i]) score[i] = r[i][j] ; 
  for(v=imax=t=0;t<m;t++) if(t==0||score[t]>v) { imax = t ; v = score[t] ; }
  freematrix(r) ; free(score) ; 
  if(v<0) return imax | (1<<24) ; else return imax ; 
}
/* ---------------------------------- main ---------------------------------- */

int main(int argc,char **argv)
{ int ntests=argc<2?1000:atoi(argv[1]),nstart=argc<3?100:atoi(argv[2]) ;
  int i,j,nv=3,nc=4,N=factorial(nc),testno,rightful,winner,ndi ; 
  int **b=imatrix(N,nc),tally[]={0,0,0,0},itestno=argc<4?0:atoi(argv[3]) ; 
  double q,*freq=vector(N),*loss=vector(nc),*bayesloss=vector(nc) ; 
  xy *V=xyvector(nv),*C=xyvector(nc) ;
  char *algs[]={"condorcet lb","minimax","condorcet ub","spatial solver"} ;

  for(i=0;i<N;i++) decompress(i,b[i],nc) ; 

  for(testno=itestno;testno<ntests+itestno;testno++)
  { ranset(testno) ; 
    genelection(V,nv,C,nc) ; 
    getvotes(V,nv,C,nc,b,freq) ; 
    for(ndi=i=0;i<N;i++) if(freq[i]) ndi += 1 ; 
    rightful = getlosses(V,nv,C,nc,loss) ; 
    winner = minimax(b,freq,N,nc) ; 
    if(winner==rightful) tally[0] += 1 ;               // condorcet lb
    if((winner&0xffff)==rightful) tally[1] += 1 ;      // minimax
    if(winner==rightful||(winner>>24)) tally[2] += 1 ; // condorcet ub
    winner = spatialsolver(b,freq,nv,nc,bayesloss,nstart) ; 
    if(winner==rightful) tally[3] += 1 ; 

    for(q=i=0;i<nc;i++) 
    { q += (loss[i]-bayesloss[i])*(loss[i]-bayesloss[i]) ;
      printf("(%.3f:%.3f) ",loss[i],bayesloss[i]) ; 
    }
    printf(" [%.3f] | %d",sqrt(q/nc),rightful) ; 
    if(winner!=rightful) printf("(%d)\n",winner) ; else printf("\n") ; 
    if(testno%10==9) 
    { printf("%d: ",testno+1) ; 
      for(i=0;i<4;i++) 
        printf("%s=%.1f%%%s",algs[i],
               (100*tally[i])/(testno+1.0-itestno),(i==3?"\n":"; ")) ;
    }
  }
}

• euclid2     • euclid     • decompress     • L12     • factorial     • flatten     • unflatten     • genelection     • getvotes     • getlosses     • validity     • spatialsolver

#include "memory.h"
#include <math.h>
#include "random.h"

static double euclid2(xy A,xy B) 
{ return (A.x-B.x)*(A.x-B.x) + (A.y-B.y)*(A.y-B.y) ; }
static double euclid(xy A,xy B) { return sqrt(euclid2(A,B)) ; }
static double euclid2(xy A) { return euclid2(A,xy()) ; }
static double euclid(xy A) { return euclid(A,xy()) ; }

double besseli(int n,double x) ;
double uoamin(double (*func)(double*),double *x,int n,
              double xinc,double tol,int maxiter) ;
int uoacond() ; 
double gaussint(double **lt,double *rhs,int nlt) ;
xy gaussint(double **lt,double *rhs,int nlt,double tol,int maxeval) ;

void decompress(int r,int *x,int m) // inverse of compress
{ int i,j,k,buf[25] ;
  div_t res ; 
  buf[0] = x[m-1] = 0 ;
  for(i=1;i<m;i++) 
  { res = div(r,i+1) ; r = res.quot ; x[m-i-1] = res.rem ; buf[i] = i ; }
  for(i=0;i<m;i++) 
  { k = buf[x[i]] ; for(j=x[i];j<m-i-1;j++) buf[j] = buf[j+1] ; x[i] = k ; }
}
static double L12(double x)
{ return exp(x/2) * ( (1-x)*besseli(0,-x/2) - x*besseli(1,-x/2) ) ; }
int factorial(int m) { int i,t ; for(t=i=1;i<=m;i++) t *= i ; return t ; }

// it's nice to store election parameters as meaningful structures, but the
// hillclimb needs a flat array of doubles

static void flatten(xy *V,int nv,xy *C,int nc,double *x) 
{ int i,k ; 
  for(k=i=0;i<nv-1;i++) { x[k++] = V[i].x ; x[k++] = V[i].y ; }
  for(i=0;i<nc;i++) { x[k++] = C[i].x ; x[k++] = C[i].y ; }
}
static void unflatten(double *x,xy *V,int nv,xy *C,int nc) 
{ int i,k ; 
  for(k=i=0;i<nv-1;i++) { V[i].x = x[k++] ; V[i].y = x[k++] ; }
  for(V[nv-1]=xy(),i=0;i<nv-1;i++) 
  { V[nv-1].x -= V[i].x ; V[nv-1].y -= V[i].y ; }
  for(i=0;i<nc;i++) { C[i].x = x[k++] ; C[i].y =  x[k++] ; }
}
/* ------------------------------- genelection ------------------------------ */

// this generates the parms for an election, namely the locations in space of
// the nc candidates and of the centres of the 3 voter gaussians (the latter 
// being adjusted so that the overall voter mean is 0).

void genelection(xy *v,int nv,xy *c,int nc)
{ int vno,cno ; 
  double h,theta,x,y ; 
  for(y=x=vno=0;vno<nv;vno++) 
  { v[vno] = xy(gaussv(),gaussv()) ; x += v[vno].x ; y += v[vno].y ; }
  for(vno=0;vno<nv;vno++) v[vno] = xy(v[vno].x-x/nv,v[vno].y-y/nv) ;
  for(cno=0;cno<nc;cno++)
  { vno = rani(nv) ; c[cno] = xy(gaussv()+v[vno].x,gaussv()+v[vno].y) ; }
}
/* -------------------------------- getvotes -------------------------------- */

// this computes the proportion of ballots cast in each of the 24 possible ways
// from the election parameters.

void getvotes(xy *v,int nv,xy *c,int nc,int **b,double *freq)
{ int bno,cno,vno,eqno,N=factorial(nc) ;
  double q,x,y,**lt=matrix(nc-1,2),*rhs=vector(nc-1) ;
  xy C0,C1 ; 

  for(bno=0;bno<N;bno++) 
    for(decompress(bno,b[bno],nc),freq[bno]=vno=0;vno<nv;vno++)
  { for(x=v[vno].x,y=v[vno].y,eqno=0;eqno<nc-1;eqno++)
    { C0 = xy(c[b[bno][eqno]].x-x,c[b[bno][eqno]].y-y) ; 
      C1 = xy(c[b[bno][eqno+1]].x-x,c[b[bno][eqno+1]].y-y) ; 
      lt[eqno][0] = 2 * (C1.x-C0.x) ; 
      lt[eqno][1] = 2 * (C1.y-C0.y) ; 
      rhs[eqno] = C1.x*C1.x + C1.y*C1.y - C0.x*C0.x - C0.y*C0.y ; 
    }
    freq[bno] += gaussint(lt,rhs,nc-1,-3e-6,20000).x ; 
  }
  freematrix(lt) ; free(rhs) ; 
  for(q=bno=0;bno<N;bno++) q += freq[bno] ; 
  for(bno=0;bno<N;bno++) freq[bno] /= q ; 
}
/* -------------------------------- getlosses ------------------------------- */

// this computes the mean voter distance from each of the nc candidates.
// see https://en.wikipedia.org/wiki/Rice_distribution

int getlosses(xy *v,int nv,xy *c,int nc,double *x)
{ int cno,vno,i,winner ;
  static double pi = 3.1415926535897932384626433832795029 ; 
  for(cno=0;cno<nc;cno++) for(x[cno]=vno=0;vno<nv;vno++) 
    x[cno] += sqrt(pi/2) * L12(-euclid2(c[cno],v[vno])/2) / nv ;
  for(i=0;i<nc;i++) if(i==0||x[i]<x[winner]) winner = i ; 
  return winner ; 
}
/* --------------------------------- validity ------------------------------- */

static int **truebal,NV,NC ; 
static double *truefreq ; 

// this computes the mean square discrepancy between the ballot frequencies 
// implied by the election parameters in x (in flattened form) and the 
// observed frequences in the static array truefreq. It divides the result by 
// the proby of the candidate locations.

static double validity(double *x)
{ int i,j,N=factorial(NC),**b=imatrix(N,NC) ; 
  double q,qmin,qdist,r,*freq=vector(N) ;
  xy *V=xyvector(NV),*C=xyvector(NC) ;
  unflatten(x,V,NV,C,NC) ; 
  getvotes(V,NV,C,NC,b,freq) ; 
  for(q=i=0;i<N;i++) 
  { for(j=0;j<NC;j++) if(b[i][j]!=truebal[i][j])
      throw up("faulty ballot no %d",i) ; 
    q += (freq[i]-truefreq[i]) * (freq[i]-truefreq[i]) ;
  }
  free(freq,V,C) ; freeimatrix(b) ; 
  return q ; 
}
/* ------------------------------- spatialsolver ---------------------------- */

int spatialsolver(int **bal,double *freq,int nv,int nc,double *loss,int nstart)
{ int i,winner,tryno,nbal=factorial(nc),n=2*(nc+nv-1) ; 
  double *x=vector(n),q,qmin ;
  xy *V=xyvector(nv),*Vmin=xyvector(nv),*C=xyvector(nc),*Cmin=xyvector(nc) ;

  truefreq = freq ; 
  truebal = bal ; 
  NC = nc ; NV = nv ; 
  ranset(1<<24) ; // for reproducible results

  for(tryno=0;tryno<nstart;tryno++)
  { genelection(V,nv,C,nc) ; 
    flatten(V,nv,C,nc,x) ; 
    q = uoamin(validity,x,n,3e-2,5e-4,100000) ;
    if(tryno==0||q<qmin) { qmin = q ; unflatten(x,Vmin,nv,Cmin,nc) ; }
  }
  winner = getlosses(Vmin,nv,Cmin,nc,loss) ;
  free(x,V,Vmin,C,Cmin) ;
  return winner ; 
}