Wednesday, November 7, 2012

Solving Travelling Salesman Problem with JgraphT and JGAP libraries



Recently, I came across a real life problem of TSP. In this problem, user is assigned a list of cities to visit, but is not required to come back. At end point, he is again assigned to visit new set of cities.

This turns out to be a Hamiltonian path problem and not Hamiltonian cycle. I evaluated 2 different libraries for it namely JgraphtT and JGAP.

JgraphT seems to use MST-prim based approximation algorithm, where Cost of tour in worst case is 2*optimalCost. It computes in polynomial time and is very fast. To solve Hamiltonian path problem, you can add an extra vertex say city 0. Next add edges from all vertices to city 0 with weight 0 (Remember input graph has to be a complete graph). After computation remove the city 0 and all the corresponding edges.
Code for this is: 

import java.util.ArrayList;
import java.util.List;
import java.util.Set;

import org.jgrapht.alg.HamiltonianCycle;
import org.jgrapht.graph.DefaultWeightedEdge;
import org.jgrapht.graph.SimpleWeightedGraph;


/**
 * Using jgrpaht library
 * 
 * @author pratyush
 * 
 */
public class HamiltonianCycleHelper {

 private SimpleWeightedGraph<Long, DefaultWeightedEdge> g = new SimpleWeightedGraph<Long, DefaultWeightedEdge>(
   DefaultWeightedEdge.class);

 public void addVertex(Long id) {
  g.addVertex(id);
 }

 public void addVertex(List<Long> ids) {
  for (Long id : ids) {
   g.addVertex(id);
  }
 }

 public void addEdge(Long source, Long destination, Long weight) {
  DefaultWeightedEdge edge = g.addEdge(source, destination);
  g.setEdgeWeight(edge, weight);
 }

 public List<Long> execute() {
  Set<Long> vertices = g.vertexSet();
  addVertex(0l);
  System.out.println(vertices);
  for (Long v : vertices) {
   if (v.longValue() == 0)
    continue;
   DefaultWeightedEdge edge = g.addEdge(0l, v);
   g.setEdgeWeight(edge, 0);
  }
  List<Long> output = HamiltonianCycle
    .getApproximateOptimalForCompleteGraph(g);
  output.remove(Long.valueOf(0l));
  return output;
 }

 public static void main(String args[]) {
  List<Long> vertices = new ArrayList<Long>();
  vertices.add(1l);
  vertices.add(2l);
  vertices.add(3l);
  HamiltonianCycleHelper h = new HamiltonianCycleHelper();
  h.addVertex(vertices);
  h.addEdge(1l, 2l, 1l);
  h.addEdge(1l, 3l, 5l);
  h.addEdge(2l, 3l, 3l);
  List<Long> output = h.execute();
  System.out.println(output);
 }
}

JgraphT results were good, but I wanted worst case cost to be better. Even though, I don’t know about genetic algorithms, I tried out JGAP library examples and it worked pretty well. If you check the example source code, you can modify the SalesmanFitnessFunction.java by commenting the return path:

protected double evaluate(final IChromosome a_subject) {
  double s = 0;
  Gene[] genes = a_subject.getGenes();
  for (int i = 0; i < genes.length - 1; i++) {
   s += m_salesman.distance(genes[i], genes[i + 1]);
  }
  // add cost of coming back:
  //s += m_salesman.distance(genes[genes.length - 1], genes[0]);
  return Integer.MAX_VALUE / 2 - s;
}
Also, modify TravellingSalesman.java, instead of just
TravellingSalesman t = new TravellingSalesman();
IChromosome optimal = t.findOptimalPath(null);
Try something like:
public List<Long> compute() throws Exception {
  IChromosome tempChromosome=null;
  int maxEvolution = 12;
  int maxPopulation = 12;
  int steps = 10;
  List<Long> output = new ArrayList<Long>();
  double currentFitness = 0;
  //Give few tries to find avg best path
  while (maxEvolution <= 512) {
   Configuration.reset();
   setMaxEvolution(maxEvolution);
   setPopulationSize(maxPopulation);
   tempChromosome = findOptimalPath(null);
   if (tempChromosome.getFitnessValue() > currentFitness) {
    currentFitness = tempChromosome.getFitnessValue();
    output = new ArrayList<Long>();
    int geneVal;
    Gene g[] = tempChromosome.getGenes();
    for (int i = 0; i < g.length; i++) {
     IntegerGene geneA = (IntegerGene) g[i];
     geneVal = geneA.intValue();
     output.add(geneVal);
    }
   }
   maxEvolution += steps;
   maxPopulation += steps;   
  }
  return output;
 }
Increasing the chromosome population makes computation slow, but results much better. Happy TSP solving !

Thursday, August 2, 2012

Hacking nested IN and NOT IN queries in hive

If you are not using nested select, then IN and NOT IN queries work fine.

But if you are looking for nested select, here is a alternate way to solve it :

For IN queries :

[SQL]    select id,name from user where id in (select id from students)
[HIVE]   select user.id,user.name from user JOIN (select id from students) tmp on (tmp.id = user.id)

For NOT IN queries :
[SQL]   select id,name from user where id not in (select id from students)
[HIVE]  select user.id,user.name from user  LEFT OUTER JOIN students ON (students.id = user.id) where students.id is null

Above will work only when students.id is supposed to be not null. Essentially idea is to have no rows from second table and all rows from first table in left outer join. So, accordingly one can modify where clause. Also in case of multiple joins, you might encounter duplicate rows in result. In that use,
[HIVE] select distinct user.id,user.name from user  LEFT OUTER JOIN students ON (students.id = user.id) where students.id is null

Distinct will involve one more step  of map reduce in HIVE.


Monday, May 7, 2012

PHP script to migrate mysql table to mongo collection


This is a naive PHP script I wrote to migrate data from mysql to mongo DB. I have tested transferring 10 million rows with no problem. Mongo and mysql DB credentials are not handled in this code. Even though code doesnot handle many mongo exceptions, it is good enough for normal use cases. Anybody is welcome to refine the code.
<?
// Usage ./mysqltomongo.php mysqldb mysqltable mongodb mongocollection
function mongo_connect($db,$collection) {
 $m = new Mongo(); //handle username,ports and passwords
 $mydb = $m->$db;
 if(!isset($collection)) return $mydb;
 else {
  $mycollection = $mydb->$collection;
  return $mycollection;
 }
}


$mysqldb=$argv[1];
$mysqltablename=$argv[2];
$mongodb=$argv[3];
$mongocollection=$argv[4];

$link = mysql_connect(...); //give proper info
if (!$link) {
    die('Could not connect: ' . mysql_error());
}

mysql_select_db($mysqldb);
$result = mysql_query("SELECT * from ".$mysqltablename);

$fields = mysql_num_fields($result);
$coltypes=array();
for ($i=0; $i < $fields; $i++) {
    $coltypes[mysql_field_name($result, $i)]=mysql_field_type($result, $i);
}

$mdb = mongo_connect($mongodb);
$mdb->dropCollection($mongocollection);
$collection= mongo_connect($mongodb,$mongocollection);

while ($row = mysql_fetch_array($result, MYSQL_ASSOC)) {
 $newrow=transform($row,$coltypes);
 try {
         $collection->insert($newrow);
 }
 catch(MongoException $e) {
  print_r($e);
 }
}

function transform($row,$coltypes) {
 $val;
 $ret=array();
 foreach($row as $k=>$v) {
        $val=utf8_encode($v);
        if($coltypes[$k]=="real") $val=floatval($val);
        else if($coltypes[$k]=="int") $val=intval($val);
        $ret[$k]=$val;
 }
 return $ret;
}

mysql_close();
?>

Code for Single, Double and Triple Exponential Forecasting

Recently I got interested in analyzing trends in time series and explored few forecasting techniques.

A good read to start are :
  • http://www.itl.nist.gov/div898/handbook/pmc/section4/pmc4.htm
  • http://en.wikipedia.org/wiki/Moving_average
  • http://en.wikipedia.org/wiki/Exponential_smoothing
First thing I did was to plot SMA and EMA over existing data with varying windows sizes and alpha values. This gives a initial idea of how your time series data is fluctuating with time. I used a awesome tool FLOT to visualize. I didn't get time to explore awesome library, but you may try http://www.r-project.org/

Next step after visualizing, is to find trends and forecast. Before forecasting future data points, I suggest forecast over current data set and try to get close fit as possible. Some of my observations from studying all the 3 smoothing methods are :
1) Single exponential should be used to when there are hardly any trends. Forecast value is almost equivalent to last data point. Only gain you get out of this method is to gain insight to alpha value by minimizing Mean Squared Error.
2) Double exponential method gives you better forecast values when there is some trend...say continuously going up or down. Forecasts are much better than single exponential. Playing with gamma value is fun.
3) Triple exponential method is good with trend plus season. By carefully iterating over period,alpha,beta,gamma, and each time minimizing MSE, one can get pretty close forecasts.

Here are my codes used :

 /**
  * http://www.itl.nist.gov/div898/handbook/pmc/section4/pmc431.htm
  * http://www.itl.nist.gov/div898/handbook/pmc/section4/pmc432.htm
  * @param data - input data
  * @param alpha - good value between 0.1-0.9
  * @param numForecasts - ahead forecasts
  * @return
  */
 public static double[] singleExponentialForecast(double[] data, double alpha, int numForecasts) {
  double[] y = new double[data.length + numForecasts];
  y[0] = 0;
  y[1] = data[0];
  int i = 2;
  for (i = 2; i < data.length; i++) {
   y[i] = alpha * data[i - 1] + (1 - alpha) * y[i - 1];
  }

  for (int j = 0; j < numForecasts; j++, i++) {
   y[i] = alpha * data[data.length - 1] + (1 - alpha) * y[i - 1];
  }
  return y;
 }

 /**
  * http://www.itl.nist.gov/div898/handbook/pmc/section4/pmc433.htm
  * http://www.itl.nist.gov/div898/handbook/pmc/section4/pmc434.htm
  * @param data
  * @param alpha
  * @param gamma
  * @param initializationMethod
  * @param numForecasts
  * @return
  */
 public static double[] doubleExponentialForecast(double[] data, double alpha, double gamma, int initializationMethod, int numForecasts) {
  double[] y = new double[data.length + numForecasts];
  double[] s = new double[data.length];
  double[] b = new double[data.length];
  s[0] = y[0] = data[0];
  
  if(initializationMethod==0) {
   b[0] = data[1]-data[0];
  } else if(initializationMethod==1 && data.length>4) {
   b[0] = (data[3] - data[0]) / 3;
  } else if(initializationMethod==2) {
   b[0] = (data[data.length - 1] - data[0])/(data.length - 1);
  }
  
  int i = 1;
  y[1] = s[0] + b[0];
  for (i = 1; i < data.length; i++) {
   s[i] = alpha * data[i] + (1 - alpha) * (s[i - 1]+b[i - 1]);
   b[i] = gamma * (s[i] - s[i - 1]) + (1-gamma) * b[i-1];
   y[i+1] = s[i] + b[i];
  }

  for (int j = 0; j < numForecasts ; j++, i++) {
   y[i] = s[data.length-1] + (j+1) * b[data.length-1];
  }
  
  return y;
 }
 
 public static double TSAError(double[] data, double[] forecast) {
  double mad = 0.0;
  double mse = 0.0;
  double diff = 0.0;

  for (int i = 0; i < data.length; i++) {
   diff = data[i] - forecast[i];
   mad += Math.abs(diff);
   mse += Math.pow(Math.abs(diff), 2.0);
  }
  
  return mse/data.length;
 }

Triple Exponential code can be found at http://n-chandra.blogspot.in/2011/04/holt-winters-triple-exponential.html
PS: Let me know if code has problems

Monday, April 9, 2012

How to add pgp public keys for mongo respository

While installing mongo db, if you are trying to get pgp key using
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv 7F0CEB10

you might receive following error :

Executing: gpg --ignore-time-conflict --no-options --no-default-keyring --secret-keyring /etc/apt/secring.gpg --trustdb-name /etc/apt/trustdb.gpg --keyring /etc/apt/trusted.gpg --primary-keyring /etc/apt/trusted.gpg --keyserver keyserver.ubuntu.com --recv 7F0CEB10
gpg: requesting key 7F0CEB10 from hkp server keyserver.ubuntu.com
gpgkeys: HTTP fetch error 7: couldn't connect to host
gpg: no valid OpenPGP data found.
gpg: Total number processed: 0

You can fix this by storing pgp public key written in http://www.mongodb.org/display/DOCS/Ubuntu+and+Debian+packages into a file say mongo.key and then execute "sudo apt-key add mongo.key" and then proceed with "sudo apt-get update"

Saturday, April 7, 2012

Store ajax json response into javascript variable using jquery

If you wish to make an ajax call, and assign response json into a variable directly...here is a small hack :
function getJson(url) {
 return JSON.parse($.ajax({
     type: 'GET',
     url: url,
     dataType: 'json',
     global: false,
     async:false,
     success: function(data) {
         return data;
     }
 }).responseText);
}

var myJsonObj = getJson('myjsonurl');

Wednesday, March 21, 2012

Umarshalling JSON and XML

If you want to unmarshall json or xml inputstream to Java object, here are the functions :
public static <T> T unmarshalXML(InputStream is, Class<T> c)
   throws JAXBException {
  JAXBContext jc = JAXBContext.newInstance(c);
  Unmarshaller u = jc.createUnmarshaller();
  T response = (T) u.unmarshal(is);
  return response;
 }

 public static <T> T unmarshalJSON(InputStream is, Class<T> c)
   throws JAXBException, IOException, JSONException, XMLStreamException {
  JAXBContext jc = JAXBContext.newInstance(c);
  Unmarshaller u = jc.createUnmarshaller();
  String sJson = IOUtils.toString(is);
  JSONObject obj = new JSONObject(sJson);
  Configuration config = new Configuration();
  MappedNamespaceConvention con = new MappedNamespaceConvention(config);
  XMLStreamReader xmlStreamReader = new MappedXMLStreamReader(obj, con);
  T response = (T) u.unmarshal(xmlStreamReader);
  return response;
 }