Wednesday, November 7, 2012

Solving Travelling Salesman Problem with JgraphT and JGAP libraries



Recently, I came across a real life problem of TSP. In this problem, user is assigned a list of cities to visit, but is not required to come back. At end point, he is again assigned to visit new set of cities.

This turns out to be a Hamiltonian path problem and not Hamiltonian cycle. I evaluated 2 different libraries for it namely JgraphtT and JGAP.

JgraphT seems to use MST-prim based approximation algorithm, where Cost of tour in worst case is 2*optimalCost. It computes in polynomial time and is very fast. To solve Hamiltonian path problem, you can add an extra vertex say city 0. Next add edges from all vertices to city 0 with weight 0 (Remember input graph has to be a complete graph). After computation remove the city 0 and all the corresponding edges.
Code for this is: 

import java.util.ArrayList;
import java.util.List;
import java.util.Set;

import org.jgrapht.alg.HamiltonianCycle;
import org.jgrapht.graph.DefaultWeightedEdge;
import org.jgrapht.graph.SimpleWeightedGraph;


/**
 * Using jgrpaht library
 * 
 * @author pratyush
 * 
 */
public class HamiltonianCycleHelper {

 private SimpleWeightedGraph<Long, DefaultWeightedEdge> g = new SimpleWeightedGraph<Long, DefaultWeightedEdge>(
   DefaultWeightedEdge.class);

 public void addVertex(Long id) {
  g.addVertex(id);
 }

 public void addVertex(List<Long> ids) {
  for (Long id : ids) {
   g.addVertex(id);
  }
 }

 public void addEdge(Long source, Long destination, Long weight) {
  DefaultWeightedEdge edge = g.addEdge(source, destination);
  g.setEdgeWeight(edge, weight);
 }

 public List<Long> execute() {
  Set<Long> vertices = g.vertexSet();
  addVertex(0l);
  System.out.println(vertices);
  for (Long v : vertices) {
   if (v.longValue() == 0)
    continue;
   DefaultWeightedEdge edge = g.addEdge(0l, v);
   g.setEdgeWeight(edge, 0);
  }
  List<Long> output = HamiltonianCycle
    .getApproximateOptimalForCompleteGraph(g);
  output.remove(Long.valueOf(0l));
  return output;
 }

 public static void main(String args[]) {
  List<Long> vertices = new ArrayList<Long>();
  vertices.add(1l);
  vertices.add(2l);
  vertices.add(3l);
  HamiltonianCycleHelper h = new HamiltonianCycleHelper();
  h.addVertex(vertices);
  h.addEdge(1l, 2l, 1l);
  h.addEdge(1l, 3l, 5l);
  h.addEdge(2l, 3l, 3l);
  List<Long> output = h.execute();
  System.out.println(output);
 }
}

JgraphT results were good, but I wanted worst case cost to be better. Even though, I don’t know about genetic algorithms, I tried out JGAP library examples and it worked pretty well. If you check the example source code, you can modify the SalesmanFitnessFunction.java by commenting the return path:

protected double evaluate(final IChromosome a_subject) {
  double s = 0;
  Gene[] genes = a_subject.getGenes();
  for (int i = 0; i < genes.length - 1; i++) {
   s += m_salesman.distance(genes[i], genes[i + 1]);
  }
  // add cost of coming back:
  //s += m_salesman.distance(genes[genes.length - 1], genes[0]);
  return Integer.MAX_VALUE / 2 - s;
}
Also, modify TravellingSalesman.java, instead of just
TravellingSalesman t = new TravellingSalesman();
IChromosome optimal = t.findOptimalPath(null);
Try something like:
public List<Long> compute() throws Exception {
  IChromosome tempChromosome=null;
  int maxEvolution = 12;
  int maxPopulation = 12;
  int steps = 10;
  List<Long> output = new ArrayList<Long>();
  double currentFitness = 0;
  //Give few tries to find avg best path
  while (maxEvolution <= 512) {
   Configuration.reset();
   setMaxEvolution(maxEvolution);
   setPopulationSize(maxPopulation);
   tempChromosome = findOptimalPath(null);
   if (tempChromosome.getFitnessValue() > currentFitness) {
    currentFitness = tempChromosome.getFitnessValue();
    output = new ArrayList<Long>();
    int geneVal;
    Gene g[] = tempChromosome.getGenes();
    for (int i = 0; i < g.length; i++) {
     IntegerGene geneA = (IntegerGene) g[i];
     geneVal = geneA.intValue();
     output.add(geneVal);
    }
   }
   maxEvolution += steps;
   maxPopulation += steps;   
  }
  return output;
 }
Increasing the chromosome population makes computation slow, but results much better. Happy TSP solving !

Thursday, August 2, 2012

Hacking nested IN and NOT IN queries in hive

If you are not using nested select, then IN and NOT IN queries work fine.

But if you are looking for nested select, here is a alternate way to solve it :

For IN queries :

[SQL]    select id,name from user where id in (select id from students)
[HIVE]   select user.id,user.name from user JOIN (select id from students) tmp on (tmp.id = user.id)

For NOT IN queries :
[SQL]   select id,name from user where id not in (select id from students)
[HIVE]  select user.id,user.name from user  LEFT OUTER JOIN students ON (students.id = user.id) where students.id is null

Above will work only when students.id is supposed to be not null. Essentially idea is to have no rows from second table and all rows from first table in left outer join. So, accordingly one can modify where clause. Also in case of multiple joins, you might encounter duplicate rows in result. In that use,
[HIVE] select distinct user.id,user.name from user  LEFT OUTER JOIN students ON (students.id = user.id) where students.id is null

Distinct will involve one more step  of map reduce in HIVE.


Monday, May 7, 2012

PHP script to migrate mysql table to mongo collection


This is a naive PHP script I wrote to migrate data from mysql to mongo DB. I have tested transferring 10 million rows with no problem. Mongo and mysql DB credentials are not handled in this code. Even though code doesnot handle many mongo exceptions, it is good enough for normal use cases. Anybody is welcome to refine the code.
<?
// Usage ./mysqltomongo.php mysqldb mysqltable mongodb mongocollection
function mongo_connect($db,$collection) {
 $m = new Mongo(); //handle username,ports and passwords
 $mydb = $m->$db;
 if(!isset($collection)) return $mydb;
 else {
  $mycollection = $mydb->$collection;
  return $mycollection;
 }
}


$mysqldb=$argv[1];
$mysqltablename=$argv[2];
$mongodb=$argv[3];
$mongocollection=$argv[4];

$link = mysql_connect(...); //give proper info
if (!$link) {
    die('Could not connect: ' . mysql_error());
}

mysql_select_db($mysqldb);
$result = mysql_query("SELECT * from ".$mysqltablename);

$fields = mysql_num_fields($result);
$coltypes=array();
for ($i=0; $i < $fields; $i++) {
    $coltypes[mysql_field_name($result, $i)]=mysql_field_type($result, $i);
}

$mdb = mongo_connect($mongodb);
$mdb->dropCollection($mongocollection);
$collection= mongo_connect($mongodb,$mongocollection);

while ($row = mysql_fetch_array($result, MYSQL_ASSOC)) {
 $newrow=transform($row,$coltypes);
 try {
         $collection->insert($newrow);
 }
 catch(MongoException $e) {
  print_r($e);
 }
}

function transform($row,$coltypes) {
 $val;
 $ret=array();
 foreach($row as $k=>$v) {
        $val=utf8_encode($v);
        if($coltypes[$k]=="real") $val=floatval($val);
        else if($coltypes[$k]=="int") $val=intval($val);
        $ret[$k]=$val;
 }
 return $ret;
}

mysql_close();
?>

Code for Single, Double and Triple Exponential Forecasting

Recently I got interested in analyzing trends in time series and explored few forecasting techniques.

A good read to start are :
  • http://www.itl.nist.gov/div898/handbook/pmc/section4/pmc4.htm
  • http://en.wikipedia.org/wiki/Moving_average
  • http://en.wikipedia.org/wiki/Exponential_smoothing
First thing I did was to plot SMA and EMA over existing data with varying windows sizes and alpha values. This gives a initial idea of how your time series data is fluctuating with time. I used a awesome tool FLOT to visualize. I didn't get time to explore awesome library, but you may try http://www.r-project.org/

Next step after visualizing, is to find trends and forecast. Before forecasting future data points, I suggest forecast over current data set and try to get close fit as possible. Some of my observations from studying all the 3 smoothing methods are :
1) Single exponential should be used to when there are hardly any trends. Forecast value is almost equivalent to last data point. Only gain you get out of this method is to gain insight to alpha value by minimizing Mean Squared Error.
2) Double exponential method gives you better forecast values when there is some trend...say continuously going up or down. Forecasts are much better than single exponential. Playing with gamma value is fun.
3) Triple exponential method is good with trend plus season. By carefully iterating over period,alpha,beta,gamma, and each time minimizing MSE, one can get pretty close forecasts.

Here are my codes used :

 /**
  * http://www.itl.nist.gov/div898/handbook/pmc/section4/pmc431.htm
  * http://www.itl.nist.gov/div898/handbook/pmc/section4/pmc432.htm
  * @param data - input data
  * @param alpha - good value between 0.1-0.9
  * @param numForecasts - ahead forecasts
  * @return
  */
 public static double[] singleExponentialForecast(double[] data, double alpha, int numForecasts) {
  double[] y = new double[data.length + numForecasts];
  y[0] = 0;
  y[1] = data[0];
  int i = 2;
  for (i = 2; i < data.length; i++) {
   y[i] = alpha * data[i - 1] + (1 - alpha) * y[i - 1];
  }

  for (int j = 0; j < numForecasts; j++, i++) {
   y[i] = alpha * data[data.length - 1] + (1 - alpha) * y[i - 1];
  }
  return y;
 }

 /**
  * http://www.itl.nist.gov/div898/handbook/pmc/section4/pmc433.htm
  * http://www.itl.nist.gov/div898/handbook/pmc/section4/pmc434.htm
  * @param data
  * @param alpha
  * @param gamma
  * @param initializationMethod
  * @param numForecasts
  * @return
  */
 public static double[] doubleExponentialForecast(double[] data, double alpha, double gamma, int initializationMethod, int numForecasts) {
  double[] y = new double[data.length + numForecasts];
  double[] s = new double[data.length];
  double[] b = new double[data.length];
  s[0] = y[0] = data[0];
  
  if(initializationMethod==0) {
   b[0] = data[1]-data[0];
  } else if(initializationMethod==1 && data.length>4) {
   b[0] = (data[3] - data[0]) / 3;
  } else if(initializationMethod==2) {
   b[0] = (data[data.length - 1] - data[0])/(data.length - 1);
  }
  
  int i = 1;
  y[1] = s[0] + b[0];
  for (i = 1; i < data.length; i++) {
   s[i] = alpha * data[i] + (1 - alpha) * (s[i - 1]+b[i - 1]);
   b[i] = gamma * (s[i] - s[i - 1]) + (1-gamma) * b[i-1];
   y[i+1] = s[i] + b[i];
  }

  for (int j = 0; j < numForecasts ; j++, i++) {
   y[i] = s[data.length-1] + (j+1) * b[data.length-1];
  }
  
  return y;
 }
 
 public static double TSAError(double[] data, double[] forecast) {
  double mad = 0.0;
  double mse = 0.0;
  double diff = 0.0;

  for (int i = 0; i < data.length; i++) {
   diff = data[i] - forecast[i];
   mad += Math.abs(diff);
   mse += Math.pow(Math.abs(diff), 2.0);
  }
  
  return mse/data.length;
 }

Triple Exponential code can be found at http://n-chandra.blogspot.in/2011/04/holt-winters-triple-exponential.html
PS: Let me know if code has problems

Monday, April 9, 2012

How to add pgp public keys for mongo respository

While installing mongo db, if you are trying to get pgp key using
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv 7F0CEB10

you might receive following error :

Executing: gpg --ignore-time-conflict --no-options --no-default-keyring --secret-keyring /etc/apt/secring.gpg --trustdb-name /etc/apt/trustdb.gpg --keyring /etc/apt/trusted.gpg --primary-keyring /etc/apt/trusted.gpg --keyserver keyserver.ubuntu.com --recv 7F0CEB10
gpg: requesting key 7F0CEB10 from hkp server keyserver.ubuntu.com
gpgkeys: HTTP fetch error 7: couldn't connect to host
gpg: no valid OpenPGP data found.
gpg: Total number processed: 0

You can fix this by storing pgp public key written in http://www.mongodb.org/display/DOCS/Ubuntu+and+Debian+packages into a file say mongo.key and then execute "sudo apt-key add mongo.key" and then proceed with "sudo apt-get update"

Saturday, April 7, 2012

Store ajax json response into javascript variable using jquery

If you wish to make an ajax call, and assign response json into a variable directly...here is a small hack :
function getJson(url) {
 return JSON.parse($.ajax({
     type: 'GET',
     url: url,
     dataType: 'json',
     global: false,
     async:false,
     success: function(data) {
         return data;
     }
 }).responseText);
}

var myJsonObj = getJson('myjsonurl');

Wednesday, March 21, 2012

Umarshalling JSON and XML

If you want to unmarshall json or xml inputstream to Java object, here are the functions :
public static <T> T unmarshalXML(InputStream is, Class<T> c)
   throws JAXBException {
  JAXBContext jc = JAXBContext.newInstance(c);
  Unmarshaller u = jc.createUnmarshaller();
  T response = (T) u.unmarshal(is);
  return response;
 }

 public static <T> T unmarshalJSON(InputStream is, Class<T> c)
   throws JAXBException, IOException, JSONException, XMLStreamException {
  JAXBContext jc = JAXBContext.newInstance(c);
  Unmarshaller u = jc.createUnmarshaller();
  String sJson = IOUtils.toString(is);
  JSONObject obj = new JSONObject(sJson);
  Configuration config = new Configuration();
  MappedNamespaceConvention con = new MappedNamespaceConvention(config);
  XMLStreamReader xmlStreamReader = new MappedXMLStreamReader(obj, con);
  T response = (T) u.unmarshal(xmlStreamReader);
  return response;
 }

Friday, March 16, 2012

Installing ReviewBoard plugin in Eclipse IDE


ReviewBoard (www.reviewboard.org/) is a nice code review web tool. ereviewboard (http://marketplace.eclipse.org/content/ereviewboard) is the corresponding plugin for Eclipse IDE. Below are the steps mentioned to integrate it with Eclipse IDE :

  • Goto Help->Install New software
  • Use URL http://rombert.github.com/ereviewboard/update/ and install "Mylyn Reviews Connector: ReviewBoard" and "Mylyn Reviews Connector: Review Board Subeclipse integration" (there are 3 connectors available, choose the appropriate one...here i am assuming subversion)
  • If subeclipse is not installed, use url http://subclipse.tigris.org/update_1.6.x
  • After installation, Open the task repositories view by navigating to Window -> Show View -> Other -> Mylyn -> Task Repositories
  • Click the "Add Task Repository" button located in the view's toolbar.
  • Select reviewboard and enter server as your reviewboard weburl (ex : http://192.168.x.x), username, password (Save password) and finish.
  • Now, Right-click on a Project and select Team -> Create Review Request will post to reviewboard (if you dont see such a option trying restarting eclipse)

Some helpful links :
http://help.eclipse.org/galileo/index.jsp?topic=/org.eclipse.mylyn.help.ui/userguide/Task-Repositories.html
https://github.com/rombert/ereviewboard/wiki/Subclipse-integration

Remarks :
This tool can only post diff's of 1 project to reviewboard. Multiple project diff are not supported

Listing all entities in a JPA

Sometimes it may be a usecase scenario to find whether a particular class is an entity managed by persistence context. If you have entityManager or entityManagerFactory you can easily do that :
 Metamodel meta = entityManagerFactory.getMetamodel();
 // or
 Metamodel meta = entityManager.getEntityManagerFactory().getMetamodel();

 // to iterate over all classes
 for (EntityType<?> e : meta.getEntities()) {
  // get entity class
  Class c = e.getJavaType();
  // get entity name as string
  String entityName = e.getName(); //or c.getName()
 }

 // test a particular class is entity
 // will throw java.lang.IllegalArgumentException if not an entity
 meta.entity(inputClass);

Curious Case in MYSQL : Lock wait timeout exceeded on INSERT

Sounds strange, that how can a insert be locked or timed out . I had a innodb table with very frequent inserts, updated and deletes. After every few minutes, one of the inserts got timed out (50 sec is default value for innodb_lock_wait_timeout). My understanding was that timeouts happen when some other thread/transaction holds a exclusive record lock (select .. from update) for a long time. So how can a non-existent new row be already locked. I do not have a proper answer to this.

What solved my problem was dropping index and foreign key mapping, which were luckily irrelevant. Random guess is that innodb locks a range of index on insert. If you have an answer do let me know !