Monday, May 7, 2012

PHP script to migrate mysql table to mongo collection


This is a naive PHP script I wrote to migrate data from mysql to mongo DB. I have tested transferring 10 million rows with no problem. Mongo and mysql DB credentials are not handled in this code. Even though code doesnot handle many mongo exceptions, it is good enough for normal use cases. Anybody is welcome to refine the code.
<?
// Usage ./mysqltomongo.php mysqldb mysqltable mongodb mongocollection
function mongo_connect($db,$collection) {
 $m = new Mongo(); //handle username,ports and passwords
 $mydb = $m->$db;
 if(!isset($collection)) return $mydb;
 else {
  $mycollection = $mydb->$collection;
  return $mycollection;
 }
}


$mysqldb=$argv[1];
$mysqltablename=$argv[2];
$mongodb=$argv[3];
$mongocollection=$argv[4];

$link = mysql_connect(...); //give proper info
if (!$link) {
    die('Could not connect: ' . mysql_error());
}

mysql_select_db($mysqldb);
$result = mysql_query("SELECT * from ".$mysqltablename);

$fields = mysql_num_fields($result);
$coltypes=array();
for ($i=0; $i < $fields; $i++) {
    $coltypes[mysql_field_name($result, $i)]=mysql_field_type($result, $i);
}

$mdb = mongo_connect($mongodb);
$mdb->dropCollection($mongocollection);
$collection= mongo_connect($mongodb,$mongocollection);

while ($row = mysql_fetch_array($result, MYSQL_ASSOC)) {
 $newrow=transform($row,$coltypes);
 try {
         $collection->insert($newrow);
 }
 catch(MongoException $e) {
  print_r($e);
 }
}

function transform($row,$coltypes) {
 $val;
 $ret=array();
 foreach($row as $k=>$v) {
        $val=utf8_encode($v);
        if($coltypes[$k]=="real") $val=floatval($val);
        else if($coltypes[$k]=="int") $val=intval($val);
        $ret[$k]=$val;
 }
 return $ret;
}

mysql_close();
?>

Code for Single, Double and Triple Exponential Forecasting

Recently I got interested in analyzing trends in time series and explored few forecasting techniques.

A good read to start are :
  • http://www.itl.nist.gov/div898/handbook/pmc/section4/pmc4.htm
  • http://en.wikipedia.org/wiki/Moving_average
  • http://en.wikipedia.org/wiki/Exponential_smoothing
First thing I did was to plot SMA and EMA over existing data with varying windows sizes and alpha values. This gives a initial idea of how your time series data is fluctuating with time. I used a awesome tool FLOT to visualize. I didn't get time to explore awesome library, but you may try http://www.r-project.org/

Next step after visualizing, is to find trends and forecast. Before forecasting future data points, I suggest forecast over current data set and try to get close fit as possible. Some of my observations from studying all the 3 smoothing methods are :
1) Single exponential should be used to when there are hardly any trends. Forecast value is almost equivalent to last data point. Only gain you get out of this method is to gain insight to alpha value by minimizing Mean Squared Error.
2) Double exponential method gives you better forecast values when there is some trend...say continuously going up or down. Forecasts are much better than single exponential. Playing with gamma value is fun.
3) Triple exponential method is good with trend plus season. By carefully iterating over period,alpha,beta,gamma, and each time minimizing MSE, one can get pretty close forecasts.

Here are my codes used :

 /**
  * http://www.itl.nist.gov/div898/handbook/pmc/section4/pmc431.htm
  * http://www.itl.nist.gov/div898/handbook/pmc/section4/pmc432.htm
  * @param data - input data
  * @param alpha - good value between 0.1-0.9
  * @param numForecasts - ahead forecasts
  * @return
  */
 public static double[] singleExponentialForecast(double[] data, double alpha, int numForecasts) {
  double[] y = new double[data.length + numForecasts];
  y[0] = 0;
  y[1] = data[0];
  int i = 2;
  for (i = 2; i < data.length; i++) {
   y[i] = alpha * data[i - 1] + (1 - alpha) * y[i - 1];
  }

  for (int j = 0; j < numForecasts; j++, i++) {
   y[i] = alpha * data[data.length - 1] + (1 - alpha) * y[i - 1];
  }
  return y;
 }

 /**
  * http://www.itl.nist.gov/div898/handbook/pmc/section4/pmc433.htm
  * http://www.itl.nist.gov/div898/handbook/pmc/section4/pmc434.htm
  * @param data
  * @param alpha
  * @param gamma
  * @param initializationMethod
  * @param numForecasts
  * @return
  */
 public static double[] doubleExponentialForecast(double[] data, double alpha, double gamma, int initializationMethod, int numForecasts) {
  double[] y = new double[data.length + numForecasts];
  double[] s = new double[data.length];
  double[] b = new double[data.length];
  s[0] = y[0] = data[0];
  
  if(initializationMethod==0) {
   b[0] = data[1]-data[0];
  } else if(initializationMethod==1 && data.length>4) {
   b[0] = (data[3] - data[0]) / 3;
  } else if(initializationMethod==2) {
   b[0] = (data[data.length - 1] - data[0])/(data.length - 1);
  }
  
  int i = 1;
  y[1] = s[0] + b[0];
  for (i = 1; i < data.length; i++) {
   s[i] = alpha * data[i] + (1 - alpha) * (s[i - 1]+b[i - 1]);
   b[i] = gamma * (s[i] - s[i - 1]) + (1-gamma) * b[i-1];
   y[i+1] = s[i] + b[i];
  }

  for (int j = 0; j < numForecasts ; j++, i++) {
   y[i] = s[data.length-1] + (j+1) * b[data.length-1];
  }
  
  return y;
 }
 
 public static double TSAError(double[] data, double[] forecast) {
  double mad = 0.0;
  double mse = 0.0;
  double diff = 0.0;

  for (int i = 0; i < data.length; i++) {
   diff = data[i] - forecast[i];
   mad += Math.abs(diff);
   mse += Math.pow(Math.abs(diff), 2.0);
  }
  
  return mse/data.length;
 }

Triple Exponential code can be found at http://n-chandra.blogspot.in/2011/04/holt-winters-triple-exponential.html
PS: Let me know if code has problems