Tuesday, July 5, 2011

Modeling Tree and DAG in MySQL Vs Inmemory implementation

Few days back I was stuck with a problem of modeling a directed acyclic graph(DAG) in database.

I started looking into various approaches of storing trees first. If it is a case of tree with multiple children, but only single parent, then we can choose to use nested set model. A comprehensive description in mysql is discussed here. I have personally implemented it and works well.

But, if you want to model a DAG kind of structure with multiple parents, then it gets difficult. A good article on this can be found here. It discusses about various operations on nodes and links. Operations like delete link gets pretty complicated.

So, I thought may be a in memory implementation would be a better alternative. I went ahead and wrote a inmemory thread implementation using DFS, which gives all the parents in the path till given node(maybe through multiple paths), and all the children in subtree on a given node.

Here is a brief outline of it :

//datastructure to store childrens and parents
public class Node {
 HashSet<Long> childrens = new HashSet<Long>(); 
 HashSet<Long> parents = new HashSet<Long>(); //parents in path
 public HashSet<Long> getChildrens() {
  return childrens;
 }
 public void setChildrens(HashSet<Long> childrens) {
  this.childrens = childrens;
 }
 public HashSet<Long> getParents() {
  return parents;
 }
 public void setParents(HashSet<Long> parents) {
  this.parents = parents;
 }
}
//adjacency list (Node id, neighbour id list)
private HashMap<Long,List<Long>> adjList= new HashMap<Long, List<Long>>();
//all the nodes hashed on node id
private HashMap<Long,Node> data = new HashMap<Long, Node>();

public void init() {
   //find nodes which have no inlink from adjacency list 
   HashSet<Long> parents = new HashSet<Long>(); //who are roots
   for(Long k : data.keySet()) {
      //add dummy nodes in adjacency list for nodes having no links
      if(!adjList.containsKey(k)) {
       adjList.put(k,new ArrayList<Long>()); 
      }
      boolean found = false;
      for(Long p : adjList.keySet()) {
         if(adjList.get(p).contains(k)) {
            found=true; break;
         }
      }
      if(!found) parents.add(k);
   }

   //start dfs
   HashSet<Long> parentNull = new HashSet<Long>();
   for(Long u : parents) {
      dfs(u,parentNull);
   }
}

/**
* @param node 
* current node
* @param parents
* set of all parents of given node
* @return all subtree nodes as children
*/
HashSet<Long> dfs(Long node, HashSet<Long> parents) {
   //cycle exists
   if(parents.contains(node)) {
      throw new Exception("Contains cycle");
   }
   //add parents to current node
   data.get(node).getParents().addAll(parents);
   //construct new set of parent for passing to children
   HashSet<Long> newParent = new HashSet<Long>(data.get(node).getParents());
   newParent.add(node);
   //do a dfs to all its children
   for(Long neighbours : adjList.get(node)) {
      HashSet<Long> childrens = dfs(neighbours,newParent);
      data.get(node).getChildrens().addAll(childrens); 
   }
   //it has subtree..so return all children
   if(adjList.get(node).size()>0) {
      return data.get(node).getChildrens();
   }
   else {
      //add only current node
      HashSet<Long> ret = new HashSet<Long>();
      ret.add(node);
      return ret;
   }
}

No comments:

Post a Comment