Multiversion - Resolving Merge Conflicts in Drupal 8 - First half completed

Submitted by rakesh on Wed, 06/22/2016 - 13:35

Google Summer of Code is halfway through and it’s now time for mid term evaluations. What that means is, we will be judged on the basis of our efforts and targets achieved and our mentors would decide if we pass of fail. If someone fails the mid term evaluation, he is removed from the program immediately.

Let us sum up all work we’ve done in the GSoC-coding phase 1.

Project: Solving content conflicts with merge algorithms in Drupal 8.

Proposed Solution: Merge the updated content in the new array or throw an Exception if there is a conflict. The real challenge was to detect the merge conflicts and to merge otherwise. We decided to use 3-way merge algorithm to compare the updated nodes with a parent node. Out of all parent nodes, the one closest to them(LCA) was our best option to compare them with as it’d have all the recent updations. We needed Libraries to find the closes common parent or Lowest common ancestor and then compare it to the updated nodes to find if there was a conflict or not.

We needed one more library to merge the updated content if there is not conflict.

Deliverables before mid term evaluations:

  1. Library to find LCA from a Directed acyclic graph.

  2. Library to perform a recursive 3-way merge algorithm.

Week 1: Library to find Lowest common ancestor(LCA) in a Directed acyclic graph.

LCA: The nearest common parent to 2 nodes. For ex:



                                                                                  /        \

                                                                                2            3

                                                                              /    \         /     

                                                                            4       5     6


LCA of (2,3) is 1 and (4,5) is 2 in the graph created above. LCA(4,5) = 2 because it is a common parent of both as well as closest to these nodes than other common parents.

Similarly LCA(5,6) = 1

Approach: The approach we used to find LCA is Breadth First Traversal in opposite Direction. We traversed from node1 to root storing all the elements encountered in an array. We, then traversed from node2 to root doing same thing. Thanks to the clue/graph and graphp libraries. We used these libraries to create graphs and traverse the graph using BFS in the reverse direction. The first element we get from the intersection of these array elements will be the LCA.

Week 2: Writing tests to assert the correct functionality.

Once we had the working prototype ready to find the LCA, we needed tests to make sure that the code works well in all the cases. We created multiple graphs based on their complexity such as Simple graph, a bit complex graph (Few Multiple parents), very complex graph (Many multiple parents). The visual representation of the graphs is available in the source code.

Then we implemented these graphs in our code and ran multiple tests on them to make sure that they were returning the correct LCA. It was a very crucial part of our project and we had to make sure that no wrong LCA were returned as it’d have caused data loss and probably would have resulted in unnecessary merge conflicts. We wrote tests for:

  • Single Parent in simple graph.

  • Multiple Parents in simple graph.

  • Single Parent in Complex graph.

  • Multiple Parents in Complex graph.

Week 3: Writing Library to perform recursive 3-Way Merge algorithm

After the completion of the LCA library, we were all set to start with the library to perform a recursive 3-way merge algorithm.

3-way merge algorithm: It turns a manual conflict into an automatic resolution. Part of the magic here relies on the VCS locating the original version of the entity (Base entity). This original version is better known as the "Lowest Common Ancestor (LCA)". The VCS then passes the common ancestor and the two contributors to the three-way merge tool that will use all three to calculate the result. The tool compares the local and remote entities with the LCA (base entity) and updates the base entity according to the modified entity. For a clear understanding, please read the article I wrote for 3rd week of GSoC.

Approach: It was a very complicated task and we had to make sure it was working fine for all edge cases including exceptions(Yes, there were many). To implement this library, I got a very clear understanding of how it works.

There were multiple cases I had to take care of while implementing the library. Some of them are:

  • Lines are added in Remote.

  • Lines are added in Local.

  • Lines are added in both Remote and Local.

  • Lines are removed in Remote.

  • Lines are removed in Local.

  • Lines are removed in both Remote and Local.

  • Lines are modified in Remote.

  • Lines are modified in Local.

  • Lines are modified in both Remote and Local.

  • Pre-existing Lines are modified and new lines are added in Remote.

  • Pre-existing Lines are modified and new lines are added in Local.

  • Pre-existing Lines are modified and new lines are added in both Remote and Local.

  • Pre-existing Lines are modified and some lines are removed from Remote.

  • Pre-existing Lines are modified and some lines are removed from Local.

  • Pre-existing Lines are modified and lines are removed from either Remote or Local.

  • Lines are added and removed at the same time.

The basic approach used for most of the cases is to count the number of lines and then we used:

  1. A For loop to run as many times as the minimum lines were there. For example: If ancestor array has a value at some key which has 3 lines, remote has a value with 4 lines an local array has a value with 2 lines, then the first loop will run for 2 lines as the minimum number of lines is 2. This loop will compare all 3 arrays and store the changes on the same key=>value in new array which would be returned after a successful merge.

  2. Second FOR loop  which ran for as many times as the the remaining lines in the second array with largest number of lines. In our case, ancestor array has 3 lines and 2 of which were audited in first FOR Loop and the 3rd line would be compared in this second FOR Loop. So In this scenario, second For loop would run for 1 line only.  It would compare the 3rd line of ancestor from the remote array to make sure they both are same. If they are same, it means only Local node has the 3rd line updated(as it was removed in local), so it’d be removed from final revision as well otherwise, it’d return a conflict Exception.

  3. Third For loop would run to just add new lines into the new revision. In our case, the line number 4 was only in remote and hence, it’d be added in the new array.

In most of the cases, only 1 or 2 “For loops” would run. All the 3 “For Loops” would run in the case where lines have been added as well as removed in remote, local or ancestor.

Week 4: Writing Tests, Custom Exception and Updated documentation.

At the end of the 3rd week, I thought I was done with almost everything until I started testing. With every new test case, I discovered a new edge case where the code was not working as expected and this is the part where I actually learnt about the real importance of testing.

The challenge was the ensure the code readability. With every edge case, the length and complexity of the code kept increasing. After +1,179 // −32 LOC, We were finally done with our as many test cases as possible and the code with it’s proper documentation.

We Covered the following cases in our tests:

  • Simple Merge in same number of lines in all 3 arrays.

  • Merge with same number of lines in 2 arrays (Addition or removal in 3rd).

  • Merge with different number of lines in all 3 arrays (Addition or removal in all 3 arrays).

  • Catching Merge Conflicts.

  • Key removal from a array(Remote or Local).

Other than writings tests this week, we created our own exception class (ConflictException) which extends the core Exception class. We also updated the documentation, made code readable and updated Readme file with the features and the examples.

All the code I have written has been merged into the core libraries from my Github repositories. The code is ready to be used, modifications and feedback. Please feel free to create a pull request with any issues, suggestions or patch for a bug.

A huge thanks to my mentors Dick Olsson, Tim Millwood and Andrei Jechiu for their guidance. They are always available to answer to my silly queries in an appropriate manner. They keep telling me about all those little mistakes I repeat and make sure that I don't repeat them multiple times. I must say they are equally inclined towards me learning more as well as the completion of this project. Never thought this summer would be this much fun. Learning curve is now going up exponentially.