Last week - GSoC16 - Detected and Solved Merge Conflicts in Drupal8.

Submitted by rakesh on Wed, 08/17/2016 - 23:34

Woah!! We're in last week of Google Summer of Code. Hard to believe that the journey we started from 22nd April is about to come to an end. Soon I’ll be publishing a post summarizing all my experience, development and other great achievements during this time. It was a really awesome journey but I’m really happy that the bond we share with the community and open source is not tied to GSoC only. The thought that I would be able to contribute after GSoC as well gives me  a really good motivation. There are many things that I learned during this time and would be happy if I could ever teach any of it to someone. My mentors were the best part of this journey. I’m really happy I got mentors like them and they played a huge role in keeping this journey interesting and really enjoying. I feel really very lucky to have 3 mentors and each of them is really a master in their fields. On the top of that, what I really liked about them is their behaviour, nature and the respect they give me irrespective of the fact that I make mistakes almost all the time and they have to find the mistakes in my code every single week in every task. I always feel embarrassing but they always tell me that they wouldn’t have signed up for being a mentor if they had any problem with this. My words can not define a bit of their awesomeness. I’d really recommend you guys to do at least a single (even small) project under their guidance and write to me if you don’t fall in love with them. You can contact them or IRC or through their drupal profiles (Dick Olsson, Andrei Jechiu and Tim Millwood) No matter how busy they are, they will definitely make some time for you and you’ll often see them working almost every time. No matter what time you message them, they are like always online to solve your issues.

Coming back to the Project details,

Project Details: The task is to solve the conflicts which may occur when two users try to push the same content with modifications in same piece of content/entity on their respective end. So far, we've a way to detect conflicts but not any to solve them.

Solution we're working on: When a node is updated, a revision is created. Multiple revisions forms a graph with actual entity as root node. In the image below, you can see a graph with multiple nodes. All these nodes are revision and root is actual entity.

We detect conflicts in 2 steps:

  1. Find Lowest common ancestor of 2 revisions.

  2. Implement recursive 3-way merge algorithm on them.

I will start by explaining the 2nd step and It’d pretty much cover the 1st one itself. Recursive 3-way merge Algorithm takes 3 parameters as input which represents 3 entities (Local, remote and base) and compares two of them (Local and Remote) with base entitiy.

It then makes sure that none of the content which is changed in Local/Remote is changed in Remote/Local as well. For example:  If some changes are made in Local on line number 19, there should not be any changes on line number 19 of remote and It should be identical to line number 19 of base entity. There can be changes on any other line of remote entity but not those which are changed/modified in local. Vice Versace is true as well. If it is made sure that no changes in local and remote overlaps, there’d not be any merge conflicts otherwise there would be conflicts.

Let’s take another example: In the image below, we see that line 51 is changed by both the deveopers and content is different from the base entity. In this scenario, the system can not decide on it’s own if we want the loops to run 10 times as per Remote Entity or 20 times as per local entity or just 5 times as Base entity says. So we’ll let users manually decide what they want. In our case, developer wanted it to run for 25 times. ;)

Explaining how merge conflicts occurs
Explaining how merge conflicts occurs
                                                                                                                                                              Image by : Dr. Drobbs (For informational use only)
 

For further information on how it all works, you can also have a look at the 3rd week blog post at my blog.


Now coming back to the first step: “Finding LCA”, the base entity we use in recursive 3-way merge is the LCA of two revisions. It is considered as base as it contains all the latest changes which are reckoned to be in the two revisions we will be comparing. So, we find a common parent of both the revisions which is closest to both of them and compare these revisions with that parent only. That parent or LCA acts as base entity.

Example: Below is a graph of revisions with revision ID and status. As we can see there are multiple revisions. Now let’s suppose someone (Developer A) starts editing revision with revision_id 4-8B10006… and at the very same time, another person( Developer B) starts editing 4-82460F0… . When both the developers tries to save their edits, we don’t know which changes should be kept in the final revision ( new revision which would be created after the merge). So for that purpose, we need to develop a system which could compare the revisions content and make sure that there is no conflict.

As discussed above, first we would find Lowest common ancestor of both the revisions developers have modified. As we can see Revision 3-6F7F875…  is closest to both of them, it is the Lowest Common Ancestor. Relaxedws/lca library does this task of finding LCA of two revisions. Now we have three important parts we can use from the graph:

  • Revision 1 (4-8B10006…) which was modified by developer A.

  • Revision 2 (4-82460F0…)  which was modified by developer B.

  • Lowest Common Ancestor of both the revisions (3-6F7F875…) .                                         

Revisions Graph created by workspace module
Revision Graph created by workspace module.

The next step is to find if there are any merge conflicts or not. For that purpose, we will compare contents of revision 4-8B10006… with revision 3-6F7F875… and inspect all the changes made in revisions 4-8B10006…. We’ll do same for revision 4-82460F0…  as well. Once we have compared both revisions with their LCA (revision 3-6F7F875…), we will compare the changes made in both the revisions. If the changes contains some common content in both the revisions, we know there is a merge conflict as system can not decided which changes are more important. So at that time, we’ll have to show a 3 pane window to the developer and let him/her decide whether he wants. He would be shown content of all three parts i.e, revision1, revision2 and LCA. The manually chosen content will then create a new revision. The tree pane window would look like this.

3 pane window to resolve conflicts manually.
3 Pane window to resolve Merge conflicts manually (Containing content from Local, Base and Remote Entities).
                                                                                                                                                              Image by : CodeMirror Demo (For informational use only)

And once the conflicts and either resolver or there wasn't any conflict, the graph would look something like:

                                                                                            3-6F7F875…

                                                                                         /                  \

Just a prototype:                                           4-8B10006…    4-82460F0…

                                                                                        \                   /

                                                                                        5-XXXXXX......

Progress so far: We have created two php libraries. One to find LCA from a directed acyclic graph and other one to perform a recursive 3-way merge algorithm. We have also created a module named "Conflict" which would find LCA and perform the merge algorithm on Drupal entities. The Conflict modules doesn’t use the libraries we created in first phase of GSoC as by default Drupal 8 uses linear graph of Revisions. For example:

                                                                      1-----> 2 ------> 3 ------> 4 ------> 5

Where “1” is the actual nodes and rest all numbers are to be considered as its revisions. Now, finding Lowest common ancestor in such linear graph is easy. For example, LCA(2,3) is 1 and LCA(4,5) is 3. Also LCA(1,3) is 1. So we just checked if either one of nodes weren’t root, we returned the parent of the revision which was first created otherwise we returned the root ( Actual entity). Similarly for merge method, we just found the revision last created out of 3 and returned it as result the last created revision is the latest one in linear graph. For ex: Merge(3,4,5) would return 5 as result. This module doesn't use these libraries we've created.

However, those websites which uses Multiversion module will not use such approach as the graph of revision would be pretty different as defined above.

Now we are writing code in Multiversion module which would use these library. We'd be able to detect LCA from complex graphs and resolve merge conflicts using Multiversion module once we've integrated these APIs with the module. Of course these features would be available to sites using Multiversion module. At the moment, we have successfully integrated LCA Library with Multiversion module.

Work we did last week: We finished our work with ComplexLcaResolver.php which implements relaxedws/lca Library and returns lowest common ancestor from a directed acyclic graph. After we got that merged, we were all set for developing ComplexMergeResolver.php which would use relaxedws/merge to merge 2 arrays into one after comparing them with a third array. The whole merging process follows 3-way merge algorithm. You can read it on Dr. Drobss website for more information about it. You can also read this blog if you want to have a clear picture of how we are using it in Drupal 8 and Multiversion.

Work we did this week: We started our work with ComplexMergeResolver and have successfully implemented the functionality. We have successfully converted entity objects into arrays using serialization API and then we have used relaxedws/merge  library to detect any merge conflicts and merge them otherwise. The functionality is implemented and now we are writing tests to ensure it’s correct working. I’m really happy that we’re on the verge of completing our last development task. All the code we've written so far can be found here and the tests we've written are availiable here.

Aim for the next week: AAhhhhhhh!!!! Unfortunately, this is the last week. The only Aim for next week would be to plan how can I keep contributing to what I have created so far and other parts of this module.

I’d really like to thanks Matthew Lechleider. He talks to us once every week but has taught us one of the most important things i.e, to think thoroughly and define our tasks in very simple language. As a developer, we know what we have developed but most of the times, we’re not really able to make non-devs understand what our code is doing and what is the actual project. From the very starting where we started writing proposals to the very end, he used to tell me to “explain it in detail”. All of a sudden, it all makes sense now. Thanks a lot Drupal Community. Thanks a lot. You've given me an experience I'd never forget.