GSoC Final Submission

Submitted by rakesh on Tue, 08/23/2016 - 14:29

GSoC is almost over and I’m now submitting the project which I started with my awesome mentors Dick Olsson, Andrei Jechiu and Tim Millwood nearly 4 months ago. I knew it’d be a really great journey but I had no idea it’d be this awesome. Everything related to GSoC was perfect. The organisation, my mentors, the project itself was enough challenging as well as interesting at the same time. Weekly blogs, daily updates and communication with mentors over Slack made sure we’re on track. First part was easy but I struggled a bit in second half but my mentors were always right here to help me with all the issues I had. I used to feel a bit embarrassed as I was not getting things done as easily as any other student could have but they never felt the same way. They always had a smile at their face and a solution for me to work on. I’m really glad I got such supportive mentors.  Thanks a lot Drupal.

Drupal makes me happy


Project Description: The task was to solve the conflicts which may occur when two users try to push the same content with modifications in same piece of content/entity on their respective end. Initially it looked like a very difficult task and I was really afraid when I chose the project but when Dick Olsson told me that this is a very interesting project and something like this has not been done in Drupal like this from a long time, it may also get a lot of attention as well as support from other people in community as well, I was more than just happy in going with this project. It took me some time to actually understand how we would solve the problem and to be very honest, I had no clear idea for the community bonding period about if I still understand it clearly or not. I decided to go on step by step and knew what I had to do at one time. Once we started coding, I could see the future and the parts where the code I was writing at that moment would be used. Soon, I could see a very clear picture of our project and the path we had to follow to make it a success. All thanks to my mentors again for their constant guidance.

How we solved it: I started by understand what actually happens at the backend of Drupal When a node is updated. I learnt that a revision is created every time a node is updated. Multiple revisions forms a graph with actual entity as root node. In the image below, you can see a graph with multiple nodes. All these nodes are revision and root is actual entity. In simpler words, when we create a new entity, we are starting a graph and when we update that entity, we are adding a node in that graph. It should be noted that even when we remove an entity, we’re adding a node in the graph and not removing.

The complete solution was depending on two steps:

  1. Find Lowest common ancestor of 2 revisions.

  2. Implement recursive 3-way merge algorithm on them.

I will start by explaining the 2nd step and It’d pretty much cover the 1st one itself. Recursive 3-way merge Algorithm takes 3 parameters as input which are 3 entities (local, remote and base) and compares two of them (local and remote) with base entity.

It then makes sure that none of the content which is changed in Local/Remote is changed in Remote/Local as well. For example:  If some changes are made in Local on line number 19, there should not be any changes on line number 19 of remote and It should be identical to line number 19 of base entity. There can be changes on any other line of remote entity but not those which are changed/modified in local. Vice versa is true as well. If it is made sure that no changes in local and remote overlaps, there’d not be any merge conflicts otherwise there would be conflicts.

Let’s take another example: In the image below, we see that line 51 is changed by both the developers and content is different from the base entity. In this scenario, the system can not decide on it’s own if we want the loops to run 10 times as per Remote Entity or 20 times as per local entity or just 5 times as Base entity says. So we’ll let users manually decide what they want. In our case, developer wanted it to run for 25 times.

Figure 1 : Explaining how merge conflicts occurs
Figure 1 : Explaining how merge conflicts occurs
                                                                                                                                                                     IMAGE BY : DR. DROBBS (FOR INFORMATIONAL USE ONLY)

For further information on how it all works, you can also have a look at the 3rd week blog post at my blog.

Now coming back to the first step: “Finding LCA”, the base entity we use in recursive 3-way merge is the LCA of two revisions. It is considered as base as it contains all the latest changes which are reckoned to be in the two revisions we will be comparing. So, we find a common parent of both the revisions which is closest to both of them and compare these revisions with that parent only. That parent or LCA acts as base entity.

Example: Below is a graph of revisions with revision ID and status. As we can see there are multiple revisions. Now let’s suppose someone (Developer A) starts editing revision with revision_id 4-8B10006… and at the very same time, another person (Developer B) starts editing 4-82460F0…. When both the developers try to save their edits, we don’t know which changes should be kept in the final revision (new revision which would be created after the merge). So for that purpose, we need to develop a system which could compare the revisions content and make sure that there is no conflict.

As discussed above, first we would find Lowest common ancestor of both the revisions developers have modified. As we can see Revision 3-6F7F875…  is closest to both of them, it is the Lowest Common Ancestor. Relaxedws/lca library does this task of finding LCA of two revisions. Now we have three important parts we can use from the graph:

  • Revision 1 (4-8B10006…) which was modified by developer A.

  • Revision 2 (4-82460F0…)  which was modified by developer B.

  • Lowest Common Ancestor of both the revisions (3-6F7F875…). 

Figure 2: Revision Graph created by workspace module.
Figure 2: Revision Graph created by workspace module.

 The next step is to find if there are any merge conflicts or not. For that purpose, we will compare contents of revision 4-8B10006… with revision 3-6F7F875… and inspect all the changes made in revisions 4-8B10006…. We’ll do the same for revision 4-82460F0….. Once we have compared both revisions with their LCA (revision 3-6F7F875…), we will compare the changes made in both revisions. If the changes contains some common content in both revisions, we know there is a merge conflict as the system can not decide which changes are more important. So at that time, we’ll have to show a 3 pane window to the developer and let him/her decide whether he wants. He would be shown content of all three parts i.e, revision 1, revision 2 and LCA. The manually chosen content will then create a new revision. The tree pane window would look like this.

Figure 3 : 3-Pane window to resolve merge conflicts manually (containing content from Local, Base and Remote entities).
Figure 3 : 3-Pane window to resolve merge conflicts manually (containing content from Local, Base and Remote entities).
                                                                                                                                                       IMAGE BY : CODEMIRROR DEMO (FOR INFORMATIONAL USE ONLY)

And once the conflicts and either resolved or there wasn't any conflict, the graph would look something like the following image:          

Expected Graph after conficts are resolved.
Expected Graph after conflicts are resolved.

Initially the biggest concern was to get my system set up and running properly. I had to install Drupal 8 locally and get an IDE with a good debugger to start working. As per my mentor’s suggestion, I used PhpStorm IDE from JetBrains and Xdebug for debugging purposes. Also, my mentors kept telling me to learn how to test the code I will write as they wanted to “Write tests first, code later” approach. It was one of the most challenging tasks as I had no exposure to testing and was totally new to it. So, with a little knowledge of everything, I started my journey and I’m really happy we got it going till the end.

Once the problem was clear, we started coding our way into it. We divided the whole process in sub tasks and completed one at a time which in the end was integrated together for a complete success of project.

Let’s take a look at the all the subtasks and how we implemented them.

Task 1: First task was to create a PHP Library which could return Lowest Common Ancestor from a Directed Acyclic Graph.

Implementation: We used clue/graph library and ran a Breadth First Search on the graph for both the revisions and then found the common nodes and returned the one closest to both the nodes.

Code: The code is pushed over github and packagist as well.

List of all my commits : Commits made by Rakesh Verma in Relaxedws/lca Library.

 A Pull commit may contain multiple commits.

Task 2: Once we’re done with the relaxedws/lca library, we needed another library which could use the LCA returned by relaxedws/lca for implementing recursive 3-way merge algorithm as described above. For this purpose, we created relaxedws/merge library which takes 3 arrays as input and merge them according to 3-way merge algorithm.

Implementation: We implemented all the algorithm by ourselves for this library. We converted the contents of file into array and then compared each line of these array with array from other revision. If there was any change in values of the arrays at same index, we made sure it doesn’t exist in 3rd array as well otherwise we returned a custom exception alerting user about merge conflict.

List of all commits: Commits made by Rakesh Verma in Relaxedws/merge Library.

Task 3: After the successful completion of both these libraries, we’re ready to implement these libraries in Multiversion module but then we realized we need one module which won’t use these libraries but use simple terminology to find LCA and perform merge algorithm in Drupal as not all Drupal sites would use Multiversion and hence these libraries won’t be used by all.

Implementation: As we discussed earlier how revisions are created in a linear graph by default in drupal, we created a module named “Conflict” which would take 2 node id as input and would return the parent of the node_id which was created first.

For ex: LCA(4,5) would be 3 and LCA(12,16) would be 11 and LCA(1,8) would be 1 as it’s the root node and it doesn’t have any parent.

Similarly it takes 3 node ids to perform merge and return the last created node id out of those 3 as the last created node is the latest one and it contains all the latest changes in linear graph. So, Merge(3,4,5) would return contents of node with id = 5 and Merge(4,6,9) would return the contents of node with id = 9.

We created a pluggable pattern with services in the container which could be used by other drupal modules as well. The services we created in the conflict module were used in Multiversion module as well to detect and solve merge issues in the complex graphs where a node can have multiple branches (nonlinear graph).

Code: The code is available at Drupaldeploy/drupal-conflict.

List of commits made by me can is available here: Commits made by Rakesh Verma.

Task 4: After implementing the simple LCA resolver and simple Merge resolver in Conflict module, we’re ready to start integrating the relaxedws/lca library into the Multiversion module. All the websites which uses Multiversion module can use these libraries to find Lowest common ancestor from the graph of revisions and use that as an input to merge library.

The feature is already completed and merged in Multiversion module.

List of all the commits made by me is present : Commits made by Rakesh Verma to integrate relaxedws/lca with multiversion module.

Task 5: Now we’re left with the integration of relaxedws/merge library with the multiversion module. We’ve implemented the functionality and writing tests for it. Some tests are failing because revision ID is different for each revision and hence, merge library is returning an exception because the revision ID is changed in all three revisions and library can not decide on it’s own which results are to be kept in final revision. We’re trying to solve this issue.

Implementation: We created a method to convert the tree returned by multiversion module into a graph and later, we used that graph and it’s contents in PHP libraries we created in first phase. We integrated those libraries with Multiversion module and accessed them via services defined in Conflict module.

Code: All the commits made by me can be seen here: Link to pull request for integrating relaxedws/merge into multiversion module.

Task 6: The last task of the project is to integrate Codemirror with the multiversion module so that if in any case, there is a merge conflict, A user would be shown all 3 nodes (Local, remote and Base) and would be allowed to decide what changes are to be kept in final revision. This is where Codemirror comes in. This is the last part of the project and we ran out of time.

Weekly Breakdown: The work we did weekly is mentioned below:

  • Community Bonding period:  Familiarize myself with the organizations, mentors and the codebase.

  • Coding Phase Week 1: Finished the PHP library to find LCA. (Blog Post)

  • Week 2: Wrote tests for the PHP library. (Blog Post)

  • Week 3: Finished the code for PHP library to perform recursive 3-way merge (Blog Post)

  • Week 4: Wrote tests for the PHP library to ensure it performs the algorithms correctly (Blog post)

  • Week 5: Started code to create module to find LCA in linear graph (Blog post).

  • Week 6: Wrote tests for module and Finished the LCA part of the Conflict module (Blog Post).

  • Week 7: Implemented Simple Merge Resolver in Conflict Module (Blog Post).

  • Week 8: Extensive tests written for Simple Merge Resolver (Blog Post).

  • Week 9: Finished the Conflict module (Blog Post).

  • Week 10: Started Integration of PHP library which finds LCA from a graph into Multiversion module. (Blog Post)

  • Week 11: Wrote tests to ensure integration has been done correctly in multiversion module (Blog post).

  • Week 12: Finished Integration of PHP library to perform recursive 3-way merge with multiversion module (Blog Post).

  • Week 13: Wrote Tests for the integration part of the PHP library with multiversion Module.

  • Week 14: Fixing Bugs, added documentation and fixed other issues.

Future Work:  I’d keep contributing to the organisation as much as possible and I’d finish the tasks which I couldn’t complete in given time. I’ll also try to mentor students in Google Code In. This was such a great experience and learning curve grew exponentially for me. I would be really fortunate if I could experience something like this ever again or if I could get another chance to work full time with my mentors.

Conclusion: Participating in Google Summer of Code has brought me closer to the Drupal community members which comprises of an amazing group of developers. The experience in working with them taught me essential soft skills like effective communication, extensive testing and much more, than just writing code. Not only I learnt about full scale development but also about writing Industrial level code as well which I believe I couldn’t have learnt on my own. Every single time I tried to write code with as few mistakes as possible but there were always some. In order to not make new mistakes, I sometimes made same mistakes multiple times but my mentors (specially Andrei) always informed me about these mistakes and why I shouldn’t repeat them. Writing tests was one of the major learning. I never thought they were this important. After writing code, when I thought now the work is done, it was always the start. Solving errors took twice the time than writing code. Initially when my mentor Dick told me that ideally writing code takes 1/3rd of the total time, I really didn’t take him very seriously until I experienced it myself. Other than this, I learnt the importance of having a good and working debugger. A few hours tasks could take more than a week if you don’t have a debugger working fine at your end as you just simply can not identify the problems sometimes with your code. No matter how fine code you’ve written or how obvious it seems that it should work, there will always be an issue at your end if tests are failing and only a debugger can help you detecting the issue. There were many other little things which I got to learn like working under given time and not sticking to a wrong method if that’s not working rather than spending time with it to make it work. These all things will help me a lot shaping my future as a software developer. I will be forever indebted to them for providing such a nurturing environment.

I would sincerely thank my mentors Dick Olsson, Andrei Jechiu and Tim Millwood, My C0-mentor Matthew Lechleider and whole Drupal community for the support over IRC. A big thank to for giving us a chance to learn about drupal through their website for free. Thanks a lot Google for providing us the opportunity and thanks Drupal for choosing me . I am really grateful.