January 21st, 2016
Most modern software developers have copied and pasted some code, either from the Web or from some other project, into his or her own project. Programmers also need to use shared libraries, jars or dlls to build solutions. We live in an era of open source. If a developer needs a class, jar, dll or simply the code, he or she can use the code that others have already written and shared as open source. This saves both time and hassle.
Software is open source if it has:
Let’s look at what open source is with the following easy-to-understand scenario:
Aunt Rosy bakes awesome cupcakes. She shared her cupcakes and its recipe (named Rosy’s Cupcakes) with all of her family members. Jack and Jill loved the cupcakes and they made the cupcakes themselves using Aunt Rosy’s recipe. Their friends also loved them and the recipe became an instant hit. One friend, Suzy, liked the cupcakes, but she wanted to add her personal touch, so she added nuts to the mix and shared it with her friends. The results were so good that she asked Aunt Rosy to add nuts to her original recipe as well. Aunt Rosy agreed it was a good idea and Rosy’s original recipe was changed to “Rosy’s Cupcakes with Nuts”.
Maria, who got the recipe from Jill, added chocolate chips and her daughters loved it. So like Suzy, she asked Aunt Rosy to add chocolate chips to her original recipe. But Rosy hates chocolate and didn’t want anything chocolatey to be part of her recipe. So she rejected Maria’s request.
Over time, Rosy’s recipe spread far and wide. Many people suggested other ingredients be added to the original recipe. Some of them were accepted by Rosy and some were rejected. Since Maria’s suggestion was rejected by Rosy, Maria decided to create her own recipe using Rosy’s recipe as a base. She then shared the new recipe with her friends and family.
In this example, Aunt Rosy is the “author.” And since she decides which changes to make to her recipe out of all the suggestions she receives, she is also the “maintainer.” The new addition to the original recipe is called the “patch” and if it is accepted by the original author/maintainer (like Suzy’s nuts), then it is called “up-streaming.”. Suzy becomes the “contributor.”
The process of creating a new branch out of an original branch like the one done by Maria is called a “fork”.
As you can see, open source provides a platform for developers and hackers to contribute and showcase their work. Since a lot of people get to use the code, test it and contribute back, it improves the quality of the product. That’s the reason why almost all open source products are so robust.
Let’s say that when sharing her recipe, Aunt Rosy also included a few rules about how the recipe could be shared. Those rules constitute the “license” for that recipe. In open source terms, the rules defined by the developer about how he wants to share his code is called a license. It has to be approved by the OSI (Open Source Initiative). There are about 70 different types of licenses that the OSI will approve.
Copyleft licenses are those that create a contractual obligation to contribute the code back to the community. The code can be shared or distributed under this license only; for example, if you use copyleft licensed code like GPL or GNU in your work. Permissive licenses do not have these limitations. This code can be forwarded, distributed and shared under any license. You just need to provide attribution back to the original author with copyright notice and the disclaimer. Examples of code used under permissive licenses include Apache2 and Eclipse.
Licenses can be broadly divided into three types:
Additionally, from a governance perspective, the three types of licenses can be broadly described as follows:
The final governance model is what could be called the “pure democratic model”. All contributors and members are peers and they together have decisive powers. This model helps in building and making the community stronger.
In the big data ecosystem, many frameworks or tools are open source. Similarly, almost all the Apache products, like MapReduce, Hive, Solr, ES andSpark are licensed under permissive Apache License 2.0. The table below lists a few:
|Apache MapReduce||Apache 2.0|
|MongoDB||GNU-AGPL + Apache|
|Cloudera Impala||Apache 2.0|
|Pivotal Greenplum||Apache 2.0|
|JUnit||Eclipse Public License|
Open source provides a way for your firm to leverage the refined intellectual product of a broad community. Whether you plan to author, use, contribute to, or fork an open source, I hope you now have a good grasp of how the open source process works and the various licenses that exist.