diff --git a/MAIN.html b/MAIN.html new file mode 100644 index 0000000..100c8dd --- /dev/null +++ b/MAIN.html @@ -0,0 +1,9568 @@ + + + + +
+ + + + + + + + +PLEASE NOTE that an audio version of this is available to download via Soundcloud + and YouTube.
+Welcome to Module 5 of the Open Science MOOC: Open Research Software and Open + Source.
+This module has been developed in the open + through collaboration by an international team of Open + Source afficianados. Everything you see here has been developed in the open through interactive feedback + and collaboration from the wider community. It comprises a series of videos, infographics, text-based reading, + and practical tasks for you to sink you teeth into.
+Don’t forget you can join in the discussions over at our open Slack channel. Please do introduce + yourself at #module5opensource, and tell us a bit about who you are, your background, and how you ended up + here!
+This module is designed primarily for computational researchers at the graduate and undergraduate level, as + well as budding data scientists and any other researcher who uses analytical code or software. In a modern + day research environment, this covers pretty much anyone who uses a computer for ther work.
+++“An article about computational result is advertising, not scholarship. The actual scholarship is the + full software environment, code and data, that produced the result.” - J. Buckheit and D. L. Donoho, 1995. +
+
Software and technology underpin much of modern research, which is now almost inevitably computational in + one way or another - search engines, social networking platforms, analytical software, and digital + publishing. With this, there is an ever-increasing demand for more sophisticated Open Source Software, + matched by an increasing willingness for researchers to openly collaborate on new tools.
+The power of Open Source is in that it lowers the barriers to collaboration and adoption, therefore + allowing ideas and technology to spread more rapidly. This Module will introduce the necessary tools + required for transforming software into something that can be openly accessed and re-used by others.
++ Image by Patrick Hochstenbach (CC0 1.0 Universal) +
+Learn the characteristics of open software; understand the ethical, legal, economic, and + research impact arguments for and against Open Source Software, and further understand the + quality requirements of open code.
+Be able to turn code made for personal use into open code which is accessible by others.
+Use software (tools) that utilizes open content and encourages wider collaboration.
+Virtually all modern scientific research workflows rely on a range of software tools, either operating on + different datasets, with different parameters, and applied iteratively in various ways (data science) or + operating on different inputs and using models and methods to predict some output state (computational + science). Open Source Software (OSS) is computer software in which the full source code is available under a + specific license that enables other users to access, view, modify, and redistribute that code for any purpose. + Because OSS requires such a license, it typically remains free of charge by default. This explicit licensing + is also what differentiates OSS from free software. Re-using OSS for analysis, simulation and visualisation + for research is also typically easier and more flexible compared to proprietary software. Often, whether we + know it or not, we are already using OSS as part of our own research workflows.
+OSS fits into the broader scheme of Open Science as it helps to make the full research environment, including + the software that produced the research results, fully accessible and re-usable. As such, it forms a necessary + component for the best practices (Jiménez + et al., 2018) and repeatability and reproducibility of research (both personally and by others), along + with other components, such as sharing data (Stodden, + 2010).
+In some cases, sharing of source code can even be conditional for the acceptance of associated research + manuscripts (Shamir + et al., 2013). It is also generally perceived to increase research impact (Vandwalle, + 2012).
+Some of common advantages for developers include:
+Increased developer loyalty and empowerment;
+Lower costs of services and marketing;
+Increased branding of services and products;
+Production of high quality software at lower expense;
+Flexibility and rapid innovation;
+Customisation and modular integration;
+Increased reliability and independence; and
+Based on open standards available to everyone.
+As such, the main advantages for researchers (users) include lower costs, increased + transparency, increased security and stability, no vendor ‘lock in’ with + increased user control, and overall higher quality. Furthermore, sharing OSS + allows researchers to receive credit for their efforts, for example through direct software citation (Smith + et al., 2016).
+Commonly used OSS include the Mozilla Firefox internet + browser and the LibreOffice full office suite. LibreOffice is + similar to the popular Microsoft Office, including a word processor, spreadsheet manager, and slide + presentation software, but is completely free and Open Source.
+Some regard the OSS movement to represent a counter-movement to neoliberalism and privatisation, through + defiance of regulations and norms in the construction and re-use of information, and a potential + transformation of modern-day capitalism through making software abundantly available with minimal effort. See + The free/open source software movement: Resistance or + change? by Panayiota Georgopoulou for more on this topic.
+The Open Source Initiative, one of the pioneers of OSS, offers the + following definition:
+Don’t worry, you don’t need to memorise all of this, but it’s good to know the principles that OSS is + coming from.
+Free Redistribution: The license shall not restrict any party from selling or giving + away the software as a component of an aggregate software distribution containing programs from several + different sources. The license shall not require a royalty or other fee for such sale.
+Source Code: The program must include source code, and must allow distribution in source + code as well as compiled form. Where some form of a product is not distributed with source code, there + must be a well-publicized means of obtaining the source code for no more than a reasonable reproduction + cost preferably, downloading via the Internet without charge. The source code must be the preferred form + in which a programmer would modify the program. Deliberately obfuscated source code is not allowed. + Intermediate forms such as the output of a preprocessor or translator are not allowed.
+Derived Works: The license must allow modifications and derived works, and must allow + them to be distributed under the same terms as the license of the original software.
+Integrity of The Author’s Source Code: The license may restrict source-code from being + distributed in modified form only if the license allows the distribution of “patch files” with the source + code for the purpose of modifying the program at build time. The license must explicitly permit + distribution of software built from modified source code. The license may require derived works to carry a + different name or version number from the original software.
+No Discrimination Against Persons or Groups: The license must not discriminate against + any person or group of persons.
+No Discrimination Against Fields of Endeavour: The license must not restrict anyone from + making use of the program in a specific field of endeavour. For example, it may not restrict the program + from being used in a business, or from being used for genetic research.
+Distribution of License: The rights attached to the program must apply to all to whom + the program is redistributed without the need for execution of an additional license by those parties.
+License Must Not Be Specific to a Product: The rights attached to the program must not + depend on the program’s being part of a particular software distribution. If the program is extracted from + that distribution and used or distributed within the terms of the program’s license, all parties to whom + the program is redistributed should have the same rights as those that are granted in conjunction with the + original software distribution.
+License Must Not Restrict Other Software: The license must not place restrictions on + other software that is distributed along with the licensed software. For example, the license must not + insist that all other programs distributed on the same medium must be open-source software.
+License Must Be Technology-Neutral: No provision of the license may be predicated on any + individual technology or style of interface.
+Now, this all might be a little complex to remember. However, it can be summarised as making software as + re-usable as possible for future works, while also being freely available.
+There are a number of existing platforms and tools that support OSS and collaboration. The Open Science Training Handbook provides a + check-list to use for evaluating the ‘openness’ of existing research software, based on the Open Source + Definition above:
+[ ] Is the software available to download and install?
+[ ] Can the software easily be installed on different platforms?
+[ ] Does the software have conditions on the use?
+[ ] Is the source code available for inspection?
+[ ] Is the full history of the source code available for inspection through a publicly available version + history?
+[ ] Are the dependencies of the software (hardware and software) described properly? Do these + dependencies require only a reasonably minimal amount of effort to obtain and use?
+Check, check, check, done! Simples.
+There are two main camps within the free software community: The free software movement, and + the OSS movement. Both have differing ideologies based on user liberties and the practical + applications of software. Often, the term ‘FLOSS’ is used to reconcile these two political camps, and means + ‘Free/Libre and Open Source Software’; Libre being French and Spanish for ‘free’ in the context of freedom. +
+The core principle of re-use is what separates OSS from ‘Free Software’. Free and Open Source Software (FOSS) + is an inclusive term to describe software that can be classified as both free and Open Source. A good example + of FOSS is the Ubuntu Linux operation system.
+The big difference between free software and OSS is that the former must distribute updated versions under + the same license as the original, whereas newer versions of OSS can be distributed under different licenses. + FOSS combines the best of both worlds.
+These definitions have now become widely adopted, both by international governments, as well as some large + organisations such as the Mozilla Foundation and the + Wikimedia Foundation. Major organisations in the FLOSS + space include the UK’s Software Sustainability Institute, who + produce valuable resources such as their recent Software Deposit Guidance for + Researchers.
+A typical open source project has the following types of formal roles:
+Typically, roles are made public through either the README file, a Contributors file, or a
+ separate team page for the project.
Virtual environments and machines are becoming increasingly popular as high-powered research workflow + enablers, and many of these are built upon OSS (e.g., operating systems, programming languages, and data + processing frameworks). Popular services include Google Cloud + and Amazon Web Services, which also assist with database storage and + content delivery, as well as computational power. InsideDNA is a computing + platform for reproducible research in bioinformatics, genomics and the life sciences.
+As mentioned above, LibreOffice provides an Open Source alternative to Microsoft + Office. The two are almost completely compatible, just with different default file formats. For citation + managers, Zotero is the most popular Open Source alternative to + proprietary platforms such as Mendeley or EndNote.
+Zotero uses the BibTeX (pronounced ‘bib-tech’) format, based on LaTeX + (pronounced ‘lay-tech’), and has browser plugins to make citation management simple. By integrating this with + other software such as LibreOffice, it is now possible to have a fully Open Source research workflow in many + cases.
+++Did you know that this entire project was build as an open and collaborative community effort in GitHub?
+
GitHub is a popular hosting site for both software and non-software + content (often called ‘notebooks’), with added capabilities for version control, project management and + tracking, and storage services. GitHub is built on top of the OSS Git, + which enables users to work remotely to maintain, share, and collaborate on research software and other + non-software based projects.
+Version control is essentially a process that takes snapshots of the files in a repository, and tracks + modifications to them. It records when the changes were made, what they were, and who did them. If several + people are working on one file at once, any overlapping changes are detected, and must be resolved prior to + continuing. This provides a much more streamlined and automated process than manually saving and recording + changes as projects develop. It also avoids the inevitable lists of confusing named file versions…
+
+
+
+ GitHub helps us to avoid, er, sub-optimal file naming conventions (source: XKCD) +
+One of the more popular and useful functions of GitHub is the issue + tracker, which is used to organise OSS development. The above link takes you to the issue tracker for + the development of this module! If you think there is something here that can improved, or you want to + comment on, anyone can add or contribute to an issue there!
+Other similar project hosting services include BitBucket, GitLab, and Launchpad. If the + recent acquisition of GitHub by Microsoft is a bit off-putting to you, these are great alternatives.
+However, we also know that GitHub can have quite a high learning curve. Which is why the first practical + task for this MOOC will teach you how to set up your first GitHub project repository!
+GO + TO TASK 1: Building your first GitHub repository
+Especially in scientific research, Open Source Software usage and development has become practically the + norm. There’s a number of reasons for this beyond those that apply to the general acceptance of OSS by, for + example, consumers, industry, or government. Among these reasons are:
+Increasingly, algorithms implemented in analysis software form an integral part of the methods described + in scholarly publications. As such, it is completely at odds with rigorous peer review if these algorithm + implementations are closed to outsiders.
+Scientific collaboration more often than not spans multiple institutions and distributed research + networks where secrecy and command hierarchy is not maintained in a way that is ‘necessary’ for closed + source development.
+Many computational analyses are run in virtualized environments (such as institutional, national, or + international ‘cloud’ infrastructures) and hosted on multi-user servers. Closed-source, commercial + software often disallows such usage.
+OSS development often relies on volunteers. In a time of budgetary constraints for scientific research, + this is a clear advantage.
+For these and other reasons, Open Source tools are very commonly used in scientific research. This includes + usage in fields where many researchers are amateur developers themselves and rely on tools such as R for statistical analysis and scripting, which, in the last decade, + has almost completely displaced commercial software for statistical analysis such as SPSS or JMP in a lot of + fields. In fields such as bioinformatics, that involve a lot of file handling of the outputs of DNA sequencing + platforms, general purpose scripting languages such as Python and + commonly used libraries built on top of it (such as biopython) have become + a vital part of the toolkit of many researchers.
+
+
+
+ Python +
+Tools such as R and Python are essentially software for writing software. Although programming is an + increasingly common activity among researchers, of course not every scientist does this. One step + away from programming is the chaining together of the inputs and outputs of various analysis tools in longer + workflows. As an example from genomics, a very common workflow is to start out with high-throughput sequencing + reads and then i) do basic quality control checks; ii) map the reads against a reference genome; iii) identify + the points where the new data are at variance with the reference. These steps are routinely executed as a + workflow where a different Open Source executable is run in a Linux command-line environment for each of the + three steps. Although this is arguably not quite open source software development, it does involve the usage + and production of open source artifacts (such as Linux shell scripts) for which the principles that we discuss + in this module are applicable.
+
+
+
+ R +
+Lastly, OSS is also used in scientific research for reasons that more closely mirror those that drive the
+ adoption of OSS in wider society, namely that it is cheap. For example, individuals or organizations might
+ decide to switch from Microsoft Office to LibreOffice for manuscript writing or spreadsheet processing because
+ the latter is free (both as in ‘free
+ beer’ and ‘free speech’). Likewise, the choice to switch from ArcGIS to QGIS for the analysis of geographic information might be prompted
+ simply by cost considerations.
I’m using X[e.g. Matlab,STATA,Excel] and I want to transition to something more open. What are the + next steps?
+Even if you are using proprietary software, you can usually still share your source code/documents etc. + The best first step is sharing whatever you can.
+Great! I can put them in my new github repo.
+If that’s enough for you for now great! If not for most pieces of proprietary software there are Open Source + equivalents. Have a go with one and see what you think.
+| Closed | +Open | +
|---|---|
| Matlab | +Python, Julia | +
| STATA/SPSS | +R | +
| MS Office | +LibreOffice | +
| Mathematica | +JupyterLab | +
| Test out your new Pull Request + -PR- Skills … | +… by adding your own example here + | +
Cool! But if I make the switch will I be stuck: taking ages to learn a new tool/ without support + /with buggy software.
+Good question! The answer is it depends. The best thing to do is find someone who’s made the switch before + and learn from their experience. Or just do a Google search! Some OSS is much better than their closed + counterparts, some aren’t, so it’s worth choosing carefully.
+The most likely person who might want to re-use your software in the future is…you! So while sharing is + always better than not sharing, you can make your own life, and that of others, much easier through + appropriate documentation. Documentation can include several things, such as including helpful comments and + annotations in the code that help to explain why a particular action was performed, rather than what it is + intended to achieve.
+One of the most critical aspects of this is including an informative README file, that
+ accompanies almost every OSS project, and some times even more than one. It can be a good practice to include
+ one such file in every directory, that includes a list of files, a table of contents, and what the purpose of
+ the directory is. The README file is typically just plain text or markdown (again, such as all of
+ the ones for the MOOC!), and can include critical information for how to install and run software, previous
+ dependencies and requirements, as well as tutorials or examples.
++Did you know… The term
+READMEis some times playfully ascribed to the famous + scene in Lewis Carroll’s Alice’s Adventures In Wonderland in which Alice confronts magic munchies labeled + with “Eat Me”" and “Drink Me”. Potent.
The purpose here is to provide sufficient information to maximise the re-use and reproducibility of the + computational environment, such that someone with no experience with the project can easily access and re-use + the software (Sandve + et al., 2013). By lowering the barriers to entry, you increase the chances of others being able to + re-use your work, which is one of the ultimate goals of OSS (Ince + et al., 2012).
+An extension of this that can help to make things even easier for future re-use is ‘container’ technology. + Containers are like an ecosystem frozen in time, where the code, the data, any other dependencies, are all + perfectly preserved, packaged and saved in the present functioning versions. This means that anyone in the + future any one can come in and run the analyses again. As such, they are generally good for re-use, but this + can come at the sacrifice of modification or understanding by others, as often a lot of details can be hidden + within the source code and its dependencies. Common examples of container implementation in research include + Rocker + (a Docker container for the R language), Binder, and + Code Ocean.
+Sustainable software is good software.
+The 10 simple rules for making computational research more reproducible, based on Sandve + et al., (2013), are:
+
+
+
+ Infographic adapted from Sandve et al., (2013). Feel free to download this and print it out to keep handy + during your research! +
+If you follow these steps, along with the processes in Task + 1 and Task + 2, you should be fine!
+An Open Source license is a type of license designed specifically for software and code that make it explicit + what the legal conditions for sharing and re-use are. As mentioned above, the addition + of a suitable license is what differentiates publicly shared software from OSS. For example, the widely used + MATLAB is proprietary software, and Octave is an openly licensed alternative programming + language.
+There are currently more than 1,400 unique Open Source licenses, a complexity born from the difficulty in + understanding the differences between the legal implications across different license.
+Some of the more common licenses include:
+You don’t need to know all the legal itty gritty behind all of these, but it is good to at least know what + options are avaiilable to you.
+There are two ways in which contributions to a project become licensed:
+Thankfully, the process of selecting an Open Source license is relatively trivial, thanks to user-friendly + tools such as Choose A License. Each of these licenses allows other + users to use, copy, distribute, and build upon your work, often while ensuring that the creators are + appropriately recognised for their work. Here, the key is selecting an appropriate license for your work, + depending on what you want, or do not want, others to do with it.
+Citations provide one of the most important interactions in scholarly research, forming the basis of our + referencing and metrics systems. Typically, this is performed thanks to the assistance of a permanent unique + identifier such as a Digital Object + Identifiers (DOI). A DOI is a persistent identifier, implemented in the Handle System, that meets a common standard, + depending on the purpose, such as for identifying academic information. Such identification is critical for + tracking the genealogy and provenance of research, for reproducibility, as well as for giving appropriate + credit to those who have created the software. Importantly, software should be considered a legitimate output + from scholarly research, and citation is becoming an increasingly common way to indicate that.
+In 2016, Smith + et al., 2016 wrote a research paper about the principles of software citation as part of the FORCE11 + Software Citation Working Group. In the same way that you would want to cite software that you have used as + part of good research practices, it is important to make your research easily citable too. When citing any + software used for your own research, you should include at minimum:
+The six principles of software citation by Smith + et al., (2016) are provided here:
+Importance: Software should be considered a legitimate and citable product of research. + Software citations should be accorded the same importance in the scholarly record as citations of other + research products, such as publications and data; they should be included in the metadata of the citing + work, for example in the reference list of a journal article, and should not be omitted or separated. + Software should be cited on the same basis as any other research product such as a paper or a book, that + is, authors should cite the appropriate set of software products just as they cite the appropriate set of + papers.
+Credit and attribution: Software citations should facilitate giving scholarly credit and + normative, legal attribution to all contributors to the software, recognizing that a single style or + mechanism of attribution may not be applicable to all software.
+Unique identification: A software citation should include a method for identification + that is machine actionable, globally unique, interoperable, and recognized by at least a community of the + corresponding domain experts, and preferably by general public researchers.
+Persistence: Unique identifiers and metadata describing the software and its disposition + should persist - even beyond the lifespan of the software they describe.
+Accessibility: Software citations should facilitate access to the software itself and to + its associated metadata, documentation, data, and other materials necessary for both humans and machines + to make informed use of the referenced software.
+Specificity: Software citations should facilitate identification of, and access to, the + specific version of software that was used. Software identification should be as specific as necessary, + such as using version numbers, revision numbers, or variants such as platforms.
+Note: For instructions on ‘how to make your software citable’ see the section Using GitHub and Zenodo below and Task + 2: Linking GitHub and Zenodo.
+GitHub is a popular tool for project management, content storage, and version control. + Note that GitHub itself is not OSS. However, Git, the tool which it is based on, is. Git is designed to help + manage the source code files, and the updates to them, for a software-related project. However, it can also be + extended to other non-software projects; for example, this MOOC!
+However, getting research onto GitHub is just the first step. It is equally important to make it persistent + and re-usable, which is why having a Digital Object Identifier (DOI) associated with it can be useful. The + simplest way to do this is through a service called Zenodo, which is a free + and open source multi-disciplinary repository created by OpenAIRE and CERN, and can be used to assign a DOI to + individual GitHub repositories. There is a GitHub + Guide that explains the details, which involve linking GitHub repositories directly through to Zenodo so + that when developers create formal releases for their software, Zenodo creates and archives a that version of + the software.
+There’s nothing special about using Zenodo for creating DOIs, other than its free of cost; + other general repositories can also be used, such as DataCite DOI + Fabrica, or your own institutional repositories such as Caltech’s. +
+A lot of researchers might typically be afraid of sharing code which is incomplete, buggy, or imperfect. + However, in the OSS community, such a practice of sharing ‘raw’ code is fairly commonplace. Sharing code + openly enables others to re-use and improve it, as well as to engage in a deeper way with any research + associated with it. This is one of the fundamental aspects of peer-collaboration, perhaps best exemplified by + the traditional process of research manuscript peer review.
+Task 2 will guide you through the process of linking a GitHub repository to Zenodo for archiving.
+++Did you know… All content produced for this MOOC is available as part of a community in Zenodo?
+
GO + TO TASK 2: Linking GitHub and Zenodo
+Often, OSS is developed in a public, decentralised, collaborative manner between multiple contributors. The + purpose of this is to enhance the diversity and scope of a project and its design, in order to become more + beneficial and sustainable. Such an approach was famously likened to a ‘bazaar’ model by Eric Raymond, an + early OSS proponent. One of the major guiding principles of this is that of peer production, + which relies on self-organised communities to regulate the development of content, co-ordinated towards a + shared goal or outcome.
+OSS projects rely heavily on volunteer collaboration, which often entails a constant flux of newcomers in + order to become productive and sustainable (Steinmacher + et al., 2014). Creating the right social atmosphere for a project, and a welcoming engagement + environment, are often critical to successful collaboraitons in OSS.
+Hopefully now you have come to see the importance of software as a cornerstone of modern science, and the + importance that OSS plays in this.
+The learning outcomes from this should be:
+You will now be able to define the characteristics of OSS, and some of the ethical, legal, economic and + research impact arguments for and against it.
+Based on community standards, you will now be able to describe the quality requirements of sharing and + re-using open code.
+You will now be able to use a range of research tools that utilise OSS.
+You will now be able to transform code designed for their personal use into code that is accessible and + re-usable by others.
+Software developers will be able to make their software citable, and software users will know how to cite + the software they use.
+BONUS TASK
+If you have completed Task + 1 and Task + 2, we also have a BONUS TASK for you, if you want to take your skills a step further. + Task + 3 will take you a step deeper into integrating Git into a typical research workflow by showing you how + to integrate it with R Studio. It is recommended that you have completed the first 2 tasks before proceeding + with this one.
+However, your Open Source journey does not stop here! This was just the beginning, and there are some + incredible resources out there if you would like to do or learn more:
+If you feel particularly inspired by this, you can endorse the Science Code Manifesto, which is based on the five + principles of code, copyright, citation, credit, and curation.
+To launch and develop your own project, the Open Source Guides + program offers a range of practical guides and skills to help launch and advance your OSS projects.
+For a detailed look at OSS-based research workflows, the Open Science, Open Data, Open Source hand-guide by + Pedro L. Fernandes and Rutger A. Vos is one of the top resources online.
+More formalised journal venues also exist for software-based articles, including The Journal of Open Research Software and The Journal of Open Source Software. A list of such venues is also available.
+The PLOS Open Source Toolkit provides a + global forum for Open Source hardware and software research and applications.
+The NumFOCUS is a nonprofit organization that supports and promotes + world-class, innovative, open source scientific software. Some of the projects they sponsor include:
+IPython and Jupyter Notebook + initiatives.
+rOpenSci, which promotes the open source R statistical environment + for transparent and reproducible research.
+To gain more hands on experience with OSS, the Software + Carpentry community holds regular workshops to improve lab-based computing skills (Wilson + et al., 2017).
+These references here are just the beginning. They include some of the most useful general overviews of + the Open Source landscape in research. However, if you want to be find something more specific to your own + research field, then that path is there for you to explore!
+The Future of Research in Free/Open Source Software Development (Scacchi, + 2010).
+The Scientific Method in Practice: Reproducibility in the Computational Sciences (Stodden, + 2010).
+The case for open computer programs (Ince + et al., 2012).
+Current issues and research trends on open-source software communities (Martinez-Torres + and Diaz-Fernandez, 2013).
+Ten simple rules for reproducible computational research (Sandve + et al., 2013).
+A systematic literature review on the barriers faced by newcomers to open source software projects (Steinmacher + et al., 2014).
+Knowledge sharing in open source software communities: motivations and management (Iskoujina + and Roberts, 2015).
+Software citation principles (Smith + et al., 2016).
+An introduction to Rocker: Docker containers for R (Boettiger + and Eddelbuettel, 2017).
+Good enough practices in scientific computing (Wilson + et al., 2017).
+Four simple recommendations to encourage best practices in research software (Jiménez + et al., 2017).
+Know a way this content can be improved?
+Time to take your new GitHub skills for a test-run! All content development primarily happens here. + If you have a suggested improvement to the content, layout, or anything else, you can make it and then it + will automatically become part of the MOOC content after verification from a moderator!
+ +This is to provide a structured check list to track content development.
+Status traffic light scheme:
+Green: All looks good
+Green
+Orange: Issues that can impact launch date
+Orange
+Red: Launch date in danger
+Red
+| Design Phase | +Delivered | +Status badge | +Deadline | +Comments | +
|---|---|---|---|---|
| Initiate and plan | ++ | + | + | + |
| Kick off | +Yep | +2018/05/10 | +Sprint success! | +|
| Define target group | +Yep | +2018/05/31 | +Sprint success! | +|
| Refine learning objectives/outcomes | +Yep | +2018/05/31 | +Sprint success! | +|
| Design course outline | +Yep | +2018/05/31 | +Sprint success! | +|
| Design project plan and timeline | +Yep | +2018/06/31 | ++ | |
| Identify promotion channels | +Yep | +2018/06/31 | ++ | |
| Design and scripting | ++ | + | + | + |
| Identify key resources | +Yep | +2018/06/31 | +Sprint success! | +|
| Design learner activities | +Yep | +2018/06/31 | +3/3 completed | +|
| Find existing key resources | +Yep | +2018/06/31 | +Sprint success! | +|
| Write audio/video scripts | +In prep | +2018/08/31 | +6/6 completed | +|
| Review all learning resources | +In prep | ++ | + | 2018/11/31 | +
| Finalise all scripts | +In prep | ++ | + | 2018/11/31 | +
| Copyright strategy | +Yep | ++ | 2018/08/31 | +|
| Recording and editing | ++ | + | + | + |
| Record on location/in studio | ++ | + | + | + |
| Edit all audio/visual material | ++ | + | + | + |
| Internal reviewing | ++ | + | + | + |
| Cross-check and review content | +In prep | +2018/08/31 | +Continuous process | +|
| Checks from Steering Committee | +In prep | +2018/08/31 | +Continuous process | +|
| External testing and review | ++ | + | + | + |
| All reviewing conducted via GitHub | +In prep | +2018/08/31 | +Continuous process | +|
| Existing channels from communications strategy | ++ | + | + | + |
| Internal reviewing and finalisation | ++ | + | + | + |
| Cross-review and check content | ++ | + | + | + |
| Final checks from Steering Committee | ++ | + | + | + |
| Implementation | ++ | + | + | + |
| Agreement on platform | +In prep | +2018/08/31 | ++ | |
| Module logo designed | +Yep | +2018/08/31 | ++ | |
| Module description and introduction | +Yes | +2018/07/31 | ++ | |
| Team member and guest lecturer agreements | +Yes | +2018/07/31 | ++ | |
| Team member and guest lecturer profiles | +Yes | +2018/07/31 | ++ | |
| Course readings acquired | +Yes | +2018/07/31 | ++ | |
| Port content to selected platform | ++ | + | + | + |
| All content deposited in Zenodo | +Yep | +2018/08/31 | +Second release completed | +|
| Promotion | ++ | + | + | + |
| Content and communication calendar/strategy/timeline | +In progress | ++ | + | |
| Identify relevant channels (mailing lists, social media and hashtags, organisations, individuals, + websites, conferences) | +Yes | +2018/07/31 | ++ | |
| Images for use in social media | +Yep | +2018/07/31 | ++ | |
| Course title marketing check | +Yes | +2018/07/31 | ++ | |
| Launch | ++ | + | + | + |
| Publicity start | +Yes | +Dec 2018 | ++ | |
| Open and free for all, continuous, self-paced learning, 100% online | +Yes | +Dec 2018 | +Continuous, self-paced | +|
| Soft launch | +Yes | ++ | Dec 2018 | +|
| Course launch | +Yes | ++ | + | |
| Monitoring of learner experiences and reactions | +In progress | ++ | + | Jan 2019 | +
| Prepare to provide additional information if required | +Pending | ++ | + | + |
| Reviewing and optimisation | ++ | + | + | + |
| Collate and review learner feedback at regular intervals | +In prep | ++ | + | + |
| Track any new information during course duration | +In prep | ++ | + | + |
| Prepare evaluation report | +Pending | ++ | + | + |
| Evaluation meeting | +Pending | ++ | + | + |
| Optimise content where relevant | +Pending | ++ | + | + |
This task is designed for students and researchers who want to create their first Open Source project (software + or non-software) on GitHub. GitHub is a place for you to come and play and experiment with new research + workflows, and is really just the beginning to help set the stage for your own pathways and ideas.
+Don’t forget you can join in the discussions over at our open Slack channel. Please do introduce + yourself at #module5opensource, and tell us a bit about who you are, your background, and how you ended up here! +
+PLEASE NOTE that a screen recording for this task is also available via YouTube.
+Estimated time to complete: 30-45 minutes.
+Estimate time saving once complete: Unimaginable..
+
+
+
+ The workflow for Task 1. Keep this handy as you work through the task! +
+A ‘repository’ is really just a fancy name for a project on GitHub. GitHub is a place online where you can + manage projects, store files, and openly collaborate with others. This is all achieved by using version + control to track projects as they progress. As such, GitHub is a powerful tool for both software and + non-software projects.
+One of the most important things to consider at this early stage is to think about how you want the wider + community to interact with your project. As you are working in the open, you want to make sure others feel + comfortable in accessing, viewing, and engaging with your work. Setting up a repository in a way that lowers + the barriers to entry, and the fear of being an ‘outsider’ is the first step towards maintaining a successful + project.
+
+
+
+ Octocat, GitHub’s little mascot +
+To set up a GitHub profile, simply head to the main page and click Sign + Up for GitHub. Here, you can create your personal account, with a username, email, and password as + standard.
+
+
+
+ Sign up for GitHub +
+The next step is to set up a personal plan. For now, simply select the ‘Unlimited public repositories for + free’ plan, unless you are concerned about privacy, in which case select the private plan. If you intend to + set up a project for an organisation, you can select that option too.
+This is possibly the most confusing and off-putting aspect of GitHub. Here are some of the most commonly + used terms and their definitions:
+Whew! Don’t worry about memorising all of these for now. Like any new skill, familiarity comes + with experience.
+You can probably see how some of these are fairly similar to things like save, copy, paste - standard + workflow operations, but adapted for a software management process. There are a few more too, but + this should do for getting started.
+If you are interested, most of these terms come from the underlying Git + system. Git was built to allow developers to manage different versions of source code in a distributed + manner, which is great. It has lots of features and the ability to do lots of complex stuff, written by a very + clever guy. However, the user interface was not designed with new + users in mind, so it can be hard to learn.
+
+
+
+ Unbeatable guide to using Git. (Source: XKCD) +
+On your GitHub profile, click the ‘Create new repository’. The first step is to create a name as the brand + for your project. Ideally, it should be memorable and give some indication of what the project does.
+
+
+
+ Create a new repository +
+Make sure not to duplicate names, infringe upon other trademarks, or name it anything that could be + considered to be offensive.
+Any GitHub repository requires 4 key elements to get started and to begin developing a welcoming community: +
+README file;These are critical aspects and best practices of any project for users to understand their legal rights, + their expectations, the purpose of the project, and to improve the overall user experience.
+All four of these files should be kept in the root directory for your project repository. It is convention to
+ use markdown file formats (.md) for most of these files (though the license file is most often
+ plain text (.txt)), and capitalise all file names. Instead of spaces in file names, make sure to
+ use underscores _ .
So you should end up with a foundational file selection like this:
+LICENSE.mdREADME.mdCONTRIBUTING.mdCODE_OF_CONDUCT.md
+
+
+ The basic repository structure +
+Choosing an appropriate license is what will differentiate your Open Source repository from publicly + available software. While you are not obliged to choose a license, doing so guarantees that others will be + able to modify, share, re-use, and build upon your project within a legal framework.
+To start with, you want to check Choose A License to find a + license that best suits your intentions for the repository.
+The three primary ones to choose from are:
+Thankfully, when you start a new repository on GitHub, you are given the option to select an existing + license from a drop-down menu. You should always (with very few exceptions) use an existing license, since + this is what potential users and contributors will see before they choose to use or contribute to your + software.
+
+
+
+ Choosing an example license +
+If they don’t have one you want, you can add one you like manually. To do this, simply click ‘Create new
+ file’ in the repository, and copy and paste an existing license text in. Name the file something like
+ LICENSE.txt or LICENSE.md to make it clear, and keep it in the main repository
+ folder (i.e., the root). Make sure to add a clean commit message, and you’re done!
++Helping hand: This MOOC uses a different combination of licenses for code content and + non-code content. Here you can find an example of the MIT + License that we apply for all code and software generated as part of the MOOC production.
+
When you initialise your new repository, there should be an option to do so with a README
+ file. Just like Alice in Wonderland, these do exactly what they say - provide key information about the
+ project. These are typically the first thing outside contributors will see when they come to your
+ repository, so making them informative and welcoming is key.
+
+
+ Part of the README file for this module +
+The file will originally be in markdown (.md) format. This is a lightweight markup language
+ with a plain text format. To learn some basic markdown, see this cheatsheet. But for now,
+ we can just use plain text.
There are several things you will want to include in your README file:
++Pro-tip: Later on as your project develops, you might want to add FAQs based on + community feedback, or a tutorial to help users understand how your project works.
+
Remember that not everyone coming to your project will be an expert, or understand what it is you are doing
+ and why. Having a well-documented README file will enhance the user experience for people with
+ a range of prior knowledge.
When the README file is included in the root directory, GitHub will automatically display this
+ on the homepage of your repository. This means it is the first thing people will often see, so make it
+ count!
++Helping hand: Here, you can find the
+READMEfile used for this MOOC + module. This includes information on the status, rationale, learning outcomes, development team, key + documents, and license to help. You can copy and adapt this structure for your own projects as needed.
Contributing guidelines are designed to communicate to potential contributors a short guide on how to + engage with your project and community. You want to make sure to be welcoming, and indicate that you are + eager for participants to engage with your project. Whenever a participant opens a new pull request or + creates a new issue, they will see a link to your contribution file.
+
+
+
+ Part of the CONTRIBUTING guidelines for this module
+
Sticking with the all caps file names, the next step is to create a CONTRIBUTING file. Click
+ ‘Create new file’, and make sure to save it in markdown format as before. This file will tell other users
+ how they can engage with and participate in your project. This is the first step towards establishing a
+ community around your project, so make it engaging, concise, and informative.
The CONTRIBUTING file should include information on:
++Pro-tip: Consider starting off with a short thank you note for people taking the time to + consider contributing - they have clicked on the file to learn more after all! If there are other methods + of recognition that you have in mind, make sure to include them in here too.
+
Here, you are essentially trying to encourage people to volunteer their time to advance your project. Make + sure to be welcoming and friendly, and be precise about how people can engage. When writing this, make sure + to think about it from the user perspective - how can you make their life easier when submitting pull + requests and opening issues to make the whole project run more smoothly.
+++Helping hand: The Contributing + guidelines for this MOOC module include some very specific things: an introduction to using Git and + GitHub, tips for getting started, contact information, how to alter the content and repor issues, a link + to the
+READMEfile, and information on the preferred content and code styles. Feel free to + copy and adapt this for your own project as needed.
A code of conduct is important for setting the ground rules for expected behaviour and participation for + project contributors, and is an easily referenced document for showing that your project team takes + constructive dialogues seriously. Therefore, it is a critical element for creating and maintaining a healthy + community that engages in a constructive and productive manner within a positive social atmosphere.
+A code of conduct not only provides expectations of behaviour, but also describes who those expectations + apply to, when they apply, what to do should a violation of the code occur, and what the action items for + this will be. As such, points of contact need to be made clear in the code of conduct. Typically, this + should be in a private way such as an email address.
+++Pro-tip: In case a violation needs to be reported about the person who receives those + reports, make sure to include an option to contact a secondary party.
+
To add a code of conduct, you can create your own from scratch by adding a new markdown file, or use
+ existing templates such as the Contributor Covenant. Name
+ your file CODE_OF_CONDUCT.md, and make sure it is visible in the README file.
++Helping hand: This MOOC also has a Code + of Conduct based on the Contributor Covenant. As you can see, it includes information on expected + standards of behaviour, responsibilities of those in the community, and enforcement of the CoC including + contact details. Feel free again to re-use and adapt this to your project as you see fit.
+
+
+
+ Part of the CODE OF CONDUCT file for this module, based on the Contributor Covenant +
+Making sure to enforce the code of conduct is important, as it shows that not only do you value the code, + but you respect the influence that it has on your community. It is important to treat each member of the + community with the respect, courtesy, and importance that they deserve. Should a violation occur, or a + repeat offender makes consistent violations, it is best to refer to the Open Source Guide to + see how to enforce the code of conduct.
+If you want to make your code citable from the start, you should store the metadata needed for a citation
+ from the start, by creating a [codemeta.json](https://codemeta.github.io) file or a
+ [CITATION.cff](https://citation-file-format.github.io) file. Both will allow tooling that is
+ currently being developed to automatically create citation information, rather than asking you to type it in
+ a form later.
If you’re interested, cite.research-software.org provides + further background information about software citation in academia.
+Issues are not necessarily problems with a project, but also suggestions for improvement, things to develop + in the future, and comments and feedback about the project to work through. They can be openly shared and + discussed with contributors as needed, sort of like a forum.
+If you are a project lead, it is important to maintain a list of issues that make it clear to contributors + what aspects of the project need attention. It is also important to engage with as many issues as possible + from others in a positive manner, to show that you take their contributions seriously.
+Key elements for issues include:
+
+
+
+ The issue tracker for the Open Scholarship Strategy project +
+Within issues it is possible to use @ mentions to notify other contirbutors about the issue, and to get the + right people engaged in an effective manner. GitHub has an internal system of notifications, just like + Facebook or Twitter, and can also send emails to people who are mentioned in the issue tracker. This can all + be customised for individuals within the user settings.
+So now you are ready to launch your project, begin advertising it, and getting contributions! Before + continuing, make sure that you have:
+LICENSE file that is an exact copy of an Open Source licenseREADME, CONTRIBUTING, and
+ CODE_OF_CONDUCT filesCONGRATULATIONS!
+You have now launched an Open Source research project! Hopefully, from here on out, your work will act to + benefit the wider community, forge new collaborations, and create new and fantastic opportunities for you all. + Try and think about ways in which these skills can be applied to future projects, and how they might also have + helped with some in the past.
+From now on, it is all up to you! Some advice is to:
+Know a way this content can be improved?
+Time to take your new GitHub skills for a test-run! All content development primarily happens here. + If you have a suggested improvement to the content, layout, or anything else, you can make it and then it will + automatically become part of the MOOC content after verification from a moderator!
+ +This task is designed for students and researchers who want to create and re-use GitHub-based + projects/repositories in the academic literature.
+Don’t forget you can join in the discussions over at our open Slack channel. Please do introduce + yourself at #module5opensource, and tell us a bit about who you are, your background, and how you ended up here! +
+Estimated time to complete: 45-60 minutes.
+
+
+
+ The workflow for Task 2. Keep this handy as you work through the task! +
+ +Although the integration of GitHub and Zenodo makes it really easier to work with these tools nowadays + (January 2019), it is important to stress that there are alternatives to GitHub (Gitlab, Bitbucket,…) and + alternatives to Zenodo (Other repositories might be more suited to your community, you might ask your + colleagues). For instance, one can work with Gitlab and manually upload each new versions to your university + repository, getting a DOI. The principles (working with a version control system online, and archiving major + versions in a repository which provides a persistent unique identifier) can be applied in different workflow. +
+++Pro-tip: Make sure to include a LICENSE and README file in your repository. This will + indicate to people the purpose of the project, and how they can engage with it in the future.
+
Find out how to set up a GitHub repository in this other guide Task + 1: Building a GitHub repository which is also part of ‘Module 5: Open Research Software and Open + Source’.
+Once on your GitHub project listings page at github.com head to the + ‘Repositories’ tab. Select which repository you would like to archive, and open it up.
+Now head over to zenodo.org. Zenodo is a platform where you can permanently + archive your code and other project elements. Zenodo does this by assigning projects a Digital Object + Identifier (DOI), which also helps to make the work more citable. This is different to GitHub, + which acts as a place where the actual work on a project takes place, rather than long-term archiving of it. + At GitHub, content can be modified, deleted, rewritten, and irreversibly changed, which makes it a bit + concerning to be used for longer lasting referencing purposes. Zenodo offers more security and permanence for + research outputs.
+
+
+
+ Sign up for Zenodo +
+If you already have a Zenodo account, this is easy. If not, follow the steps to create one — you can even + login using your GitHub account or ORCID profile to make things simpler, as Zenodo has a built in integration + for it. This might be easier than creating yet another research account and profile.
+If you have got this far, this means that Zenodo is now authorised to configure the repository webhooks that + it needs to archive the repository and issue it a DOI. To do this, on the Zenodo website navigate to the GitHub repository listing page and simply click the + ‘on’ button next to your repository.
+
+
+
+ Enable individual GitHub repositories to be preserved in Zenodo +
+Now you have set up a new webhook between Zenodo and your repository. In GitHub, click on the settings for + your repository, and the Webhooks tab on the left hand side menu. This should display the new Zenodo webhook + configured to Zenodo. Note, it may take a little time for the webhook listing to show up.
+
+
+
+ Check that webhooks are enabled for your GitHub repository. Example here using the Open Scholarship + Strategy +
+The first time you archive a repository is known as the ‘first release’. Each time you create a new version + of that repository and archive it, you create a new release. This can be tracked in the ‘releases’ tab for + your repository on GitHub (top center).
+
+
+
+ Check that the repository first release was successful. Example here using the Open Scholarship + Strategy +
+For the first archived version of your repository, click ‘Create a new release’ back in Zenodo. Fill in the + form and give some details as to what the release entails. For the first release, make sure to call it v1.0.0, + as is standard practice.
+
+
+
+ Create a new release. Example here using the Open Scholarship Strategy, for which a first release already + exists +
+Finally, click ‘publish release’, and your archive will be published and versioned on GitHub.
+To view your release on Zenodo you need to visit the Upload tab. To + finish the archiving a few more details are needed on Zenodo.
+
+
+
+ Check the new release has been uploaded. Example here shown using the Open Scholarship Strategy +
+This is sometimes referred to as DOI ‘minting’, and requires a couple of extra bits of information about the + repository on Zenodo. On Zenodo click the Upload tab in the main + menu, and your newly uploaded repository should be there. Scroll down the page and fill in the extra + information as needed, required fields are marked with a red asterisk, and then click ‘Publish’.
+Note: Only after this extra information has been added will your DOI become live. It may + also take a short time for the DOI to become active. Example DOI shown below (for the Open Scholarship + Strategy again).
+++ +Pro-tip: Copy the URL for the DOI into the README file for your GitHub repo to make + cross-linking even easier, as well as present a clear highlighted DOI badge for users to see and make use of + your DOI. You only need to do this once with your first release DOI as it acts as a ‘concept DOI’ and is + linked to all subsequent release DOIs.
+
The GitHub/Zenodo integration will now assign a DOI to each version/release of a project repository. This + enables users to refer to and cite specific versions of projects. Also, the list of authors for the citation + is automatically determined by the GitHub user account names used by the repository - this means no-one gets + left out. Author details can be edited later on Zenodo. DOIs used in Zenodo are registered through the DataCite service.
+++Pro-tip: Create a ‘human-readable’ version of this citation in your project’s README file. + This will be helpful to researchers who might not be familiar with using DOIs to create citations, and make + it easier for others to cite your software and acknowledge your work. An example of this could be: Jon + Tennant. (2018, July 30). Foundations for Open Scholarship Strategy Development: First formal release + (Version 1.2). Zenodo. http://doi.org/10.5281/zenodo.1323437
+
CONGRATULATIONS!!
+Your GitHub repository is now archived in Zenodo, and with a DOI that can be versioned to reflect updates to + the repository version through time. You should be able to see details of this on the GitHub Zenodo page for + your repository. This also means that your archived projects can get picked up by other indexing services and + search engines that use DOIs too.
+Providing a long-term archive and a DOI for your work is required for others to be able to properly cite it, + as this provides basic citation metadata. For Open Science, it is important to be able to cite the software + that you use in your research, and this integrated workflow enables that to happen, in line with best + practices for research citation. Furthermore, this practice is important in elevating the standard of software + (and related projects) to that of the standard of other research outputs.
+++Pro-tip: Is your research funded by an EU grant? Now you can directly connect your + archived project to your grant by updating the grant section of the metadata on the project’s Zenodo record. + This massively helps to increase its discoverability!
+
So now you have a sustainably archived GitHub repository in Zenodo that is ready to be re-used and cited! + Before continuing, make sure that you have:
+Making your code citable - GitHub Guides. +
+Know a way this content can be improved?
+Time to take your new GitHub skills for a test-run! All content development primarily happens here. + If you have a suggested improvement to the content, layout, or anything else, you can make it and then it + will automatically become part of the MOOC content after verification from a moderator!
+ +This task is designed for students and researchers who want to implement a system of version control within a + standard R-based workflow. This can be applied to a range of software development, data analysis and project + management tasks. Your future research self will thank your for the convenience.
+Don’t forget you can join in the discussions over at our open Slack channel. Please do introduce + yourself at #module5opensource, and tell us a bit about who you are, your background, and how you ended up here! +
+Estimated time to complete: 30 minutes
+Estimate time saving once complete: Virtually infinite
+NOTE A video guide version of this task is now available on YouTube.
+Congratulations on making it this far! If you’re reading this, you’ve survived pull requests, web-hooks, and + can probably even tell us know what the F in FOSS stands for (not Frustration…) Hopefully, you have + overcome any scepticism or reluctance towards the benefits of GitHub and Open Source Software, and are ready + to take the next step.
+Before starting this Task, please make sure you have already completed Task + 1 and Task + 2, so that you are more familiar with GitHub and some standard Open Source practices.
+This task will teach you how to integrate the version control software, Git, with the popular coding + environment, RStudio. And yes, it is Git as in gif or God, not Jit as in the wrong way of pronouncing things. +
+If you are one of those researchers who thinks that having code spread across multiple hard-drives that are + waiting to break, Dropbox, Google Drive, or any other non-specialist software, this task is just for you. If + you have ever experienced the mind-numbing process of having multiple ‘final’ versions of a paper bouncing + between different co-authors, this is also for you.
+All of us are guilty of these sorts of things once in a while, but there are ways to do it that are better + for you, future you, and those who might benefit from your work.
+So, what is Git, and how is it different to GitHub? Git is a version control system, which enables you to + save and track time-stamped copies of your work throughout the development process. It also works with + non-code items too, like this MOOC, the majority of which was written in markdown in RStudio, and integrated + with a Git/GitHub workflow.
+This is important, as all research goes through changes and sometimes we want to know what those things
+ were. Did you delete some text that you now think is important? Version control will save that for you. Did
+ your code work perfectly in the past, but is now buggy beyond belief? Version control. It’s a great way to
+ avoid that chaotic state where you have multiple copies of the same file, but without a stupid and annoying
+ file naming convention. FINAL_Revised_2.2_supervisor_edits_ver1.7_scream.txt will be a thing of
+ the past.
GitHub is the platform that allows you to seamlessly share code from your workspace (e.g., laptop) to be + hosted in an online space. So, sort of like the public interface to GitHub. The advantages of Git/GitHub + are:
+While this was primarily designed for source code, it should be instantly obvious how this becomes a + powerful tool for virtually all research workflows.
+RStudio is a popular coding environment for researchers who use the statistical programming language, R. It + comes with a text editor, so you don’t have to install another and switch between. It also includes a + graphical user interface (GUI) to Git and GitHub, which we will be using here.
+Isn’t it nice when brilliant Open Source tools integrate seamlessly like that. This should help to make + your daily use of Git much simpler.
+If at any point you need to install new packages for R, simply use the following command:
+install.packages("PACKAGE NAME", dependencies = TRUE)
Replacing PACKAGE NAME with the, er, package name. Some examples you can play with that might
+ come in useful include knitr, devtools or ggplot2.
++Pro-tip: To update all of your R packages in one, simply execute the following code +
+update.packages(ask = FALSE, checkBuilt = TRUE)
For now, just choose all the usual default options for each install. Depending on which Operating System + (e.g., Mac, Windows, Linux), this might be different for each of you. For now, and for the rest of this task, + we’re going to stick with doing things the easy-ish Windows way (but also provide some instructions for using + the command line).
+For Linux or Debian users, simply use the following command to install Git:
+sudo apt-get install git-core
For Mac users, this link, or purchase a new laptop with a + different operating system.
+If you want, you can also download the local version of GitHub and + use it through the simple GUI. It’s available on Windows and Mac and Linux, and can make your life a little + easier, especially if you want to use a different platform to RStudio.
+++Pro-tip: You see when installing Git it says ‘Use Git Bash as shell for Git projects?’ + This is the place where you can use the command-line to access Git from outside of RStudio. It’s a powerful + beast. Try the following two commands to get started:
+
git config --global user.name 'YOUR USERNAME'
+ git config --global user.email 'YOUR EMAIL'
Right, that’s the easy bit done. Next, go into RStudio, and in the tabs at the top go to Go to Tools + > Global Options > Git/SVN. SVN is just another version control system like Git, and we don’t + need to worry about that here.
+In the place where it says Git executable, add the pathway here to the git.exe file that you just + downloaded in the previous step. Make sure the box here that says Enable version control interface for + RStudio projects is ticked. This now has tied version control to future projects in RStudio, to + provide a really powerful additional dimension to collaborative or solo work.
+
+
+
+ The Global Options window inside RStudio +
+Next, hit the button in this window that says Create RSA Key, This is a private key that is used for + authentication between different systems, and saves you from having to type in your password over and over. + Here, it will pop up a new window with a public key, that you want to copy to your clipboard.
+Head over to GitHub, go to your profile settings, and the SSH and GPG keys tab. Click New SSH + key. Here, paste in the key from RStudio, and call it something imaginative like ‘RStudio’.
+
+
+
+ Inside GitHub where you will want to enter the key you just generated in RStudio +
+OK, now hold on to your butts, we’re going into the command line. Don’t worry if you’ve never used the shell + before because it’s quite similar to using R, or any other coding system. The main difference here though is + that instead of calling functions like in R, you call commands.
+So back in RStudio, go to Tools > Shell, and it will open up a command prompt window. If + you already played with the Git Bash above, you should have done this step already. Enter the following two + commands:
+git config --global user.name 'YOUR USERNAME'
+ git config --global user.email 'YOUR EMAIL'
Hopefully it does not have to be said to substitute in your own GitHub username and email here. You can + access this at any point just by finding the ‘Shell’ within Windows. Or, if you right click on any folder on + your Desktop that is linked to a GitHub repo, you can open up the Shell instantly and Bash away.
+What this stage has done is configure Git, which is software that runs on your desktop, to GitHub, which is a + repository website.
+Restart R Studio. Whew, that was tough. Next.
+OK, hold your breathe, we’re going to pause here just to learn some basic Git commands. Some of the key ones + you could do with learning are:
+Add: This is where you submit files to the staging area before being committed.
+Commit This is like ‘saving’ your work by creating a new version or copy.
+Push: This is how you send files from your local project to the online repository.
+Pull: This is how you get files from your online repository to your local project.
+Back in RStudio, type in the following into the Terminal, or by opening up a new Shell:
+git add .
It won’t actually do anything for now, but in the future will add all files in your current working directory
+ (that’s what the . does) to staging ready for a commit.
Now, in Task 1, you should have learned how to build your very first GitHub repository. If you haven’t done + that, we can wait here while you go and do that. If you have already, or have an existing GitHub repository, + we can move on.
+So, you should have a repository on GitHub, complete with a README file, a LICENSE
+ file and some other bits and bobs.
What we are going to do now, is integrate that repository with Git. Steady now.
+What you just did was tell RStudio to associate a new project in R with specific repository on GitHub.
+If you still haven’t built your first repository on GitHub, we can do something slightly different here. In
+ RStudio, click New project and then New Directory. Call it what you want and change the
+ directory as needed, make sure to tick Create a git repository, and then click Create
+ Project. This creates an .Rproj file, which you can manage in the usual way through
+ RStudio, including adding README.mdand LICENSE.md files as discussing in Task 1.
Remember that README file we created a while back? Well, it’s time to write it. Thinking back to
+ Task 1, there were some specific things that we said make a good README file. Do you remember
+ what any of them were? Just to refresh your memory, these were:
So, in RStudio, open that file try adding just a bit of information about this for your project. If you are + doing this for an actual project, try and make it useful. If you are just tinkering for now, you can add what + you want.
+Remember that your README file is in markdown (.md) format. For a refresher on some of the
+ simple syntax markdown uses, check this handy cheatsheet.
+
+
+ Screenshot of what this module looks in markdown, during development. Meta. +
+OK, so now you should have a nicely edited README file. Now we are going to ‘commit’ this to the
+ project using Git. This is basically the equivalent of saving this version of your project, with a record of
+ what changes were made. Successive commits produce a history that can be examined at a later time, allowing
+ you to work with confidence.
There are a few ways of doing this.
+Let’s just stick with the second option for now. This Git pane shows you which files have been changed and + includes buttons for the most important Git commands we saw earlier.
+Select the README file in the Git window, which should show up automatically if you have made
+ any edits to it. This adds that file to the ‘staging’ area, which is sort of like the pre-saving space for
+ your work. Click ‘Commit’ and a new window should pop up.
Here, you have a chance to review your changes, and write a nice commit message. Type in something brief, but + informative about the changes that you have made in this version or snapshot of your work. You want this to be + enough information so that if you or someone else looks back on it, you’ll know why you made this commit and + the changes associated with it. These are like safety nets for your project in case you need to fall back for + some reason.
+++Pro-tip: Here, you will see a list of all the changes you have made since your last + commit. Older removed lines are in red, and newly added lines are in green. Double check these to make sure + that the edits you have made are the ones you intended to make. This is really helpful for spotting typos, + stray edits, and any other little mistakes you might have accidentially introduced. Safety first.
+
Note If you are colour-blind and can’t see which lines have been added or removed, you can + use the line numbers in the two columns on the left of the window as a guide. Here, the number in the first + column identifies the older version, and the number in the second column identifies the new version.
+Now when you click ‘Commit’, another window will pop up, telling you how many files you have changed and the + number of lines within that file you have changed. Close that little window down.
+Click the Push button in the top right of the new window. A new window will pop up now. What this is
+ doing is synchronising the files changed on your local repository with the README file to the
+ online version of the project on GitHub.
To do this from the Shell, use the following command:
+git push -u origin master
Some times here you will be prompted to add your username and password from GitHub, which you should do if + asked.
+Close that window down, and the next one. Go to your project on GitHub, refresh, and check that the
+ README file is still there in all its newly edited glory. You should see the commit message you
+ made next to the file too.
OPTIONAL ADVANCED/AWESOME STEP
+Alright, so you just pushed some content to your first repo, awesome! Now let’s put it into practice for a + real project. Like, the one you are participating in right now. Let’s try this out:
+Go to the repositors for this project on GitHub
+Fork the repository to your own GitHub account. The URL for this should be:
+ https://github.com/OpenScienceMOOC/Module-5-Open-Research-Software-and-Open-Source.git
Head into RStudio, go to File > New Project, choose Version Control, select + Git, and then paste the forked repository URL found in your copy of the repository. You now have + your own versioned copy of this whole module. Neat. Save this somewhere on your local machine.
+Now, you need to tell Git that a different version of this project exists. Open up the Shell,
+ and enter the command:
+ git remote add upstream https://github.com/OpenScienceMOOC/Module-5-Open-Research-Software-and-Open-Source
+
What you just did was name the original branch here upstream, just to keep things simple for
+ now. Now, create a new branch to document your changes to this independent of the main
+ branch. Enter the command: git checkout -b proposed-changes master
You just created a new branch called proposed-changes where you can now edit all of the
+ content and files to your heart’s delight. Hopefully, the structure of this project is simple enough for
+ you to navigate around. All of the raw files for the MOOC can be found in the
+ content_development folder, and this is Task_3.md.
If you scroll to the bottom of Task_3.md, you should see a place where you can edit in your
+ name and affiliation. Add these in, and then go through the commit procedure detailed above. If you see
+ anything else that needs editing too, feel free to add them in too!
Now, you want to push the changes back to the original branch. Use the following command in your
+ Shell: git push origin proposed-changes
Go back to GitHub and find your fork here. Click the little green button, and create a pull request. This + is essentially a review to integrate the changes made into the original branch for this MOOC project.
+The owners in charge of the MOOC project will now get a notification of this, review it, and confirm it + if everything went to plan! We will review it, and if it all went okay, your name will now appear for all + eternity as someone who completed this advanced task.
+Have a cup of tea, coffee, or wine to celebrate!
+CONGRATULATIONS
+You just integrated Git with R Studio, and made your first change to a version controlled project. Your life + will now never be the same, and your research workflow will probably be more rapid, agile, and collaborative + than ever. Good luck going back to Word.
+The great thing is that this doesn’t have to just be used for code. You can use it for plain text, markdown, + html, and, well, R code. The possibilities are limitless - what you have just learned is a new form of openly + collaborative project management that works for an enormous range of tasks.
+From now on, it is all up to you! Some advice is to:
+Make frequent commits. Treat Git like your puppy, in that it requires constant and special attention. + Just a pat on the head every now and then is enough to keep it satisfied, but it’ll be happiest with + sustained servicing.
+The best way to do this is to make a commit each time you work on a specific problem. For example, + writing a paragraph, running an analysis, or fixing a bug.
+Push often. Don’t let those commits build up, otherwise you run more risk of getting into merge + conflicts. Seeing as these can be the stuff of nightmares, just make sure to push often.
+Pull often. If others are working remotely on the same project, you will want to stay up to date with + their changes. Make sure to frequently pull in their changes from GitHub to make sure you are all in sync. +
+Experiment and explore! This task really only scratches the surface, and there are many different + functions, tools, and ways this can be used. Really, it is up to you to find out how to use this + information to improve your research workflow, and ultimately collaborate on better, more open and + reliable research!
+To learn more about issues, branches, merge conflicts, pull requests, and other advanced aspects of using + Git and RStudio, check out this awesome guide by Hadley + Wickham.
+Know a way this content can be improved?
+Time to take your new GitHub skills for a test-run! All content development primarily happens here. + If you have a suggested improvement to the content, layout, or anything else, you can make it and then it will + automatically become part of the MOOC content after verification from a moderator!
+PLEASE NOTE that an audio version of this is available to download via Soundcloud and YouTube.
-Welcome to Module 5 of the Open Science MOOC: Open Research Software and Open Source.
-This module has been developed in the open through collaboration by an international team of Open Source afficianados. Everything you see here has been developed in the open through interactive feedback and collaboration from the wider community. It comprises a series of videos, infographics, text-based reading, and practical tasks for you to sink you teeth into.
-Don’t forget you can join in the discussions over at our open Slack channel. Please do introduce yourself at #module5opensource, and tell us a bit about who you are, your background, and how you ended up here!
-This module is designed primarily for computational researchers at the graduate and undergraduate level, as well as budding data scientists and any other researcher who uses analytical code or software. In a modern day research environment, this covers pretty much anyone who uses a computer for ther work.
---“An article about computational result is advertising, not scholarship. The actual scholarship is the full software environment, code and data, that produced the result.” - J. Buckheit and D. L. Donoho, 1995.
-
Software and technology underpin much of modern research, which is now almost inevitably computational in one way or another - search engines, social networking platforms, analytical software, and digital publishing. With this, there is an ever-increasing demand for more sophisticated Open Source Software, matched by an increasing willingness for researchers to openly collaborate on new tools.
-The power of Open Source is in that it lowers the barriers to collaboration and adoption, therefore allowing ideas and technology to spread more rapidly. This Module will introduce the necessary tools required for transforming software into something that can be openly accessed and re-used by others.
--Image by Patrick Hochstenbach (CC0 1.0 Universal) -
-Learn the characteristics of open software; understand the ethical, legal, economic, and research impact arguments for and against Open Source Software, and further understand the quality requirements of open code.
Be able to turn code made for personal use into open code which is accessible by others.
Use software (tools) that utilizes open content and encourages wider collaboration.
Virtually all modern scientific research workflows rely on a range of software tools, either operating on different datasets, with different parameters, and applied iteratively in various ways (data science) or operating on different inputs and using models and methods to predict some output state (computational science). Open Source Software (OSS) is computer software in which the full source code is available under a specific license that enables other users to access, view, modify, and redistribute that code for any purpose. Because OSS requires such a license, it typically remains free of charge by default. This explicit licensing is also what differentiates OSS from free software. Re-using OSS for analysis, simulation and visualisation for research is also typically easier and more flexible compared to proprietary software. Often, whether we know it or not, we are already using OSS as part of our own research workflows.
-OSS fits into the broader scheme of Open Science as it helps to make the full research environment, including the software that produced the research results, fully accessible and re-usable. As such, it forms a necessary component for the best practices (Jiménez et al., 2018) and repeatability and reproducibility of research (both personally and by others), along with other components, such as sharing data (Stodden, 2010).
-In some cases, sharing of source code can even be conditional for the acceptance of associated research manuscripts (Shamir et al., 2013). It is also generally perceived to increase research impact (Vandwalle, 2012).
-Some of common advantages for developers include:
-Increased developer loyalty and empowerment;
Lower costs of services and marketing;
Increased branding of services and products;
Production of high quality software at lower expense;
Flexibility and rapid innovation;
Customisation and modular integration;
Increased reliability and independence; and
Based on open standards available to everyone.
As such, the main advantages for researchers (users) include lower costs, increased transparency, increased security and stability, no vendor ‘lock in’ with increased user control, and overall higher quality. Furthermore, sharing OSS allows researchers to receive credit for their efforts, for example through direct software citation (Smith et al., 2016).
-Commonly used OSS include the Mozilla Firefox internet browser and the LibreOffice full office suite. LibreOffice is similar to the popular Microsoft Office, including a word processor, spreadsheet manager, and slide presentation software, but is completely free and Open Source.
-Some regard the OSS movement to represent a counter-movement to neoliberalism and privatisation, through defiance of regulations and norms in the construction and re-use of information, and a potential transformation of modern-day capitalism through making software abundantly available with minimal effort. See The free/open source software movement: Resistance or change? by Panayiota Georgopoulou for more on this topic.
-The Open Source Initiative, one of the pioneers of OSS, offers the following definition:
-Don’t worry, you don’t need to memorise all of this, but it’s good to know the principles that OSS is coming from.
-Free Redistribution: The license shall not restrict any party from selling or giving away the software as a component of an aggregate software distribution containing programs from several different sources. The license shall not require a royalty or other fee for such sale.
Source Code: The program must include source code, and must allow distribution in source code as well as compiled form. Where some form of a product is not distributed with source code, there must be a well-publicized means of obtaining the source code for no more than a reasonable reproduction cost preferably, downloading via the Internet without charge. The source code must be the preferred form in which a programmer would modify the program. Deliberately obfuscated source code is not allowed. Intermediate forms such as the output of a preprocessor or translator are not allowed.
Derived Works: The license must allow modifications and derived works, and must allow them to be distributed under the same terms as the license of the original software.
Integrity of The Author’s Source Code: The license may restrict source-code from being distributed in modified form only if the license allows the distribution of “patch files” with the source code for the purpose of modifying the program at build time. The license must explicitly permit distribution of software built from modified source code. The license may require derived works to carry a different name or version number from the original software.
No Discrimination Against Persons or Groups: The license must not discriminate against any person or group of persons.
No Discrimination Against Fields of Endeavour: The license must not restrict anyone from making use of the program in a specific field of endeavour. For example, it may not restrict the program from being used in a business, or from being used for genetic research.
Distribution of License: The rights attached to the program must apply to all to whom the program is redistributed without the need for execution of an additional license by those parties.
License Must Not Be Specific to a Product: The rights attached to the program must not depend on the program’s being part of a particular software distribution. If the program is extracted from that distribution and used or distributed within the terms of the program’s license, all parties to whom the program is redistributed should have the same rights as those that are granted in conjunction with the original software distribution.
License Must Not Restrict Other Software: The license must not place restrictions on other software that is distributed along with the licensed software. For example, the license must not insist that all other programs distributed on the same medium must be open-source software.
License Must Be Technology-Neutral: No provision of the license may be predicated on any individual technology or style of interface.
Now, this all might be a little complex to remember. However, it can be summarised as making software as re-usable as possible for future works, while also being freely available.
-There are a number of existing platforms and tools that support OSS and collaboration. The Open Science Training Handbook provides a check-list to use for evaluating the ‘openness’ of existing research software, based on the Open Source Definition above:
-[ ] Is the software available to download and install?
[ ] Can the software easily be installed on different platforms?
[ ] Does the software have conditions on the use?
[ ] Is the source code available for inspection?
[ ] Is the full history of the source code available for inspection through a publicly available version history?
[ ] Are the dependencies of the software (hardware and software) described properly? Do these dependencies require only a reasonably minimal amount of effort to obtain and use?
Check, check, check, done! Simples.
-There are two main camps within the free software community: The free software movement, and the OSS movement. Both have differing ideologies based on user liberties and the practical applications of software. Often, the term ‘FLOSS’ is used to reconcile these two political camps, and means ‘Free/Libre and Open Source Software’; Libre being French and Spanish for ‘free’ in the context of freedom.
-The core principle of re-use is what separates OSS from ‘Free Software’. Free and Open Source Software (FOSS) is an inclusive term to describe software that can be classified as both free and Open Source. A good example of FOSS is the Ubuntu Linux operation system.
-The big difference between free software and OSS is that the former must distribute updated versions under the same license as the original, whereas newer versions of OSS can be distributed under different licenses. FOSS combines the best of both worlds.
-These definitions have now become widely adopted, both by international governments, as well as some large organisations such as the Mozilla Foundation and the Wikimedia Foundation. Major organisations in the FLOSS space include the UK’s Software Sustainability Institute, who produce valuable resources such as their recent Software Deposit Guidance for Researchers.
-A typical open source project has the following types of formal roles:
-Typically, roles are made public through either the README file, a Contributors file, or a separate team page for the project.
Virtual environments and machines are becoming increasingly popular as high-powered research workflow enablers, and many of these are built upon OSS (e.g., operating systems, programming languages, and data processing frameworks). Popular services include Google Cloud and Amazon Web Services, which also assist with database storage and content delivery, as well as computational power. InsideDNA is a computing platform for reproducible research in bioinformatics, genomics and the life sciences.
-As mentioned above, LibreOffice provides an Open Source alternative to Microsoft Office. The two are almost completely compatible, just with different default file formats. For citation managers, Zotero is the most popular Open Source alternative to proprietary platforms such as Mendeley or EndNote.
-Zotero uses the BibTeX (pronounced ‘bib-tech’) format, based on LaTeX (pronounced ‘lay-tech’), and has browser plugins to make citation management simple. By integrating this with other software such as LibreOffice, it is now possible to have a fully Open Source research workflow in many cases.
---Did you know that this entire project was build as an open and collaborative community effort in GitHub?
-
GitHub is a popular hosting site for both software and non-software content (often called ‘notebooks’), with added capabilities for version control, project management and tracking, and storage services. GitHub is built on top of the OSS Git, which enables users to work remotely to maintain, share, and collaborate on research software and other non-software based projects.
-Version control is essentially a process that takes snapshots of the files in a repository, and tracks modifications to them. It records when the changes were made, what they were, and who did them. If several people are working on one file at once, any overlapping changes are detected, and must be resolved prior to continuing. This provides a much more streamlined and automated process than manually saving and recording changes as projects develop. It also avoids the inevitable lists of confusing named file versions…
-
-
-
-GitHub helps us to avoid, er, sub-optimal file naming conventions (source: XKCD) -
-One of the more popular and useful functions of GitHub is the issue tracker, which is used to organise OSS development. The above link takes you to the issue tracker for the development of this module! If you think there is something here that can improved, or you want to comment on, anyone can add or contribute to an issue there!
-Other similar project hosting services include BitBucket, GitLab, and Launchpad. If the recent acquisition of GitHub by Microsoft is a bit off-putting to you, these are great alternatives.
-However, we also know that GitHub can have quite a high learning curve. Which is why the first practical task for this MOOC will teach you how to set up your first GitHub project repository!
-GO TO TASK 1: Building your first GitHub repository
-Especially in scientific research, Open Source Software usage and development has become practically the norm. There’s a number of reasons for this beyond those that apply to the general acceptance of OSS by, for example, consumers, industry, or government. Among these reasons are:
-Increasingly, algorithms implemented in analysis software form an integral part of the methods described in scholarly publications. As such, it is completely at odds with rigorous peer review if these algorithm implementations are closed to outsiders.
Scientific collaboration more often than not spans multiple institutions and distributed research networks where secrecy and command hierarchy is not maintained in a way that is ‘necessary’ for closed source development.
Many computational analyses are run in virtualized environments (such as institutional, national, or international ‘cloud’ infrastructures) and hosted on multi-user servers. Closed-source, commercial software often disallows such usage.
OSS development often relies on volunteers. In a time of budgetary constraints for scientific research, this is a clear advantage.
For these and other reasons, Open Source tools are very commonly used in scientific research. This includes usage in fields where many researchers are amateur developers themselves and rely on tools such as R for statistical analysis and scripting, which, in the last decade, has almost completely displaced commercial software for statistical analysis such as SPSS or JMP in a lot of fields. In fields such as bioinformatics, that involve a lot of file handling of the outputs of DNA sequencing platforms, general purpose scripting languages such as Python and commonly used libraries built on top of it (such as biopython) have become a vital part of the toolkit of many researchers.
-
-
-
-Python -
-Tools such as R and Python are essentially software for writing software. Although programming is an increasingly common activity among researchers, of course not every scientist does this. One step away from programming is the chaining together of the inputs and outputs of various analysis tools in longer workflows. As an example from genomics, a very common workflow is to start out with high-throughput sequencing reads and then i) do basic quality control checks; ii) map the reads against a reference genome; iii) identify the points where the new data are at variance with the reference. These steps are routinely executed as a workflow where a different Open Source executable is run in a Linux command-line environment for each of the three steps. Although this is arguably not quite open source software development, it does involve the usage and production of open source artifacts (such as Linux shell scripts) for which the principles that we discuss in this module are applicable.
-
-
-
-R -
-Lastly, OSS is also used in scientific research for reasons that more closely mirror those that drive the adoption of OSS in wider society, namely that it is cheap. For example, individuals or organizations might decide to switch from Microsoft Office to LibreOffice for manuscript writing or spreadsheet processing because the latter is free (both as in ‘free beer’ and ‘free speech’). Likewise, the choice to switch from ArcGIS to QGIS for the analysis of geographic information might be prompted simply by cost considerations.
I’m using X[e.g. Matlab,STATA,Excel] and I want to transition to something more open. What are the next steps?
-Even if you are using proprietary software, you can usually still share your source code/documents etc. The best first step is sharing whatever you can.
-Great! I can put them in my new github repo.
-If that’s enough for you for now great! If not for most pieces of proprietary software there are Open Source equivalents. Have a go with one and see what you think.
-| Closed | -Open | -
|---|---|
| Matlab | -Python, Julia | -
| STATA/SPSS | -R | -
| MS Office | -LibreOffice | -
| Mathematica | -JupyterLab | -
| Test out your new Pull Request -PR- Skills … | -… by adding your own example here | -
Cool! But if I make the switch will I be stuck: taking ages to learn a new tool/ without support /with buggy software.
-Good question! The answer is it depends. The best thing to do is find someone who’s made the switch before and learn from their experience. Or just do a Google search! Some OSS is much better than their closed counterparts, some aren’t, so it’s worth choosing carefully.
-The most likely person who might want to re-use your software in the future is…you! So while sharing is always better than not sharing, you can make your own life, and that of others, much easier through appropriate documentation. Documentation can include several things, such as including helpful comments and annotations in the code that help to explain why a particular action was performed, rather than what it is intended to achieve.
-One of the most critical aspects of this is including an informative README file, that accompanies almost every OSS project, and some times even more than one. It can be a good practice to include one such file in every directory, that includes a list of files, a table of contents, and what the purpose of the directory is. The README file is typically just plain text or markdown (again, such as all of the ones for the MOOC!), and can include critical information for how to install and run software, previous dependencies and requirements, as well as tutorials or examples.
--Did you know… The term
-READMEis some times playfully ascribed to the famous scene in Lewis Carroll’s Alice’s Adventures In Wonderland in which Alice confronts magic munchies labeled with “Eat Me”" and “Drink Me”. Potent.
The purpose here is to provide sufficient information to maximise the re-use and reproducibility of the computational environment, such that someone with no experience with the project can easily access and re-use the software (Sandve et al., 2013). By lowering the barriers to entry, you increase the chances of others being able to re-use your work, which is one of the ultimate goals of OSS (Ince et al., 2012).
-An extension of this that can help to make things even easier for future re-use is ‘container’ technology. Containers are like an ecosystem frozen in time, where the code, the data, any other dependencies, are all perfectly preserved, packaged and saved in the present functioning versions. This means that anyone in the future any one can come in and run the analyses again. As such, they are generally good for re-use, but this can come at the sacrifice of modification or understanding by others, as often a lot of details can be hidden within the source code and its dependencies. Common examples of container implementation in research include Rocker (a Docker container for the R language), Binder, and Code Ocean.
-Sustainable software is good software.
-The 10 simple rules for making computational research more reproducible, based on Sandve et al., (2013), are:
-
-
-
-Infographic adapted from Sandve et al., (2013). Feel free to download this and print it out to keep handy during your research! -
-If you follow these steps, along with the processes in Task 1 and Task 2, you should be fine!
-An Open Source license is a type of license designed specifically for software and code that make it explicit what the legal conditions for sharing and re-use are. As mentioned above, the addition of a suitable license is what differentiates publicly shared software from OSS. For example, the widely used MATLAB is proprietary software, and Octave is an openly licensed alternative programming language.
-There are currently more than 1,400 unique Open Source licenses, a complexity born from the difficulty in understanding the differences between the legal implications across different license.
-Some of the more common licenses include:
-You don’t need to know all the legal itty gritty behind all of these, but it is good to at least know what options are avaiilable to you.
-There are two ways in which contributions to a project become licensed:
-Thankfully, the process of selecting an Open Source license is relatively trivial, thanks to user-friendly tools such as Choose A License. Each of these licenses allows other users to use, copy, distribute, and build upon your work, often while ensuring that the creators are appropriately recognised for their work. Here, the key is selecting an appropriate license for your work, depending on what you want, or do not want, others to do with it.
-Citations provide one of the most important interactions in scholarly research, forming the basis of our referencing and metrics systems. Typically, this is performed thanks to the assistance of a permanent unique identifier such as a Digital Object Identifiers (DOI). A DOI is a persistent identifier, implemented in the Handle System, that meets a common standard, depending on the purpose, such as for identifying academic information. Such identification is critical for tracking the genealogy and provenance of research, for reproducibility, as well as for giving appropriate credit to those who have created the software. Importantly, software should be considered a legitimate output from scholarly research, and citation is becoming an increasingly common way to indicate that.
-In 2016, Smith et al., 2016 wrote a research paper about the principles of software citation as part of the FORCE11 Software Citation Working Group. In the same way that you would want to cite software that you have used as part of good research practices, it is important to make your research easily citable too. When citing any software used for your own research, you should include at minimum:
-The six principles of software citation by Smith et al., (2016) are provided here:
-Importance: Software should be considered a legitimate and citable product of research. Software citations should be accorded the same importance in the scholarly record as citations of other research products, such as publications and data; they should be included in the metadata of the citing work, for example in the reference list of a journal article, and should not be omitted or separated. Software should be cited on the same basis as any other research product such as a paper or a book, that is, authors should cite the appropriate set of software products just as they cite the appropriate set of papers.
Credit and attribution: Software citations should facilitate giving scholarly credit and normative, legal attribution to all contributors to the software, recognizing that a single style or mechanism of attribution may not be applicable to all software.
Unique identification: A software citation should include a method for identification that is machine actionable, globally unique, interoperable, and recognized by at least a community of the corresponding domain experts, and preferably by general public researchers.
Persistence: Unique identifiers and metadata describing the software and its disposition should persist - even beyond the lifespan of the software they describe.
Accessibility: Software citations should facilitate access to the software itself and to its associated metadata, documentation, data, and other materials necessary for both humans and machines to make informed use of the referenced software.
Specificity: Software citations should facilitate identification of, and access to, the specific version of software that was used. Software identification should be as specific as necessary, such as using version numbers, revision numbers, or variants such as platforms.
Note: For instructions on ‘how to make your software citable’ see the section Using GitHub and Zenodo below and Task 2: Linking GitHub and Zenodo.
-GitHub is a popular tool for project management, content storage, and version control. Note that GitHub itself is not OSS. However, Git, the tool which it is based on, is. Git is designed to help manage the source code files, and the updates to them, for a software-related project. However, it can also be extended to other non-software projects; for example, this MOOC!
-However, getting research onto GitHub is just the first step. It is equally important to make it persistent and re-usable, which is why having a Digital Object Identifier (DOI) associated with it can be useful. The simplest way to do this is through a service called Zenodo, which is a free and open source multi-disciplinary repository created by OpenAIRE and CERN, and can be used to assign a DOI to individual GitHub repositories. There is a GitHub Guide that explains the details, which involve linking GitHub repositories directly through to Zenodo so that when developers create formal releases for their software, Zenodo creates and archives a that version of the software.
-There’s nothing special about using Zenodo for creating DOIs, other than its free of cost; other general repositories can also be used, such as DataCite DOI Fabrica, or your own institutional repositories such as Caltech’s.
-A lot of researchers might typically be afraid of sharing code which is incomplete, buggy, or imperfect. However, in the OSS community, such a practice of sharing ‘raw’ code is fairly commonplace. Sharing code openly enables others to re-use and improve it, as well as to engage in a deeper way with any research associated with it. This is one of the fundamental aspects of peer-collaboration, perhaps best exemplified by the traditional process of research manuscript peer review.
-Task 2 will guide you through the process of linking a GitHub repository to Zenodo for archiving.
---Did you know… All content produced for this MOOC is available as part of a community in Zenodo?
-
GO TO TASK 2: Linking GitHub and Zenodo
-Often, OSS is developed in a public, decentralised, collaborative manner between multiple contributors. The purpose of this is to enhance the diversity and scope of a project and its design, in order to become more beneficial and sustainable. Such an approach was famously likened to a ‘bazaar’ model by Eric Raymond, an early OSS proponent. One of the major guiding principles of this is that of peer production, which relies on self-organised communities to regulate the development of content, co-ordinated towards a shared goal or outcome.
-OSS projects rely heavily on volunteer collaboration, which often entails a constant flux of newcomers in order to become productive and sustainable (Steinmacher et al., 2014). Creating the right social atmosphere for a project, and a welcoming engagement environment, are often critical to successful collaboraitons in OSS.
-Hopefully now you have come to see the importance of software as a cornerstone of modern science, and the importance that OSS plays in this.
-The learning outcomes from this should be:
-You will now be able to define the characteristics of OSS, and some of the ethical, legal, economic and research impact arguments for and against it.
Based on community standards, you will now be able to describe the quality requirements of sharing and re-using open code.
You will now be able to use a range of research tools that utilise OSS.
You will now be able to transform code designed for their personal use into code that is accessible and re-usable by others.
Software developers will be able to make their software citable, and software users will know how to cite the software they use.
BONUS TASK
-If you have completed Task 1 and Task 2, we also have a BONUS TASK for you, if you want to take your skills a step further. Task 3 will take you a step deeper into integrating Git into a typical research workflow by showing you how to integrate it with R Studio. It is recommended that you have completed the first 2 tasks before proceeding with this one.
-However, your Open Source journey does not stop here! This was just the beginning, and there are some incredible resources out there if you would like to do or learn more:
-If you feel particularly inspired by this, you can endorse the Science Code Manifesto, which is based on the five principles of code, copyright, citation, credit, and curation.
To launch and develop your own project, the Open Source Guides program offers a range of practical guides and skills to help launch and advance your OSS projects.
For a detailed look at OSS-based research workflows, the Open Science, Open Data, Open Source hand-guide by Pedro L. Fernandes and Rutger A. Vos is one of the top resources online.
More formalised journal venues also exist for software-based articles, including The Journal of Open Research Software and The Journal of Open Source Software. A list of such venues is also available.
The PLOS Open Source Toolkit provides a global forum for Open Source hardware and software research and applications.
The NumFOCUS is a nonprofit organization that supports and promotes world-class, innovative, open source scientific software. Some of the projects they sponsor include:
-IPython and Jupyter Notebook initiatives.
rOpenSci, which promotes the open source R statistical environment for transparent and reproducible research.
To gain more hands on experience with OSS, the Software Carpentry community holds regular workshops to improve lab-based computing skills (Wilson et al., 2017).
These references here are just the beginning. They include some of the most useful general overviews of the Open Source landscape in research. However, if you want to be find something more specific to your own research field, then that path is there for you to explore!
-The Future of Research in Free/Open Source Software Development (Scacchi, 2010).
The Scientific Method in Practice: Reproducibility in the Computational Sciences (Stodden, 2010).
The case for open computer programs (Ince et al., 2012).
Current issues and research trends on open-source software communities (Martinez-Torres and Diaz-Fernandez, 2013).
Ten simple rules for reproducible computational research (Sandve et al., 2013).
A systematic literature review on the barriers faced by newcomers to open source software projects (Steinmacher et al., 2014).
Knowledge sharing in open source software communities: motivations and management (Iskoujina and Roberts, 2015).
Software citation principles (Smith et al., 2016).
An introduction to Rocker: Docker containers for R (Boettiger and Eddelbuettel, 2017).
Good enough practices in scientific computing (Wilson et al., 2017).
Four simple recommendations to encourage best practices in research software (Jiménez et al., 2017).
Know a way this content can be improved?
-Time to take your new GitHub skills for a test-run! All content development primarily happens here. If you have a suggested improvement to the content, layout, or anything else, you can make it and then it will automatically become part of the MOOC content after verification from a moderator!
- -PLEASE NOTE that an audio version of this is available to download via Soundcloud + and YouTube.
+Welcome to Module 5 of the Open Science MOOC: Open Research Software and Open + Source.
+This module has been developed in the open + through collaboration by an international team of Open + Source afficianados. Everything you see here has been developed in the open through interactive feedback + and collaboration from the wider community. It comprises a series of videos, infographics, text-based reading, + and practical tasks for you to sink you teeth into.
+Don’t forget you can join in the discussions over at our open Slack channel. Please do introduce + yourself at #module5opensource, and tell us a bit about who you are, your background, and how you ended up + here!
+This module is designed primarily for computational researchers at the graduate and undergraduate level, as + well as budding data scientists and any other researcher who uses analytical code or software. In a modern + day research environment, this covers pretty much anyone who uses a computer for ther work.
+++“An article about computational result is advertising, not scholarship. The actual scholarship is the + full software environment, code and data, that produced the result.” - J. Buckheit and D. L. Donoho, 1995. +
+
Software and technology underpin much of modern research, which is now almost inevitably computational in + one way or another - search engines, social networking platforms, analytical software, and digital + publishing. With this, there is an ever-increasing demand for more sophisticated Open Source Software, + matched by an increasing willingness for researchers to openly collaborate on new tools.
+The power of Open Source is in that it lowers the barriers to collaboration and adoption, therefore + allowing ideas and technology to spread more rapidly. This Module will introduce the necessary tools + required for transforming software into something that can be openly accessed and re-used by others.
++ Image by Patrick Hochstenbach (CC0 1.0 Universal) +
+Learn the characteristics of open software; understand the ethical, legal, economic, and + research impact arguments for and against Open Source Software, and further understand the + quality requirements of open code.
+Be able to turn code made for personal use into open code which is accessible by others.
+Use software (tools) that utilizes open content and encourages wider collaboration.
+Virtually all modern scientific research workflows rely on a range of software tools, either operating on + different datasets, with different parameters, and applied iteratively in various ways (data science) or + operating on different inputs and using models and methods to predict some output state (computational + science). Open Source Software (OSS) is computer software in which the full source code is available under a + specific license that enables other users to access, view, modify, and redistribute that code for any purpose. + Because OSS requires such a license, it typically remains free of charge by default. This explicit licensing + is also what differentiates OSS from free software. Re-using OSS for analysis, simulation and visualisation + for research is also typically easier and more flexible compared to proprietary software. Often, whether we + know it or not, we are already using OSS as part of our own research workflows.
+OSS fits into the broader scheme of Open Science as it helps to make the full research environment, including + the software that produced the research results, fully accessible and re-usable. As such, it forms a necessary + component for the best practices (Jiménez + et al., 2018) and repeatability and reproducibility of research (both personally and by others), along + with other components, such as sharing data (Stodden, + 2010).
+In some cases, sharing of source code can even be conditional for the acceptance of associated research + manuscripts (Shamir + et al., 2013). It is also generally perceived to increase research impact (Vandwalle, + 2012).
+Some of common advantages for developers include:
+Increased developer loyalty and empowerment;
+Lower costs of services and marketing;
+Increased branding of services and products;
+Production of high quality software at lower expense;
+Flexibility and rapid innovation;
+Customisation and modular integration;
+Increased reliability and independence; and
+Based on open standards available to everyone.
+As such, the main advantages for researchers (users) include lower costs, increased + transparency, increased security and stability, no vendor ‘lock in’ with + increased user control, and overall higher quality. Furthermore, sharing OSS + allows researchers to receive credit for their efforts, for example through direct software citation (Smith + et al., 2016).
+Commonly used OSS include the Mozilla Firefox internet + browser and the LibreOffice full office suite. LibreOffice is + similar to the popular Microsoft Office, including a word processor, spreadsheet manager, and slide + presentation software, but is completely free and Open Source.
+Some regard the OSS movement to represent a counter-movement to neoliberalism and privatisation, through + defiance of regulations and norms in the construction and re-use of information, and a potential + transformation of modern-day capitalism through making software abundantly available with minimal effort. See + The free/open source software movement: Resistance or + change? by Panayiota Georgopoulou for more on this topic.
+The Open Source Initiative, one of the pioneers of OSS, offers the + following definition:
+Don’t worry, you don’t need to memorise all of this, but it’s good to know the principles that OSS is + coming from.
+Free Redistribution: The license shall not restrict any party from selling or giving + away the software as a component of an aggregate software distribution containing programs from several + different sources. The license shall not require a royalty or other fee for such sale.
+Source Code: The program must include source code, and must allow distribution in source + code as well as compiled form. Where some form of a product is not distributed with source code, there + must be a well-publicized means of obtaining the source code for no more than a reasonable reproduction + cost preferably, downloading via the Internet without charge. The source code must be the preferred form + in which a programmer would modify the program. Deliberately obfuscated source code is not allowed. + Intermediate forms such as the output of a preprocessor or translator are not allowed.
+Derived Works: The license must allow modifications and derived works, and must allow + them to be distributed under the same terms as the license of the original software.
+Integrity of The Author’s Source Code: The license may restrict source-code from being + distributed in modified form only if the license allows the distribution of “patch files” with the source + code for the purpose of modifying the program at build time. The license must explicitly permit + distribution of software built from modified source code. The license may require derived works to carry a + different name or version number from the original software.
+No Discrimination Against Persons or Groups: The license must not discriminate against + any person or group of persons.
+No Discrimination Against Fields of Endeavour: The license must not restrict anyone from + making use of the program in a specific field of endeavour. For example, it may not restrict the program + from being used in a business, or from being used for genetic research.
+Distribution of License: The rights attached to the program must apply to all to whom + the program is redistributed without the need for execution of an additional license by those parties.
+License Must Not Be Specific to a Product: The rights attached to the program must not + depend on the program’s being part of a particular software distribution. If the program is extracted from + that distribution and used or distributed within the terms of the program’s license, all parties to whom + the program is redistributed should have the same rights as those that are granted in conjunction with the + original software distribution.
+License Must Not Restrict Other Software: The license must not place restrictions on + other software that is distributed along with the licensed software. For example, the license must not + insist that all other programs distributed on the same medium must be open-source software.
+License Must Be Technology-Neutral: No provision of the license may be predicated on any + individual technology or style of interface.
+Now, this all might be a little complex to remember. However, it can be summarised as making software as + re-usable as possible for future works, while also being freely available.
+There are a number of existing platforms and tools that support OSS and collaboration. The Open Science Training Handbook provides a + check-list to use for evaluating the ‘openness’ of existing research software, based on the Open Source + Definition above:
+[ ] Is the software available to download and install?
+[ ] Can the software easily be installed on different platforms?
+[ ] Does the software have conditions on the use?
+[ ] Is the source code available for inspection?
+[ ] Is the full history of the source code available for inspection through a publicly available version + history?
+[ ] Are the dependencies of the software (hardware and software) described properly? Do these + dependencies require only a reasonably minimal amount of effort to obtain and use?
+Check, check, check, done! Simples.
+There are two main camps within the free software community: The free software movement, and + the OSS movement. Both have differing ideologies based on user liberties and the practical + applications of software. Often, the term ‘FLOSS’ is used to reconcile these two political camps, and means + ‘Free/Libre and Open Source Software’; Libre being French and Spanish for ‘free’ in the context of freedom. +
+The core principle of re-use is what separates OSS from ‘Free Software’. Free and Open Source Software (FOSS) + is an inclusive term to describe software that can be classified as both free and Open Source. A good example + of FOSS is the Ubuntu Linux operation system.
+The big difference between free software and OSS is that the former must distribute updated versions under + the same license as the original, whereas newer versions of OSS can be distributed under different licenses. + FOSS combines the best of both worlds.
+These definitions have now become widely adopted, both by international governments, as well as some large + organisations such as the Mozilla Foundation and the + Wikimedia Foundation. Major organisations in the FLOSS + space include the UK’s Software Sustainability Institute, who + produce valuable resources such as their recent Software Deposit Guidance for + Researchers.
+A typical open source project has the following types of formal roles:
+Typically, roles are made public through either the README file, a Contributors file, or a
+ separate team page for the project.
Virtual environments and machines are becoming increasingly popular as high-powered research workflow + enablers, and many of these are built upon OSS (e.g., operating systems, programming languages, and data + processing frameworks). Popular services include Google Cloud + and Amazon Web Services, which also assist with database storage and + content delivery, as well as computational power. InsideDNA is a computing + platform for reproducible research in bioinformatics, genomics and the life sciences.
+As mentioned above, LibreOffice provides an Open Source alternative to Microsoft + Office. The two are almost completely compatible, just with different default file formats. For citation + managers, Zotero is the most popular Open Source alternative to + proprietary platforms such as Mendeley or EndNote.
+Zotero uses the BibTeX (pronounced ‘bib-tech’) format, based on LaTeX + (pronounced ‘lay-tech’), and has browser plugins to make citation management simple. By integrating this with + other software such as LibreOffice, it is now possible to have a fully Open Source research workflow in many + cases.
+++Did you know that this entire project was build as an open and collaborative community effort in GitHub?
+
GitHub is a popular hosting site for both software and non-software + content (often called ‘notebooks’), with added capabilities for version control, project management and + tracking, and storage services. GitHub is built on top of the OSS Git, + which enables users to work remotely to maintain, share, and collaborate on research software and other + non-software based projects.
+Version control is essentially a process that takes snapshots of the files in a repository, and tracks + modifications to them. It records when the changes were made, what they were, and who did them. If several + people are working on one file at once, any overlapping changes are detected, and must be resolved prior to + continuing. This provides a much more streamlined and automated process than manually saving and recording + changes as projects develop. It also avoids the inevitable lists of confusing named file versions…
+
+
+
+ GitHub helps us to avoid, er, sub-optimal file naming conventions (source: XKCD) +
+One of the more popular and useful functions of GitHub is the issue + tracker, which is used to organise OSS development. The above link takes you to the issue tracker for + the development of this module! If you think there is something here that can improved, or you want to + comment on, anyone can add or contribute to an issue there!
+Other similar project hosting services include BitBucket, GitLab, and Launchpad. If the + recent acquisition of GitHub by Microsoft is a bit off-putting to you, these are great alternatives.
+However, we also know that GitHub can have quite a high learning curve. Which is why the first practical + task for this MOOC will teach you how to set up your first GitHub project repository!
+GO + TO TASK 1: Building your first GitHub repository
+Especially in scientific research, Open Source Software usage and development has become practically the + norm. There’s a number of reasons for this beyond those that apply to the general acceptance of OSS by, for + example, consumers, industry, or government. Among these reasons are:
+Increasingly, algorithms implemented in analysis software form an integral part of the methods described + in scholarly publications. As such, it is completely at odds with rigorous peer review if these algorithm + implementations are closed to outsiders.
+Scientific collaboration more often than not spans multiple institutions and distributed research + networks where secrecy and command hierarchy is not maintained in a way that is ‘necessary’ for closed + source development.
+Many computational analyses are run in virtualized environments (such as institutional, national, or + international ‘cloud’ infrastructures) and hosted on multi-user servers. Closed-source, commercial + software often disallows such usage.
+OSS development often relies on volunteers. In a time of budgetary constraints for scientific research, + this is a clear advantage.
+For these and other reasons, Open Source tools are very commonly used in scientific research. This includes + usage in fields where many researchers are amateur developers themselves and rely on tools such as R for statistical analysis and scripting, which, in the last decade, + has almost completely displaced commercial software for statistical analysis such as SPSS or JMP in a lot of + fields. In fields such as bioinformatics, that involve a lot of file handling of the outputs of DNA sequencing + platforms, general purpose scripting languages such as Python and + commonly used libraries built on top of it (such as biopython) have become + a vital part of the toolkit of many researchers.
+
+
+
+ Python +
+Tools such as R and Python are essentially software for writing software. Although programming is an + increasingly common activity among researchers, of course not every scientist does this. One step + away from programming is the chaining together of the inputs and outputs of various analysis tools in longer + workflows. As an example from genomics, a very common workflow is to start out with high-throughput sequencing + reads and then i) do basic quality control checks; ii) map the reads against a reference genome; iii) identify + the points where the new data are at variance with the reference. These steps are routinely executed as a + workflow where a different Open Source executable is run in a Linux command-line environment for each of the + three steps. Although this is arguably not quite open source software development, it does involve the usage + and production of open source artifacts (such as Linux shell scripts) for which the principles that we discuss + in this module are applicable.
+
+
+
+ R +
+Lastly, OSS is also used in scientific research for reasons that more closely mirror those that drive the
+ adoption of OSS in wider society, namely that it is cheap. For example, individuals or organizations might
+ decide to switch from Microsoft Office to LibreOffice for manuscript writing or spreadsheet processing because
+ the latter is free (both as in ‘free
+ beer’ and ‘free speech’). Likewise, the choice to switch from ArcGIS to QGIS for the analysis of geographic information might be prompted
+ simply by cost considerations.
I’m using X[e.g. Matlab,STATA,Excel] and I want to transition to something more open. What are the + next steps?
+Even if you are using proprietary software, you can usually still share your source code/documents etc. + The best first step is sharing whatever you can.
+Great! I can put them in my new github repo.
+If that’s enough for you for now great! If not for most pieces of proprietary software there are Open Source + equivalents. Have a go with one and see what you think.
+| Closed | +Open | +
|---|---|
| Matlab | +Python, Julia | +
| STATA/SPSS | +R | +
| MS Office | +LibreOffice | +
| Mathematica | +JupyterLab | +
| Test out your new Pull Request + -PR- Skills … | +… by adding your own example here + | +
Cool! But if I make the switch will I be stuck: taking ages to learn a new tool/ without support + /with buggy software.
+Good question! The answer is it depends. The best thing to do is find someone who’s made the switch before + and learn from their experience. Or just do a Google search! Some OSS is much better than their closed + counterparts, some aren’t, so it’s worth choosing carefully.
+The most likely person who might want to re-use your software in the future is…you! So while sharing is + always better than not sharing, you can make your own life, and that of others, much easier through + appropriate documentation. Documentation can include several things, such as including helpful comments and + annotations in the code that help to explain why a particular action was performed, rather than what it is + intended to achieve.
+One of the most critical aspects of this is including an informative README file, that
+ accompanies almost every OSS project, and some times even more than one. It can be a good practice to include
+ one such file in every directory, that includes a list of files, a table of contents, and what the purpose of
+ the directory is. The README file is typically just plain text or markdown (again, such as all of
+ the ones for the MOOC!), and can include critical information for how to install and run software, previous
+ dependencies and requirements, as well as tutorials or examples.
++Did you know… The term
+READMEis some times playfully ascribed to the famous + scene in Lewis Carroll’s Alice’s Adventures In Wonderland in which Alice confronts magic munchies labeled + with “Eat Me”" and “Drink Me”. Potent.
The purpose here is to provide sufficient information to maximise the re-use and reproducibility of the + computational environment, such that someone with no experience with the project can easily access and re-use + the software (Sandve + et al., 2013). By lowering the barriers to entry, you increase the chances of others being able to + re-use your work, which is one of the ultimate goals of OSS (Ince + et al., 2012).
+An extension of this that can help to make things even easier for future re-use is ‘container’ technology. + Containers are like an ecosystem frozen in time, where the code, the data, any other dependencies, are all + perfectly preserved, packaged and saved in the present functioning versions. This means that anyone in the + future any one can come in and run the analyses again. As such, they are generally good for re-use, but this + can come at the sacrifice of modification or understanding by others, as often a lot of details can be hidden + within the source code and its dependencies. Common examples of container implementation in research include + Rocker + (a Docker container for the R language), Binder, and + Code Ocean.
+Sustainable software is good software.
+The 10 simple rules for making computational research more reproducible, based on Sandve + et al., (2013), are:
+
+
+
+ Infographic adapted from Sandve et al., (2013). Feel free to download this and print it out to keep handy + during your research! +
+If you follow these steps, along with the processes in Task + 1 and Task + 2, you should be fine!
+An Open Source license is a type of license designed specifically for software and code that make it explicit + what the legal conditions for sharing and re-use are. As mentioned above, the addition + of a suitable license is what differentiates publicly shared software from OSS. For example, the widely used + MATLAB is proprietary software, and Octave is an openly licensed alternative programming + language.
+There are currently more than 1,400 unique Open Source licenses, a complexity born from the difficulty in + understanding the differences between the legal implications across different license.
+Some of the more common licenses include:
+You don’t need to know all the legal itty gritty behind all of these, but it is good to at least know what + options are avaiilable to you.
+There are two ways in which contributions to a project become licensed:
+Thankfully, the process of selecting an Open Source license is relatively trivial, thanks to user-friendly + tools such as Choose A License. Each of these licenses allows other + users to use, copy, distribute, and build upon your work, often while ensuring that the creators are + appropriately recognised for their work. Here, the key is selecting an appropriate license for your work, + depending on what you want, or do not want, others to do with it.
+Citations provide one of the most important interactions in scholarly research, forming the basis of our + referencing and metrics systems. Typically, this is performed thanks to the assistance of a permanent unique + identifier such as a Digital Object + Identifiers (DOI). A DOI is a persistent identifier, implemented in the Handle System, that meets a common standard, + depending on the purpose, such as for identifying academic information. Such identification is critical for + tracking the genealogy and provenance of research, for reproducibility, as well as for giving appropriate + credit to those who have created the software. Importantly, software should be considered a legitimate output + from scholarly research, and citation is becoming an increasingly common way to indicate that.
+In 2016, Smith + et al., 2016 wrote a research paper about the principles of software citation as part of the FORCE11 + Software Citation Working Group. In the same way that you would want to cite software that you have used as + part of good research practices, it is important to make your research easily citable too. When citing any + software used for your own research, you should include at minimum:
+The six principles of software citation by Smith + et al., (2016) are provided here:
+Importance: Software should be considered a legitimate and citable product of research. + Software citations should be accorded the same importance in the scholarly record as citations of other + research products, such as publications and data; they should be included in the metadata of the citing + work, for example in the reference list of a journal article, and should not be omitted or separated. + Software should be cited on the same basis as any other research product such as a paper or a book, that + is, authors should cite the appropriate set of software products just as they cite the appropriate set of + papers.
+Credit and attribution: Software citations should facilitate giving scholarly credit and + normative, legal attribution to all contributors to the software, recognizing that a single style or + mechanism of attribution may not be applicable to all software.
+Unique identification: A software citation should include a method for identification + that is machine actionable, globally unique, interoperable, and recognized by at least a community of the + corresponding domain experts, and preferably by general public researchers.
+Persistence: Unique identifiers and metadata describing the software and its disposition + should persist - even beyond the lifespan of the software they describe.
+Accessibility: Software citations should facilitate access to the software itself and to + its associated metadata, documentation, data, and other materials necessary for both humans and machines + to make informed use of the referenced software.
+Specificity: Software citations should facilitate identification of, and access to, the + specific version of software that was used. Software identification should be as specific as necessary, + such as using version numbers, revision numbers, or variants such as platforms.
+Note: For instructions on ‘how to make your software citable’ see the section Using GitHub and Zenodo below and Task + 2: Linking GitHub and Zenodo.
+GitHub is a popular tool for project management, content storage, and version control. + Note that GitHub itself is not OSS. However, Git, the tool which it is based on, is. Git is designed to help + manage the source code files, and the updates to them, for a software-related project. However, it can also be + extended to other non-software projects; for example, this MOOC!
+However, getting research onto GitHub is just the first step. It is equally important to make it persistent + and re-usable, which is why having a Digital Object Identifier (DOI) associated with it can be useful. The + simplest way to do this is through a service called Zenodo, which is a free + and open source multi-disciplinary repository created by OpenAIRE and CERN, and can be used to assign a DOI to + individual GitHub repositories. There is a GitHub + Guide that explains the details, which involve linking GitHub repositories directly through to Zenodo so + that when developers create formal releases for their software, Zenodo creates and archives a that version of + the software.
+There’s nothing special about using Zenodo for creating DOIs, other than its free of cost; + other general repositories can also be used, such as DataCite DOI + Fabrica, or your own institutional repositories such as Caltech’s. +
+A lot of researchers might typically be afraid of sharing code which is incomplete, buggy, or imperfect. + However, in the OSS community, such a practice of sharing ‘raw’ code is fairly commonplace. Sharing code + openly enables others to re-use and improve it, as well as to engage in a deeper way with any research + associated with it. This is one of the fundamental aspects of peer-collaboration, perhaps best exemplified by + the traditional process of research manuscript peer review.
+Task 2 will guide you through the process of linking a GitHub repository to Zenodo for archiving.
+++Did you know… All content produced for this MOOC is available as part of a community in Zenodo?
+
GO + TO TASK 2: Linking GitHub and Zenodo
+Often, OSS is developed in a public, decentralised, collaborative manner between multiple contributors. The + purpose of this is to enhance the diversity and scope of a project and its design, in order to become more + beneficial and sustainable. Such an approach was famously likened to a ‘bazaar’ model by Eric Raymond, an + early OSS proponent. One of the major guiding principles of this is that of peer production, + which relies on self-organised communities to regulate the development of content, co-ordinated towards a + shared goal or outcome.
+OSS projects rely heavily on volunteer collaboration, which often entails a constant flux of newcomers in + order to become productive and sustainable (Steinmacher + et al., 2014). Creating the right social atmosphere for a project, and a welcoming engagement + environment, are often critical to successful collaboraitons in OSS.
+Hopefully now you have come to see the importance of software as a cornerstone of modern science, and the + importance that OSS plays in this.
+The learning outcomes from this should be:
+You will now be able to define the characteristics of OSS, and some of the ethical, legal, economic and + research impact arguments for and against it.
+Based on community standards, you will now be able to describe the quality requirements of sharing and + re-using open code.
+You will now be able to use a range of research tools that utilise OSS.
+You will now be able to transform code designed for their personal use into code that is accessible and + re-usable by others.
+Software developers will be able to make their software citable, and software users will know how to cite + the software they use.
+BONUS TASK
+If you have completed Task + 1 and Task + 2, we also have a BONUS TASK for you, if you want to take your skills a step further. + Task + 3 will take you a step deeper into integrating Git into a typical research workflow by showing you how + to integrate it with R Studio. It is recommended that you have completed the first 2 tasks before proceeding + with this one.
+However, your Open Source journey does not stop here! This was just the beginning, and there are some + incredible resources out there if you would like to do or learn more:
+If you feel particularly inspired by this, you can endorse the Science Code Manifesto, which is based on the five + principles of code, copyright, citation, credit, and curation.
+To launch and develop your own project, the Open Source Guides + program offers a range of practical guides and skills to help launch and advance your OSS projects.
+For a detailed look at OSS-based research workflows, the Open Science, Open Data, Open Source hand-guide by + Pedro L. Fernandes and Rutger A. Vos is one of the top resources online.
+More formalised journal venues also exist for software-based articles, including The Journal of Open Research Software and The Journal of Open Source Software. A list of such venues is also available.
+The PLOS Open Source Toolkit provides a + global forum for Open Source hardware and software research and applications.
+The NumFOCUS is a nonprofit organization that supports and promotes + world-class, innovative, open source scientific software. Some of the projects they sponsor include:
+IPython and Jupyter Notebook + initiatives.
+rOpenSci, which promotes the open source R statistical environment + for transparent and reproducible research.
+To gain more hands on experience with OSS, the Software + Carpentry community holds regular workshops to improve lab-based computing skills (Wilson + et al., 2017).
+These references here are just the beginning. They include some of the most useful general overviews of + the Open Source landscape in research. However, if you want to be find something more specific to your own + research field, then that path is there for you to explore!
+The Future of Research in Free/Open Source Software Development (Scacchi, + 2010).
+The Scientific Method in Practice: Reproducibility in the Computational Sciences (Stodden, + 2010).
+The case for open computer programs (Ince + et al., 2012).
+Current issues and research trends on open-source software communities (Martinez-Torres + and Diaz-Fernandez, 2013).
+Ten simple rules for reproducible computational research (Sandve + et al., 2013).
+A systematic literature review on the barriers faced by newcomers to open source software projects (Steinmacher + et al., 2014).
+Knowledge sharing in open source software communities: motivations and management (Iskoujina + and Roberts, 2015).
+Software citation principles (Smith + et al., 2016).
+An introduction to Rocker: Docker containers for R (Boettiger + and Eddelbuettel, 2017).
+Good enough practices in scientific computing (Wilson + et al., 2017).
+Four simple recommendations to encourage best practices in research software (Jiménez + et al., 2017).
+Know a way this content can be improved?
+Time to take your new GitHub skills for a test-run! All content development primarily happens here. + If you have a suggested improvement to the content, layout, or anything else, you can make it and then it + will automatically become part of the MOOC content after verification from a moderator!
+ +