diff --git a/MAIN.html b/MAIN.html new file mode 100644 index 0000000..100c8dd --- /dev/null +++ b/MAIN.html @@ -0,0 +1,9568 @@ + + + + + + + + + + + + + + MAIN.utf8.md + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + +
+

Module 5: Open Research Software and Open Source

+ +
+

Introduction

+

Welcome to Module 5 of the Open Science MOOC: Open Research Software and Open + Source.

+

This module has been developed in the open + through collaboration by an international team of Open + Source afficianados. Everything you see here has been developed in the open through interactive feedback + and collaboration from the wider community. It comprises a series of videos, infographics, text-based reading, + and practical tasks for you to sink you teeth into.

+

Don’t forget you can join in the discussions over at our open Slack channel. Please do introduce + yourself at #module5opensource, and tell us a bit about who you are, your background, and how you ended up + here!

+
+

Who is this module for?

+

This module is designed primarily for computational researchers at the graduate and undergraduate level, as + well as budding data scientists and any other researcher who uses analytical code or software. In a modern + day research environment, this covers pretty much anyone who uses a computer for ther work.

+
+

“An article about computational result is advertising, not scholarship. The actual scholarship is the + full software environment, code and data, that produced the result.” - J. Buckheit and D. L. Donoho, 1995. +

+
+

Software and technology underpin much of modern research, which is now almost inevitably computational in + one way or another - search engines, social networking platforms, analytical software, and digital + publishing. With this, there is an ever-increasing demand for more sophisticated Open Source Software, + matched by an increasing willingness for researchers to openly collaborate on new tools.

+

The power of Open Source is in that it lowers the barriers to collaboration and adoption, therefore + allowing ideas and technology to spread more rapidly. This Module will introduce the necessary tools + required for transforming software into something that can be openly accessed and re-used by others.

+

+

+ Image by Patrick Hochstenbach (CC0 1.0 Universal) +

+


+
+
+

Specific learning objectives for this Module:

+
    +
  1. +

    Learn the characteristics of open software; understand the ethical, legal, economic, and + research impact arguments for and against Open Source Software, and further understand the + quality requirements of open code.

    +
  2. +
  3. +

    Be able to turn code made for personal use into open code which is accessible by others.

    +
  4. +
  5. +

    Use software (tools) that utilizes open content and encourages wider collaboration.

    +
  6. +
+


+
+
+
+

What is Open Source Software

+

Virtually all modern scientific research workflows rely on a range of software tools, either operating on + different datasets, with different parameters, and applied iteratively in various ways (data science) or + operating on different inputs and using models and methods to predict some output state (computational + science). Open Source Software (OSS) is computer software in which the full source code is available under a + specific license that enables other users to access, view, modify, and redistribute that code for any purpose. + Because OSS requires such a license, it typically remains free of charge by default. This explicit licensing + is also what differentiates OSS from free software. Re-using OSS for analysis, simulation and visualisation + for research is also typically easier and more flexible compared to proprietary software. Often, whether we + know it or not, we are already using OSS as part of our own research workflows.

+

OSS fits into the broader scheme of Open Science as it helps to make the full research environment, including + the software that produced the research results, fully accessible and re-usable. As such, it forms a necessary + component for the best practices (Jiménez + et al., 2018) and repeatability and reproducibility of research (both personally and by others), along + with other components, such as sharing data (Stodden, + 2010).

+

In some cases, sharing of source code can even be conditional for the acceptance of associated research + manuscripts (Shamir + et al., 2013). It is also generally perceived to increase research impact (Vandwalle, + 2012).

+

Some of common advantages for developers include:

+
    +
  • +

    Increased developer loyalty and empowerment;

    +
  • +
  • +

    Lower costs of services and marketing;

    +
  • +
  • +

    Increased branding of services and products;

    +
  • +
  • +

    Production of high quality software at lower expense;

    +
  • +
  • +

    Flexibility and rapid innovation;

    +
  • +
  • +

    Customisation and modular integration;

    +
  • +
  • +

    Increased reliability and independence; and

    +
  • +
  • +

    Based on open standards available to everyone.

    +
  • +
+

As such, the main advantages for researchers (users) include lower costs, increased + transparency, increased security and stability, no vendor ‘lock in’ with + increased user control, and overall higher quality. Furthermore, sharing OSS + allows researchers to receive credit for their efforts, for example through direct software citation (Smith + et al., 2016).

+

Commonly used OSS include the Mozilla Firefox internet + browser and the LibreOffice full office suite. LibreOffice is + similar to the popular Microsoft Office, including a word processor, spreadsheet manager, and slide + presentation software, but is completely free and Open Source.

+

Some regard the OSS movement to represent a counter-movement to neoliberalism and privatisation, through + defiance of regulations and norms in the construction and re-use of information, and a potential + transformation of modern-day capitalism through making software abundantly available with minimal effort. See + The free/open source software movement: Resistance or + change? by Panayiota Georgopoulou for more on this topic.

+


+
+
+

Principles of Open Source Software

+

The Open Source Initiative, one of the pioneers of OSS, offers the + following definition:

+

Don’t worry, you don’t need to memorise all of this, but it’s good to know the principles that OSS is + coming from.

+
    +
  • +

    Free Redistribution: The license shall not restrict any party from selling or giving + away the software as a component of an aggregate software distribution containing programs from several + different sources. The license shall not require a royalty or other fee for such sale.

    +
  • +
  • +

    Source Code: The program must include source code, and must allow distribution in source + code as well as compiled form. Where some form of a product is not distributed with source code, there + must be a well-publicized means of obtaining the source code for no more than a reasonable reproduction + cost preferably, downloading via the Internet without charge. The source code must be the preferred form + in which a programmer would modify the program. Deliberately obfuscated source code is not allowed. + Intermediate forms such as the output of a preprocessor or translator are not allowed.

    +
  • +
  • +

    Derived Works: The license must allow modifications and derived works, and must allow + them to be distributed under the same terms as the license of the original software.

    +
  • +
  • +

    Integrity of The Author’s Source Code: The license may restrict source-code from being + distributed in modified form only if the license allows the distribution of “patch files” with the source + code for the purpose of modifying the program at build time. The license must explicitly permit + distribution of software built from modified source code. The license may require derived works to carry a + different name or version number from the original software.

    +
  • +
  • +

    No Discrimination Against Persons or Groups: The license must not discriminate against + any person or group of persons.

    +
  • +
  • +

    No Discrimination Against Fields of Endeavour: The license must not restrict anyone from + making use of the program in a specific field of endeavour. For example, it may not restrict the program + from being used in a business, or from being used for genetic research.

    +
  • +
  • +

    Distribution of License: The rights attached to the program must apply to all to whom + the program is redistributed without the need for execution of an additional license by those parties.

    +
  • +
  • +

    License Must Not Be Specific to a Product: The rights attached to the program must not + depend on the program’s being part of a particular software distribution. If the program is extracted from + that distribution and used or distributed within the terms of the program’s license, all parties to whom + the program is redistributed should have the same rights as those that are granted in conjunction with the + original software distribution.

    +
  • +
  • +

    License Must Not Restrict Other Software: The license must not place restrictions on + other software that is distributed along with the licensed software. For example, the license must not + insist that all other programs distributed on the same medium must be open-source software.

    +
  • +
  • +

    License Must Be Technology-Neutral: No provision of the license may be predicated on any + individual technology or style of interface.

    +
  • +
+

Now, this all might be a little complex to remember. However, it can be summarised as making software as + re-usable as possible for future works, while also being freely available.

+


+
+
+

An Open Source checklist

+

There are a number of existing platforms and tools that support OSS and collaboration. The Open Science Training Handbook provides a + check-list to use for evaluating the ‘openness’ of existing research software, based on the Open Source + Definition above:

+
    +
  • +

    [ ] Is the software available to download and install?

    +
  • +
  • +

    [ ] Can the software easily be installed on different platforms?

    +
  • +
  • +

    [ ] Does the software have conditions on the use?

    +
  • +
  • +

    [ ] Is the source code available for inspection?

    +
  • +
  • +

    [ ] Is the full history of the source code available for inspection through a publicly available version + history?

    +
  • +
  • +

    [ ] Are the dependencies of the software (hardware and software) described properly? Do these + dependencies require only a reasonably minimal amount of effort to obtain and use?

    +
  • +
+

Check, check, check, done! Simples.

+


+
+
+

The Open Source community and its governance

+

There are two main camps within the free software community: The free software movement, and + the OSS movement. Both have differing ideologies based on user liberties and the practical + applications of software. Often, the term ‘FLOSS’ is used to reconcile these two political camps, and means + ‘Free/Libre and Open Source Software’; Libre being French and Spanish for ‘free’ in the context of freedom. +

+

The core principle of re-use is what separates OSS from ‘Free Software’. Free and Open Source Software (FOSS) + is an inclusive term to describe software that can be classified as both free and Open Source. A good example + of FOSS is the Ubuntu Linux operation system.

+

The big difference between free software and OSS is that the former must distribute updated versions under + the same license as the original, whereas newer versions of OSS can be distributed under different licenses. + FOSS combines the best of both worlds.

+

These definitions have now become widely adopted, both by international governments, as well as some large + organisations such as the Mozilla Foundation and the + Wikimedia Foundation. Major organisations in the FLOSS + space include the UK’s Software Sustainability Institute, who + produce valuable resources such as their recent Software Deposit Guidance for + Researchers.

+
+

For individual projects

+

A typical open source project has the following types of formal roles:

+
    +
  • Author: It is the person that created the project
  • +
  • Owner: The person/s who has administrative ownership over the organization or + repository
  • +
  • Maintainers: Contributors who are responsible for driving the vision and managing the + organizational aspects of the project. (They may also be authors or owners of the project.)
  • +
  • Contributors: The user that has already contributed to the project.
  • +
  • Community Members: People who use the project. They might be active in conversations, + create new issues or express their opinion on the future project improvements.
  • +
+

Typically, roles are made public through either the README file, a Contributors file, or a + separate team page for the project.

+


+
+
+
+

Existing platforms and tools for Open Source Software

+

Virtual environments and machines are becoming increasingly popular as high-powered research workflow + enablers, and many of these are built upon OSS (e.g., operating systems, programming languages, and data + processing frameworks). Popular services include Google Cloud + and Amazon Web Services, which also assist with database storage and + content delivery, as well as computational power. InsideDNA is a computing + platform for reproducible research in bioinformatics, genomics and the life sciences.

+

As mentioned above, LibreOffice provides an Open Source alternative to Microsoft + Office. The two are almost completely compatible, just with different default file formats. For citation + managers, Zotero is the most popular Open Source alternative to + proprietary platforms such as Mendeley or EndNote.

+

Zotero uses the BibTeX (pronounced ‘bib-tech’) format, based on LaTeX + (pronounced ‘lay-tech’), and has browser plugins to make citation management simple. By integrating this with + other software such as LibreOffice, it is now possible to have a fully Open Source research workflow in many + cases.

+
+

GitHub

+
+

Did you know that this entire project was build as an open and collaborative community effort in GitHub?

+
+

GitHub is a popular hosting site for both software and non-software + content (often called ‘notebooks’), with added capabilities for version control, project management and + tracking, and storage services. GitHub is built on top of the OSS Git, + which enables users to work remotely to maintain, share, and collaborate on research software and other + non-software based projects.

+

Version control is essentially a process that takes snapshots of the files in a repository, and tracks + modifications to them. It records when the changes were made, what they were, and who did them. If several + people are working on one file at once, any overlapping changes are detected, and must be resolved prior to + continuing. This provides a much more streamlined and automated process than manually saving and recording + changes as projects develop. It also avoids the inevitable lists of confusing named file versions…

+

+ +

+

+ GitHub helps us to avoid, er, sub-optimal file naming conventions (source: XKCD) +

+


+

One of the more popular and useful functions of GitHub is the issue + tracker, which is used to organise OSS development. The above link takes you to the issue tracker for + the development of this module! If you think there is something here that can improved, or you want to + comment on, anyone can add or contribute to an issue there!

+

Other similar project hosting services include BitBucket, GitLab, and Launchpad. If the + recent acquisition of GitHub by Microsoft is a bit off-putting to you, these are great alternatives.

+

However, we also know that GitHub can have quite a high learning curve. Which is why the first practical + task for this MOOC will teach you how to set up your first GitHub project repository!

+

GO + TO TASK 1: Building your first GitHub repository

+


+
+
+
+

Open Source Software used in research

+

Especially in scientific research, Open Source Software usage and development has become practically the + norm. There’s a number of reasons for this beyond those that apply to the general acceptance of OSS by, for + example, consumers, industry, or government. Among these reasons are:

+
    +
  • +

    Increasingly, algorithms implemented in analysis software form an integral part of the methods described + in scholarly publications. As such, it is completely at odds with rigorous peer review if these algorithm + implementations are closed to outsiders.

    +
  • +
  • +

    Scientific collaboration more often than not spans multiple institutions and distributed research + networks where secrecy and command hierarchy is not maintained in a way that is ‘necessary’ for closed + source development.

    +
  • +
  • +

    Many computational analyses are run in virtualized environments (such as institutional, national, or + international ‘cloud’ infrastructures) and hosted on multi-user servers. Closed-source, commercial + software often disallows such usage.

    +
  • +
  • +

    OSS development often relies on volunteers. In a time of budgetary constraints for scientific research, + this is a clear advantage.

    +
  • +
+

For these and other reasons, Open Source tools are very commonly used in scientific research. This includes + usage in fields where many researchers are amateur developers themselves and rely on tools such as R for statistical analysis and scripting, which, in the last decade, + has almost completely displaced commercial software for statistical analysis such as SPSS or JMP in a lot of + fields. In fields such as bioinformatics, that involve a lot of file handling of the outputs of DNA sequencing + platforms, general purpose scripting languages such as Python and + commonly used libraries built on top of it (such as biopython) have become + a vital part of the toolkit of many researchers.

+

+ +

+

+ Python +

+


+

Tools such as R and Python are essentially software for writing software. Although programming is an + increasingly common activity among researchers, of course not every scientist does this. One step + away from programming is the chaining together of the inputs and outputs of various analysis tools in longer + workflows. As an example from genomics, a very common workflow is to start out with high-throughput sequencing + reads and then i) do basic quality control checks; ii) map the reads against a reference genome; iii) identify + the points where the new data are at variance with the reference. These steps are routinely executed as a + workflow where a different Open Source executable is run in a Linux command-line environment for each of the + three steps. Although this is arguably not quite open source software development, it does involve the usage + and production of open source artifacts (such as Linux shell scripts) for which the principles that we discuss + in this module are applicable.

+

+ +

+

+ R +

+


+

Lastly, OSS is also used in scientific research for reasons that more closely mirror those that drive the + adoption of OSS in wider society, namely that it is cheap. For example, individuals or organizations might + decide to switch from Microsoft Office to LibreOffice for manuscript writing or spreadsheet processing because + the latter is free (both as in ‘free + beer’ and ‘free speech’). Likewise, the choice to switch from ArcGIS to QGIS for the analysis of geographic information might be prompted + simply by cost considerations.

+
+
+

Getting Started with OSS - FAQ

+

I’m using X[e.g. Matlab,STATA,Excel] and I want to transition to something more open. What are the + next steps?

+

Even if you are using proprietary software, you can usually still share your source code/documents etc. + The best first step is sharing whatever you can.

+

Great! I can put them in my new github repo.

+

If that’s enough for you for now great! If not for most pieces of proprietary software there are Open Source + equivalents. Have a go with one and see what you think.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
ClosedOpen
MatlabPython, Julia
STATA/SPSSR
MS OfficeLibreOffice
MathematicaJupyterLab
Test out your new Pull Request + -PR- Skills …… by adding your own example here +
+

Cool! But if I make the switch will I be stuck: taking ages to learn a new tool/ without support + /with buggy software.

+

Good question! The answer is it depends. The best thing to do is find someone who’s made the switch before + and learn from their experience. Or just do a Google search! Some OSS is much better than their closed + counterparts, some aren’t, so it’s worth choosing carefully.

+
+
+

Making good software for re-use

+

The most likely person who might want to re-use your software in the future is…you! So while sharing is + always better than not sharing, you can make your own life, and that of others, much easier through + appropriate documentation. Documentation can include several things, such as including helpful comments and + annotations in the code that help to explain why a particular action was performed, rather than what it is + intended to achieve.

+

One of the most critical aspects of this is including an informative README file, that + accompanies almost every OSS project, and some times even more than one. It can be a good practice to include + one such file in every directory, that includes a list of files, a table of contents, and what the purpose of + the directory is. The README file is typically just plain text or markdown (again, such as all of + the ones for the MOOC!), and can include critical information for how to install and run software, previous + dependencies and requirements, as well as tutorials or examples.

+
+

Did you know… The term README is some times playfully ascribed to the famous + scene in Lewis Carroll’s Alice’s Adventures In Wonderland in which Alice confronts magic munchies labeled + with “Eat Me”" and “Drink Me”. Potent.

+
+

The purpose here is to provide sufficient information to maximise the re-use and reproducibility of the + computational environment, such that someone with no experience with the project can easily access and re-use + the software (Sandve + et al., 2013). By lowering the barriers to entry, you increase the chances of others being able to + re-use your work, which is one of the ultimate goals of OSS (Ince + et al., 2012).

+

An extension of this that can help to make things even easier for future re-use is ‘container’ technology. + Containers are like an ecosystem frozen in time, where the code, the data, any other dependencies, are all + perfectly preserved, packaged and saved in the present functioning versions. This means that anyone in the + future any one can come in and run the analyses again. As such, they are generally good for re-use, but this + can come at the sacrifice of modification or understanding by others, as often a lot of details can be hidden + within the source code and its dependencies. Common examples of container implementation in research include + Rocker + (a Docker container for the R language), Binder, and + Code Ocean.

+

Sustainable software is good software.

+


+
+
+

10 simple rules for reproducible computational research

+

The 10 simple rules for making computational research more reproducible, based on Sandve + et al., (2013), are:

+
    +
  1. For every result, keep track of how it was produced.
  2. +
  3. Avoid manual data manipulation steps.
  4. +
  5. Archive the exact versions of all external programs used.
  6. +
  7. Version control all custom scripts.
  8. +
  9. Record all intermediate results, when possible in standardised formats.
  10. +
  11. For analyses that include randomness, note underlying random seeds.
  12. +
  13. Always store raw data behind plots.
  14. +
  15. Generate hierarchical analysis output, allowing layers of increasing detail to be inspected.
  16. +
  17. Connect textual statements to underlying results.
  18. +
  19. Provide public access to scripts, runs, and results.
  20. +
+

+ +

+

+ Infographic adapted from Sandve et al., (2013). Feel free to download this and print it out to keep handy + during your research! +

+


+

If you follow these steps, along with the processes in Task + 1 and Task + 2, you should be fine!

+


+
+
+

Open Source licensing

+

An Open Source license is a type of license designed specifically for software and code that make it explicit + what the legal conditions for sharing and re-use are. As mentioned above, the addition + of a suitable license is what differentiates publicly shared software from OSS. For example, the widely used + MATLAB is proprietary software, and Octave is an openly licensed alternative programming + language.

+

There are currently more than 1,400 unique Open Source licenses, a complexity born from the difficulty in + understanding the differences between the legal implications across different license.

+

Some of the more common licenses include:

+ +

You don’t need to know all the legal itty gritty behind all of these, but it is good to at least know what + options are avaiilable to you.

+

There are two ways in which contributions to a project become licensed:

+
    +
  1. Explicitly, whereby the individual contribution has a clearly indicated license independent of + the main project; or
  2. +
  3. Implicitly, whereby the contribution falls under the original licensing code of the main project. +
  4. +
+

Thankfully, the process of selecting an Open Source license is relatively trivial, thanks to user-friendly + tools such as Choose A License. Each of these licenses allows other + users to use, copy, distribute, and build upon your work, often while ensuring that the creators are + appropriately recognised for their work. Here, the key is selecting an appropriate license for your work, + depending on what you want, or do not want, others to do with it.

+


+
+
+

Software citation

+

Citations provide one of the most important interactions in scholarly research, forming the basis of our + referencing and metrics systems. Typically, this is performed thanks to the assistance of a permanent unique + identifier such as a Digital Object + Identifiers (DOI). A DOI is a persistent identifier, implemented in the Handle System, that meets a common standard, + depending on the purpose, such as for identifying academic information. Such identification is critical for + tracking the genealogy and provenance of research, for reproducibility, as well as for giving appropriate + credit to those who have created the software. Importantly, software should be considered a legitimate output + from scholarly research, and citation is becoming an increasingly common way to indicate that.

+

In 2016, Smith + et al., 2016 wrote a research paper about the principles of software citation as part of the FORCE11 + Software Citation Working Group. In the same way that you would want to cite software that you have used as + part of good research practices, it is important to make your research easily citable too. When citing any + software used for your own research, you should include at minimum:

+
    +
  • The author name(s),
  • +
  • Software title,
  • +
  • Version number, and
  • +
  • The unique identifier/locator (DOI or URL).
  • +
+

The six principles of software citation by Smith + et al., (2016) are provided here:

+
    +
  • +

    Importance: Software should be considered a legitimate and citable product of research. + Software citations should be accorded the same importance in the scholarly record as citations of other + research products, such as publications and data; they should be included in the metadata of the citing + work, for example in the reference list of a journal article, and should not be omitted or separated. + Software should be cited on the same basis as any other research product such as a paper or a book, that + is, authors should cite the appropriate set of software products just as they cite the appropriate set of + papers.

    +
  • +
  • +

    Credit and attribution: Software citations should facilitate giving scholarly credit and + normative, legal attribution to all contributors to the software, recognizing that a single style or + mechanism of attribution may not be applicable to all software.

    +
  • +
  • +

    Unique identification: A software citation should include a method for identification + that is machine actionable, globally unique, interoperable, and recognized by at least a community of the + corresponding domain experts, and preferably by general public researchers.

    +
  • +
  • +

    Persistence: Unique identifiers and metadata describing the software and its disposition + should persist - even beyond the lifespan of the software they describe.

    +
  • +
  • +

    Accessibility: Software citations should facilitate access to the software itself and to + its associated metadata, documentation, data, and other materials necessary for both humans and machines + to make informed use of the referenced software.

    +
  • +
  • +

    Specificity: Software citations should facilitate identification of, and access to, the + specific version of software that was used. Software identification should be as specific as necessary, + such as using version numbers, revision numbers, or variants such as platforms.

    +
  • +
+

Note: For instructions on ‘how to make your software citable’ see the section Using GitHub and Zenodo below and Task + 2: Linking GitHub and Zenodo.

+


+
+
+

Using GitHub and Zenodo

+

GitHub is a popular tool for project management, content storage, and version control. + Note that GitHub itself is not OSS. However, Git, the tool which it is based on, is. Git is designed to help + manage the source code files, and the updates to them, for a software-related project. However, it can also be + extended to other non-software projects; for example, this MOOC!

+

However, getting research onto GitHub is just the first step. It is equally important to make it persistent + and re-usable, which is why having a Digital Object Identifier (DOI) associated with it can be useful. The + simplest way to do this is through a service called Zenodo, which is a free + and open source multi-disciplinary repository created by OpenAIRE and CERN, and can be used to assign a DOI to + individual GitHub repositories. There is a GitHub + Guide that explains the details, which involve linking GitHub repositories directly through to Zenodo so + that when developers create formal releases for their software, Zenodo creates and archives a that version of + the software.

+

There’s nothing special about using Zenodo for creating DOIs, other than its free of cost; + other general repositories can also be used, such as DataCite DOI + Fabrica, or your own institutional repositories such as Caltech’s. +

+

A lot of researchers might typically be afraid of sharing code which is incomplete, buggy, or imperfect. + However, in the OSS community, such a practice of sharing ‘raw’ code is fairly commonplace. Sharing code + openly enables others to re-use and improve it, as well as to engage in a deeper way with any research + associated with it. This is one of the fundamental aspects of peer-collaboration, perhaps best exemplified by + the traditional process of research manuscript peer review.

+

Task 2 will guide you through the process of linking a GitHub repository to Zenodo for archiving.

+
+

Did you know… All content produced for this MOOC is available as part of a community in Zenodo?

+
+

GO + TO TASK 2: Linking GitHub and Zenodo

+


+
+
+

Collaborating and contributing through Open Source

+

Often, OSS is developed in a public, decentralised, collaborative manner between multiple contributors. The + purpose of this is to enhance the diversity and scope of a project and its design, in order to become more + beneficial and sustainable. Such an approach was famously likened to a ‘bazaar’ model by Eric Raymond, an + early OSS proponent. One of the major guiding principles of this is that of peer production, + which relies on self-organised communities to regulate the development of content, co-ordinated towards a + shared goal or outcome.

+

OSS projects rely heavily on volunteer collaboration, which often entails a constant flux of newcomers in + order to become productive and sustainable (Steinmacher + et al., 2014). Creating the right social atmosphere for a project, and a welcoming engagement + environment, are often critical to successful collaboraitons in OSS.

+


+
+
+

Where to go from here

+

Hopefully now you have come to see the importance of software as a cornerstone of modern science, and the + importance that OSS plays in this.

+

The learning outcomes from this should be:

+
    +
  1. +

    You will now be able to define the characteristics of OSS, and some of the ethical, legal, economic and + research impact arguments for and against it.

    +
  2. +
  3. +

    Based on community standards, you will now be able to describe the quality requirements of sharing and + re-using open code.

    +
  4. +
  5. +

    You will now be able to use a range of research tools that utilise OSS.

    +
  6. +
  7. +

    You will now be able to transform code designed for their personal use into code that is accessible and + re-usable by others.

    +
  8. +
  9. +

    Software developers will be able to make their software citable, and software users will know how to cite + the software they use.

    +
  10. +
+


+

BONUS TASK

+

If you have completed Task + 1 and Task + 2, we also have a BONUS TASK for you, if you want to take your skills a step further. + Task + 3 will take you a step deeper into integrating Git into a typical research workflow by showing you how + to integrate it with R Studio. It is recommended that you have completed the first 2 tasks before proceeding + with this one.

+

However, your Open Source journey does not stop here! This was just the beginning, and there are some + incredible resources out there if you would like to do or learn more:

+
    +
  • +

    If you feel particularly inspired by this, you can endorse the Science Code Manifesto, which is based on the five + principles of code, copyright, citation, credit, and curation.

    +
  • +
  • +

    To launch and develop your own project, the Open Source Guides + program offers a range of practical guides and skills to help launch and advance your OSS projects.

    +
  • +
  • +

    For a detailed look at OSS-based research workflows, the Open Science, Open Data, Open Source hand-guide by + Pedro L. Fernandes and Rutger A. Vos is one of the top resources online.

    +
  • +
  • +

    More formalised journal venues also exist for software-based articles, including The Journal of Open Research Software and The Journal of Open Source Software. A list of such venues is also available.

    +
  • +
  • +

    The PLOS Open Source Toolkit provides a + global forum for Open Source hardware and software research and applications.

    +
  • +
  • +

    The NumFOCUS is a nonprofit organization that supports and promotes + world-class, innovative, open source scientific software. Some of the projects they sponsor include:

    + +
  • +
+


+
+

Further reading

+

These references here are just the beginning. They include some of the most useful general overviews of + the Open Source landscape in research. However, if you want to be find something more specific to your own + research field, then that path is there for you to explore!

+ +


+
+
+

Development Team

+ +

Know a way this content can be improved?

+

Time to take your new GitHub skills for a test-run! All content development primarily happens here. + If you have a suggested improvement to the content, layout, or anything else, you can make it and then it + will automatically become part of the MOOC content after verification from a moderator!

+

CC0 Public Domain Dedication

+
+
+
+ + + + +
+ + + + + + + + + \ No newline at end of file diff --git a/MOOC_planning_template.html b/MOOC_planning_template.html new file mode 100644 index 0000000..dfcd7e8 --- /dev/null +++ b/MOOC_planning_template.html @@ -0,0 +1,9202 @@ + + + + + + + + + + + + + + + + Mooc planning template + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + +
+

MOOC planning template

+
+

How to use this template

+

This is to provide a structured check list to track content development.

+
    +
  • For the ‘Delivered’ column, a simple Yes/No Scheme should be used.
  • +
  • For the ‘Status’ column, please use one of the three symbols below.
  • +
  • For the ‘Deadline’ column, please use a traditional dating scheme: 2018/05/10.
  • +
  • For the ‘Comments’ column, insert any text as neccessary.
  • +
+

Status traffic light scheme:

+

Green: All looks good

+
+ Green +

Green

+
+

Orange: Issues that can impact launch date

+
+ Orange +

Orange

+
+

Red: Launch date in danger

+
+ Red +

Red

+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Design PhaseDeliveredStatus badgeDeadlineComments
Initiate and plan
Kick offYepGreen2018/05/10Sprint success!
Define target groupYepGreen2018/05/31Sprint success!
Refine learning objectives/outcomesYepGreen2018/05/31Sprint success!
Design course outlineYepGreen2018/05/31Sprint success!
Design project plan and timelineYepGreen2018/06/31
Identify promotion channelsYepGreen2018/06/31
Design and scripting
Identify key resourcesYepGreen2018/06/31Sprint success!
Design learner activitiesYepGreen2018/06/313/3 completed
Find existing key resourcesYepGreen2018/06/31Sprint success!
Write audio/video scriptsIn prepGreen2018/08/316/6 completed
Review all learning resourcesIn prep2018/11/31
Finalise all scriptsIn prep2018/11/31
Copyright strategyYepGreen2018/08/31
Recording and editing
Record on location/in studio
Edit all audio/visual material
Internal reviewing
Cross-check and review contentIn prepGreen2018/08/31Continuous process
Checks from Steering CommitteeIn prepGreen2018/08/31Continuous process
External testing and review
All reviewing conducted via GitHubIn prepGreen2018/08/31Continuous process
Existing channels from communications strategy
Internal reviewing and finalisation
Cross-review and check content
Final checks from Steering Committee
Implementation
Agreement on platformIn prepGreen2018/08/31
Module logo designedYepGreen2018/08/31
Module description and introductionYesGreen2018/07/31
Team member and guest lecturer agreementsYesGreen2018/07/31
Team member and guest lecturer profilesYesGreen2018/07/31
Course readings acquiredYesGreen2018/07/31
Port content to selected platform
All content deposited in ZenodoYepGreen2018/08/31Second release completed
Promotion
Content and communication calendar/strategy/timelineIn progressGreen
Identify relevant channels (mailing lists, social media and hashtags, organisations, individuals, + websites, conferences)YesGreen2018/07/31
Images for use in social mediaYepGreen2018/07/31
Course title marketing checkYesGreen2018/07/31
Launch
Publicity startYesGreenDec 2018
Open and free for all, continuous, self-paced learning, 100% onlineYesGreenDec 2018Continuous, self-paced
Soft launchYesGreenDec 2018
Course launchYesGreen
Monitoring of learner experiences and reactionsIn progressJan 2019
Prepare to provide additional information if requiredPending
Reviewing and optimisation
Collate and review learner feedback at regular intervalsIn prep
Track any new information during course durationIn prep
Prepare evaluation reportPending
Evaluation meetingPending
Optimise content where relevantPending
+
+
+ + + + +
+ + + + + + + + + \ No newline at end of file diff --git a/Task_1.html b/Task_1.html new file mode 100644 index 0000000..e6d1511 --- /dev/null +++ b/Task_1.html @@ -0,0 +1,9147 @@ + + + + + + + + + + + + + + Task_1.utf8.md + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + +
+

Task 1: How to set up a repository on GitHub

+

This task is designed for students and researchers who want to create their first Open Source project (software + or non-software) on GitHub. GitHub is a place for you to come and play and experiment with new research + workflows, and is really just the beginning to help set the stage for your own pathways and ideas.

+

Don’t forget you can join in the discussions over at our open Slack channel. Please do introduce + yourself at #module5opensource, and tell us a bit about who you are, your background, and how you ended up here! +

+

PLEASE NOTE that a screen recording for this task is also available via YouTube.

+

Estimated time to complete: 30-45 minutes.

+

Estimate time saving once complete: Unimaginable..

+ +
+

Getting started

+

A ‘repository’ is really just a fancy name for a project on GitHub. GitHub is a place online where you can + manage projects, store files, and openly collaborate with others. This is all achieved by using version + control to track projects as they progress. As such, GitHub is a powerful tool for both software and + non-software projects.

+

One of the most important things to consider at this early stage is to think about how you want the wider + community to interact with your project. As you are working in the open, you want to make sure others feel + comfortable in accessing, viewing, and engaging with your work. Setting up a repository in a way that lowers + the barriers to entry, and the fear of being an ‘outsider’ is the first step towards maintaining a successful + project.

+

+ +

+

+ Octocat, GitHub’s little mascot +

+


+
+

Setting up a GitHub profile

+

To set up a GitHub profile, simply head to the main page and click Sign + Up for GitHub. Here, you can create your personal account, with a username, email, and password as + standard.

+

+ +

+

+ Sign up for GitHub +

+


+

The next step is to set up a personal plan. For now, simply select the ‘Unlimited public repositories for + free’ plan, unless you are concerned about privacy, in which case select the private plan. If you intend to + set up a project for an organisation, you can select that option too.

+


+
+
+

The GitHub language

+

This is possibly the most confusing and off-putting aspect of GitHub. Here are some of the most commonly + used terms and their definitions:

+
    +
  • Initialise: Create an empty repository.
  • +
  • Checkout: Create a working copy of a local repository.
  • +
  • Clone: Copy the repository into a local directory on your computer.
  • +
  • Fork: Create a personal offshoot of a repository to work on it in parallel.
  • +
  • Branch: An independent and parallel version of a repository. Changes do not affect the + master branch.
  • +
  • Master: The main and default branch for a repository.
  • +
  • Clean: No commits pending on the branch.
  • +
  • Stage: Add updates ready to be committed to a branch.
  • +
  • Commit: A revision to a repository, like a versioned ‘save’ function.
  • +
  • Commit message: A description of changes accompanying a commit.
  • +
  • Check: A status check.
  • +
  • Fetch: Nothing to do with dogs. Refers to getting the latest changes from an online + repository without merging them.
  • +
  • Index: The ‘tree’ which acts as a staging area.
  • +
  • Working Directory: The ‘tree’ where the files are kept.
  • +
  • Head: The ‘tree’ which indicates the last commits made.
  • +
  • Push: Add committed changes to the head of your remote repository.
  • +
  • Merge: Combining the changes made in one branch back with the master branch upon + completion.
  • +
  • Pull: Update your repository by fetching and merging the newest commits.
  • +
  • Pull request: A request to merge an updated branch into the master branch.
  • +
  • Issue: Suggested improvements, tasks, or questions related to a repository.
  • +
+

Whew! Don’t worry about memorising all of these for now. Like any new skill, familiarity comes + with experience.

+

You can probably see how some of these are fairly similar to things like save, copy, paste - standard + workflow operations, but adapted for a software management process. There are a few more too, but + this should do for getting started.

+

If you are interested, most of these terms come from the underlying Git + system. Git was built to allow developers to manage different versions of source code in a distributed + manner, which is great. It has lots of features and the ability to do lots of complex stuff, written by a very + clever guy. However, the user interface was not designed with new + users in mind, so it can be hard to learn.

+

+ +

+

+ Unbeatable guide to using Git. (Source: XKCD) +

+


+
+
+

Creating a new repository

+

On your GitHub profile, click the ‘Create new repository’. The first step is to create a name as the brand + for your project. Ideally, it should be memorable and give some indication of what the project does.

+

+ +

+

+ Create a new repository +

+


+

Make sure not to duplicate names, infringe upon other trademarks, or name it anything that could be + considered to be offensive.

+


+
+
+
+

The foundational steps

+

Any GitHub repository requires 4 key elements to get started and to begin developing a welcoming community: +

+
    +
  1. An Open Source license;
  2. +
  3. A README file;
  4. +
  5. Contributing guidelines; and
  6. +
  7. A Code of Conduct.
  8. +
+

These are critical aspects and best practices of any project for users to understand their legal rights, + their expectations, the purpose of the project, and to improve the overall user experience.

+

All four of these files should be kept in the root directory for your project repository. It is convention to + use markdown file formats (.md) for most of these files (though the license file is most often + plain text (.txt)), and capitalise all file names. Instead of spaces in file names, make sure to + use underscores _ .

+

So you should end up with a foundational file selection like this:

+
    +
  1. LICENSE.md
  2. +
  3. README.md
  4. +
  5. CONTRIBUTING.md
  6. +
  7. CODE_OF_CONDUCT.md
  8. +
+

+ +

+

+ The basic repository structure +

+


+
+

Choosing a license

+

Choosing an appropriate license is what will differentiate your Open Source repository from publicly + available software. While you are not obliged to choose a license, doing so guarantees that others will be + able to modify, share, re-use, and build upon your project within a legal framework.

+

To start with, you want to check Choose A License to find a + license that best suits your intentions for the repository.

+

The three primary ones to choose from are:

+
    +
  • MIT License: A permissive license that lets people do whatever they want with your code + as long as they provide appropriate attribution to you, and do not hold you liable.
  • +
  • Apache License 2.0: Similar permissions to the MIT License, but also provides an + express grant of patent rights from contributors to users.
  • +
  • GNU General Public License (GPL) v3: A copyleft license that requires anyone who + redistributes your code, or a derivative work, to make the source available under the same terms as the + original license; also provides an express grant of patent rights from contributors to users.
  • +
+

Thankfully, when you start a new repository on GitHub, you are given the option to select an existing + license from a drop-down menu. You should always (with very few exceptions) use an existing license, since + this is what potential users and contributors will see before they choose to use or contribute to your + software.

+

+ +

+

+ Choosing an example license +

+


+

If they don’t have one you want, you can add one you like manually. To do this, simply click ‘Create new + file’ in the repository, and copy and paste an existing license text in. Name the file something like + LICENSE.txt or LICENSE.md to make it clear, and keep it in the main repository + folder (i.e., the root). Make sure to add a clean commit message, and you’re done!

+
+

Helping hand: This MOOC uses a different combination of licenses for code content and + non-code content. Here you can find an example of the MIT + License that we apply for all code and software generated as part of the MOOC production.

+
+


+
+
+

Creating a README file

+

When you initialise your new repository, there should be an option to do so with a README + file. Just like Alice in Wonderland, these do exactly what they say - provide key information about the + project. These are typically the first thing outside contributors will see when they come to your + repository, so making them informative and welcoming is key.

+

+ +

+

+ Part of the README file for this module +

+


+

The file will originally be in markdown (.md) format. This is a lightweight markup language + with a plain text format. To learn some basic markdown, see this cheatsheet. But for now, + we can just use plain text.

+

There are several things you will want to include in your README file:

+
    +
  • What is this project about and what does it do.
  • +
  • Why should people care, and why is it useful.
  • +
  • How can someone get started contributing to the project.
  • +
  • Who can be contacted in case someone needs help.
  • +
  • A link to the license, contributing guidelines, and code of conduct.
  • +
  • A description of the project structure.
  • +
  • Who is involved, and what are their roles.
  • +
  • The current status of the project.
  • +
+
+

Pro-tip: Later on as your project develops, you might want to add FAQs based on + community feedback, or a tutorial to help users understand how your project works.

+
+

Remember that not everyone coming to your project will be an expert, or understand what it is you are doing + and why. Having a well-documented README file will enhance the user experience for people with + a range of prior knowledge.

+

When the README file is included in the root directory, GitHub will automatically display this + on the homepage of your repository. This means it is the first thing people will often see, so make it + count!

+
+

Helping hand: Here, you can find the README file used for this MOOC + module. This includes information on the status, rationale, learning outcomes, development team, key + documents, and license to help. You can copy and adapt this structure for your own projects as needed.

+
+


+
+
+

Creating contributing guidelines

+

Contributing guidelines are designed to communicate to potential contributors a short guide on how to + engage with your project and community. You want to make sure to be welcoming, and indicate that you are + eager for participants to engage with your project. Whenever a participant opens a new pull request or + creates a new issue, they will see a link to your contribution file.

+

+ +

+

+ Part of the CONTRIBUTING guidelines for this module +

+


+

Sticking with the all caps file names, the next step is to create a CONTRIBUTING file. Click + ‘Create new file’, and make sure to save it in markdown format as before. This file will tell other users + how they can engage with and participate in your project. This is the first step towards establishing a + community around your project, so make it engaging, concise, and informative.

+

The CONTRIBUTING file should include information on:

+
    +
  • What sort of contributions you are looking for.
  • +
  • How to suggest updates or new features.
  • +
  • How to interact with the project using GitHub’s functions (e.g., the pull request protocol).
  • +
  • How to file a bug report or create an issue.
  • +
  • The ultimate goal, vision, or roadmap for the project.
  • +
  • How to contact those in charge of the project.
  • +
  • Links to any external documentation or websites.
  • +
+
+

Pro-tip: Consider starting off with a short thank you note for people taking the time to + consider contributing - they have clicked on the file to learn more after all! If there are other methods + of recognition that you have in mind, make sure to include them in here too.

+
+

Here, you are essentially trying to encourage people to volunteer their time to advance your project. Make + sure to be welcoming and friendly, and be precise about how people can engage. When writing this, make sure + to think about it from the user perspective - how can you make their life easier when submitting pull + requests and opening issues to make the whole project run more smoothly.

+
+

Helping hand: The Contributing + guidelines for this MOOC module include some very specific things: an introduction to using Git and + GitHub, tips for getting started, contact information, how to alter the content and repor issues, a link + to the README file, and information on the preferred content and code styles. Feel free to + copy and adapt this for your own project as needed.

+
+


+
+
+

Creating a Code of Conduct

+

A code of conduct is important for setting the ground rules for expected behaviour and participation for + project contributors, and is an easily referenced document for showing that your project team takes + constructive dialogues seriously. Therefore, it is a critical element for creating and maintaining a healthy + community that engages in a constructive and productive manner within a positive social atmosphere.

+

A code of conduct not only provides expectations of behaviour, but also describes who those expectations + apply to, when they apply, what to do should a violation of the code occur, and what the action items for + this will be. As such, points of contact need to be made clear in the code of conduct. Typically, this + should be in a private way such as an email address.

+
+

Pro-tip: In case a violation needs to be reported about the person who receives those + reports, make sure to include an option to contact a secondary party.

+
+

To add a code of conduct, you can create your own from scratch by adding a new markdown file, or use + existing templates such as the Contributor Covenant. Name + your file CODE_OF_CONDUCT.md, and make sure it is visible in the README file.

+
+

Helping hand: This MOOC also has a Code + of Conduct based on the Contributor Covenant. As you can see, it includes information on expected + standards of behaviour, responsibilities of those in the community, and enforcement of the CoC including + contact details. Feel free again to re-use and adapt this to your project as you see fit.

+
+

+ +

+

+ Part of the CODE OF CONDUCT file for this module, based on the Contributor Covenant +

+


+

Making sure to enforce the code of conduct is important, as it shows that not only do you value the code, + but you respect the influence that it has on your community. It is important to treat each member of the + community with the respect, courtesy, and importance that they deserve. Should a violation occur, or a + repeat offender makes consistent violations, it is best to refer to the Open Source Guide to + see how to enforce the code of conduct.

+


+
+
+

Making your code citable

+

If you want to make your code citable from the start, you should store the metadata needed for a citation + from the start, by creating a [codemeta.json](https://codemeta.github.io) file or a + [CITATION.cff](https://citation-file-format.github.io) file. Both will allow tooling that is + currently being developed to automatically create citation information, rather than asking you to type it in + a form later.

+

If you’re interested, cite.research-software.org provides + further background information about software citation in academia.

+


+
+
+
+

Keeping your issues up to date

+

Issues are not necessarily problems with a project, but also suggestions for improvement, things to develop + in the future, and comments and feedback about the project to work through. They can be openly shared and + discussed with contributors as needed, sort of like a forum.

+

If you are a project lead, it is important to maintain a list of issues that make it clear to contributors + what aspects of the project need attention. It is also important to engage with as many issues as possible + from others in a positive manner, to show that you take their contributions seriously.

+

Key elements for issues include:

+
    +
  • An informative title and description;
  • +
  • Coloud-coded labels/tags to help categorise and filter;
  • +
  • Milestones to associate issues with specific features or project phases;
  • +
  • Assignees indicate who is responsible for working on an issue;
  • +
  • Comments for providing feedback.
  • +
+

+ +

+

+ The issue tracker for the Open Scholarship Strategy project +

+


+

Within issues it is possible to use @ mentions to notify other contirbutors about the issue, and to get the + right people engaged in an effective manner. GitHub has an internal system of notifications, just like + Facebook or Twitter, and can also send emails to people who are mentioned in the issue tracker. This can all + be customised for individuals within the user settings.

+


+
+
+

Checklist for launching your project

+

So now you are ready to launch your project, begin advertising it, and getting contributions! Before + continuing, make sure that you have:

+
    +
  • [ ] Project has a memorable and informative name
  • +
  • [ ] Project has a LICENSE file that is an exact copy of an Open Source license
  • +
  • [ ] Complete documentation including a README, CONTRIBUTING, and + CODE_OF_CONDUCT files
  • +
  • [ ] Project has at least one clearly labelled issue
  • +
  • [ ] Any code included at this stage is clearly structured and annotated
  • +
+

CONGRATULATIONS!

+

You have now launched an Open Source research project! Hopefully, from here on out, your work will act to + benefit the wider community, forge new collaborations, and create new and fantastic opportunities for you all. + Try and think about ways in which these skills can be applied to future projects, and how they might also have + helped with some in the past.

+

From now on, it is all up to you! Some advice is to:

+
    +
  • Write clean code;
  • +
  • Have a well-structured project;
  • +
  • Make frequent commits with clean messages;
  • +
  • Keep projects well-documented;
  • +
  • Have clear contributing guidelines;
  • +
  • Make use of the description and tag functions;
  • +
  • Don’t fork someone else’s repository unless you intend to work on them;
  • +
  • Make sure to contribute to other people’s projects too.
  • +
+

Know a way this content can be improved?

+

Time to take your new GitHub skills for a test-run! All content development primarily happens here. + If you have a suggested improvement to the content, layout, or anything else, you can make it and then it will + automatically become part of the MOOC content after verification from a moderator!

+

CC0 Public Domain Dedication

+
+
+ + + + +
+ + + + + + + + + \ No newline at end of file diff --git a/Task_2.html b/Task_2.html new file mode 100644 index 0000000..8461ea5 --- /dev/null +++ b/Task_2.html @@ -0,0 +1,8953 @@ + + + + + + + + + + + + + + Task_2.utf8.md + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + +
+

Task 2: How to make your code citable using GitHub and Zenodo

+

This task is designed for students and researchers who want to create and re-use GitHub-based + projects/repositories in the academic literature.

+

Don’t forget you can join in the discussions over at our open Slack channel. Please do introduce + yourself at #module5opensource, and tell us a bit about who you are, your background, and how you ended up here! +

+

Estimated time to complete: 45-60 minutes.

+
+

Table of contents

+ +

+ Task 2 workflow +

+

+ The workflow for Task 2. Keep this handy as you work through the task! +

+


## Foreword

+

Although the integration of GitHub and Zenodo makes it really easier to work with these tools nowadays + (January 2019), it is important to stress that there are alternatives to GitHub (Gitlab, Bitbucket,…) and + alternatives to Zenodo (Other repositories might be more suited to your community, you might ask your + colleagues). For instance, one can work with Gitlab and manually upload each new versions to your university + repository, getting a DOI. The principles (working with a version control system online, and archiving major + versions in a repository which provides a persistent unique identifier) can be applied in different workflow. +

+
+
+

Set up a GitHub repository

+
+

Pro-tip: Make sure to include a LICENSE and README file in your repository. This will + indicate to people the purpose of the project, and how they can engage with it in the future.

+
+

Find out how to set up a GitHub repository in this other guide Task + 1: Building a GitHub repository which is also part of ‘Module 5: Open Research Software and Open + Source’.

+
+
+

Choose your GitHub repository

+

Once on your GitHub project listings page at github.com head to the + ‘Repositories’ tab. Select which repository you would like to archive, and open it up.

+


+
+
+

Login to Zenodo

+

Now head over to zenodo.org. Zenodo is a platform where you can permanently + archive your code and other project elements. Zenodo does this by assigning projects a Digital Object + Identifier (DOI), which also helps to make the work more citable. This is different to GitHub, + which acts as a place where the actual work on a project takes place, rather than long-term archiving of it. + At GitHub, content can be modified, deleted, rewritten, and irreversibly changed, which makes it a bit + concerning to be used for longer lasting referencing purposes. Zenodo offers more security and permanence for + research outputs.

+

+ +

+

+ Sign up for Zenodo +

+


+

If you already have a Zenodo account, this is easy. If not, follow the steps to create one — you can even + login using your GitHub account or ORCID profile to make things simpler, as Zenodo has a built in integration + for it. This might be easier than creating yet another research account and profile.

+


+
+
+

Authorise GitHub to connect with Zenodo

+

On the Zenodo website authorise it to connect to your GitHub account in the ‘Using GitHub’ section. Here, Zenodo will redirect you + to GitHub to ask for permissions to use ‘webhooks’ on + your repositories. You want to authorise Zenodo here with the permissions it needs to form those links.

+

+ +

+

+ Authorize Zenodo to connect with GitHub +

+


+

If you are trying to give Zenodo access to an organisational repository, you (or an administrator) will need + to make sure that Zenodo is granted third party access permissions. GitHub will send an authorisation email + that needs confirming. At this point, back in the settings of your repository on GitHub, you also need to make + sure that the repository is set to ‘public’, not private.

+


+
+
+

Select the repository to archive

+

If you have got this far, this means that Zenodo is now authorised to configure the repository webhooks that + it needs to archive the repository and issue it a DOI. To do this, on the Zenodo website navigate to the GitHub repository listing page and simply click the + ‘on’ button next to your repository.

+

+ +

+

+ Enable individual GitHub repositories to be preserved in Zenodo +

+


+
+
+

Check repository settings

+

Now you have set up a new webhook between Zenodo and your repository. In GitHub, click on the settings for + your repository, and the Webhooks tab on the left hand side menu. This should display the new Zenodo webhook + configured to Zenodo. Note, it may take a little time for the webhook listing to show up.

+

+ +

+

+ Check that webhooks are enabled for your GitHub repository. Example here using the Open Scholarship + Strategy +

+


+
+
+

Create a new release

+

The first time you archive a repository is known as the ‘first release’. Each time you create a new version + of that repository and archive it, you create a new release. This can be tracked in the ‘releases’ tab for + your repository on GitHub (top center).

+

+ +

+

+ Check that the repository first release was successful. Example here using the Open Scholarship + Strategy +

+


+

For the first archived version of your repository, click ‘Create a new release’ back in Zenodo. Fill in the + form and give some details as to what the release entails. For the first release, make sure to call it v1.0.0, + as is standard practice.

+

+ +

+

+ Create a new release. Example here using the Open Scholarship Strategy, for which a first release already + exists +

+


+

Finally, click ‘publish release’, and your archive will be published and versioned on GitHub.

+

To view your release on Zenodo you need to visit the Upload tab. To + finish the archiving a few more details are needed on Zenodo.

+

+ +

+

+ Check the new release has been uploaded. Example here shown using the Open Scholarship Strategy +

+


+
+
+

Getting a DOI

+

This is sometimes referred to as DOI ‘minting’, and requires a couple of extra bits of information about the + repository on Zenodo. On Zenodo click the Upload tab in the main + menu, and your newly uploaded repository should be there. Scroll down the page and fill in the extra + information as needed, required fields are marked with a red asterisk, and then click ‘Publish’.

+

Note: Only after this extra information has been added will your DOI become live. It may + also take a short time for the DOI to become active. Example DOI shown below (for the Open Scholarship + Strategy again).

+
+

Pro-tip: Copy the URL for the DOI into the README file for your GitHub repo to make + cross-linking even easier, as well as present a clear highlighted DOI badge for users to see and make use of + your DOI. You only need to do this once with your first release DOI as it acts as a ‘concept DOI’ and is + linked to all subsequent release DOIs.

+
+

DOI

+

The GitHub/Zenodo integration will now assign a DOI to each version/release of a project repository. This + enables users to refer to and cite specific versions of projects. Also, the list of authors for the citation + is automatically determined by the GitHub user account names used by the repository - this means no-one gets + left out. Author details can be edited later on Zenodo. DOIs used in Zenodo are registered through the DataCite service.

+
+

Pro-tip: Create a ‘human-readable’ version of this citation in your project’s README file. + This will be helpful to researchers who might not be familiar with using DOIs to create citations, and make + it easier for others to cite your software and acknowledge your work. An example of this could be: Jon + Tennant. (2018, July 30). Foundations for Open Scholarship Strategy Development: First formal release + (Version 1.2). Zenodo. http://doi.org/10.5281/zenodo.1323437

+
+

CONGRATULATIONS!!

+

Your GitHub repository is now archived in Zenodo, and with a DOI that can be versioned to reflect updates to + the repository version through time. You should be able to see details of this on the GitHub Zenodo page for + your repository. This also means that your archived projects can get picked up by other indexing services and + search engines that use DOIs too.

+

Providing a long-term archive and a DOI for your work is required for others to be able to properly cite it, + as this provides basic citation metadata. For Open Science, it is important to be able to cite the software + that you use in your research, and this integrated workflow enables that to happen, in line with best + practices for research citation. Furthermore, this practice is important in elevating the standard of software + (and related projects) to that of the standard of other research outputs.

+
+

Pro-tip: Is your research funded by an EU grant? Now you can directly connect your + archived project to your grant by updating the grant section of the metadata on the project’s Zenodo record. + This massively helps to increase its discoverability!

+
+


+
+
+

Checklist for citing your project

+

So now you have a sustainably archived GitHub repository in Zenodo that is ready to be re-used and cited! + Before continuing, make sure that you have:

+
    +
  • [ ] Linked your GitHub project to Zenodo. If you see a complete copy of your GitHub repository in Zenodo + then things are working.
  • +
  • [ ] Zenodo and GitHub integrated setup works nicely. For example have all the author names, and correct + project title come across to Zenodo. If not, or if authors just have nicknames you can edit these details in + Zenodo.
  • +
  • [ ] Project has a first release, with a DOI. You should have a DOI displayed on your projects Zenodo page. + This first DOI is called the ‘concept DOI’ and is the master DOI linking to all subsequent release DOIs. + Copy this DOI link and embed it in your GitHub projects README page. You’re done!
  • +
+
+

Additional resources

+

Making your code citable - GitHub Guides. +

+

Know a way this content can be improved?

+

Time to take your new GitHub skills for a test-run! All content development primarily happens here. + If you have a suggested improvement to the content, layout, or anything else, you can make it and then it + will automatically become part of the MOOC content after verification from a moderator!

+

CC0 Public Domain Dedication

+
+
+
+ + + + +
+ + + + + + + + + \ No newline at end of file diff --git a/Task_3.html b/Task_3.html new file mode 100644 index 0000000..c873b97 --- /dev/null +++ b/Task_3.html @@ -0,0 +1,9128 @@ + + + + + + + + + + + + + + Task_3.utf8.md + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + +
+

Task 3: How to integrate Git with R Studio

+

This task is designed for students and researchers who want to implement a system of version control within a + standard R-based workflow. This can be applied to a range of software development, data analysis and project + management tasks. Your future research self will thank your for the convenience.

+

Don’t forget you can join in the discussions over at our open Slack channel. Please do introduce + yourself at #module5opensource, and tell us a bit about who you are, your background, and how you ended up here! +

+

Estimated time to complete: 30 minutes

+

Estimate time saving once complete: Virtually infinite

+

NOTE A video guide version of this task is now available on YouTube.

+ +
+

Getting started

+

Congratulations on making it this far! If you’re reading this, you’ve survived pull requests, web-hooks, and + can probably even tell us know what the F in FOSS stands for (not Frustration…) Hopefully, you have + overcome any scepticism or reluctance towards the benefits of GitHub and Open Source Software, and are ready + to take the next step.

+

Before starting this Task, please make sure you have already completed Task + 1 and Task + 2, so that you are more familiar with GitHub and some standard Open Source practices.

+

This task will teach you how to integrate the version control software, Git, with the popular coding + environment, RStudio. And yes, it is Git as in gif or God, not Jit as in the wrong way of pronouncing things. +

+

If you are one of those researchers who thinks that having code spread across multiple hard-drives that are + waiting to break, Dropbox, Google Drive, or any other non-specialist software, this task is just for you. If + you have ever experienced the mind-numbing process of having multiple ‘final’ versions of a paper bouncing + between different co-authors, this is also for you.

+

All of us are guilty of these sorts of things once in a while, but there are ways to do it that are better + for you, future you, and those who might benefit from your work.

+


+
+

Getting Git

+

So, what is Git, and how is it different to GitHub? Git is a version control system, which enables you to + save and track time-stamped copies of your work throughout the development process. It also works with + non-code items too, like this MOOC, the majority of which was written in markdown in RStudio, and integrated + with a Git/GitHub workflow.

+

This is important, as all research goes through changes and sometimes we want to know what those things + were. Did you delete some text that you now think is important? Version control will save that for you. Did + your code work perfectly in the past, but is now buggy beyond belief? Version control. It’s a great way to + avoid that chaotic state where you have multiple copies of the same file, but without a stupid and annoying + file naming convention. FINAL_Revised_2.2_supervisor_edits_ver1.7_scream.txt will be a thing of + the past.

+

GitHub is the platform that allows you to seamlessly share code from your workspace (e.g., laptop) to be + hosted in an online space. So, sort of like the public interface to GitHub. The advantages of Git/GitHub + are:

+
    +
  1. You get to keep copies of all your work through time;
  2. +
  3. You can compare work through different copies through time, which helps to spot bugs or errors;
  4. +
  5. Other people can collaborate openly with your work;
  6. +
  7. You have both a local and an online copy of your work that remain in sync;
  8. +
  9. It is fully transparent as to who made a contribution, why they made it, and when; and
  10. +
  11. You can have multiple people working on the same project at once in parallel.
  12. +
+

While this was primarily designed for source code, it should be instantly obvious how this becomes a + powerful tool for virtually all research workflows.

+


+
+
+

RStudio

+

RStudio is a popular coding environment for researchers who use the statistical programming language, R. It + comes with a text editor, so you don’t have to install another and switch between. It also includes a + graphical user interface (GUI) to Git and GitHub, which we will be using here.

+

Isn’t it nice when brilliant Open Source tools integrate seamlessly like that. This should help to make + your daily use of Git much simpler.

+

If at any point you need to install new packages for R, simply use the following command:

+

install.packages("PACKAGE NAME", dependencies = TRUE)

+

Replacing PACKAGE NAME with the, er, package name. Some examples you can play with that might + come in useful include knitr, devtools or ggplot2.

+


+
+
+
+

Step one: Download all the things

+
    +
  1. You should already have a GitHub account by now if you have followed the previous tasks. If not, sign up here. Free unlimited repositories for all!
  2. +
  3. Download and install the latest version of R. Also available for + Mac and Linux.
  4. +
  5. Download and install the latest version of Rstudio. Oh, hey, looks it Open Source! + Swish.
  6. +
  7. Download and install the latest version of Git. Make sure + to Select “Use Git from the Windows Command Prompt” during installation.
  8. +
+
+

Pro-tip: To update all of your R packages in one, simply execute the following code + update.packages(ask = FALSE, checkBuilt = TRUE)

+
+

For now, just choose all the usual default options for each install. Depending on which Operating System + (e.g., Mac, Windows, Linux), this might be different for each of you. For now, and for the rest of this task, + we’re going to stick with doing things the easy-ish Windows way (but also provide some instructions for using + the command line).

+

For Linux or Debian users, simply use the following command to install Git:

+

sudo apt-get install git-core

+

For Mac users, this link, or purchase a new laptop with a + different operating system.

+

If you want, you can also download the local version of GitHub and + use it through the simple GUI. It’s available on Windows and Mac and Linux, and can make your life a little + easier, especially if you want to use a different platform to RStudio.

+
+

Pro-tip: You see when installing Git it says ‘Use Git Bash as shell for Git projects?’ + This is the place where you can use the command-line to access Git from outside of RStudio. It’s a powerful + beast. Try the following two commands to get started:

+
+

git config --global user.name 'YOUR USERNAME'
+ git config --global user.email 'YOUR EMAIL'

+
+
+

Step two: Configure Git inside RStudio

+

Right, that’s the easy bit done. Next, go into RStudio, and in the tabs at the top go to Go to Tools + > Global Options > Git/SVN. SVN is just another version control system like Git, and we don’t + need to worry about that here.

+

In the place where it says Git executable, add the pathway here to the git.exe file that you just + downloaded in the previous step. Make sure the box here that says Enable version control interface for + RStudio projects is ticked. This now has tied version control to future projects in RStudio, to + provide a really powerful additional dimension to collaborative or solo work.

+

+ +

+

+ The Global Options window inside RStudio +

+


+

Next, hit the button in this window that says Create RSA Key, This is a private key that is used for + authentication between different systems, and saves you from having to type in your password over and over. + Here, it will pop up a new window with a public key, that you want to copy to your clipboard.

+

Head over to GitHub, go to your profile settings, and the SSH and GPG keys tab. Click New SSH + key. Here, paste in the key from RStudio, and call it something imaginative like ‘RStudio’.

+

+ +

+

+ Inside GitHub where you will want to enter the key you just generated in RStudio +

+


+

OK, now hold on to your butts, we’re going into the command line. Don’t worry if you’ve never used the shell + before because it’s quite similar to using R, or any other coding system. The main difference here though is + that instead of calling functions like in R, you call commands.

+

So back in RStudio, go to Tools > Shell, and it will open up a command prompt window. If + you already played with the Git Bash above, you should have done this step already. Enter the following two + commands:

+

git config --global user.name 'YOUR USERNAME'
+ git config --global user.email 'YOUR EMAIL'

+

Hopefully it does not have to be said to substitute in your own GitHub username and email here. You can + access this at any point just by finding the ‘Shell’ within Windows. Or, if you right click on any folder on + your Desktop that is linked to a GitHub repo, you can open up the Shell instantly and Bash away.

+

What this stage has done is configure Git, which is software that runs on your desktop, to GitHub, which is a + repository website.

+

Restart R Studio. Whew, that was tough. Next.

+


+
+
+

Step three: Why did I just do that?

+

OK, hold your breathe, we’re going to pause here just to learn some basic Git commands. Some of the key ones + you could do with learning are:

+
    +
  • +

    Add: This is where you submit files to the staging area before being committed.

    +
  • +
  • +

    Commit This is like ‘saving’ your work by creating a new version or copy.

    +
  • +
  • +

    Push: This is how you send files from your local project to the online repository.

    +
  • +
  • +

    Pull: This is how you get files from your online repository to your local project.

    +
  • +
+

Back in RStudio, type in the following into the Terminal, or by opening up a new Shell:

+

git add .

+

It won’t actually do anything for now, but in the future will add all files in your current working directory + (that’s what the . does) to staging ready for a commit.

+


+
+
+

Step four: The perfect marriage between Git and R

+

Now, in Task 1, you should have learned how to build your very first GitHub repository. If you haven’t done + that, we can wait here while you go and do that. If you have already, or have an existing GitHub repository, + we can move on.

+

So, you should have a repository on GitHub, complete with a README file, a LICENSE + file and some other bits and bobs.

+

What we are going to do now, is integrate that repository with Git. Steady now.

+
    +
  1. Firstly, go to Project > Create Project > Version Control > Git.
  2. +
  3. Back on GitHub, you should see a bit where there is a https:// URL. + That is the link to your repository, and it gives you the option to clone it in your desktop. For now, just + copy that link, switch back to RStudio, and paste it into the ‘Repository URL’ as indicated.
  4. +
  5. Give the project a directory name, like test, Jim, or whatever you want.
  6. +
  7. Next, browse for the place on your desktop where you want this project to live, its subdirectory.
  8. +
  9. Click ‘Create Project’, and let the magic be done!
  10. +
+

What you just did was tell RStudio to associate a new project in R with specific repository on GitHub.

+
+
+

Step four: Alternative

+

If you still haven’t built your first repository on GitHub, we can do something slightly different here. In + RStudio, click New project and then New Directory. Call it what you want and change the + directory as needed, make sure to tick Create a git repository, and then click Create + Project. This creates an .Rproj file, which you can manage in the usual way through + RStudio, including adding README.mdand LICENSE.md files as discussing in Task 1.

+
+
+

Step five: Getting content with content

+

Remember that README file we created a while back? Well, it’s time to write it. Thinking back to + Task 1, there were some specific things that we said make a good README file. Do you remember + what any of them were? Just to refresh your memory, these were:

+
    +
  • What is this project about and what does it do.
  • +
  • Why should people care, and why is it useful.
  • +
  • How can someone get started contributing to the project.
  • +
  • Who can be contacted in case someone needs help.
  • +
  • A link to the license, contributing guidelines, and code of conduct.
  • +
  • A description of the project structure.
  • +
  • Who is involved, and what are their roles.
  • +
  • The current status of the project.
  • +
+

So, in RStudio, open that file try adding just a bit of information about this for your project. If you are + doing this for an actual project, try and make it useful. If you are just tinkering for now, you can add what + you want.

+

Remember that your README file is in markdown (.md) format. For a refresher on some of the + simple syntax markdown uses, check this handy cheatsheet.

+

+ +

+

+ Screenshot of what this module looks in markdown, during development. Meta. +

+


+
+
+

Step six: A brave commitment

+

OK, so now you should have a nicely edited README file. Now we are going to ‘commit’ this to the + project using Git. This is basically the equivalent of saving this version of your project, with a record of + what changes were made. Successive commits produce a history that can be examined at a later time, allowing + you to work with confidence.

+

There are a few ways of doing this.

+
    +
  1. Go to Tools > Version Control > Commit
  2. +
  3. In the environment pane in RStudio, there should be a new ‘Git’ tab. Handy.
  4. +
  5. In your console pane, there should now be a new ‘Terminal’, which you can run Git command lines through. +
  6. +
+

Let’s just stick with the second option for now. This Git pane shows you which files have been changed and + includes buttons for the most important Git commands we saw earlier.

+

Select the README file in the Git window, which should show up automatically if you have made + any edits to it. This adds that file to the ‘staging’ area, which is sort of like the pre-saving space for + your work. Click ‘Commit’ and a new window should pop up.

+

Here, you have a chance to review your changes, and write a nice commit message. Type in something brief, but + informative about the changes that you have made in this version or snapshot of your work. You want this to be + enough information so that if you or someone else looks back on it, you’ll know why you made this commit and + the changes associated with it. These are like safety nets for your project in case you need to fall back for + some reason.

+
+

Pro-tip: Here, you will see a list of all the changes you have made since your last + commit. Older removed lines are in red, and newly added lines are in green. Double check these to make sure + that the edits you have made are the ones you intended to make. This is really helpful for spotting typos, + stray edits, and any other little mistakes you might have accidentially introduced. Safety first.

+
+

Note If you are colour-blind and can’t see which lines have been added or removed, you can + use the line numbers in the two columns on the left of the window as a guide. Here, the number in the first + column identifies the older version, and the number in the second column identifies the new version.

+

Now when you click ‘Commit’, another window will pop up, telling you how many files you have changed and the + number of lines within that file you have changed. Close that little window down.

+


+
+
+

Step seven: PUSH!

+

Click the Push button in the top right of the new window. A new window will pop up now. What this is + doing is synchronising the files changed on your local repository with the README file to the + online version of the project on GitHub.

+

To do this from the Shell, use the following command:

+

git push -u origin master

+

Some times here you will be prompted to add your username and password from GitHub, which you should do if + asked.

+

Close that window down, and the next one. Go to your project on GitHub, refresh, and check that the + README file is still there in all its newly edited glory. You should see the commit message you + made next to the file too.

+


+

OPTIONAL ADVANCED/AWESOME STEP

+

Alright, so you just pushed some content to your first repo, awesome! Now let’s put it into practice for a + real project. Like, the one you are participating in right now. Let’s try this out:

+
    +
  1. +

    Go to the repositors for this project on GitHub

    +
  2. +
  3. +

    Fork the repository to your own GitHub account. The URL for this should be: + https://github.com/OpenScienceMOOC/Module-5-Open-Research-Software-and-Open-Source.git

    +
  4. +
  5. +

    Head into RStudio, go to File > New Project, choose Version Control, select + Git, and then paste the forked repository URL found in your copy of the repository. You now have + your own versioned copy of this whole module. Neat. Save this somewhere on your local machine.

    +
  6. +
  7. +

    Now, you need to tell Git that a different version of this project exists. Open up the Shell, + and enter the command: + git remote add upstream https://github.com/OpenScienceMOOC/Module-5-Open-Research-Software-and-Open-Source +

    +
  8. +
  9. +

    What you just did was name the original branch here upstream, just to keep things simple for + now. Now, create a new branch to document your changes to this independent of the main + branch. Enter the command: git checkout -b proposed-changes master

    +
  10. +
  11. +

    You just created a new branch called proposed-changes where you can now edit all of the + content and files to your heart’s delight. Hopefully, the structure of this project is simple enough for + you to navigate around. All of the raw files for the MOOC can be found in the + content_development folder, and this is Task_3.md.

    +
  12. +
  13. +

    If you scroll to the bottom of Task_3.md, you should see a place where you can edit in your + name and affiliation. Add these in, and then go through the commit procedure detailed above. If you see + anything else that needs editing too, feel free to add them in too!

    +
  14. +
  15. +

    Now, you want to push the changes back to the original branch. Use the following command in your + Shell: git push origin proposed-changes

    +
  16. +
  17. +

    Go back to GitHub and find your fork here. Click the little green button, and create a pull request. This + is essentially a review to integrate the changes made into the original branch for this MOOC project.

    +
  18. +
  19. +

    The owners in charge of the MOOC project will now get a notification of this, review it, and confirm it + if everything went to plan! We will review it, and if it all went okay, your name will now appear for all + eternity as someone who completed this advanced task.

    +
  20. +
  21. +

    Have a cup of tea, coffee, or wine to celebrate!

    +
  22. +
+

CONGRATULATIONS

+

You just integrated Git with R Studio, and made your first change to a version controlled project. Your life + will now never be the same, and your research workflow will probably be more rapid, agile, and collaborative + than ever. Good luck going back to Word.

+

The great thing is that this doesn’t have to just be used for code. You can use it for plain text, markdown, + html, and, well, R code. The possibilities are limitless - what you have just learned is a new form of openly + collaborative project management that works for an enormous range of tasks.

+

From now on, it is all up to you! Some advice is to:

+
    +
  • +

    Make frequent commits. Treat Git like your puppy, in that it requires constant and special attention. + Just a pat on the head every now and then is enough to keep it satisfied, but it’ll be happiest with + sustained servicing.

    +
  • +
  • +

    The best way to do this is to make a commit each time you work on a specific problem. For example, + writing a paragraph, running an analysis, or fixing a bug.

    +
  • +
  • +

    Push often. Don’t let those commits build up, otherwise you run more risk of getting into merge + conflicts. Seeing as these can be the stuff of nightmares, just make sure to push often.

    +
  • +
  • +

    Pull often. If others are working remotely on the same project, you will want to stay up to date with + their changes. Make sure to frequently pull in their changes from GitHub to make sure you are all in sync. +

    +
  • +
  • +

    Experiment and explore! This task really only scratches the surface, and there are many different + functions, tools, and ways this can be used. Really, it is up to you to find out how to use this + information to improve your research workflow, and ultimately collaborate on better, more open and + reliable research!

    +
  • +
  • +

    To learn more about issues, branches, merge conflicts, pull requests, and other advanced aspects of using + Git and RStudio, check out this awesome guide by Hadley + Wickham.

    +
  • +
+


+

Know a way this content can be improved?

+

Time to take your new GitHub skills for a test-run! All content development primarily happens here. + If you have a suggested improvement to the content, layout, or anything else, you can make it and then it will + automatically become part of the MOOC content after verification from a moderator!

+
+
+

List of participants who completed the ADVANCED version of this task

+
    +
  • Brendan Palmer,CRF-C, University College Cork
  • +
  • Lisa Matthias, Freie Universität Berlin
  • +
  • Hollie Marshall, University of Leicester
  • +
  • Eric D. Wilkey, Western University, Canada
  • +
  • José-Raúl Canay-Pazos, Universidade de Santiago de Compostela, Spain
  • +
  • Encarnación Martínez Álvarez, Spain
  • +
  • Alberto Albz Marocchino, Italy
  • +
  • Iratxe Rubio, Basque Centre for Climate Change BC3
  • +
+

CC0 Public Domain Dedication

+
+
+ + + + +
+ + + + + + + + + \ No newline at end of file diff --git a/content_development/MAIN.html b/content_development/MAIN.html index 927fdaf..121ca5e 100644 --- a/content_development/MAIN.html +++ b/content_development/MAIN.html @@ -4,723 +4,9565 @@ - - - + + + -MAIN.utf8.md + MAIN.utf8.md - - - - - + + + + - - + + + + + + + + - // Change the URL when tabs are clicked - $('a', context).on('click', function(e) { - history.pushState(null, null, this.href); - showStuffFromHash(context); - }); - return this; - }; -}(jQuery)); - -window.buildTabsets = function(tocID) { - - // build a tabset from a section div with the .tabset class - function buildTabset(tabset) { - - // check for fade and pills options - var fade = tabset.hasClass("tabset-fade"); - var pills = tabset.hasClass("tabset-pills"); - var navClass = pills ? "nav-pills" : "nav-tabs"; - - // determine the heading level of the tabset and tabs - var match = tabset.attr('class').match(/level(\d) /); - if (match === null) - return; - var tabsetLevel = Number(match[1]); - var tabLevel = tabsetLevel + 1; - - // find all subheadings immediately below - var tabs = tabset.find("div.section.level" + tabLevel); - if (!tabs.length) - return; - - // create tablist and tab-content elements - var tabList = $(''); - $(tabs[0]).before(tabList); - var tabContent = $('
'); - $(tabs[0]).before(tabContent); - - // build the tabset - var activeTab = 0; - tabs.each(function(i) { - - // get the tab div - var tab = $(tabs[i]); - - // get the id then sanitize it for use with bootstrap tabs - var id = tab.attr('id'); - - // see if this is marked as the active tab - if (tab.hasClass('active')) - activeTab = i; - - // remove any table of contents entries associated with - // this ID (since we'll be removing the heading element) - $("div#" + tocID + " li a[href='#" + id + "']").parent().remove(); - - // sanitize the id for use with bootstrap tabs - id = id.replace(/[.\/?&!#<>]/g, '').replace(/\s/g, '_'); - tab.attr('id', id); - - // get the heading element within it, grab it's text, then remove it - var heading = tab.find('h' + tabLevel + ':first'); - var headingText = heading.html(); - heading.remove(); - - // build and append the tab list item - var a = $('' + headingText + ''); - a.attr('href', '#' + id); - a.attr('aria-controls', id); - var li = $('
  • '); - li.append(a); - tabList.append(li); - - // set it's attributes - tab.attr('role', 'tabpanel'); - tab.addClass('tab-pane'); - tab.addClass('tabbed-pane'); - if (fade) - tab.addClass('fade'); - - // move it into the tab content div - tab.detach().appendTo(tabContent); - }); - // set active tab - $(tabList.children('li')[activeTab]).addClass('active'); - var active = $(tabContent.children('div.section')[activeTab]); - active.addClass('active'); - if (fade) - active.addClass('in'); - - if (tabset.hasClass("tabset-sticky")) - tabset.rmarkdownStickyTabs(); - } - - // convert section divs with the .tabset class to tabsets - var tabsets = $("div.section.tabset"); - tabsets.each(function(i) { - buildTabset($(tabsets[i])); - }); -}; - - - - - - - - - - - - + - - - - -
    - - - - - - - - - - - - - - -
    -

    Module 5: Open Research Software and Open Source

    - -
    -

    Introduction

    -

    Welcome to Module 5 of the Open Science MOOC: Open Research Software and Open Source.

    -

    This module has been developed in the open through collaboration by an international team of Open Source afficianados. Everything you see here has been developed in the open through interactive feedback and collaboration from the wider community. It comprises a series of videos, infographics, text-based reading, and practical tasks for you to sink you teeth into.

    -

    Don’t forget you can join in the discussions over at our open Slack channel. Please do introduce yourself at #module5opensource, and tell us a bit about who you are, your background, and how you ended up here!

    -
    -

    Who is this module for?

    -

    This module is designed primarily for computational researchers at the graduate and undergraduate level, as well as budding data scientists and any other researcher who uses analytical code or software. In a modern day research environment, this covers pretty much anyone who uses a computer for ther work.

    -
    -

    “An article about computational result is advertising, not scholarship. The actual scholarship is the full software environment, code and data, that produced the result.” - J. Buckheit and D. L. Donoho, 1995.

    -
    -

    Software and technology underpin much of modern research, which is now almost inevitably computational in one way or another - search engines, social networking platforms, analytical software, and digital publishing. With this, there is an ever-increasing demand for more sophisticated Open Source Software, matched by an increasing willingness for researchers to openly collaborate on new tools.

    -

    The power of Open Source is in that it lowers the barriers to collaboration and adoption, therefore allowing ideas and technology to spread more rapidly. This Module will introduce the necessary tools required for transforming software into something that can be openly accessed and re-used by others.

    -

    -

    -Image by Patrick Hochstenbach (CC0 1.0 Universal) -

    -


    -
    -
    -

    Specific learning objectives for this Module:

    -
      -
    1. Learn the characteristics of open software; understand the ethical, legal, economic, and research impact arguments for and against Open Source Software, and further understand the quality requirements of open code.

    2. -
    3. Be able to turn code made for personal use into open code which is accessible by others.

    4. -
    5. Use software (tools) that utilizes open content and encourages wider collaboration.

    6. -
    -


    -
    -
    -
    -

    What is Open Source Software

    -

    Virtually all modern scientific research workflows rely on a range of software tools, either operating on different datasets, with different parameters, and applied iteratively in various ways (data science) or operating on different inputs and using models and methods to predict some output state (computational science). Open Source Software (OSS) is computer software in which the full source code is available under a specific license that enables other users to access, view, modify, and redistribute that code for any purpose. Because OSS requires such a license, it typically remains free of charge by default. This explicit licensing is also what differentiates OSS from free software. Re-using OSS for analysis, simulation and visualisation for research is also typically easier and more flexible compared to proprietary software. Often, whether we know it or not, we are already using OSS as part of our own research workflows.

    -

    OSS fits into the broader scheme of Open Science as it helps to make the full research environment, including the software that produced the research results, fully accessible and re-usable. As such, it forms a necessary component for the best practices (Jiménez et al., 2018) and repeatability and reproducibility of research (both personally and by others), along with other components, such as sharing data (Stodden, 2010).

    -

    In some cases, sharing of source code can even be conditional for the acceptance of associated research manuscripts (Shamir et al., 2013). It is also generally perceived to increase research impact (Vandwalle, 2012).

    -

    Some of common advantages for developers include:

    -
      -
    • Increased developer loyalty and empowerment;

    • -
    • Lower costs of services and marketing;

    • -
    • Increased branding of services and products;

    • -
    • Production of high quality software at lower expense;

    • -
    • Flexibility and rapid innovation;

    • -
    • Customisation and modular integration;

    • -
    • Increased reliability and independence; and

    • -
    • Based on open standards available to everyone.

    • -
    -

    As such, the main advantages for researchers (users) include lower costs, increased transparency, increased security and stability, no vendor ‘lock in’ with increased user control, and overall higher quality. Furthermore, sharing OSS allows researchers to receive credit for their efforts, for example through direct software citation (Smith et al., 2016).

    -

    Commonly used OSS include the Mozilla Firefox internet browser and the LibreOffice full office suite. LibreOffice is similar to the popular Microsoft Office, including a word processor, spreadsheet manager, and slide presentation software, but is completely free and Open Source.

    -

    Some regard the OSS movement to represent a counter-movement to neoliberalism and privatisation, through defiance of regulations and norms in the construction and re-use of information, and a potential transformation of modern-day capitalism through making software abundantly available with minimal effort. See The free/open source software movement: Resistance or change? by Panayiota Georgopoulou for more on this topic.

    -


    -
    -
    -

    Principles of Open Source Software

    -

    The Open Source Initiative, one of the pioneers of OSS, offers the following definition:

    -

    Don’t worry, you don’t need to memorise all of this, but it’s good to know the principles that OSS is coming from.

    -
      -
    • Free Redistribution: The license shall not restrict any party from selling or giving away the software as a component of an aggregate software distribution containing programs from several different sources. The license shall not require a royalty or other fee for such sale.

    • -
    • Source Code: The program must include source code, and must allow distribution in source code as well as compiled form. Where some form of a product is not distributed with source code, there must be a well-publicized means of obtaining the source code for no more than a reasonable reproduction cost preferably, downloading via the Internet without charge. The source code must be the preferred form in which a programmer would modify the program. Deliberately obfuscated source code is not allowed. Intermediate forms such as the output of a preprocessor or translator are not allowed.

    • -
    • Derived Works: The license must allow modifications and derived works, and must allow them to be distributed under the same terms as the license of the original software.

    • -
    • Integrity of The Author’s Source Code: The license may restrict source-code from being distributed in modified form only if the license allows the distribution of “patch files” with the source code for the purpose of modifying the program at build time. The license must explicitly permit distribution of software built from modified source code. The license may require derived works to carry a different name or version number from the original software.

    • -
    • No Discrimination Against Persons or Groups: The license must not discriminate against any person or group of persons.

    • -
    • No Discrimination Against Fields of Endeavour: The license must not restrict anyone from making use of the program in a specific field of endeavour. For example, it may not restrict the program from being used in a business, or from being used for genetic research.

    • -
    • Distribution of License: The rights attached to the program must apply to all to whom the program is redistributed without the need for execution of an additional license by those parties.

    • -
    • License Must Not Be Specific to a Product: The rights attached to the program must not depend on the program’s being part of a particular software distribution. If the program is extracted from that distribution and used or distributed within the terms of the program’s license, all parties to whom the program is redistributed should have the same rights as those that are granted in conjunction with the original software distribution.

    • -
    • License Must Not Restrict Other Software: The license must not place restrictions on other software that is distributed along with the licensed software. For example, the license must not insist that all other programs distributed on the same medium must be open-source software.

    • -
    • License Must Be Technology-Neutral: No provision of the license may be predicated on any individual technology or style of interface.

    • -
    -

    Now, this all might be a little complex to remember. However, it can be summarised as making software as re-usable as possible for future works, while also being freely available.

    -


    -
    -
    -

    An Open Source checklist

    -

    There are a number of existing platforms and tools that support OSS and collaboration. The Open Science Training Handbook provides a check-list to use for evaluating the ‘openness’ of existing research software, based on the Open Source Definition above:

    -
      -
    • [ ] Is the software available to download and install?

    • -
    • [ ] Can the software easily be installed on different platforms?

    • -
    • [ ] Does the software have conditions on the use?

    • -
    • [ ] Is the source code available for inspection?

    • -
    • [ ] Is the full history of the source code available for inspection through a publicly available version history?

    • -
    • [ ] Are the dependencies of the software (hardware and software) described properly? Do these dependencies require only a reasonably minimal amount of effort to obtain and use?

    • -
    -

    Check, check, check, done! Simples.

    -


    -
    -
    -

    The Open Source community and its governance

    -

    There are two main camps within the free software community: The free software movement, and the OSS movement. Both have differing ideologies based on user liberties and the practical applications of software. Often, the term ‘FLOSS’ is used to reconcile these two political camps, and means ‘Free/Libre and Open Source Software’; Libre being French and Spanish for ‘free’ in the context of freedom.

    -

    The core principle of re-use is what separates OSS from ‘Free Software’. Free and Open Source Software (FOSS) is an inclusive term to describe software that can be classified as both free and Open Source. A good example of FOSS is the Ubuntu Linux operation system.

    -

    The big difference between free software and OSS is that the former must distribute updated versions under the same license as the original, whereas newer versions of OSS can be distributed under different licenses. FOSS combines the best of both worlds.

    -

    These definitions have now become widely adopted, both by international governments, as well as some large organisations such as the Mozilla Foundation and the Wikimedia Foundation. Major organisations in the FLOSS space include the UK’s Software Sustainability Institute, who produce valuable resources such as their recent Software Deposit Guidance for Researchers.

    -
    -

    For individual projects

    -

    A typical open source project has the following types of formal roles:

    -
      -
    • Author: It is the person that created the project
    • -
    • Owner: The person/s who has administrative ownership over the organization or repository
    • -
    • Maintainers: Contributors who are responsible for driving the vision and managing the organizational aspects of the project. (They may also be authors or owners of the project.)
    • -
    • Contributors: The user that has already contributed to the project.
    • -
    • Community Members: People who use the project. They might be active in conversations, create new issues or express their opinion on the future project improvements.
    • -
    -

    Typically, roles are made public through either the README file, a Contributors file, or a separate team page for the project.

    -


    -
    -
    -
    -

    Existing platforms and tools for Open Source Software

    -

    Virtual environments and machines are becoming increasingly popular as high-powered research workflow enablers, and many of these are built upon OSS (e.g., operating systems, programming languages, and data processing frameworks). Popular services include Google Cloud and Amazon Web Services, which also assist with database storage and content delivery, as well as computational power. InsideDNA is a computing platform for reproducible research in bioinformatics, genomics and the life sciences.

    -

    As mentioned above, LibreOffice provides an Open Source alternative to Microsoft Office. The two are almost completely compatible, just with different default file formats. For citation managers, Zotero is the most popular Open Source alternative to proprietary platforms such as Mendeley or EndNote.

    -

    Zotero uses the BibTeX (pronounced ‘bib-tech’) format, based on LaTeX (pronounced ‘lay-tech’), and has browser plugins to make citation management simple. By integrating this with other software such as LibreOffice, it is now possible to have a fully Open Source research workflow in many cases.

    -
    -

    GitHub

    -
    -

    Did you know that this entire project was build as an open and collaborative community effort in GitHub?

    -
    -

    GitHub is a popular hosting site for both software and non-software content (often called ‘notebooks’), with added capabilities for version control, project management and tracking, and storage services. GitHub is built on top of the OSS Git, which enables users to work remotely to maintain, share, and collaborate on research software and other non-software based projects.

    -

    Version control is essentially a process that takes snapshots of the files in a repository, and tracks modifications to them. It records when the changes were made, what they were, and who did them. If several people are working on one file at once, any overlapping changes are detected, and must be resolved prior to continuing. This provides a much more streamlined and automated process than manually saving and recording changes as projects develop. It also avoids the inevitable lists of confusing named file versions…

    -

    - -

    -

    -GitHub helps us to avoid, er, sub-optimal file naming conventions (source: XKCD) -

    -


    -

    One of the more popular and useful functions of GitHub is the issue tracker, which is used to organise OSS development. The above link takes you to the issue tracker for the development of this module! If you think there is something here that can improved, or you want to comment on, anyone can add or contribute to an issue there!

    -

    Other similar project hosting services include BitBucket, GitLab, and Launchpad. If the recent acquisition of GitHub by Microsoft is a bit off-putting to you, these are great alternatives.

    -

    However, we also know that GitHub can have quite a high learning curve. Which is why the first practical task for this MOOC will teach you how to set up your first GitHub project repository!

    -

    GO TO TASK 1: Building your first GitHub repository

    -


    -
    -
    -
    -

    Open Source Software used in research

    -

    Especially in scientific research, Open Source Software usage and development has become practically the norm. There’s a number of reasons for this beyond those that apply to the general acceptance of OSS by, for example, consumers, industry, or government. Among these reasons are:

    -
      -
    • Increasingly, algorithms implemented in analysis software form an integral part of the methods described in scholarly publications. As such, it is completely at odds with rigorous peer review if these algorithm implementations are closed to outsiders.

    • -
    • Scientific collaboration more often than not spans multiple institutions and distributed research networks where secrecy and command hierarchy is not maintained in a way that is ‘necessary’ for closed source development.

    • -
    • Many computational analyses are run in virtualized environments (such as institutional, national, or international ‘cloud’ infrastructures) and hosted on multi-user servers. Closed-source, commercial software often disallows such usage.

    • -
    • OSS development often relies on volunteers. In a time of budgetary constraints for scientific research, this is a clear advantage.

    • -
    -

    For these and other reasons, Open Source tools are very commonly used in scientific research. This includes usage in fields where many researchers are amateur developers themselves and rely on tools such as R for statistical analysis and scripting, which, in the last decade, has almost completely displaced commercial software for statistical analysis such as SPSS or JMP in a lot of fields. In fields such as bioinformatics, that involve a lot of file handling of the outputs of DNA sequencing platforms, general purpose scripting languages such as Python and commonly used libraries built on top of it (such as biopython) have become a vital part of the toolkit of many researchers.

    -

    - -

    -

    -Python -

    -


    -

    Tools such as R and Python are essentially software for writing software. Although programming is an increasingly common activity among researchers, of course not every scientist does this. One step away from programming is the chaining together of the inputs and outputs of various analysis tools in longer workflows. As an example from genomics, a very common workflow is to start out with high-throughput sequencing reads and then i) do basic quality control checks; ii) map the reads against a reference genome; iii) identify the points where the new data are at variance with the reference. These steps are routinely executed as a workflow where a different Open Source executable is run in a Linux command-line environment for each of the three steps. Although this is arguably not quite open source software development, it does involve the usage and production of open source artifacts (such as Linux shell scripts) for which the principles that we discuss in this module are applicable.

    -

    - -

    -

    -R -

    -


    -

    Lastly, OSS is also used in scientific research for reasons that more closely mirror those that drive the adoption of OSS in wider society, namely that it is cheap. For example, individuals or organizations might decide to switch from Microsoft Office to LibreOffice for manuscript writing or spreadsheet processing because the latter is free (both as in ‘free beer’ and ‘free speech’). Likewise, the choice to switch from ArcGIS to QGIS for the analysis of geographic information might be prompted simply by cost considerations.

    -
    -
    -

    Getting Started with OSS - FAQ

    -

    I’m using X[e.g. Matlab,STATA,Excel] and I want to transition to something more open. What are the next steps?

    -

    Even if you are using proprietary software, you can usually still share your source code/documents etc. The best first step is sharing whatever you can.

    -

    Great! I can put them in my new github repo.

    -

    If that’s enough for you for now great! If not for most pieces of proprietary software there are Open Source equivalents. Have a go with one and see what you think.

    - ---- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
    ClosedOpen
    MatlabPython, Julia
    STATA/SPSSR
    MS OfficeLibreOffice
    MathematicaJupyterLab
    Test out your new Pull Request -PR- Skills …… by adding your own example here
    -

    Cool! But if I make the switch will I be stuck: taking ages to learn a new tool/ without support /with buggy software.

    -

    Good question! The answer is it depends. The best thing to do is find someone who’s made the switch before and learn from their experience. Or just do a Google search! Some OSS is much better than their closed counterparts, some aren’t, so it’s worth choosing carefully.

    -
    -
    -

    Making good software for re-use

    -

    The most likely person who might want to re-use your software in the future is…you! So while sharing is always better than not sharing, you can make your own life, and that of others, much easier through appropriate documentation. Documentation can include several things, such as including helpful comments and annotations in the code that help to explain why a particular action was performed, rather than what it is intended to achieve.

    -

    One of the most critical aspects of this is including an informative README file, that accompanies almost every OSS project, and some times even more than one. It can be a good practice to include one such file in every directory, that includes a list of files, a table of contents, and what the purpose of the directory is. The README file is typically just plain text or markdown (again, such as all of the ones for the MOOC!), and can include critical information for how to install and run software, previous dependencies and requirements, as well as tutorials or examples.

    -
    -

    Did you know… The term README is some times playfully ascribed to the famous scene in Lewis Carroll’s Alice’s Adventures In Wonderland in which Alice confronts magic munchies labeled with “Eat Me”" and “Drink Me”. Potent.

    -
    -

    The purpose here is to provide sufficient information to maximise the re-use and reproducibility of the computational environment, such that someone with no experience with the project can easily access and re-use the software (Sandve et al., 2013). By lowering the barriers to entry, you increase the chances of others being able to re-use your work, which is one of the ultimate goals of OSS (Ince et al., 2012).

    -

    An extension of this that can help to make things even easier for future re-use is ‘container’ technology. Containers are like an ecosystem frozen in time, where the code, the data, any other dependencies, are all perfectly preserved, packaged and saved in the present functioning versions. This means that anyone in the future any one can come in and run the analyses again. As such, they are generally good for re-use, but this can come at the sacrifice of modification or understanding by others, as often a lot of details can be hidden within the source code and its dependencies. Common examples of container implementation in research include Rocker (a Docker container for the R language), Binder, and Code Ocean.

    -

    Sustainable software is good software.

    -


    -
    -
    -

    10 simple rules for reproducible computational research

    -

    The 10 simple rules for making computational research more reproducible, based on Sandve et al., (2013), are:

    -
      -
    1. For every result, keep track of how it was produced.
    2. -
    3. Avoid manual data manipulation steps.
    4. -
    5. Archive the exact versions of all external programs used.
    6. -
    7. Version control all custom scripts.
    8. -
    9. Record all intermediate results, when possible in standardised formats.
    10. -
    11. For analyses that include randomness, note underlying random seeds.
    12. -
    13. Always store raw data behind plots.
    14. -
    15. Generate hierarchical analysis output, allowing layers of increasing detail to be inspected.
    16. -
    17. Connect textual statements to underlying results.
    18. -
    19. Provide public access to scripts, runs, and results.
    20. -
    -

    - -

    -

    -Infographic adapted from Sandve et al., (2013). Feel free to download this and print it out to keep handy during your research! -

    -


    -

    If you follow these steps, along with the processes in Task 1 and Task 2, you should be fine!

    -


    -
    -
    -

    Open Source licensing

    -

    An Open Source license is a type of license designed specifically for software and code that make it explicit what the legal conditions for sharing and re-use are. As mentioned above, the addition of a suitable license is what differentiates publicly shared software from OSS. For example, the widely used MATLAB is proprietary software, and Octave is an openly licensed alternative programming language.

    -

    There are currently more than 1,400 unique Open Source licenses, a complexity born from the difficulty in understanding the differences between the legal implications across different license.

    -

    Some of the more common licenses include:

    - -

    You don’t need to know all the legal itty gritty behind all of these, but it is good to at least know what options are avaiilable to you.

    -

    There are two ways in which contributions to a project become licensed:

    -
      -
    1. Explicitly, whereby the individual contribution has a clearly indicated license independent of the main project; or
    2. -
    3. Implicitly, whereby the contribution falls under the original licensing code of the main project.
    4. -
    -

    Thankfully, the process of selecting an Open Source license is relatively trivial, thanks to user-friendly tools such as Choose A License. Each of these licenses allows other users to use, copy, distribute, and build upon your work, often while ensuring that the creators are appropriately recognised for their work. Here, the key is selecting an appropriate license for your work, depending on what you want, or do not want, others to do with it.

    -


    -
    -
    -

    Software citation

    -

    Citations provide one of the most important interactions in scholarly research, forming the basis of our referencing and metrics systems. Typically, this is performed thanks to the assistance of a permanent unique identifier such as a Digital Object Identifiers (DOI). A DOI is a persistent identifier, implemented in the Handle System, that meets a common standard, depending on the purpose, such as for identifying academic information. Such identification is critical for tracking the genealogy and provenance of research, for reproducibility, as well as for giving appropriate credit to those who have created the software. Importantly, software should be considered a legitimate output from scholarly research, and citation is becoming an increasingly common way to indicate that.

    -

    In 2016, Smith et al., 2016 wrote a research paper about the principles of software citation as part of the FORCE11 Software Citation Working Group. In the same way that you would want to cite software that you have used as part of good research practices, it is important to make your research easily citable too. When citing any software used for your own research, you should include at minimum:

    -
      -
    • The author name(s),
    • -
    • Software title,
    • -
    • Version number, and
    • -
    • The unique identifier/locator (DOI or URL).
    • -
    -

    The six principles of software citation by Smith et al., (2016) are provided here:

    -
      -
    • Importance: Software should be considered a legitimate and citable product of research. Software citations should be accorded the same importance in the scholarly record as citations of other research products, such as publications and data; they should be included in the metadata of the citing work, for example in the reference list of a journal article, and should not be omitted or separated. Software should be cited on the same basis as any other research product such as a paper or a book, that is, authors should cite the appropriate set of software products just as they cite the appropriate set of papers.

    • -
    • Credit and attribution: Software citations should facilitate giving scholarly credit and normative, legal attribution to all contributors to the software, recognizing that a single style or mechanism of attribution may not be applicable to all software.

    • -
    • Unique identification: A software citation should include a method for identification that is machine actionable, globally unique, interoperable, and recognized by at least a community of the corresponding domain experts, and preferably by general public researchers.

    • -
    • Persistence: Unique identifiers and metadata describing the software and its disposition should persist - even beyond the lifespan of the software they describe.

    • -
    • Accessibility: Software citations should facilitate access to the software itself and to its associated metadata, documentation, data, and other materials necessary for both humans and machines to make informed use of the referenced software.

    • -
    • Specificity: Software citations should facilitate identification of, and access to, the specific version of software that was used. Software identification should be as specific as necessary, such as using version numbers, revision numbers, or variants such as platforms.

    • -
    -

    Note: For instructions on ‘how to make your software citable’ see the section Using GitHub and Zenodo below and Task 2: Linking GitHub and Zenodo.

    -


    -
    -
    -

    Using GitHub and Zenodo

    -

    GitHub is a popular tool for project management, content storage, and version control. Note that GitHub itself is not OSS. However, Git, the tool which it is based on, is. Git is designed to help manage the source code files, and the updates to them, for a software-related project. However, it can also be extended to other non-software projects; for example, this MOOC!

    -

    However, getting research onto GitHub is just the first step. It is equally important to make it persistent and re-usable, which is why having a Digital Object Identifier (DOI) associated with it can be useful. The simplest way to do this is through a service called Zenodo, which is a free and open source multi-disciplinary repository created by OpenAIRE and CERN, and can be used to assign a DOI to individual GitHub repositories. There is a GitHub Guide that explains the details, which involve linking GitHub repositories directly through to Zenodo so that when developers create formal releases for their software, Zenodo creates and archives a that version of the software.

    -

    There’s nothing special about using Zenodo for creating DOIs, other than its free of cost; other general repositories can also be used, such as DataCite DOI Fabrica, or your own institutional repositories such as Caltech’s.

    -

    A lot of researchers might typically be afraid of sharing code which is incomplete, buggy, or imperfect. However, in the OSS community, such a practice of sharing ‘raw’ code is fairly commonplace. Sharing code openly enables others to re-use and improve it, as well as to engage in a deeper way with any research associated with it. This is one of the fundamental aspects of peer-collaboration, perhaps best exemplified by the traditional process of research manuscript peer review.

    -

    Task 2 will guide you through the process of linking a GitHub repository to Zenodo for archiving.

    -
    -

    Did you know… All content produced for this MOOC is available as part of a community in Zenodo?

    -
    -

    GO TO TASK 2: Linking GitHub and Zenodo

    -


    -
    -
    -

    Collaborating and contributing through Open Source

    -

    Often, OSS is developed in a public, decentralised, collaborative manner between multiple contributors. The purpose of this is to enhance the diversity and scope of a project and its design, in order to become more beneficial and sustainable. Such an approach was famously likened to a ‘bazaar’ model by Eric Raymond, an early OSS proponent. One of the major guiding principles of this is that of peer production, which relies on self-organised communities to regulate the development of content, co-ordinated towards a shared goal or outcome.

    -

    OSS projects rely heavily on volunteer collaboration, which often entails a constant flux of newcomers in order to become productive and sustainable (Steinmacher et al., 2014). Creating the right social atmosphere for a project, and a welcoming engagement environment, are often critical to successful collaboraitons in OSS.

    -


    -
    -
    -

    Where to go from here

    -

    Hopefully now you have come to see the importance of software as a cornerstone of modern science, and the importance that OSS plays in this.

    -

    The learning outcomes from this should be:

    -
      -
    1. You will now be able to define the characteristics of OSS, and some of the ethical, legal, economic and research impact arguments for and against it.

    2. -
    3. Based on community standards, you will now be able to describe the quality requirements of sharing and re-using open code.

    4. -
    5. You will now be able to use a range of research tools that utilise OSS.

    6. -
    7. You will now be able to transform code designed for their personal use into code that is accessible and re-usable by others.

    8. -
    9. Software developers will be able to make their software citable, and software users will know how to cite the software they use.

    10. -
    -


    -

    BONUS TASK

    -

    If you have completed Task 1 and Task 2, we also have a BONUS TASK for you, if you want to take your skills a step further. Task 3 will take you a step deeper into integrating Git into a typical research workflow by showing you how to integrate it with R Studio. It is recommended that you have completed the first 2 tasks before proceeding with this one.

    -

    However, your Open Source journey does not stop here! This was just the beginning, and there are some incredible resources out there if you would like to do or learn more:

    -
      -
    • If you feel particularly inspired by this, you can endorse the Science Code Manifesto, which is based on the five principles of code, copyright, citation, credit, and curation.

    • -
    • To launch and develop your own project, the Open Source Guides program offers a range of practical guides and skills to help launch and advance your OSS projects.

    • -
    • For a detailed look at OSS-based research workflows, the Open Science, Open Data, Open Source hand-guide by Pedro L. Fernandes and Rutger A. Vos is one of the top resources online.

    • -
    • More formalised journal venues also exist for software-based articles, including The Journal of Open Research Software and The Journal of Open Source Software. A list of such venues is also available.

    • -
    • The PLOS Open Source Toolkit provides a global forum for Open Source hardware and software research and applications.

    • -
    • The NumFOCUS is a nonprofit organization that supports and promotes world-class, innovative, open source scientific software. Some of the projects they sponsor include:

      -
    • -
    -


    -
    -

    Further reading

    -

    These references here are just the beginning. They include some of the most useful general overviews of the Open Source landscape in research. However, if you want to be find something more specific to your own research field, then that path is there for you to explore!

    - -


    -
    -
    -

    Development Team

    - -

    Know a way this content can be improved?

    -

    Time to take your new GitHub skills for a test-run! All content development primarily happens here. If you have a suggested improvement to the content, layout, or anything else, you can make it and then it will automatically become part of the MOOC content after verification from a moderator!

    -

    CC0 Public Domain Dedication

    -
    -
    -
    - - - - -
    - - - - - + + + + +
    + + + + + + + + + + + + + + +
    +

    Module 5: Open Research Software and Open Source

    + +
    +

    Introduction

    +

    Welcome to Module 5 of the Open Science MOOC: Open Research Software and Open + Source.

    +

    This module has been developed in the open + through collaboration by an international team of Open + Source afficianados. Everything you see here has been developed in the open through interactive feedback + and collaboration from the wider community. It comprises a series of videos, infographics, text-based reading, + and practical tasks for you to sink you teeth into.

    +

    Don’t forget you can join in the discussions over at our open Slack channel. Please do introduce + yourself at #module5opensource, and tell us a bit about who you are, your background, and how you ended up + here!

    +
    +

    Who is this module for?

    +

    This module is designed primarily for computational researchers at the graduate and undergraduate level, as + well as budding data scientists and any other researcher who uses analytical code or software. In a modern + day research environment, this covers pretty much anyone who uses a computer for ther work.

    +
    +

    “An article about computational result is advertising, not scholarship. The actual scholarship is the + full software environment, code and data, that produced the result.” - J. Buckheit and D. L. Donoho, 1995. +

    +
    +

    Software and technology underpin much of modern research, which is now almost inevitably computational in + one way or another - search engines, social networking platforms, analytical software, and digital + publishing. With this, there is an ever-increasing demand for more sophisticated Open Source Software, + matched by an increasing willingness for researchers to openly collaborate on new tools.

    +

    The power of Open Source is in that it lowers the barriers to collaboration and adoption, therefore + allowing ideas and technology to spread more rapidly. This Module will introduce the necessary tools + required for transforming software into something that can be openly accessed and re-used by others.

    +

    +

    + Image by Patrick Hochstenbach (CC0 1.0 Universal) +

    +


    +
    +
    +

    Specific learning objectives for this Module:

    +
      +
    1. +

      Learn the characteristics of open software; understand the ethical, legal, economic, and + research impact arguments for and against Open Source Software, and further understand the + quality requirements of open code.

      +
    2. +
    3. +

      Be able to turn code made for personal use into open code which is accessible by others.

      +
    4. +
    5. +

      Use software (tools) that utilizes open content and encourages wider collaboration.

      +
    6. +
    +


    +
    +
    +
    +

    What is Open Source Software

    +

    Virtually all modern scientific research workflows rely on a range of software tools, either operating on + different datasets, with different parameters, and applied iteratively in various ways (data science) or + operating on different inputs and using models and methods to predict some output state (computational + science). Open Source Software (OSS) is computer software in which the full source code is available under a + specific license that enables other users to access, view, modify, and redistribute that code for any purpose. + Because OSS requires such a license, it typically remains free of charge by default. This explicit licensing + is also what differentiates OSS from free software. Re-using OSS for analysis, simulation and visualisation + for research is also typically easier and more flexible compared to proprietary software. Often, whether we + know it or not, we are already using OSS as part of our own research workflows.

    +

    OSS fits into the broader scheme of Open Science as it helps to make the full research environment, including + the software that produced the research results, fully accessible and re-usable. As such, it forms a necessary + component for the best practices (Jiménez + et al., 2018) and repeatability and reproducibility of research (both personally and by others), along + with other components, such as sharing data (Stodden, + 2010).

    +

    In some cases, sharing of source code can even be conditional for the acceptance of associated research + manuscripts (Shamir + et al., 2013). It is also generally perceived to increase research impact (Vandwalle, + 2012).

    +

    Some of common advantages for developers include:

    +
      +
    • +

      Increased developer loyalty and empowerment;

      +
    • +
    • +

      Lower costs of services and marketing;

      +
    • +
    • +

      Increased branding of services and products;

      +
    • +
    • +

      Production of high quality software at lower expense;

      +
    • +
    • +

      Flexibility and rapid innovation;

      +
    • +
    • +

      Customisation and modular integration;

      +
    • +
    • +

      Increased reliability and independence; and

      +
    • +
    • +

      Based on open standards available to everyone.

      +
    • +
    +

    As such, the main advantages for researchers (users) include lower costs, increased + transparency, increased security and stability, no vendor ‘lock in’ with + increased user control, and overall higher quality. Furthermore, sharing OSS + allows researchers to receive credit for their efforts, for example through direct software citation (Smith + et al., 2016).

    +

    Commonly used OSS include the Mozilla Firefox internet + browser and the LibreOffice full office suite. LibreOffice is + similar to the popular Microsoft Office, including a word processor, spreadsheet manager, and slide + presentation software, but is completely free and Open Source.

    +

    Some regard the OSS movement to represent a counter-movement to neoliberalism and privatisation, through + defiance of regulations and norms in the construction and re-use of information, and a potential + transformation of modern-day capitalism through making software abundantly available with minimal effort. See + The free/open source software movement: Resistance or + change? by Panayiota Georgopoulou for more on this topic.

    +


    +
    +
    +

    Principles of Open Source Software

    +

    The Open Source Initiative, one of the pioneers of OSS, offers the + following definition:

    +

    Don’t worry, you don’t need to memorise all of this, but it’s good to know the principles that OSS is + coming from.

    +
      +
    • +

      Free Redistribution: The license shall not restrict any party from selling or giving + away the software as a component of an aggregate software distribution containing programs from several + different sources. The license shall not require a royalty or other fee for such sale.

      +
    • +
    • +

      Source Code: The program must include source code, and must allow distribution in source + code as well as compiled form. Where some form of a product is not distributed with source code, there + must be a well-publicized means of obtaining the source code for no more than a reasonable reproduction + cost preferably, downloading via the Internet without charge. The source code must be the preferred form + in which a programmer would modify the program. Deliberately obfuscated source code is not allowed. + Intermediate forms such as the output of a preprocessor or translator are not allowed.

      +
    • +
    • +

      Derived Works: The license must allow modifications and derived works, and must allow + them to be distributed under the same terms as the license of the original software.

      +
    • +
    • +

      Integrity of The Author’s Source Code: The license may restrict source-code from being + distributed in modified form only if the license allows the distribution of “patch files” with the source + code for the purpose of modifying the program at build time. The license must explicitly permit + distribution of software built from modified source code. The license may require derived works to carry a + different name or version number from the original software.

      +
    • +
    • +

      No Discrimination Against Persons or Groups: The license must not discriminate against + any person or group of persons.

      +
    • +
    • +

      No Discrimination Against Fields of Endeavour: The license must not restrict anyone from + making use of the program in a specific field of endeavour. For example, it may not restrict the program + from being used in a business, or from being used for genetic research.

      +
    • +
    • +

      Distribution of License: The rights attached to the program must apply to all to whom + the program is redistributed without the need for execution of an additional license by those parties.

      +
    • +
    • +

      License Must Not Be Specific to a Product: The rights attached to the program must not + depend on the program’s being part of a particular software distribution. If the program is extracted from + that distribution and used or distributed within the terms of the program’s license, all parties to whom + the program is redistributed should have the same rights as those that are granted in conjunction with the + original software distribution.

      +
    • +
    • +

      License Must Not Restrict Other Software: The license must not place restrictions on + other software that is distributed along with the licensed software. For example, the license must not + insist that all other programs distributed on the same medium must be open-source software.

      +
    • +
    • +

      License Must Be Technology-Neutral: No provision of the license may be predicated on any + individual technology or style of interface.

      +
    • +
    +

    Now, this all might be a little complex to remember. However, it can be summarised as making software as + re-usable as possible for future works, while also being freely available.

    +


    +
    +
    +

    An Open Source checklist

    +

    There are a number of existing platforms and tools that support OSS and collaboration. The Open Science Training Handbook provides a + check-list to use for evaluating the ‘openness’ of existing research software, based on the Open Source + Definition above:

    +
      +
    • +

      [ ] Is the software available to download and install?

      +
    • +
    • +

      [ ] Can the software easily be installed on different platforms?

      +
    • +
    • +

      [ ] Does the software have conditions on the use?

      +
    • +
    • +

      [ ] Is the source code available for inspection?

      +
    • +
    • +

      [ ] Is the full history of the source code available for inspection through a publicly available version + history?

      +
    • +
    • +

      [ ] Are the dependencies of the software (hardware and software) described properly? Do these + dependencies require only a reasonably minimal amount of effort to obtain and use?

      +
    • +
    +

    Check, check, check, done! Simples.

    +


    +
    +
    +

    The Open Source community and its governance

    +

    There are two main camps within the free software community: The free software movement, and + the OSS movement. Both have differing ideologies based on user liberties and the practical + applications of software. Often, the term ‘FLOSS’ is used to reconcile these two political camps, and means + ‘Free/Libre and Open Source Software’; Libre being French and Spanish for ‘free’ in the context of freedom. +

    +

    The core principle of re-use is what separates OSS from ‘Free Software’. Free and Open Source Software (FOSS) + is an inclusive term to describe software that can be classified as both free and Open Source. A good example + of FOSS is the Ubuntu Linux operation system.

    +

    The big difference between free software and OSS is that the former must distribute updated versions under + the same license as the original, whereas newer versions of OSS can be distributed under different licenses. + FOSS combines the best of both worlds.

    +

    These definitions have now become widely adopted, both by international governments, as well as some large + organisations such as the Mozilla Foundation and the + Wikimedia Foundation. Major organisations in the FLOSS + space include the UK’s Software Sustainability Institute, who + produce valuable resources such as their recent Software Deposit Guidance for + Researchers.

    +
    +

    For individual projects

    +

    A typical open source project has the following types of formal roles:

    +
      +
    • Author: It is the person that created the project
    • +
    • Owner: The person/s who has administrative ownership over the organization or + repository
    • +
    • Maintainers: Contributors who are responsible for driving the vision and managing the + organizational aspects of the project. (They may also be authors or owners of the project.)
    • +
    • Contributors: The user that has already contributed to the project.
    • +
    • Community Members: People who use the project. They might be active in conversations, + create new issues or express their opinion on the future project improvements.
    • +
    +

    Typically, roles are made public through either the README file, a Contributors file, or a + separate team page for the project.

    +


    +
    +
    +
    +

    Existing platforms and tools for Open Source Software

    +

    Virtual environments and machines are becoming increasingly popular as high-powered research workflow + enablers, and many of these are built upon OSS (e.g., operating systems, programming languages, and data + processing frameworks). Popular services include Google Cloud + and Amazon Web Services, which also assist with database storage and + content delivery, as well as computational power. InsideDNA is a computing + platform for reproducible research in bioinformatics, genomics and the life sciences.

    +

    As mentioned above, LibreOffice provides an Open Source alternative to Microsoft + Office. The two are almost completely compatible, just with different default file formats. For citation + managers, Zotero is the most popular Open Source alternative to + proprietary platforms such as Mendeley or EndNote.

    +

    Zotero uses the BibTeX (pronounced ‘bib-tech’) format, based on LaTeX + (pronounced ‘lay-tech’), and has browser plugins to make citation management simple. By integrating this with + other software such as LibreOffice, it is now possible to have a fully Open Source research workflow in many + cases.

    +
    +

    GitHub

    +
    +

    Did you know that this entire project was build as an open and collaborative community effort in GitHub?

    +
    +

    GitHub is a popular hosting site for both software and non-software + content (often called ‘notebooks’), with added capabilities for version control, project management and + tracking, and storage services. GitHub is built on top of the OSS Git, + which enables users to work remotely to maintain, share, and collaborate on research software and other + non-software based projects.

    +

    Version control is essentially a process that takes snapshots of the files in a repository, and tracks + modifications to them. It records when the changes were made, what they were, and who did them. If several + people are working on one file at once, any overlapping changes are detected, and must be resolved prior to + continuing. This provides a much more streamlined and automated process than manually saving and recording + changes as projects develop. It also avoids the inevitable lists of confusing named file versions…

    +

    + +

    +

    + GitHub helps us to avoid, er, sub-optimal file naming conventions (source: XKCD) +

    +


    +

    One of the more popular and useful functions of GitHub is the issue + tracker, which is used to organise OSS development. The above link takes you to the issue tracker for + the development of this module! If you think there is something here that can improved, or you want to + comment on, anyone can add or contribute to an issue there!

    +

    Other similar project hosting services include BitBucket, GitLab, and Launchpad. If the + recent acquisition of GitHub by Microsoft is a bit off-putting to you, these are great alternatives.

    +

    However, we also know that GitHub can have quite a high learning curve. Which is why the first practical + task for this MOOC will teach you how to set up your first GitHub project repository!

    +

    GO + TO TASK 1: Building your first GitHub repository

    +


    +
    +
    +
    +

    Open Source Software used in research

    +

    Especially in scientific research, Open Source Software usage and development has become practically the + norm. There’s a number of reasons for this beyond those that apply to the general acceptance of OSS by, for + example, consumers, industry, or government. Among these reasons are:

    +
      +
    • +

      Increasingly, algorithms implemented in analysis software form an integral part of the methods described + in scholarly publications. As such, it is completely at odds with rigorous peer review if these algorithm + implementations are closed to outsiders.

      +
    • +
    • +

      Scientific collaboration more often than not spans multiple institutions and distributed research + networks where secrecy and command hierarchy is not maintained in a way that is ‘necessary’ for closed + source development.

      +
    • +
    • +

      Many computational analyses are run in virtualized environments (such as institutional, national, or + international ‘cloud’ infrastructures) and hosted on multi-user servers. Closed-source, commercial + software often disallows such usage.

      +
    • +
    • +

      OSS development often relies on volunteers. In a time of budgetary constraints for scientific research, + this is a clear advantage.

      +
    • +
    +

    For these and other reasons, Open Source tools are very commonly used in scientific research. This includes + usage in fields where many researchers are amateur developers themselves and rely on tools such as R for statistical analysis and scripting, which, in the last decade, + has almost completely displaced commercial software for statistical analysis such as SPSS or JMP in a lot of + fields. In fields such as bioinformatics, that involve a lot of file handling of the outputs of DNA sequencing + platforms, general purpose scripting languages such as Python and + commonly used libraries built on top of it (such as biopython) have become + a vital part of the toolkit of many researchers.

    +

    + +

    +

    + Python +

    +


    +

    Tools such as R and Python are essentially software for writing software. Although programming is an + increasingly common activity among researchers, of course not every scientist does this. One step + away from programming is the chaining together of the inputs and outputs of various analysis tools in longer + workflows. As an example from genomics, a very common workflow is to start out with high-throughput sequencing + reads and then i) do basic quality control checks; ii) map the reads against a reference genome; iii) identify + the points where the new data are at variance with the reference. These steps are routinely executed as a + workflow where a different Open Source executable is run in a Linux command-line environment for each of the + three steps. Although this is arguably not quite open source software development, it does involve the usage + and production of open source artifacts (such as Linux shell scripts) for which the principles that we discuss + in this module are applicable.

    +

    + +

    +

    + R +

    +


    +

    Lastly, OSS is also used in scientific research for reasons that more closely mirror those that drive the + adoption of OSS in wider society, namely that it is cheap. For example, individuals or organizations might + decide to switch from Microsoft Office to LibreOffice for manuscript writing or spreadsheet processing because + the latter is free (both as in ‘free + beer’ and ‘free speech’). Likewise, the choice to switch from ArcGIS to QGIS for the analysis of geographic information might be prompted + simply by cost considerations.

    +
    +
    +

    Getting Started with OSS - FAQ

    +

    I’m using X[e.g. Matlab,STATA,Excel] and I want to transition to something more open. What are the + next steps?

    +

    Even if you are using proprietary software, you can usually still share your source code/documents etc. + The best first step is sharing whatever you can.

    +

    Great! I can put them in my new github repo.

    +

    If that’s enough for you for now great! If not for most pieces of proprietary software there are Open Source + equivalents. Have a go with one and see what you think.

    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    ClosedOpen
    MatlabPython, Julia
    STATA/SPSSR
    MS OfficeLibreOffice
    MathematicaJupyterLab
    Test out your new Pull Request + -PR- Skills …… by adding your own example here +
    +

    Cool! But if I make the switch will I be stuck: taking ages to learn a new tool/ without support + /with buggy software.

    +

    Good question! The answer is it depends. The best thing to do is find someone who’s made the switch before + and learn from their experience. Or just do a Google search! Some OSS is much better than their closed + counterparts, some aren’t, so it’s worth choosing carefully.

    +
    +
    +

    Making good software for re-use

    +

    The most likely person who might want to re-use your software in the future is…you! So while sharing is + always better than not sharing, you can make your own life, and that of others, much easier through + appropriate documentation. Documentation can include several things, such as including helpful comments and + annotations in the code that help to explain why a particular action was performed, rather than what it is + intended to achieve.

    +

    One of the most critical aspects of this is including an informative README file, that + accompanies almost every OSS project, and some times even more than one. It can be a good practice to include + one such file in every directory, that includes a list of files, a table of contents, and what the purpose of + the directory is. The README file is typically just plain text or markdown (again, such as all of + the ones for the MOOC!), and can include critical information for how to install and run software, previous + dependencies and requirements, as well as tutorials or examples.

    +
    +

    Did you know… The term README is some times playfully ascribed to the famous + scene in Lewis Carroll’s Alice’s Adventures In Wonderland in which Alice confronts magic munchies labeled + with “Eat Me”" and “Drink Me”. Potent.

    +
    +

    The purpose here is to provide sufficient information to maximise the re-use and reproducibility of the + computational environment, such that someone with no experience with the project can easily access and re-use + the software (Sandve + et al., 2013). By lowering the barriers to entry, you increase the chances of others being able to + re-use your work, which is one of the ultimate goals of OSS (Ince + et al., 2012).

    +

    An extension of this that can help to make things even easier for future re-use is ‘container’ technology. + Containers are like an ecosystem frozen in time, where the code, the data, any other dependencies, are all + perfectly preserved, packaged and saved in the present functioning versions. This means that anyone in the + future any one can come in and run the analyses again. As such, they are generally good for re-use, but this + can come at the sacrifice of modification or understanding by others, as often a lot of details can be hidden + within the source code and its dependencies. Common examples of container implementation in research include + Rocker + (a Docker container for the R language), Binder, and + Code Ocean.

    +

    Sustainable software is good software.

    +


    +
    +
    +

    10 simple rules for reproducible computational research

    +

    The 10 simple rules for making computational research more reproducible, based on Sandve + et al., (2013), are:

    +
      +
    1. For every result, keep track of how it was produced.
    2. +
    3. Avoid manual data manipulation steps.
    4. +
    5. Archive the exact versions of all external programs used.
    6. +
    7. Version control all custom scripts.
    8. +
    9. Record all intermediate results, when possible in standardised formats.
    10. +
    11. For analyses that include randomness, note underlying random seeds.
    12. +
    13. Always store raw data behind plots.
    14. +
    15. Generate hierarchical analysis output, allowing layers of increasing detail to be inspected.
    16. +
    17. Connect textual statements to underlying results.
    18. +
    19. Provide public access to scripts, runs, and results.
    20. +
    +

    + +

    +

    + Infographic adapted from Sandve et al., (2013). Feel free to download this and print it out to keep handy + during your research! +

    +


    +

    If you follow these steps, along with the processes in Task + 1 and Task + 2, you should be fine!

    +


    +
    +
    +

    Open Source licensing

    +

    An Open Source license is a type of license designed specifically for software and code that make it explicit + what the legal conditions for sharing and re-use are. As mentioned above, the addition + of a suitable license is what differentiates publicly shared software from OSS. For example, the widely used + MATLAB is proprietary software, and Octave is an openly licensed alternative programming + language.

    +

    There are currently more than 1,400 unique Open Source licenses, a complexity born from the difficulty in + understanding the differences between the legal implications across different license.

    +

    Some of the more common licenses include:

    + +

    You don’t need to know all the legal itty gritty behind all of these, but it is good to at least know what + options are avaiilable to you.

    +

    There are two ways in which contributions to a project become licensed:

    +
      +
    1. Explicitly, whereby the individual contribution has a clearly indicated license independent of + the main project; or
    2. +
    3. Implicitly, whereby the contribution falls under the original licensing code of the main project. +
    4. +
    +

    Thankfully, the process of selecting an Open Source license is relatively trivial, thanks to user-friendly + tools such as Choose A License. Each of these licenses allows other + users to use, copy, distribute, and build upon your work, often while ensuring that the creators are + appropriately recognised for their work. Here, the key is selecting an appropriate license for your work, + depending on what you want, or do not want, others to do with it.

    +


    +
    +
    +

    Software citation

    +

    Citations provide one of the most important interactions in scholarly research, forming the basis of our + referencing and metrics systems. Typically, this is performed thanks to the assistance of a permanent unique + identifier such as a Digital Object + Identifiers (DOI). A DOI is a persistent identifier, implemented in the Handle System, that meets a common standard, + depending on the purpose, such as for identifying academic information. Such identification is critical for + tracking the genealogy and provenance of research, for reproducibility, as well as for giving appropriate + credit to those who have created the software. Importantly, software should be considered a legitimate output + from scholarly research, and citation is becoming an increasingly common way to indicate that.

    +

    In 2016, Smith + et al., 2016 wrote a research paper about the principles of software citation as part of the FORCE11 + Software Citation Working Group. In the same way that you would want to cite software that you have used as + part of good research practices, it is important to make your research easily citable too. When citing any + software used for your own research, you should include at minimum:

    +
      +
    • The author name(s),
    • +
    • Software title,
    • +
    • Version number, and
    • +
    • The unique identifier/locator (DOI or URL).
    • +
    +

    The six principles of software citation by Smith + et al., (2016) are provided here:

    +
      +
    • +

      Importance: Software should be considered a legitimate and citable product of research. + Software citations should be accorded the same importance in the scholarly record as citations of other + research products, such as publications and data; they should be included in the metadata of the citing + work, for example in the reference list of a journal article, and should not be omitted or separated. + Software should be cited on the same basis as any other research product such as a paper or a book, that + is, authors should cite the appropriate set of software products just as they cite the appropriate set of + papers.

      +
    • +
    • +

      Credit and attribution: Software citations should facilitate giving scholarly credit and + normative, legal attribution to all contributors to the software, recognizing that a single style or + mechanism of attribution may not be applicable to all software.

      +
    • +
    • +

      Unique identification: A software citation should include a method for identification + that is machine actionable, globally unique, interoperable, and recognized by at least a community of the + corresponding domain experts, and preferably by general public researchers.

      +
    • +
    • +

      Persistence: Unique identifiers and metadata describing the software and its disposition + should persist - even beyond the lifespan of the software they describe.

      +
    • +
    • +

      Accessibility: Software citations should facilitate access to the software itself and to + its associated metadata, documentation, data, and other materials necessary for both humans and machines + to make informed use of the referenced software.

      +
    • +
    • +

      Specificity: Software citations should facilitate identification of, and access to, the + specific version of software that was used. Software identification should be as specific as necessary, + such as using version numbers, revision numbers, or variants such as platforms.

      +
    • +
    +

    Note: For instructions on ‘how to make your software citable’ see the section Using GitHub and Zenodo below and Task + 2: Linking GitHub and Zenodo.

    +


    +
    +
    +

    Using GitHub and Zenodo

    +

    GitHub is a popular tool for project management, content storage, and version control. + Note that GitHub itself is not OSS. However, Git, the tool which it is based on, is. Git is designed to help + manage the source code files, and the updates to them, for a software-related project. However, it can also be + extended to other non-software projects; for example, this MOOC!

    +

    However, getting research onto GitHub is just the first step. It is equally important to make it persistent + and re-usable, which is why having a Digital Object Identifier (DOI) associated with it can be useful. The + simplest way to do this is through a service called Zenodo, which is a free + and open source multi-disciplinary repository created by OpenAIRE and CERN, and can be used to assign a DOI to + individual GitHub repositories. There is a GitHub + Guide that explains the details, which involve linking GitHub repositories directly through to Zenodo so + that when developers create formal releases for their software, Zenodo creates and archives a that version of + the software.

    +

    There’s nothing special about using Zenodo for creating DOIs, other than its free of cost; + other general repositories can also be used, such as DataCite DOI + Fabrica, or your own institutional repositories such as Caltech’s. +

    +

    A lot of researchers might typically be afraid of sharing code which is incomplete, buggy, or imperfect. + However, in the OSS community, such a practice of sharing ‘raw’ code is fairly commonplace. Sharing code + openly enables others to re-use and improve it, as well as to engage in a deeper way with any research + associated with it. This is one of the fundamental aspects of peer-collaboration, perhaps best exemplified by + the traditional process of research manuscript peer review.

    +

    Task 2 will guide you through the process of linking a GitHub repository to Zenodo for archiving.

    +
    +

    Did you know… All content produced for this MOOC is available as part of a community in Zenodo?

    +
    +

    GO + TO TASK 2: Linking GitHub and Zenodo

    +


    +
    +
    +

    Collaborating and contributing through Open Source

    +

    Often, OSS is developed in a public, decentralised, collaborative manner between multiple contributors. The + purpose of this is to enhance the diversity and scope of a project and its design, in order to become more + beneficial and sustainable. Such an approach was famously likened to a ‘bazaar’ model by Eric Raymond, an + early OSS proponent. One of the major guiding principles of this is that of peer production, + which relies on self-organised communities to regulate the development of content, co-ordinated towards a + shared goal or outcome.

    +

    OSS projects rely heavily on volunteer collaboration, which often entails a constant flux of newcomers in + order to become productive and sustainable (Steinmacher + et al., 2014). Creating the right social atmosphere for a project, and a welcoming engagement + environment, are often critical to successful collaboraitons in OSS.

    +


    +
    +
    +

    Where to go from here

    +

    Hopefully now you have come to see the importance of software as a cornerstone of modern science, and the + importance that OSS plays in this.

    +

    The learning outcomes from this should be:

    +
      +
    1. +

      You will now be able to define the characteristics of OSS, and some of the ethical, legal, economic and + research impact arguments for and against it.

      +
    2. +
    3. +

      Based on community standards, you will now be able to describe the quality requirements of sharing and + re-using open code.

      +
    4. +
    5. +

      You will now be able to use a range of research tools that utilise OSS.

      +
    6. +
    7. +

      You will now be able to transform code designed for their personal use into code that is accessible and + re-usable by others.

      +
    8. +
    9. +

      Software developers will be able to make their software citable, and software users will know how to cite + the software they use.

      +
    10. +
    +


    +

    BONUS TASK

    +

    If you have completed Task + 1 and Task + 2, we also have a BONUS TASK for you, if you want to take your skills a step further. + Task + 3 will take you a step deeper into integrating Git into a typical research workflow by showing you how + to integrate it with R Studio. It is recommended that you have completed the first 2 tasks before proceeding + with this one.

    +

    However, your Open Source journey does not stop here! This was just the beginning, and there are some + incredible resources out there if you would like to do or learn more:

    +
      +
    • +

      If you feel particularly inspired by this, you can endorse the Science Code Manifesto, which is based on the five + principles of code, copyright, citation, credit, and curation.

      +
    • +
    • +

      To launch and develop your own project, the Open Source Guides + program offers a range of practical guides and skills to help launch and advance your OSS projects.

      +
    • +
    • +

      For a detailed look at OSS-based research workflows, the Open Science, Open Data, Open Source hand-guide by + Pedro L. Fernandes and Rutger A. Vos is one of the top resources online.

      +
    • +
    • +

      More formalised journal venues also exist for software-based articles, including The Journal of Open Research Software and The Journal of Open Source Software. A list of such venues is also available.

      +
    • +
    • +

      The PLOS Open Source Toolkit provides a + global forum for Open Source hardware and software research and applications.

      +
    • +
    • +

      The NumFOCUS is a nonprofit organization that supports and promotes + world-class, innovative, open source scientific software. Some of the projects they sponsor include:

      + +
    • +
    +


    +
    +

    Further reading

    +

    These references here are just the beginning. They include some of the most useful general overviews of + the Open Source landscape in research. However, if you want to be find something more specific to your own + research field, then that path is there for you to explore!

    + +


    +
    +
    +

    Development Team

    + +

    Know a way this content can be improved?

    +

    Time to take your new GitHub skills for a test-run! All content development primarily happens here. + If you have a suggested improvement to the content, layout, or anything else, you can make it and then it + will automatically become part of the MOOC content after verification from a moderator!

    +

    CC0 Public Domain Dedication

    +
    +
    +
    + + + + +
    + + + + + - + + \ No newline at end of file diff --git a/content_development/Task_1.html b/content_development/Task_1.html index 4116c52..3cbdfdc 100644 --- a/content_development/Task_1.html +++ b/content_development/Task_1.html @@ -4,637 +4,9144 @@ - - - + + + -Task_1.utf8.md + Task_1.utf8.md - - - - - + + + + - - + + + + + + + + - // Change the URL when tabs are clicked - $('a', context).on('click', function(e) { - history.pushState(null, null, this.href); - showStuffFromHash(context); - }); - return this; - }; -}(jQuery)); - -window.buildTabsets = function(tocID) { - - // build a tabset from a section div with the .tabset class - function buildTabset(tabset) { - - // check for fade and pills options - var fade = tabset.hasClass("tabset-fade"); - var pills = tabset.hasClass("tabset-pills"); - var navClass = pills ? "nav-pills" : "nav-tabs"; - - // determine the heading level of the tabset and tabs - var match = tabset.attr('class').match(/level(\d) /); - if (match === null) - return; - var tabsetLevel = Number(match[1]); - var tabLevel = tabsetLevel + 1; - - // find all subheadings immediately below - var tabs = tabset.find("div.section.level" + tabLevel); - if (!tabs.length) - return; - - // create tablist and tab-content elements - var tabList = $(''); - $(tabs[0]).before(tabList); - var tabContent = $('
    '); - $(tabs[0]).before(tabContent); - - // build the tabset - var activeTab = 0; - tabs.each(function(i) { - - // get the tab div - var tab = $(tabs[i]); - - // get the id then sanitize it for use with bootstrap tabs - var id = tab.attr('id'); - - // see if this is marked as the active tab - if (tab.hasClass('active')) - activeTab = i; - - // remove any table of contents entries associated with - // this ID (since we'll be removing the heading element) - $("div#" + tocID + " li a[href='#" + id + "']").parent().remove(); - - // sanitize the id for use with bootstrap tabs - id = id.replace(/[.\/?&!#<>]/g, '').replace(/\s/g, '_'); - tab.attr('id', id); - - // get the heading element within it, grab it's text, then remove it - var heading = tab.find('h' + tabLevel + ':first'); - var headingText = heading.html(); - heading.remove(); - - // build and append the tab list item - var a = $('' + headingText + ''); - a.attr('href', '#' + id); - a.attr('aria-controls', id); - var li = $('
  • '); - li.append(a); - tabList.append(li); - - // set it's attributes - tab.attr('role', 'tabpanel'); - tab.addClass('tab-pane'); - tab.addClass('tabbed-pane'); - if (fade) - tab.addClass('fade'); - - // move it into the tab content div - tab.detach().appendTo(tabContent); - }); - // set active tab - $(tabList.children('li')[activeTab]).addClass('active'); - var active = $(tabContent.children('div.section')[activeTab]); - active.addClass('active'); - if (fade) - active.addClass('in'); - - if (tabset.hasClass("tabset-sticky")) - tabset.rmarkdownStickyTabs(); - } - - // convert section divs with the .tabset class to tabsets - var tabsets = $("div.section.tabset"); - tabsets.each(function(i) { - buildTabset($(tabsets[i])); - }); -}; - - - - - - - - - - - - + - - - - -
    - - - - - - - - - - - - - - -
    -

    Task 1: How to set up a repository on GitHub

    -

    This task is designed for students and researchers who want to create their first Open Source project (software or non-software) on GitHub. GitHub is a place for you to come and play and experiment with new research workflows, and is really just the beginning to help set the stage for your own pathways and ideas.

    -

    Don’t forget you can join in the discussions over at our open Slack channel. Please do introduce yourself at #module5opensource, and tell us a bit about who you are, your background, and how you ended up here!

    -

    PLEASE NOTE that a screen recording for this task is also available via YouTube.

    -

    Estimated time to complete: 30-45 minutes.

    -

    Estimate time saving once complete: Unimaginable..

    - -
    -

    Getting started

    -

    A ‘repository’ is really just a fancy name for a project on GitHub. GitHub is a place online where you can manage projects, store files, and openly collaborate with others. This is all achieved by using version control to track projects as they progress. As such, GitHub is a powerful tool for both software and non-software projects.

    -

    One of the most important things to consider at this early stage is to think about how you want the wider community to interact with your project. As you are working in the open, you want to make sure others feel comfortable in accessing, viewing, and engaging with your work. Setting up a repository in a way that lowers the barriers to entry, and the fear of being an ‘outsider’ is the first step towards maintaining a successful project.

    -

    - -

    -

    -Octocat, GitHub’s little mascot -

    -


    -
    -

    Setting up a GitHub profile

    -

    To set up a GitHub profile, simply head to the main page and click Sign Up for GitHub. Here, you can create your personal account, with a username, email, and password as standard.

    -

    - -

    -

    -Sign up for GitHub -

    -


    -

    The next step is to set up a personal plan. For now, simply select the ‘Unlimited public repositories for free’ plan, unless you are concerned about privacy, in which case select the private plan. If you intend to set up a project for an organisation, you can select that option too.

    -


    -
    -
    -

    The GitHub language

    -

    This is possibly the most confusing and off-putting aspect of GitHub. Here are some of the most commonly used terms and their definitions:

    -
      -
    • Initialise: Create an empty repository.
    • -
    • Checkout: Create a working copy of a local repository.
    • -
    • Clone: Copy the repository into a local directory on your computer.
    • -
    • Fork: Create a personal offshoot of a repository to work on it in parallel.
    • -
    • Branch: An independent and parallel version of a repository. Changes do not affect the master branch.
    • -
    • Master: The main and default branch for a repository.
    • -
    • Clean: No commits pending on the branch.
    • -
    • Stage: Add updates ready to be committed to a branch.
    • -
    • Commit: A revision to a repository, like a versioned ‘save’ function.
    • -
    • Commit message: A description of changes accompanying a commit.
    • -
    • Check: A status check.
    • -
    • Fetch: Nothing to do with dogs. Refers to getting the latest changes from an online repository without merging them.
    • -
    • Index: The ‘tree’ which acts as a staging area.
    • -
    • Working Directory: The ‘tree’ where the files are kept.
    • -
    • Head: The ‘tree’ which indicates the last commits made.
    • -
    • Push: Add committed changes to the head of your remote repository.
    • -
    • Merge: Combining the changes made in one branch back with the master branch upon completion.
    • -
    • Pull: Update your repository by fetching and merging the newest commits.
    • -
    • Pull request: A request to merge an updated branch into the master branch.
    • -
    • Issue: Suggested improvements, tasks, or questions related to a repository.
    • -
    -

    Whew! Don’t worry about memorising all of these for now. Like any new skill, familiarity comes with experience.

    -

    You can probably see how some of these are fairly similar to things like save, copy, paste - standard workflow operations, but adapted for a software management process. There are a few more too, but this should do for getting started.

    -

    If you are interested, most of these terms come from the underlying Git system. Git was built to allow developers to manage different versions of source code in a distributed manner, which is great. It has lots of features and the ability to do lots of complex stuff, written by a very clever guy. However, the user interface was not designed with new users in mind, so it can be hard to learn.

    -

    - -

    -

    -Unbeatable guide to using Git. (Source: XKCD) -

    -


    -
    -
    -

    Creating a new repository

    -

    On your GitHub profile, click the ‘Create new repository’. The first step is to create a name as the brand for your project. Ideally, it should be memorable and give some indication of what the project does.

    -

    - -

    -

    -Create a new repository -

    -


    -

    Make sure not to duplicate names, infringe upon other trademarks, or name it anything that could be considered to be offensive.

    -


    -
    -
    -
    -

    The foundational steps

    -

    Any GitHub repository requires 4 key elements to get started and to begin developing a welcoming community:

    -
      -
    1. An Open Source license;
    2. -
    3. A README file;
    4. -
    5. Contributing guidelines; and
    6. -
    7. A Code of Conduct.
    8. -
    -

    These are critical aspects and best practices of any project for users to understand their legal rights, their expectations, the purpose of the project, and to improve the overall user experience.

    -

    All four of these files should be kept in the root directory for your project repository. It is convention to use markdown file formats (.md) for most of these files (though the license file is most often plain text (.txt)), and capitalise all file names. Instead of spaces in file names, make sure to use underscores _ .

    -

    So you should end up with a foundational file selection like this:

    -
      -
    1. LICENSE.md
    2. -
    3. README.md
    4. -
    5. CONTRIBUTING.md
    6. -
    7. CODE_OF_CONDUCT.md
    8. -
    -

    - -

    -

    -The basic repository structure -

    -


    -
    -

    Choosing a license

    -

    Choosing an appropriate license is what will differentiate your Open Source repository from publicly available software. While you are not obliged to choose a license, doing so guarantees that others will be able to modify, share, re-use, and build upon your project within a legal framework.

    -

    To start with, you want to check Choose A License to find a license that best suits your intentions for the repository.

    -

    The three primary ones to choose from are:

    -
      -
    • MIT License: A permissive license that lets people do whatever they want with your code as long as they provide appropriate attribution to you, and do not hold you liable.
    • -
    • Apache License 2.0: Similar permissions to the MIT License, but also provides an express grant of patent rights from contributors to users.
    • -
    • GNU General Public License (GPL) v3: A copyleft license that requires anyone who redistributes your code, or a derivative work, to make the source available under the same terms as the original license; also provides an express grant of patent rights from contributors to users.
    • -
    -

    Thankfully, when you start a new repository on GitHub, you are given the option to select an existing license from a drop-down menu. You should always (with very few exceptions) use an existing license, since this is what potential users and contributors will see before they choose to use or contribute to your software.

    -

    - -

    -

    -Choosing an example license -

    -


    -

    If they don’t have one you want, you can add one you like manually. To do this, simply click ‘Create new file’ in the repository, and copy and paste an existing license text in. Name the file something like LICENSE.txt or LICENSE.md to make it clear, and keep it in the main repository folder (i.e., the root). Make sure to add a clean commit message, and you’re done!

    -
    -

    Helping hand: This MOOC uses a different combination of licenses for code content and non-code content. Here you can find an example of the MIT License that we apply for all code and software generated as part of the MOOC production.

    -
    -


    -
    -
    -

    Creating a README file

    -

    When you initialise your new repository, there should be an option to do so with a README file. Just like Alice in Wonderland, these do exactly what they say - provide key information about the project. These are typically the first thing outside contributors will see when they come to your repository, so making them informative and welcoming is key.

    -

    - -

    -

    -Part of the README file for this module -

    -


    -

    The file will originally be in markdown (.md) format. This is a lightweight markup language with a plain text format. To learn some basic markdown, see this cheatsheet. But for now, we can just use plain text.

    -

    There are several things you will want to include in your README file:

    -
      -
    • What is this project about and what does it do.
    • -
    • Why should people care, and why is it useful.
    • -
    • How can someone get started contributing to the project.
    • -
    • Who can be contacted in case someone needs help.
    • -
    • A link to the license, contributing guidelines, and code of conduct.
    • -
    • A description of the project structure.
    • -
    • Who is involved, and what are their roles.
    • -
    • The current status of the project.
    • -
    -
    -

    Pro-tip: Later on as your project develops, you might want to add FAQs based on community feedback, or a tutorial to help users understand how your project works.

    -
    -

    Remember that not everyone coming to your project will be an expert, or understand what it is you are doing and why. Having a well-documented README file will enhance the user experience for people with a range of prior knowledge.

    -

    When the README file is included in the root directory, GitHub will automatically display this on the homepage of your repository. This means it is the first thing people will often see, so make it count!

    -
    -

    Helping hand: Here, you can find the README file used for this MOOC module. This includes information on the status, rationale, learning outcomes, development team, key documents, and license to help. You can copy and adapt this structure for your own projects as needed.

    -
    -


    -
    -
    -

    Creating contributing guidelines

    -

    Contributing guidelines are designed to communicate to potential contributors a short guide on how to engage with your project and community. You want to make sure to be welcoming, and indicate that you are eager for participants to engage with your project. Whenever a participant opens a new pull request or creates a new issue, they will see a link to your contribution file.

    -

    - -

    -

    -Part of the CONTRIBUTING guidelines for this module -

    -


    -

    Sticking with the all caps file names, the next step is to create a CONTRIBUTING file. Click ‘Create new file’, and make sure to save it in markdown format as before. This file will tell other users how they can engage with and participate in your project. This is the first step towards establishing a community around your project, so make it engaging, concise, and informative.

    -

    The CONTRIBUTING file should include information on:

    -
      -
    • What sort of contributions you are looking for.
    • -
    • How to suggest updates or new features.
    • -
    • How to interact with the project using GitHub’s functions (e.g., the pull request protocol).
    • -
    • How to file a bug report or create an issue.
    • -
    • The ultimate goal, vision, or roadmap for the project.
    • -
    • How to contact those in charge of the project.
    • -
    • Links to any external documentation or websites.
    • -
    -
    -

    Pro-tip: Consider starting off with a short thank you note for people taking the time to consider contributing - they have clicked on the file to learn more after all! If there are other methods of recognition that you have in mind, make sure to include them in here too.

    -
    -

    Here, you are essentially trying to encourage people to volunteer their time to advance your project. Make sure to be welcoming and friendly, and be precise about how people can engage. When writing this, make sure to think about it from the user perspective - how can you make their life easier when submitting pull requests and opening issues to make the whole project run more smoothly.

    -
    -

    Helping hand: The Contributing guidelines for this MOOC module include some very specific things: an introduction to using Git and GitHub, tips for getting started, contact information, how to alter the content and repor issues, a link to the README file, and information on the preferred content and code styles. Feel free to copy and adapt this for your own project as needed.

    -
    -


    -
    -
    -

    Creating a Code of Conduct

    -

    A code of conduct is important for setting the ground rules for expected behaviour and participation for project contributors, and is an easily referenced document for showing that your project team takes constructive dialogues seriously. Therefore, it is a critical element for creating and maintaining a healthy community that engages in a constructive and productive manner within a positive social atmosphere.

    -

    A code of conduct not only provides expectations of behaviour, but also describes who those expectations apply to, when they apply, what to do should a violation of the code occur, and what the action items for this will be. As such, points of contact need to be made clear in the code of conduct. Typically, this should be in a private way such as an email address.

    -
    -

    Pro-tip: In case a violation needs to be reported about the person who receives those reports, make sure to include an option to contact a secondary party.

    -
    -

    To add a code of conduct, you can create your own from scratch by adding a new markdown file, or use existing templates such as the Contributor Covenant. Name your file CODE_OF_CONDUCT.md, and make sure it is visible in the README file.

    -
    -

    Helping hand: This MOOC also has a Code of Conduct based on the Contributor Covenant. As you can see, it includes information on expected standards of behaviour, responsibilities of those in the community, and enforcement of the CoC including contact details. Feel free again to re-use and adapt this to your project as you see fit.

    -
    -

    - -

    -

    -Part of the CODE OF CONDUCT file for this module, based on the Contributor Covenant -

    -


    -

    Making sure to enforce the code of conduct is important, as it shows that not only do you value the code, but you respect the influence that it has on your community. It is important to treat each member of the community with the respect, courtesy, and importance that they deserve. Should a violation occur, or a repeat offender makes consistent violations, it is best to refer to the Open Source Guide to see how to enforce the code of conduct.

    -


    -
    -
    -

    Making your code citable

    -

    If you want to make your code citable from the start, you should store the metadata needed for a citation from the start, by creating a [codemeta.json](https://codemeta.github.io) file or a [CITATION.cff](https://citation-file-format.github.io) file. Both will allow tooling that is currently being developed to automatically create citation information, rather than asking you to type it in a form later.

    -

    If you’re interested, cite.research-software.org provides further background information about software citation in academia.

    -


    -
    -
    -
    -

    Keeping your issues up to date

    -

    Issues are not necessarily problems with a project, but also suggestions for improvement, things to develop in the future, and comments and feedback about the project to work through. They can be openly shared and discussed with contributors as needed, sort of like a forum.

    -

    If you are a project lead, it is important to maintain a list of issues that make it clear to contributors what aspects of the project need attention. It is also important to engage with as many issues as possible from others in a positive manner, to show that you take their contributions seriously.

    -

    Key elements for issues include:

    -
      -
    • An informative title and description;
    • -
    • Coloud-coded labels/tags to help categorise and filter;
    • -
    • Milestones to associate issues with specific features or project phases;
    • -
    • Assignees indicate who is responsible for working on an issue;
    • -
    • Comments for providing feedback.
    • -
    -

    - -

    -

    -The issue tracker for the Open Scholarship Strategy project -

    -


    -

    Within issues it is possible to use @ mentions to notify other contirbutors about the issue, and to get the right people engaged in an effective manner. GitHub has an internal system of notifications, just like Facebook or Twitter, and can also send emails to people who are mentioned in the issue tracker. This can all be customised for individuals within the user settings.

    -


    -
    -
    -

    Checklist for launching your project

    -

    So now you are ready to launch your project, begin advertising it, and getting contributions! Before continuing, make sure that you have:

    -
      -
    • [ ] Project has a memorable and informative name
    • -
    • [ ] Project has a LICENSE file that is an exact copy of an Open Source license
    • -
    • [ ] Complete documentation including a README, CONTRIBUTING, and CODE_OF_CONDUCT files
    • -
    • [ ] Project has at least one clearly labelled issue
    • -
    • [ ] Any code included at this stage is clearly structured and annotated
    • -
    -

    CONGRATULATIONS!

    -

    You have now launched an Open Source research project! Hopefully, from here on out, your work will act to benefit the wider community, forge new collaborations, and create new and fantastic opportunities for you all. Try and think about ways in which these skills can be applied to future projects, and how they might also have helped with some in the past.

    -

    From now on, it is all up to you! Some advice is to:

    -
      -
    • Write clean code;
    • -
    • Have a well-structured project;
    • -
    • Make frequent commits with clean messages;
    • -
    • Keep projects well-documented;
    • -
    • Have clear contributing guidelines;
    • -
    • Make use of the description and tag functions;
    • -
    • Don’t fork someone else’s repository unless you intend to work on them;
    • -
    • Make sure to contribute to other people’s projects too.
    • -
    -

    Know a way this content can be improved?

    -

    Time to take your new GitHub skills for a test-run! All content development primarily happens here. If you have a suggested improvement to the content, layout, or anything else, you can make it and then it will automatically become part of the MOOC content after verification from a moderator!

    -

    CC0 Public Domain Dedication

    -
    -
    - - - - -
    - - - - - + + + + +
    + + + + + + + + + + + + + + +
    +

    Task 1: How to set up a repository on GitHub

    +

    This task is designed for students and researchers who want to create their first Open Source project (software + or non-software) on GitHub. GitHub is a place for you to come and play and experiment with new research + workflows, and is really just the beginning to help set the stage for your own pathways and ideas.

    +

    Don’t forget you can join in the discussions over at our open Slack channel. Please do introduce + yourself at #module5opensource, and tell us a bit about who you are, your background, and how you ended up here! +

    +

    PLEASE NOTE that a screen recording for this task is also available via YouTube.

    +

    Estimated time to complete: 30-45 minutes.

    +

    Estimate time saving once complete: Unimaginable..

    + +
    +

    Getting started

    +

    A ‘repository’ is really just a fancy name for a project on GitHub. GitHub is a place online where you can + manage projects, store files, and openly collaborate with others. This is all achieved by using version + control to track projects as they progress. As such, GitHub is a powerful tool for both software and + non-software projects.

    +

    One of the most important things to consider at this early stage is to think about how you want the wider + community to interact with your project. As you are working in the open, you want to make sure others feel + comfortable in accessing, viewing, and engaging with your work. Setting up a repository in a way that lowers + the barriers to entry, and the fear of being an ‘outsider’ is the first step towards maintaining a successful + project.

    +

    + +

    +

    + Octocat, GitHub’s little mascot +

    +


    +
    +

    Setting up a GitHub profile

    +

    To set up a GitHub profile, simply head to the main page and click Sign + Up for GitHub. Here, you can create your personal account, with a username, email, and password as + standard.

    +

    + +

    +

    + Sign up for GitHub +

    +


    +

    The next step is to set up a personal plan. For now, simply select the ‘Unlimited public repositories for + free’ plan, unless you are concerned about privacy, in which case select the private plan. If you intend to + set up a project for an organisation, you can select that option too.

    +


    +
    +
    +

    The GitHub language

    +

    This is possibly the most confusing and off-putting aspect of GitHub. Here are some of the most commonly + used terms and their definitions:

    +
      +
    • Initialise: Create an empty repository.
    • +
    • Checkout: Create a working copy of a local repository.
    • +
    • Clone: Copy the repository into a local directory on your computer.
    • +
    • Fork: Create a personal offshoot of a repository to work on it in parallel.
    • +
    • Branch: An independent and parallel version of a repository. Changes do not affect the + master branch.
    • +
    • Master: The main and default branch for a repository.
    • +
    • Clean: No commits pending on the branch.
    • +
    • Stage: Add updates ready to be committed to a branch.
    • +
    • Commit: A revision to a repository, like a versioned ‘save’ function.
    • +
    • Commit message: A description of changes accompanying a commit.
    • +
    • Check: A status check.
    • +
    • Fetch: Nothing to do with dogs. Refers to getting the latest changes from an online + repository without merging them.
    • +
    • Index: The ‘tree’ which acts as a staging area.
    • +
    • Working Directory: The ‘tree’ where the files are kept.
    • +
    • Head: The ‘tree’ which indicates the last commits made.
    • +
    • Push: Add committed changes to the head of your remote repository.
    • +
    • Merge: Combining the changes made in one branch back with the master branch upon + completion.
    • +
    • Pull: Update your repository by fetching and merging the newest commits.
    • +
    • Pull request: A request to merge an updated branch into the master branch.
    • +
    • Issue: Suggested improvements, tasks, or questions related to a repository.
    • +
    +

    Whew! Don’t worry about memorising all of these for now. Like any new skill, familiarity comes + with experience.

    +

    You can probably see how some of these are fairly similar to things like save, copy, paste - standard + workflow operations, but adapted for a software management process. There are a few more too, but + this should do for getting started.

    +

    If you are interested, most of these terms come from the underlying Git + system. Git was built to allow developers to manage different versions of source code in a distributed + manner, which is great. It has lots of features and the ability to do lots of complex stuff, written by a very + clever guy. However, the user interface was not designed with new + users in mind, so it can be hard to learn.

    +

    + +

    +

    + Unbeatable guide to using Git. (Source: XKCD) +

    +


    +
    +
    +

    Creating a new repository

    +

    On your GitHub profile, click the ‘Create new repository’. The first step is to create a name as the brand + for your project. Ideally, it should be memorable and give some indication of what the project does.

    +

    + +

    +

    + Create a new repository +

    +


    +

    Make sure not to duplicate names, infringe upon other trademarks, or name it anything that could be + considered to be offensive.

    +


    +
    +
    +
    +

    The foundational steps

    +

    Any GitHub repository requires 4 key elements to get started and to begin developing a welcoming community: +

    +
      +
    1. An Open Source license;
    2. +
    3. A README file;
    4. +
    5. Contributing guidelines; and
    6. +
    7. A Code of Conduct.
    8. +
    +

    These are critical aspects and best practices of any project for users to understand their legal rights, + their expectations, the purpose of the project, and to improve the overall user experience.

    +

    All four of these files should be kept in the root directory for your project repository. It is convention to + use markdown file formats (.md) for most of these files (though the license file is most often + plain text (.txt)), and capitalise all file names. Instead of spaces in file names, make sure to + use underscores _ .

    +

    So you should end up with a foundational file selection like this:

    +
      +
    1. LICENSE.md
    2. +
    3. README.md
    4. +
    5. CONTRIBUTING.md
    6. +
    7. CODE_OF_CONDUCT.md
    8. +
    +

    + +

    +

    + The basic repository structure +

    +


    +
    +

    Choosing a license

    +

    Choosing an appropriate license is what will differentiate your Open Source repository from publicly + available software. While you are not obliged to choose a license, doing so guarantees that others will be + able to modify, share, re-use, and build upon your project within a legal framework.

    +

    To start with, you want to check Choose A License to find a + license that best suits your intentions for the repository.

    +

    The three primary ones to choose from are:

    +
      +
    • MIT License: A permissive license that lets people do whatever they want with your code + as long as they provide appropriate attribution to you, and do not hold you liable.
    • +
    • Apache License 2.0: Similar permissions to the MIT License, but also provides an + express grant of patent rights from contributors to users.
    • +
    • GNU General Public License (GPL) v3: A copyleft license that requires anyone who + redistributes your code, or a derivative work, to make the source available under the same terms as the + original license; also provides an express grant of patent rights from contributors to users.
    • +
    +

    Thankfully, when you start a new repository on GitHub, you are given the option to select an existing + license from a drop-down menu. You should always (with very few exceptions) use an existing license, since + this is what potential users and contributors will see before they choose to use or contribute to your + software.

    +

    + +

    +

    + Choosing an example license +

    +


    +

    If they don’t have one you want, you can add one you like manually. To do this, simply click ‘Create new + file’ in the repository, and copy and paste an existing license text in. Name the file something like + LICENSE.txt or LICENSE.md to make it clear, and keep it in the main repository + folder (i.e., the root). Make sure to add a clean commit message, and you’re done!

    +
    +

    Helping hand: This MOOC uses a different combination of licenses for code content and + non-code content. Here you can find an example of the MIT + License that we apply for all code and software generated as part of the MOOC production.

    +
    +


    +
    +
    +

    Creating a README file

    +

    When you initialise your new repository, there should be an option to do so with a README + file. Just like Alice in Wonderland, these do exactly what they say - provide key information about the + project. These are typically the first thing outside contributors will see when they come to your + repository, so making them informative and welcoming is key.

    +

    + +

    +

    + Part of the README file for this module +

    +


    +

    The file will originally be in markdown (.md) format. This is a lightweight markup language + with a plain text format. To learn some basic markdown, see this cheatsheet. But for now, + we can just use plain text.

    +

    There are several things you will want to include in your README file:

    +
      +
    • What is this project about and what does it do.
    • +
    • Why should people care, and why is it useful.
    • +
    • How can someone get started contributing to the project.
    • +
    • Who can be contacted in case someone needs help.
    • +
    • A link to the license, contributing guidelines, and code of conduct.
    • +
    • A description of the project structure.
    • +
    • Who is involved, and what are their roles.
    • +
    • The current status of the project.
    • +
    +
    +

    Pro-tip: Later on as your project develops, you might want to add FAQs based on + community feedback, or a tutorial to help users understand how your project works.

    +
    +

    Remember that not everyone coming to your project will be an expert, or understand what it is you are doing + and why. Having a well-documented README file will enhance the user experience for people with + a range of prior knowledge.

    +

    When the README file is included in the root directory, GitHub will automatically display this + on the homepage of your repository. This means it is the first thing people will often see, so make it + count!

    +
    +

    Helping hand: Here, you can find the README file used for this MOOC + module. This includes information on the status, rationale, learning outcomes, development team, key + documents, and license to help. You can copy and adapt this structure for your own projects as needed.

    +
    +


    +
    +
    +

    Creating contributing guidelines

    +

    Contributing guidelines are designed to communicate to potential contributors a short guide on how to + engage with your project and community. You want to make sure to be welcoming, and indicate that you are + eager for participants to engage with your project. Whenever a participant opens a new pull request or + creates a new issue, they will see a link to your contribution file.

    +

    + +

    +

    + Part of the CONTRIBUTING guidelines for this module +

    +


    +

    Sticking with the all caps file names, the next step is to create a CONTRIBUTING file. Click + ‘Create new file’, and make sure to save it in markdown format as before. This file will tell other users + how they can engage with and participate in your project. This is the first step towards establishing a + community around your project, so make it engaging, concise, and informative.

    +

    The CONTRIBUTING file should include information on:

    +
      +
    • What sort of contributions you are looking for.
    • +
    • How to suggest updates or new features.
    • +
    • How to interact with the project using GitHub’s functions (e.g., the pull request protocol).
    • +
    • How to file a bug report or create an issue.
    • +
    • The ultimate goal, vision, or roadmap for the project.
    • +
    • How to contact those in charge of the project.
    • +
    • Links to any external documentation or websites.
    • +
    +
    +

    Pro-tip: Consider starting off with a short thank you note for people taking the time to + consider contributing - they have clicked on the file to learn more after all! If there are other methods + of recognition that you have in mind, make sure to include them in here too.

    +
    +

    Here, you are essentially trying to encourage people to volunteer their time to advance your project. Make + sure to be welcoming and friendly, and be precise about how people can engage. When writing this, make sure + to think about it from the user perspective - how can you make their life easier when submitting pull + requests and opening issues to make the whole project run more smoothly.

    +
    +

    Helping hand: The Contributing + guidelines for this MOOC module include some very specific things: an introduction to using Git and + GitHub, tips for getting started, contact information, how to alter the content and repor issues, a link + to the README file, and information on the preferred content and code styles. Feel free to + copy and adapt this for your own project as needed.

    +
    +


    +
    +
    +

    Creating a Code of Conduct

    +

    A code of conduct is important for setting the ground rules for expected behaviour and participation for + project contributors, and is an easily referenced document for showing that your project team takes + constructive dialogues seriously. Therefore, it is a critical element for creating and maintaining a healthy + community that engages in a constructive and productive manner within a positive social atmosphere.

    +

    A code of conduct not only provides expectations of behaviour, but also describes who those expectations + apply to, when they apply, what to do should a violation of the code occur, and what the action items for + this will be. As such, points of contact need to be made clear in the code of conduct. Typically, this + should be in a private way such as an email address.

    +
    +

    Pro-tip: In case a violation needs to be reported about the person who receives those + reports, make sure to include an option to contact a secondary party.

    +
    +

    To add a code of conduct, you can create your own from scratch by adding a new markdown file, or use + existing templates such as the Contributor Covenant. Name + your file CODE_OF_CONDUCT.md, and make sure it is visible in the README file.

    +
    +

    Helping hand: This MOOC also has a Code + of Conduct based on the Contributor Covenant. As you can see, it includes information on expected + standards of behaviour, responsibilities of those in the community, and enforcement of the CoC including + contact details. Feel free again to re-use and adapt this to your project as you see fit.

    +
    +

    + +

    +

    + Part of the CODE OF CONDUCT file for this module, based on the Contributor Covenant +

    +


    +

    Making sure to enforce the code of conduct is important, as it shows that not only do you value the code, + but you respect the influence that it has on your community. It is important to treat each member of the + community with the respect, courtesy, and importance that they deserve. Should a violation occur, or a + repeat offender makes consistent violations, it is best to refer to the Open Source Guide to + see how to enforce the code of conduct.

    +


    +
    +
    +

    Making your code citable

    +

    If you want to make your code citable from the start, you should store the metadata needed for a citation + from the start, by creating a [codemeta.json](https://codemeta.github.io) file or a + [CITATION.cff](https://citation-file-format.github.io) file. Both will allow tooling that is + currently being developed to automatically create citation information, rather than asking you to type it in + a form later.

    +

    If you’re interested, cite.research-software.org provides + further background information about software citation in academia.

    +


    +
    +
    +
    +

    Keeping your issues up to date

    +

    Issues are not necessarily problems with a project, but also suggestions for improvement, things to develop + in the future, and comments and feedback about the project to work through. They can be openly shared and + discussed with contributors as needed, sort of like a forum.

    +

    If you are a project lead, it is important to maintain a list of issues that make it clear to contributors + what aspects of the project need attention. It is also important to engage with as many issues as possible + from others in a positive manner, to show that you take their contributions seriously.

    +

    Key elements for issues include:

    +
      +
    • An informative title and description;
    • +
    • Coloud-coded labels/tags to help categorise and filter;
    • +
    • Milestones to associate issues with specific features or project phases;
    • +
    • Assignees indicate who is responsible for working on an issue;
    • +
    • Comments for providing feedback.
    • +
    +

    + +

    +

    + The issue tracker for the Open Scholarship Strategy project +

    +


    +

    Within issues it is possible to use @ mentions to notify other contirbutors about the issue, and to get the + right people engaged in an effective manner. GitHub has an internal system of notifications, just like + Facebook or Twitter, and can also send emails to people who are mentioned in the issue tracker. This can all + be customised for individuals within the user settings.

    +


    +
    +
    +

    Checklist for launching your project

    +

    So now you are ready to launch your project, begin advertising it, and getting contributions! Before + continuing, make sure that you have:

    +
      +
    • [ ] Project has a memorable and informative name
    • +
    • [ ] Project has a LICENSE file that is an exact copy of an Open Source license
    • +
    • [ ] Complete documentation including a README, CONTRIBUTING, and + CODE_OF_CONDUCT files
    • +
    • [ ] Project has at least one clearly labelled issue
    • +
    • [ ] Any code included at this stage is clearly structured and annotated
    • +
    +

    CONGRATULATIONS!

    +

    You have now launched an Open Source research project! Hopefully, from here on out, your work will act to + benefit the wider community, forge new collaborations, and create new and fantastic opportunities for you all. + Try and think about ways in which these skills can be applied to future projects, and how they might also have + helped with some in the past.

    +

    From now on, it is all up to you! Some advice is to:

    +
      +
    • Write clean code;
    • +
    • Have a well-structured project;
    • +
    • Make frequent commits with clean messages;
    • +
    • Keep projects well-documented;
    • +
    • Have clear contributing guidelines;
    • +
    • Make use of the description and tag functions;
    • +
    • Don’t fork someone else’s repository unless you intend to work on them;
    • +
    • Make sure to contribute to other people’s projects too.
    • +
    +

    Know a way this content can be improved?

    +

    Time to take your new GitHub skills for a test-run! All content development primarily happens here. + If you have a suggested improvement to the content, layout, or anything else, you can make it and then it will + automatically become part of the MOOC content after verification from a moderator!

    +

    CC0 Public Domain Dedication

    +
    +
    + + + + +
    + + + + + - + + \ No newline at end of file diff --git a/content_development/Task_2.html b/content_development/Task_2.html index 723a66b..f8e5899 100644 --- a/content_development/Task_2.html +++ b/content_development/Task_2.html @@ -4,493 +4,8950 @@ - - - + + + -Task_2.utf8.md + Task_2.utf8.md - - - - - + + + + - - + + + + + + + + - // Change the URL when tabs are clicked - $('a', context).on('click', function(e) { - history.pushState(null, null, this.href); - showStuffFromHash(context); - }); - return this; - }; -}(jQuery)); - -window.buildTabsets = function(tocID) { - - // build a tabset from a section div with the .tabset class - function buildTabset(tabset) { - - // check for fade and pills options - var fade = tabset.hasClass("tabset-fade"); - var pills = tabset.hasClass("tabset-pills"); - var navClass = pills ? "nav-pills" : "nav-tabs"; - - // determine the heading level of the tabset and tabs - var match = tabset.attr('class').match(/level(\d) /); - if (match === null) - return; - var tabsetLevel = Number(match[1]); - var tabLevel = tabsetLevel + 1; - - // find all subheadings immediately below - var tabs = tabset.find("div.section.level" + tabLevel); - if (!tabs.length) - return; - - // create tablist and tab-content elements - var tabList = $(''); - $(tabs[0]).before(tabList); - var tabContent = $('
    '); - $(tabs[0]).before(tabContent); - - // build the tabset - var activeTab = 0; - tabs.each(function(i) { - - // get the tab div - var tab = $(tabs[i]); - - // get the id then sanitize it for use with bootstrap tabs - var id = tab.attr('id'); - - // see if this is marked as the active tab - if (tab.hasClass('active')) - activeTab = i; - - // remove any table of contents entries associated with - // this ID (since we'll be removing the heading element) - $("div#" + tocID + " li a[href='#" + id + "']").parent().remove(); - - // sanitize the id for use with bootstrap tabs - id = id.replace(/[.\/?&!#<>]/g, '').replace(/\s/g, '_'); - tab.attr('id', id); - - // get the heading element within it, grab it's text, then remove it - var heading = tab.find('h' + tabLevel + ':first'); - var headingText = heading.html(); - heading.remove(); - - // build and append the tab list item - var a = $('' + headingText + ''); - a.attr('href', '#' + id); - a.attr('aria-controls', id); - var li = $('
  • '); - li.append(a); - tabList.append(li); - - // set it's attributes - tab.attr('role', 'tabpanel'); - tab.addClass('tab-pane'); - tab.addClass('tabbed-pane'); - if (fade) - tab.addClass('fade'); - - // move it into the tab content div - tab.detach().appendTo(tabContent); - }); - // set active tab - $(tabList.children('li')[activeTab]).addClass('active'); - var active = $(tabContent.children('div.section')[activeTab]); - active.addClass('active'); - if (fade) - active.addClass('in'); - - if (tabset.hasClass("tabset-sticky")) - tabset.rmarkdownStickyTabs(); - } - - // convert section divs with the .tabset class to tabsets - var tabsets = $("div.section.tabset"); - tabsets.each(function(i) { - buildTabset($(tabsets[i])); - }); -}; - - - - - - - - - - - - + - - - - -
    - - - - - - - - - - - - - - -
    -

    Task 2: How to make your code citable using GitHub and Zenodo

    -

    This task is designed for students and researchers who want to create and re-use GitHub-based projects/repositories in the academic literature.

    -

    Don’t forget you can join in the discussions over at our open Slack channel. Please do introduce yourself at #module5opensource, and tell us a bit about who you are, your background, and how you ended up here!

    -

    Estimated time to complete: 45-60 minutes.

    -
    -

    Table of contents

    - -

    -Task 2 workflow -

    -

    -The workflow for Task 2. Keep this handy as you work through the task! -

    -


    ## Foreword

    -

    Although the integration of GitHub and Zenodo makes it really easier to work with these tools nowadays (January 2019), it is important to stress that there are alternatives to GitHub (Gitlab, Bitbucket,…) and alternatives to Zenodo (Other repositories might be more suited to your community, you might ask your colleagues). For instance, one can work with Gitlab and manually upload each new versions to your university repository, getting a DOI. The principles (working with a version control system online, and archiving major versions in a repository which provides a persistent unique identifier) can be applied in different workflow.

    -
    -
    -

    Set up a GitHub repository

    -
    -

    Pro-tip: Make sure to include a LICENSE and README file in your repository. This will indicate to people the purpose of the project, and how they can engage with it in the future.

    -
    -

    Find out how to set up a GitHub repository in this other guide Task 1: Building a GitHub repository which is also part of ‘Module 5: Open Research Software and Open Source’.

    -
    -
    -

    Choose your GitHub repository

    -

    Once on your GitHub project listings page at github.com head to the ‘Repositories’ tab. Select which repository you would like to archive, and open it up.

    -


    -
    -
    -

    Login to Zenodo

    -

    Now head over to zenodo.org. Zenodo is a platform where you can permanently archive your code and other project elements. Zenodo does this by assigning projects a Digital Object Identifier (DOI), which also helps to make the work more citable. This is different to GitHub, which acts as a place where the actual work on a project takes place, rather than long-term archiving of it. At GitHub, content can be modified, deleted, rewritten, and irreversibly changed, which makes it a bit concerning to be used for longer lasting referencing purposes. Zenodo offers more security and permanence for research outputs.

    -

    - -

    -

    -Sign up for Zenodo -

    -


    -

    If you already have a Zenodo account, this is easy. If not, follow the steps to create one — you can even login using your GitHub account or ORCID profile to make things simpler, as Zenodo has a built in integration for it. This might be easier than creating yet another research account and profile.

    -


    -
    -
    -

    Authorise GitHub to connect with Zenodo

    -

    On the Zenodo website authorise it to connect to your GitHub account in the ‘Using GitHub’ section. Here, Zenodo will redirect you to GitHub to ask for permissions to use ‘webhooks’ on your repositories. You want to authorise Zenodo here with the permissions it needs to form those links.

    -

    - -

    -

    -Authorize Zenodo to connect with GitHub -

    -


    -

    If you are trying to give Zenodo access to an organisational repository, you (or an administrator) will need to make sure that Zenodo is granted third party access permissions. GitHub will send an authorisation email that needs confirming. At this point, back in the settings of your repository on GitHub, you also need to make sure that the repository is set to ‘public’, not private.

    -


    -
    -
    -

    Select the repository to archive

    -

    If you have got this far, this means that Zenodo is now authorised to configure the repository webhooks that it needs to archive the repository and issue it a DOI. To do this, on the Zenodo website navigate to the GitHub repository listing page and simply click the ‘on’ button next to your repository.

    -

    - -

    -

    -Enable individual GitHub repositories to be preserved in Zenodo -

    -


    -
    -
    -

    Check repository settings

    -

    Now you have set up a new webhook between Zenodo and your repository. In GitHub, click on the settings for your repository, and the Webhooks tab on the left hand side menu. This should display the new Zenodo webhook configured to Zenodo. Note, it may take a little time for the webhook listing to show up.

    -

    - -

    -

    -Check that webhooks are enabled for your GitHub repository. Example here using the Open Scholarship Strategy -

    -


    -
    -
    -

    Create a new release

    -

    The first time you archive a repository is known as the ‘first release’. Each time you create a new version of that repository and archive it, you create a new release. This can be tracked in the ‘releases’ tab for your repository on GitHub (top center).

    -

    - -

    -

    -Check that the repository first release was successful. Example here using the Open Scholarship Strategy -

    -


    -

    For the first archived version of your repository, click ‘Create a new release’ back in Zenodo. Fill in the form and give some details as to what the release entails. For the first release, make sure to call it v1.0.0, as is standard practice.

    -

    - -

    -

    -Create a new release. Example here using the Open Scholarship Strategy, for which a first release already exists -

    -


    -

    Finally, click ‘publish release’, and your archive will be published and versioned on GitHub.

    -

    To view your release on Zenodo you need to visit the Upload tab. To finish the archiving a few more details are needed on Zenodo.

    -

    - -

    -

    -Check the new release has been uploaded. Example here shown using the Open Scholarship Strategy -

    -


    -
    -
    -

    Getting a DOI

    -

    This is sometimes referred to as DOI ‘minting’, and requires a couple of extra bits of information about the repository on Zenodo. On Zenodo click the Upload tab in the main menu, and your newly uploaded repository should be there. Scroll down the page and fill in the extra information as needed, required fields are marked with a red asterisk, and then click ‘Publish’.

    -

    Note: Only after this extra information has been added will your DOI become live. It may also take a short time for the DOI to become active. Example DOI shown below (for the Open Scholarship Strategy again).

    -
    -

    Pro-tip: Copy the URL for the DOI into the README file for your GitHub repo to make cross-linking even easier, as well as present a clear highlighted DOI badge for users to see and make use of your DOI. You only need to do this once with your first release DOI as it acts as a ‘concept DOI’ and is linked to all subsequent release DOIs.

    -
    -

    DOI

    -

    The GitHub/Zenodo integration will now assign a DOI to each version/release of a project repository. This enables users to refer to and cite specific versions of projects. Also, the list of authors for the citation is automatically determined by the GitHub user account names used by the repository - this means no-one gets left out. Author details can be edited later on Zenodo. DOIs used in Zenodo are registered through the DataCite service.

    -
    -

    Pro-tip: Create a ‘human-readable’ version of this citation in your project’s README file. This will be helpful to researchers who might not be familiar with using DOIs to create citations, and make it easier for others to cite your software and acknowledge your work. An example of this could be: Jon Tennant. (2018, July 30). Foundations for Open Scholarship Strategy Development: First formal release (Version 1.2). Zenodo. http://doi.org/10.5281/zenodo.1323437

    -
    -

    CONGRATULATIONS!!

    -

    Your GitHub repository is now archived in Zenodo, and with a DOI that can be versioned to reflect updates to the repository version through time. You should be able to see details of this on the GitHub Zenodo page for your repository. This also means that your archived projects can get picked up by other indexing services and search engines that use DOIs too.

    -

    Providing a long-term archive and a DOI for your work is required for others to be able to properly cite it, as this provides basic citation metadata. For Open Science, it is important to be able to cite the software that you use in your research, and this integrated workflow enables that to happen, in line with best practices for research citation. Furthermore, this practice is important in elevating the standard of software (and related projects) to that of the standard of other research outputs.

    -
    -

    Pro-tip: Is your research funded by an EU grant? Now you can directly connect your archived project to your grant by updating the grant section of the metadata on the project’s Zenodo record. This massively helps to increase its discoverability!

    -
    -


    -
    -
    -

    Checklist for citing your project

    -

    So now you have a sustainably archived GitHub repository in Zenodo that is ready to be re-used and cited! Before continuing, make sure that you have:

    -
      -
    • [ ] Linked your GitHub project to Zenodo. If you see a complete copy of your GitHub repository in Zenodo then things are working.
    • -
    • [ ] Zenodo and GitHub integrated setup works nicely. For example have all the author names, and correct project title come across to Zenodo. If not, or if authors just have nicknames you can edit these details in Zenodo.
    • -
    • [ ] Project has a first release, with a DOI. You should have a DOI displayed on your projects Zenodo page. This first DOI is called the ‘concept DOI’ and is the master DOI linking to all subsequent release DOIs. Copy this DOI link and embed it in your GitHub projects README page. You’re done!
    • -
    -
    -

    Additional resources

    -

    Making your code citable - GitHub Guides.

    -

    Know a way this content can be improved?

    -

    Time to take your new GitHub skills for a test-run! All content development primarily happens here. If you have a suggested improvement to the content, layout, or anything else, you can make it and then it will automatically become part of the MOOC content after verification from a moderator!

    -

    CC0 Public Domain Dedication

    -
    -
    -
    - - - - -
    - - - - - + + + + +
    + + + + + + + + + + + + + + +
    +

    Task 2: How to make your code citable using GitHub and Zenodo

    +

    This task is designed for students and researchers who want to create and re-use GitHub-based + projects/repositories in the academic literature.

    +

    Don’t forget you can join in the discussions over at our open Slack channel. Please do introduce + yourself at #module5opensource, and tell us a bit about who you are, your background, and how you ended up here! +

    +

    Estimated time to complete: 45-60 minutes.

    +
    +

    Table of contents

    + +

    + Task 2 workflow +

    +

    + The workflow for Task 2. Keep this handy as you work through the task! +

    +


    ## Foreword

    +

    Although the integration of GitHub and Zenodo makes it really easier to work with these tools nowadays + (January 2019), it is important to stress that there are alternatives to GitHub (Gitlab, Bitbucket,…) and + alternatives to Zenodo (Other repositories might be more suited to your community, you might ask your + colleagues). For instance, one can work with Gitlab and manually upload each new versions to your university + repository, getting a DOI. The principles (working with a version control system online, and archiving major + versions in a repository which provides a persistent unique identifier) can be applied in different workflow. +

    +
    +
    +

    Set up a GitHub repository

    +
    +

    Pro-tip: Make sure to include a LICENSE and README file in your repository. This will + indicate to people the purpose of the project, and how they can engage with it in the future.

    +
    +

    Find out how to set up a GitHub repository in this other guide Task + 1: Building a GitHub repository which is also part of ‘Module 5: Open Research Software and Open + Source’.

    +
    +
    +

    Choose your GitHub repository

    +

    Once on your GitHub project listings page at github.com head to the + ‘Repositories’ tab. Select which repository you would like to archive, and open it up.

    +


    +
    +
    +

    Login to Zenodo

    +

    Now head over to zenodo.org. Zenodo is a platform where you can permanently + archive your code and other project elements. Zenodo does this by assigning projects a Digital Object + Identifier (DOI), which also helps to make the work more citable. This is different to GitHub, + which acts as a place where the actual work on a project takes place, rather than long-term archiving of it. + At GitHub, content can be modified, deleted, rewritten, and irreversibly changed, which makes it a bit + concerning to be used for longer lasting referencing purposes. Zenodo offers more security and permanence for + research outputs.

    +

    + +

    +

    + Sign up for Zenodo +

    +


    +

    If you already have a Zenodo account, this is easy. If not, follow the steps to create one — you can even + login using your GitHub account or ORCID profile to make things simpler, as Zenodo has a built in integration + for it. This might be easier than creating yet another research account and profile.

    +


    +
    +
    +

    Authorise GitHub to connect with Zenodo

    +

    On the Zenodo website authorise it to connect to your GitHub account in the ‘Using GitHub’ section. Here, Zenodo will redirect you + to GitHub to ask for permissions to use ‘webhooks’ on + your repositories. You want to authorise Zenodo here with the permissions it needs to form those links.

    +

    + +

    +

    + Authorize Zenodo to connect with GitHub +

    +


    +

    If you are trying to give Zenodo access to an organisational repository, you (or an administrator) will need + to make sure that Zenodo is granted third party access permissions. GitHub will send an authorisation email + that needs confirming. At this point, back in the settings of your repository on GitHub, you also need to make + sure that the repository is set to ‘public’, not private.

    +


    +
    +
    +

    Select the repository to archive

    +

    If you have got this far, this means that Zenodo is now authorised to configure the repository webhooks that + it needs to archive the repository and issue it a DOI. To do this, on the Zenodo website navigate to the GitHub repository listing page and simply click the + ‘on’ button next to your repository.

    +

    + +

    +

    + Enable individual GitHub repositories to be preserved in Zenodo +

    +


    +
    +
    +

    Check repository settings

    +

    Now you have set up a new webhook between Zenodo and your repository. In GitHub, click on the settings for + your repository, and the Webhooks tab on the left hand side menu. This should display the new Zenodo webhook + configured to Zenodo. Note, it may take a little time for the webhook listing to show up.

    +

    + +

    +

    + Check that webhooks are enabled for your GitHub repository. Example here using the Open Scholarship + Strategy +

    +


    +
    +
    +

    Create a new release

    +

    The first time you archive a repository is known as the ‘first release’. Each time you create a new version + of that repository and archive it, you create a new release. This can be tracked in the ‘releases’ tab for + your repository on GitHub (top center).

    +

    + +

    +

    + Check that the repository first release was successful. Example here using the Open Scholarship + Strategy +

    +


    +

    For the first archived version of your repository, click ‘Create a new release’ back in Zenodo. Fill in the + form and give some details as to what the release entails. For the first release, make sure to call it v1.0.0, + as is standard practice.

    +

    + +

    +

    + Create a new release. Example here using the Open Scholarship Strategy, for which a first release already + exists +

    +


    +

    Finally, click ‘publish release’, and your archive will be published and versioned on GitHub.

    +

    To view your release on Zenodo you need to visit the Upload tab. To + finish the archiving a few more details are needed on Zenodo.

    +

    + +

    +

    + Check the new release has been uploaded. Example here shown using the Open Scholarship Strategy +

    +


    +
    +
    +

    Getting a DOI

    +

    This is sometimes referred to as DOI ‘minting’, and requires a couple of extra bits of information about the + repository on Zenodo. On Zenodo click the Upload tab in the main + menu, and your newly uploaded repository should be there. Scroll down the page and fill in the extra + information as needed, required fields are marked with a red asterisk, and then click ‘Publish’.

    +

    Note: Only after this extra information has been added will your DOI become live. It may + also take a short time for the DOI to become active. Example DOI shown below (for the Open Scholarship + Strategy again).

    +
    +

    Pro-tip: Copy the URL for the DOI into the README file for your GitHub repo to make + cross-linking even easier, as well as present a clear highlighted DOI badge for users to see and make use of + your DOI. You only need to do this once with your first release DOI as it acts as a ‘concept DOI’ and is + linked to all subsequent release DOIs.

    +
    +

    DOI

    +

    The GitHub/Zenodo integration will now assign a DOI to each version/release of a project repository. This + enables users to refer to and cite specific versions of projects. Also, the list of authors for the citation + is automatically determined by the GitHub user account names used by the repository - this means no-one gets + left out. Author details can be edited later on Zenodo. DOIs used in Zenodo are registered through the DataCite service.

    +
    +

    Pro-tip: Create a ‘human-readable’ version of this citation in your project’s README file. + This will be helpful to researchers who might not be familiar with using DOIs to create citations, and make + it easier for others to cite your software and acknowledge your work. An example of this could be: Jon + Tennant. (2018, July 30). Foundations for Open Scholarship Strategy Development: First formal release + (Version 1.2). Zenodo. http://doi.org/10.5281/zenodo.1323437

    +
    +

    CONGRATULATIONS!!

    +

    Your GitHub repository is now archived in Zenodo, and with a DOI that can be versioned to reflect updates to + the repository version through time. You should be able to see details of this on the GitHub Zenodo page for + your repository. This also means that your archived projects can get picked up by other indexing services and + search engines that use DOIs too.

    +

    Providing a long-term archive and a DOI for your work is required for others to be able to properly cite it, + as this provides basic citation metadata. For Open Science, it is important to be able to cite the software + that you use in your research, and this integrated workflow enables that to happen, in line with best + practices for research citation. Furthermore, this practice is important in elevating the standard of software + (and related projects) to that of the standard of other research outputs.

    +
    +

    Pro-tip: Is your research funded by an EU grant? Now you can directly connect your + archived project to your grant by updating the grant section of the metadata on the project’s Zenodo record. + This massively helps to increase its discoverability!

    +
    +


    +
    +
    +

    Checklist for citing your project

    +

    So now you have a sustainably archived GitHub repository in Zenodo that is ready to be re-used and cited! + Before continuing, make sure that you have:

    +
      +
    • [ ] Linked your GitHub project to Zenodo. If you see a complete copy of your GitHub repository in Zenodo + then things are working.
    • +
    • [ ] Zenodo and GitHub integrated setup works nicely. For example have all the author names, and correct + project title come across to Zenodo. If not, or if authors just have nicknames you can edit these details in + Zenodo.
    • +
    • [ ] Project has a first release, with a DOI. You should have a DOI displayed on your projects Zenodo page. + This first DOI is called the ‘concept DOI’ and is the master DOI linking to all subsequent release DOIs. + Copy this DOI link and embed it in your GitHub projects README page. You’re done!
    • +
    +
    +

    Additional resources

    +

    Making your code citable - GitHub Guides. +

    +

    Know a way this content can be improved?

    +

    Time to take your new GitHub skills for a test-run! All content development primarily happens here. + If you have a suggested improvement to the content, layout, or anything else, you can make it and then it + will automatically become part of the MOOC content after verification from a moderator!

    +

    CC0 Public Domain Dedication

    +
    +
    +
    + + + + +
    + + + + + - + + \ No newline at end of file diff --git a/content_development/Task_3.html b/content_development/Task_3.html index 9282ff9..3a87a87 100644 --- a/content_development/Task_3.html +++ b/content_development/Task_3.html @@ -4,574 +4,9125 @@ - - - + + + -Task_3.utf8.md + Task_3.utf8.md - - - - - + + + + - - + + + + + + + + - // Change the URL when tabs are clicked - $('a', context).on('click', function(e) { - history.pushState(null, null, this.href); - showStuffFromHash(context); - }); - return this; - }; -}(jQuery)); - -window.buildTabsets = function(tocID) { - - // build a tabset from a section div with the .tabset class - function buildTabset(tabset) { - - // check for fade and pills options - var fade = tabset.hasClass("tabset-fade"); - var pills = tabset.hasClass("tabset-pills"); - var navClass = pills ? "nav-pills" : "nav-tabs"; - - // determine the heading level of the tabset and tabs - var match = tabset.attr('class').match(/level(\d) /); - if (match === null) - return; - var tabsetLevel = Number(match[1]); - var tabLevel = tabsetLevel + 1; - - // find all subheadings immediately below - var tabs = tabset.find("div.section.level" + tabLevel); - if (!tabs.length) - return; - - // create tablist and tab-content elements - var tabList = $(''); - $(tabs[0]).before(tabList); - var tabContent = $('
    '); - $(tabs[0]).before(tabContent); - - // build the tabset - var activeTab = 0; - tabs.each(function(i) { - - // get the tab div - var tab = $(tabs[i]); - - // get the id then sanitize it for use with bootstrap tabs - var id = tab.attr('id'); - - // see if this is marked as the active tab - if (tab.hasClass('active')) - activeTab = i; - - // remove any table of contents entries associated with - // this ID (since we'll be removing the heading element) - $("div#" + tocID + " li a[href='#" + id + "']").parent().remove(); - - // sanitize the id for use with bootstrap tabs - id = id.replace(/[.\/?&!#<>]/g, '').replace(/\s/g, '_'); - tab.attr('id', id); - - // get the heading element within it, grab it's text, then remove it - var heading = tab.find('h' + tabLevel + ':first'); - var headingText = heading.html(); - heading.remove(); - - // build and append the tab list item - var a = $('' + headingText + ''); - a.attr('href', '#' + id); - a.attr('aria-controls', id); - var li = $('
  • '); - li.append(a); - tabList.append(li); - - // set it's attributes - tab.attr('role', 'tabpanel'); - tab.addClass('tab-pane'); - tab.addClass('tabbed-pane'); - if (fade) - tab.addClass('fade'); - - // move it into the tab content div - tab.detach().appendTo(tabContent); - }); - // set active tab - $(tabList.children('li')[activeTab]).addClass('active'); - var active = $(tabContent.children('div.section')[activeTab]); - active.addClass('active'); - if (fade) - active.addClass('in'); - - if (tabset.hasClass("tabset-sticky")) - tabset.rmarkdownStickyTabs(); - } - - // convert section divs with the .tabset class to tabsets - var tabsets = $("div.section.tabset"); - tabsets.each(function(i) { - buildTabset($(tabsets[i])); - }); -}; - - - - - - - - - - - - + - - - - -
    - - - - - - - - - - - - - - -
    -

    Task 3: How to integrate Git with R Studio

    -

    This task is designed for students and researchers who want to implement a system of version control within a standard R-based workflow. This can be applied to a range of software development, data analysis and project management tasks. Your future research self will thank your for the convenience.

    -

    Don’t forget you can join in the discussions over at our open Slack channel. Please do introduce yourself at #module5opensource, and tell us a bit about who you are, your background, and how you ended up here!

    -

    Estimated time to complete: 30 minutes

    -

    Estimate time saving once complete: Virtually infinite

    -

    NOTE A video guide version of this task is now available on YouTube.

    - -
    -

    Getting started

    -

    Congratulations on making it this far! If you’re reading this, you’ve survived pull requests, web-hooks, and can probably even tell us know what the F in FOSS stands for (not Frustration…) Hopefully, you have overcome any scepticism or reluctance towards the benefits of GitHub and Open Source Software, and are ready to take the next step.

    -

    Before starting this Task, please make sure you have already completed Task 1 and Task 2, so that you are more familiar with GitHub and some standard Open Source practices.

    -

    This task will teach you how to integrate the version control software, Git, with the popular coding environment, RStudio. And yes, it is Git as in gif or God, not Jit as in the wrong way of pronouncing things.

    -

    If you are one of those researchers who thinks that having code spread across multiple hard-drives that are waiting to break, Dropbox, Google Drive, or any other non-specialist software, this task is just for you. If you have ever experienced the mind-numbing process of having multiple ‘final’ versions of a paper bouncing between different co-authors, this is also for you.

    -

    All of us are guilty of these sorts of things once in a while, but there are ways to do it that are better for you, future you, and those who might benefit from your work.

    -


    -
    -

    Getting Git

    -

    So, what is Git, and how is it different to GitHub? Git is a version control system, which enables you to save and track time-stamped copies of your work throughout the development process. It also works with non-code items too, like this MOOC, the majority of which was written in markdown in RStudio, and integrated with a Git/GitHub workflow.

    -

    This is important, as all research goes through changes and sometimes we want to know what those things were. Did you delete some text that you now think is important? Version control will save that for you. Did your code work perfectly in the past, but is now buggy beyond belief? Version control. It’s a great way to avoid that chaotic state where you have multiple copies of the same file, but without a stupid and annoying file naming convention. FINAL_Revised_2.2_supervisor_edits_ver1.7_scream.txt will be a thing of the past.

    -

    GitHub is the platform that allows you to seamlessly share code from your workspace (e.g., laptop) to be hosted in an online space. So, sort of like the public interface to GitHub. The advantages of Git/GitHub are:

    -
      -
    1. You get to keep copies of all your work through time;
    2. -
    3. You can compare work through different copies through time, which helps to spot bugs or errors;
    4. -
    5. Other people can collaborate openly with your work;
    6. -
    7. You have both a local and an online copy of your work that remain in sync;
    8. -
    9. It is fully transparent as to who made a contribution, why they made it, and when; and
    10. -
    11. You can have multiple people working on the same project at once in parallel.
    12. -
    -

    While this was primarily designed for source code, it should be instantly obvious how this becomes a powerful tool for virtually all research workflows.

    -


    -
    -
    -

    RStudio

    -

    RStudio is a popular coding environment for researchers who use the statistical programming language, R. It comes with a text editor, so you don’t have to install another and switch between. It also includes a graphical user interface (GUI) to Git and GitHub, which we will be using here.

    -

    Isn’t it nice when brilliant Open Source tools integrate seamlessly like that. This should help to make your daily use of Git much simpler.

    -

    If at any point you need to install new packages for R, simply use the following command:

    -

    install.packages("PACKAGE NAME", dependencies = TRUE)

    -

    Replacing PACKAGE NAME with the, er, package name. Some examples you can play with that might come in useful include knitr, devtools or ggplot2.

    -


    -
    -
    -
    -

    Step one: Download all the things

    -
      -
    1. You should already have a GitHub account by now if you have followed the previous tasks. If not, sign up here. Free unlimited repositories for all!
    2. -
    3. Download and install the latest version of R. Also available for Mac and Linux.
    4. -
    5. Download and install the latest version of Rstudio. Oh, hey, looks it Open Source! Swish.
    6. -
    7. Download and install the latest version of Git. Make sure to Select “Use Git from the Windows Command Prompt” during installation.
    8. -
    -
    -

    Pro-tip: To update all of your R packages in one, simply execute the following code update.packages(ask = FALSE, checkBuilt = TRUE)

    -
    -

    For now, just choose all the usual default options for each install. Depending on which Operating System (e.g., Mac, Windows, Linux), this might be different for each of you. For now, and for the rest of this task, we’re going to stick with doing things the easy-ish Windows way (but also provide some instructions for using the command line).

    -

    For Linux or Debian users, simply use the following command to install Git:

    -

    sudo apt-get install git-core

    -

    For Mac users, this link, or purchase a new laptop with a different operating system.

    -

    If you want, you can also download the local version of GitHub and use it through the simple GUI. It’s available on Windows and Mac and Linux, and can make your life a little easier, especially if you want to use a different platform to RStudio.

    -
    -

    Pro-tip: You see when installing Git it says ‘Use Git Bash as shell for Git projects?’ This is the place where you can use the command-line to access Git from outside of RStudio. It’s a powerful beast. Try the following two commands to get started:

    -
    -

    git config --global user.name 'YOUR USERNAME'
    git config --global user.email 'YOUR EMAIL'

    -
    -
    -

    Step two: Configure Git inside RStudio

    -

    Right, that’s the easy bit done. Next, go into RStudio, and in the tabs at the top go to Go to Tools > Global Options > Git/SVN. SVN is just another version control system like Git, and we don’t need to worry about that here.

    -

    In the place where it says Git executable, add the pathway here to the git.exe file that you just downloaded in the previous step. Make sure the box here that says Enable version control interface for RStudio projects is ticked. This now has tied version control to future projects in RStudio, to provide a really powerful additional dimension to collaborative or solo work.

    -

    - -

    -

    -The Global Options window inside RStudio -

    -


    -

    Next, hit the button in this window that says Create RSA Key, This is a private key that is used for authentication between different systems, and saves you from having to type in your password over and over. Here, it will pop up a new window with a public key, that you want to copy to your clipboard.

    -

    Head over to GitHub, go to your profile settings, and the SSH and GPG keys tab. Click New SSH key. Here, paste in the key from RStudio, and call it something imaginative like ‘RStudio’.

    -

    - -

    -

    -Inside GitHub where you will want to enter the key you just generated in RStudio -

    -


    -

    OK, now hold on to your butts, we’re going into the command line. Don’t worry if you’ve never used the shell before because it’s quite similar to using R, or any other coding system. The main difference here though is that instead of calling functions like in R, you call commands.

    -

    So back in RStudio, go to Tools > Shell, and it will open up a command prompt window. If you already played with the Git Bash above, you should have done this step already. Enter the following two commands:

    -

    git config --global user.name 'YOUR USERNAME'
    git config --global user.email 'YOUR EMAIL'

    -

    Hopefully it does not have to be said to substitute in your own GitHub username and email here. You can access this at any point just by finding the ‘Shell’ within Windows. Or, if you right click on any folder on your Desktop that is linked to a GitHub repo, you can open up the Shell instantly and Bash away.

    -

    What this stage has done is configure Git, which is software that runs on your desktop, to GitHub, which is a repository website.

    -

    Restart R Studio. Whew, that was tough. Next.

    -


    -
    -
    -

    Step three: Why did I just do that?

    -

    OK, hold your breathe, we’re going to pause here just to learn some basic Git commands. Some of the key ones you could do with learning are:

    -
      -
    • Add: This is where you submit files to the staging area before being committed.

    • -
    • Commit This is like ‘saving’ your work by creating a new version or copy.

    • -
    • Push: This is how you send files from your local project to the online repository.

    • -
    • Pull: This is how you get files from your online repository to your local project.

    • -
    -

    Back in RStudio, type in the following into the Terminal, or by opening up a new Shell:

    -

    git add .

    -

    It won’t actually do anything for now, but in the future will add all files in your current working directory (that’s what the . does) to staging ready for a commit.

    -


    -
    -
    -

    Step four: The perfect marriage between Git and R

    -

    Now, in Task 1, you should have learned how to build your very first GitHub repository. If you haven’t done that, we can wait here while you go and do that. If you have already, or have an existing GitHub repository, we can move on.

    -

    So, you should have a repository on GitHub, complete with a README file, a LICENSE file and some other bits and bobs.

    -

    What we are going to do now, is integrate that repository with Git. Steady now.

    -
      -
    1. Firstly, go to Project > Create Project > Version Control > Git.
    2. -
    3. Back on GitHub, you should see a bit where there is a https:// URL. That is the link to your repository, and it gives you the option to clone it in your desktop. For now, just copy that link, switch back to RStudio, and paste it into the ‘Repository URL’ as indicated.
    4. -
    5. Give the project a directory name, like test, Jim, or whatever you want.
    6. -
    7. Next, browse for the place on your desktop where you want this project to live, its subdirectory.
    8. -
    9. Click ‘Create Project’, and let the magic be done!
    10. -
    -

    What you just did was tell RStudio to associate a new project in R with specific repository on GitHub.

    -
    -
    -

    Step four: Alternative

    -

    If you still haven’t built your first repository on GitHub, we can do something slightly different here. In RStudio, click New project and then New Directory. Call it what you want and change the directory as needed, make sure to tick Create a git repository, and then click Create Project. This creates an .Rproj file, which you can manage in the usual way through RStudio, including adding README.mdand LICENSE.md files as discussing in Task 1.

    -
    -
    -

    Step five: Getting content with content

    -

    Remember that README file we created a while back? Well, it’s time to write it. Thinking back to Task 1, there were some specific things that we said make a good README file. Do you remember what any of them were? Just to refresh your memory, these were:

    -
      -
    • What is this project about and what does it do.
    • -
    • Why should people care, and why is it useful.
    • -
    • How can someone get started contributing to the project.
    • -
    • Who can be contacted in case someone needs help.
    • -
    • A link to the license, contributing guidelines, and code of conduct.
    • -
    • A description of the project structure.
    • -
    • Who is involved, and what are their roles.
    • -
    • The current status of the project.
    • -
    -

    So, in RStudio, open that file try adding just a bit of information about this for your project. If you are doing this for an actual project, try and make it useful. If you are just tinkering for now, you can add what you want.

    -

    Remember that your README file is in markdown (.md) format. For a refresher on some of the simple syntax markdown uses, check this handy cheatsheet.

    -

    - -

    -

    -Screenshot of what this module looks in markdown, during development. Meta. -

    -


    -
    -
    -

    Step six: A brave commitment

    -

    OK, so now you should have a nicely edited README file. Now we are going to ‘commit’ this to the project using Git. This is basically the equivalent of saving this version of your project, with a record of what changes were made. Successive commits produce a history that can be examined at a later time, allowing you to work with confidence.

    -

    There are a few ways of doing this.

    -
      -
    1. Go to Tools > Version Control > Commit
    2. -
    3. In the environment pane in RStudio, there should be a new ‘Git’ tab. Handy.
    4. -
    5. In your console pane, there should now be a new ‘Terminal’, which you can run Git command lines through.
    6. -
    -

    Let’s just stick with the second option for now. This Git pane shows you which files have been changed and includes buttons for the most important Git commands we saw earlier.

    -

    Select the README file in the Git window, which should show up automatically if you have made any edits to it. This adds that file to the ‘staging’ area, which is sort of like the pre-saving space for your work. Click ‘Commit’ and a new window should pop up.

    -

    Here, you have a chance to review your changes, and write a nice commit message. Type in something brief, but informative about the changes that you have made in this version or snapshot of your work. You want this to be enough information so that if you or someone else looks back on it, you’ll know why you made this commit and the changes associated with it. These are like safety nets for your project in case you need to fall back for some reason.

    -
    -

    Pro-tip: Here, you will see a list of all the changes you have made since your last commit. Older removed lines are in red, and newly added lines are in green. Double check these to make sure that the edits you have made are the ones you intended to make. This is really helpful for spotting typos, stray edits, and any other little mistakes you might have accidentially introduced. Safety first.

    -
    -

    Note If you are colour-blind and can’t see which lines have been added or removed, you can use the line numbers in the two columns on the left of the window as a guide. Here, the number in the first column identifies the older version, and the number in the second column identifies the new version.

    -

    Now when you click ‘Commit’, another window will pop up, telling you how many files you have changed and the number of lines within that file you have changed. Close that little window down.

    -


    -
    -
    -

    Step seven: PUSH!

    -

    Click the Push button in the top right of the new window. A new window will pop up now. What this is doing is synchronising the files changed on your local repository with the README file to the online version of the project on GitHub.

    -

    To do this from the Shell, use the following command:

    -

    git push -u origin master

    -

    Some times here you will be prompted to add your username and password from GitHub, which you should do if asked.

    -

    Close that window down, and the next one. Go to your project on GitHub, refresh, and check that the README file is still there in all its newly edited glory. You should see the commit message you made next to the file too.

    -


    -

    OPTIONAL ADVANCED/AWESOME STEP

    -

    Alright, so you just pushed some content to your first repo, awesome! Now let’s put it into practice for a real project. Like, the one you are participating in right now. Let’s try this out:

    -
      -
    1. Go to the repositors for this project on GitHub

    2. -
    3. Fork the repository to your own GitHub account. The URL for this should be: https://github.com/OpenScienceMOOC/Module-5-Open-Research-Software-and-Open-Source.git

    4. -
    5. Head into RStudio, go to File > New Project, choose Version Control, select Git, and then paste the forked repository URL found in your copy of the repository. You now have your own versioned copy of this whole module. Neat. Save this somewhere on your local machine.

    6. -
    7. Now, you need to tell Git that a different version of this project exists. Open up the Shell, and enter the command: git remote add upstream https://github.com/OpenScienceMOOC/Module-5-Open-Research-Software-and-Open-Source

    8. -
    9. What you just did was name the original branch here upstream, just to keep things simple for now. Now, create a new branch to document your changes to this independent of the main branch. Enter the command: git checkout -b proposed-changes master

    10. -
    11. You just created a new branch called proposed-changes where you can now edit all of the content and files to your heart’s delight. Hopefully, the structure of this project is simple enough for you to navigate around. All of the raw files for the MOOC can be found in the content_development folder, and this is Task_3.md.

    12. -
    13. If you scroll to the bottom of Task_3.md, you should see a place where you can edit in your name and affiliation. Add these in, and then go through the commit procedure detailed above. If you see anything else that needs editing too, feel free to add them in too!

    14. -
    15. Now, you want to push the changes back to the original branch. Use the following command in your Shell: git push origin proposed-changes

    16. -
    17. Go back to GitHub and find your fork here. Click the little green button, and create a pull request. This is essentially a review to integrate the changes made into the original branch for this MOOC project.

    18. -
    19. The owners in charge of the MOOC project will now get a notification of this, review it, and confirm it if everything went to plan! We will review it, and if it all went okay, your name will now appear for all eternity as someone who completed this advanced task.

    20. -
    21. Have a cup of tea, coffee, or wine to celebrate!

    22. -
    -

    CONGRATULATIONS

    -

    You just integrated Git with R Studio, and made your first change to a version controlled project. Your life will now never be the same, and your research workflow will probably be more rapid, agile, and collaborative than ever. Good luck going back to Word.

    -

    The great thing is that this doesn’t have to just be used for code. You can use it for plain text, markdown, html, and, well, R code. The possibilities are limitless - what you have just learned is a new form of openly collaborative project management that works for an enormous range of tasks.

    -

    From now on, it is all up to you! Some advice is to:

    -
      -
    • Make frequent commits. Treat Git like your puppy, in that it requires constant and special attention. Just a pat on the head every now and then is enough to keep it satisfied, but it’ll be happiest with sustained servicing.

    • -
    • The best way to do this is to make a commit each time you work on a specific problem. For example, writing a paragraph, running an analysis, or fixing a bug.

    • -
    • Push often. Don’t let those commits build up, otherwise you run more risk of getting into merge conflicts. Seeing as these can be the stuff of nightmares, just make sure to push often.

    • -
    • Pull often. If others are working remotely on the same project, you will want to stay up to date with their changes. Make sure to frequently pull in their changes from GitHub to make sure you are all in sync.

    • -
    • Experiment and explore! This task really only scratches the surface, and there are many different functions, tools, and ways this can be used. Really, it is up to you to find out how to use this information to improve your research workflow, and ultimately collaborate on better, more open and reliable research!

    • -
    • To learn more about issues, branches, merge conflicts, pull requests, and other advanced aspects of using Git and RStudio, check out this awesome guide by Hadley Wickham.

    • -
    -


    -

    Know a way this content can be improved?

    -

    Time to take your new GitHub skills for a test-run! All content development primarily happens here. If you have a suggested improvement to the content, layout, or anything else, you can make it and then it will automatically become part of the MOOC content after verification from a moderator!

    -
    -
    -

    List of participants who completed the ADVANCED version of this task

    -
      -
    • Brendan Palmer,CRF-C, University College Cork
    • -
    • Lisa Matthias, Freie Universität Berlin
    • -
    • Hollie Marshall, University of Leicester
    • -
    • Eric D. Wilkey, Western University, Canada
    • -
    • José-Raúl Canay-Pazos, Universidade de Santiago de Compostela, Spain
    • -
    • Encarnación Martínez Álvarez, Spain
    • -
    • Alberto Albz Marocchino, Italy
    • -
    • Iratxe Rubio, Basque Centre for Climate Change BC3
    • -
    -

    CC0 Public Domain Dedication

    -
    -
    - - - - -
    - - - - - + + + + +
    + + + + + + + + + + + + + + +
    +

    Task 3: How to integrate Git with R Studio

    +

    This task is designed for students and researchers who want to implement a system of version control within a + standard R-based workflow. This can be applied to a range of software development, data analysis and project + management tasks. Your future research self will thank your for the convenience.

    +

    Don’t forget you can join in the discussions over at our open Slack channel. Please do introduce + yourself at #module5opensource, and tell us a bit about who you are, your background, and how you ended up here! +

    +

    Estimated time to complete: 30 minutes

    +

    Estimate time saving once complete: Virtually infinite

    +

    NOTE A video guide version of this task is now available on YouTube.

    + +
    +

    Getting started

    +

    Congratulations on making it this far! If you’re reading this, you’ve survived pull requests, web-hooks, and + can probably even tell us know what the F in FOSS stands for (not Frustration…) Hopefully, you have + overcome any scepticism or reluctance towards the benefits of GitHub and Open Source Software, and are ready + to take the next step.

    +

    Before starting this Task, please make sure you have already completed Task + 1 and Task + 2, so that you are more familiar with GitHub and some standard Open Source practices.

    +

    This task will teach you how to integrate the version control software, Git, with the popular coding + environment, RStudio. And yes, it is Git as in gif or God, not Jit as in the wrong way of pronouncing things. +

    +

    If you are one of those researchers who thinks that having code spread across multiple hard-drives that are + waiting to break, Dropbox, Google Drive, or any other non-specialist software, this task is just for you. If + you have ever experienced the mind-numbing process of having multiple ‘final’ versions of a paper bouncing + between different co-authors, this is also for you.

    +

    All of us are guilty of these sorts of things once in a while, but there are ways to do it that are better + for you, future you, and those who might benefit from your work.

    +


    +
    +

    Getting Git

    +

    So, what is Git, and how is it different to GitHub? Git is a version control system, which enables you to + save and track time-stamped copies of your work throughout the development process. It also works with + non-code items too, like this MOOC, the majority of which was written in markdown in RStudio, and integrated + with a Git/GitHub workflow.

    +

    This is important, as all research goes through changes and sometimes we want to know what those things + were. Did you delete some text that you now think is important? Version control will save that for you. Did + your code work perfectly in the past, but is now buggy beyond belief? Version control. It’s a great way to + avoid that chaotic state where you have multiple copies of the same file, but without a stupid and annoying + file naming convention. FINAL_Revised_2.2_supervisor_edits_ver1.7_scream.txt will be a thing of + the past.

    +

    GitHub is the platform that allows you to seamlessly share code from your workspace (e.g., laptop) to be + hosted in an online space. So, sort of like the public interface to GitHub. The advantages of Git/GitHub + are:

    +
      +
    1. You get to keep copies of all your work through time;
    2. +
    3. You can compare work through different copies through time, which helps to spot bugs or errors;
    4. +
    5. Other people can collaborate openly with your work;
    6. +
    7. You have both a local and an online copy of your work that remain in sync;
    8. +
    9. It is fully transparent as to who made a contribution, why they made it, and when; and
    10. +
    11. You can have multiple people working on the same project at once in parallel.
    12. +
    +

    While this was primarily designed for source code, it should be instantly obvious how this becomes a + powerful tool for virtually all research workflows.

    +


    +
    +
    +

    RStudio

    +

    RStudio is a popular coding environment for researchers who use the statistical programming language, R. It + comes with a text editor, so you don’t have to install another and switch between. It also includes a + graphical user interface (GUI) to Git and GitHub, which we will be using here.

    +

    Isn’t it nice when brilliant Open Source tools integrate seamlessly like that. This should help to make + your daily use of Git much simpler.

    +

    If at any point you need to install new packages for R, simply use the following command:

    +

    install.packages("PACKAGE NAME", dependencies = TRUE)

    +

    Replacing PACKAGE NAME with the, er, package name. Some examples you can play with that might + come in useful include knitr, devtools or ggplot2.

    +


    +
    +
    +
    +

    Step one: Download all the things

    +
      +
    1. You should already have a GitHub account by now if you have followed the previous tasks. If not, sign up here. Free unlimited repositories for all!
    2. +
    3. Download and install the latest version of R. Also available for + Mac and Linux.
    4. +
    5. Download and install the latest version of Rstudio. Oh, hey, looks it Open Source! + Swish.
    6. +
    7. Download and install the latest version of Git. Make sure + to Select “Use Git from the Windows Command Prompt” during installation.
    8. +
    +
    +

    Pro-tip: To update all of your R packages in one, simply execute the following code + update.packages(ask = FALSE, checkBuilt = TRUE)

    +
    +

    For now, just choose all the usual default options for each install. Depending on which Operating System + (e.g., Mac, Windows, Linux), this might be different for each of you. For now, and for the rest of this task, + we’re going to stick with doing things the easy-ish Windows way (but also provide some instructions for using + the command line).

    +

    For Linux or Debian users, simply use the following command to install Git:

    +

    sudo apt-get install git-core

    +

    For Mac users, this link, or purchase a new laptop with a + different operating system.

    +

    If you want, you can also download the local version of GitHub and + use it through the simple GUI. It’s available on Windows and Mac and Linux, and can make your life a little + easier, especially if you want to use a different platform to RStudio.

    +
    +

    Pro-tip: You see when installing Git it says ‘Use Git Bash as shell for Git projects?’ + This is the place where you can use the command-line to access Git from outside of RStudio. It’s a powerful + beast. Try the following two commands to get started:

    +
    +

    git config --global user.name 'YOUR USERNAME'
    + git config --global user.email 'YOUR EMAIL'

    +
    +
    +

    Step two: Configure Git inside RStudio

    +

    Right, that’s the easy bit done. Next, go into RStudio, and in the tabs at the top go to Go to Tools + > Global Options > Git/SVN. SVN is just another version control system like Git, and we don’t + need to worry about that here.

    +

    In the place where it says Git executable, add the pathway here to the git.exe file that you just + downloaded in the previous step. Make sure the box here that says Enable version control interface for + RStudio projects is ticked. This now has tied version control to future projects in RStudio, to + provide a really powerful additional dimension to collaborative or solo work.

    +

    + +

    +

    + The Global Options window inside RStudio +

    +


    +

    Next, hit the button in this window that says Create RSA Key, This is a private key that is used for + authentication between different systems, and saves you from having to type in your password over and over. + Here, it will pop up a new window with a public key, that you want to copy to your clipboard.

    +

    Head over to GitHub, go to your profile settings, and the SSH and GPG keys tab. Click New SSH + key. Here, paste in the key from RStudio, and call it something imaginative like ‘RStudio’.

    +

    + +

    +

    + Inside GitHub where you will want to enter the key you just generated in RStudio +

    +


    +

    OK, now hold on to your butts, we’re going into the command line. Don’t worry if you’ve never used the shell + before because it’s quite similar to using R, or any other coding system. The main difference here though is + that instead of calling functions like in R, you call commands.

    +

    So back in RStudio, go to Tools > Shell, and it will open up a command prompt window. If + you already played with the Git Bash above, you should have done this step already. Enter the following two + commands:

    +

    git config --global user.name 'YOUR USERNAME'
    + git config --global user.email 'YOUR EMAIL'

    +

    Hopefully it does not have to be said to substitute in your own GitHub username and email here. You can + access this at any point just by finding the ‘Shell’ within Windows. Or, if you right click on any folder on + your Desktop that is linked to a GitHub repo, you can open up the Shell instantly and Bash away.

    +

    What this stage has done is configure Git, which is software that runs on your desktop, to GitHub, which is a + repository website.

    +

    Restart R Studio. Whew, that was tough. Next.

    +


    +
    +
    +

    Step three: Why did I just do that?

    +

    OK, hold your breathe, we’re going to pause here just to learn some basic Git commands. Some of the key ones + you could do with learning are:

    +
      +
    • +

      Add: This is where you submit files to the staging area before being committed.

      +
    • +
    • +

      Commit This is like ‘saving’ your work by creating a new version or copy.

      +
    • +
    • +

      Push: This is how you send files from your local project to the online repository.

      +
    • +
    • +

      Pull: This is how you get files from your online repository to your local project.

      +
    • +
    +

    Back in RStudio, type in the following into the Terminal, or by opening up a new Shell:

    +

    git add .

    +

    It won’t actually do anything for now, but in the future will add all files in your current working directory + (that’s what the . does) to staging ready for a commit.

    +


    +
    +
    +

    Step four: The perfect marriage between Git and R

    +

    Now, in Task 1, you should have learned how to build your very first GitHub repository. If you haven’t done + that, we can wait here while you go and do that. If you have already, or have an existing GitHub repository, + we can move on.

    +

    So, you should have a repository on GitHub, complete with a README file, a LICENSE + file and some other bits and bobs.

    +

    What we are going to do now, is integrate that repository with Git. Steady now.

    +
      +
    1. Firstly, go to Project > Create Project > Version Control > Git.
    2. +
    3. Back on GitHub, you should see a bit where there is a https:// URL. + That is the link to your repository, and it gives you the option to clone it in your desktop. For now, just + copy that link, switch back to RStudio, and paste it into the ‘Repository URL’ as indicated.
    4. +
    5. Give the project a directory name, like test, Jim, or whatever you want.
    6. +
    7. Next, browse for the place on your desktop where you want this project to live, its subdirectory.
    8. +
    9. Click ‘Create Project’, and let the magic be done!
    10. +
    +

    What you just did was tell RStudio to associate a new project in R with specific repository on GitHub.

    +
    +
    +

    Step four: Alternative

    +

    If you still haven’t built your first repository on GitHub, we can do something slightly different here. In + RStudio, click New project and then New Directory. Call it what you want and change the + directory as needed, make sure to tick Create a git repository, and then click Create + Project. This creates an .Rproj file, which you can manage in the usual way through + RStudio, including adding README.mdand LICENSE.md files as discussing in Task 1.

    +
    +
    +

    Step five: Getting content with content

    +

    Remember that README file we created a while back? Well, it’s time to write it. Thinking back to + Task 1, there were some specific things that we said make a good README file. Do you remember + what any of them were? Just to refresh your memory, these were:

    +
      +
    • What is this project about and what does it do.
    • +
    • Why should people care, and why is it useful.
    • +
    • How can someone get started contributing to the project.
    • +
    • Who can be contacted in case someone needs help.
    • +
    • A link to the license, contributing guidelines, and code of conduct.
    • +
    • A description of the project structure.
    • +
    • Who is involved, and what are their roles.
    • +
    • The current status of the project.
    • +
    +

    So, in RStudio, open that file try adding just a bit of information about this for your project. If you are + doing this for an actual project, try and make it useful. If you are just tinkering for now, you can add what + you want.

    +

    Remember that your README file is in markdown (.md) format. For a refresher on some of the + simple syntax markdown uses, check this handy cheatsheet.

    +

    + +

    +

    + Screenshot of what this module looks in markdown, during development. Meta. +

    +


    +
    +
    +

    Step six: A brave commitment

    +

    OK, so now you should have a nicely edited README file. Now we are going to ‘commit’ this to the + project using Git. This is basically the equivalent of saving this version of your project, with a record of + what changes were made. Successive commits produce a history that can be examined at a later time, allowing + you to work with confidence.

    +

    There are a few ways of doing this.

    +
      +
    1. Go to Tools > Version Control > Commit
    2. +
    3. In the environment pane in RStudio, there should be a new ‘Git’ tab. Handy.
    4. +
    5. In your console pane, there should now be a new ‘Terminal’, which you can run Git command lines through. +
    6. +
    +

    Let’s just stick with the second option for now. This Git pane shows you which files have been changed and + includes buttons for the most important Git commands we saw earlier.

    +

    Select the README file in the Git window, which should show up automatically if you have made + any edits to it. This adds that file to the ‘staging’ area, which is sort of like the pre-saving space for + your work. Click ‘Commit’ and a new window should pop up.

    +

    Here, you have a chance to review your changes, and write a nice commit message. Type in something brief, but + informative about the changes that you have made in this version or snapshot of your work. You want this to be + enough information so that if you or someone else looks back on it, you’ll know why you made this commit and + the changes associated with it. These are like safety nets for your project in case you need to fall back for + some reason.

    +
    +

    Pro-tip: Here, you will see a list of all the changes you have made since your last + commit. Older removed lines are in red, and newly added lines are in green. Double check these to make sure + that the edits you have made are the ones you intended to make. This is really helpful for spotting typos, + stray edits, and any other little mistakes you might have accidentially introduced. Safety first.

    +
    +

    Note If you are colour-blind and can’t see which lines have been added or removed, you can + use the line numbers in the two columns on the left of the window as a guide. Here, the number in the first + column identifies the older version, and the number in the second column identifies the new version.

    +

    Now when you click ‘Commit’, another window will pop up, telling you how many files you have changed and the + number of lines within that file you have changed. Close that little window down.

    +


    +
    +
    +

    Step seven: PUSH!

    +

    Click the Push button in the top right of the new window. A new window will pop up now. What this is + doing is synchronising the files changed on your local repository with the README file to the + online version of the project on GitHub.

    +

    To do this from the Shell, use the following command:

    +

    git push -u origin master

    +

    Some times here you will be prompted to add your username and password from GitHub, which you should do if + asked.

    +

    Close that window down, and the next one. Go to your project on GitHub, refresh, and check that the + README file is still there in all its newly edited glory. You should see the commit message you + made next to the file too.

    +


    +

    OPTIONAL ADVANCED/AWESOME STEP

    +

    Alright, so you just pushed some content to your first repo, awesome! Now let’s put it into practice for a + real project. Like, the one you are participating in right now. Let’s try this out:

    +
      +
    1. +

      Go to the repositors for this project on GitHub

      +
    2. +
    3. +

      Fork the repository to your own GitHub account. The URL for this should be: + https://github.com/OpenScienceMOOC/Module-5-Open-Research-Software-and-Open-Source.git

      +
    4. +
    5. +

      Head into RStudio, go to File > New Project, choose Version Control, select + Git, and then paste the forked repository URL found in your copy of the repository. You now have + your own versioned copy of this whole module. Neat. Save this somewhere on your local machine.

      +
    6. +
    7. +

      Now, you need to tell Git that a different version of this project exists. Open up the Shell, + and enter the command: + git remote add upstream https://github.com/OpenScienceMOOC/Module-5-Open-Research-Software-and-Open-Source +

      +
    8. +
    9. +

      What you just did was name the original branch here upstream, just to keep things simple for + now. Now, create a new branch to document your changes to this independent of the main + branch. Enter the command: git checkout -b proposed-changes master

      +
    10. +
    11. +

      You just created a new branch called proposed-changes where you can now edit all of the + content and files to your heart’s delight. Hopefully, the structure of this project is simple enough for + you to navigate around. All of the raw files for the MOOC can be found in the + content_development folder, and this is Task_3.md.

      +
    12. +
    13. +

      If you scroll to the bottom of Task_3.md, you should see a place where you can edit in your + name and affiliation. Add these in, and then go through the commit procedure detailed above. If you see + anything else that needs editing too, feel free to add them in too!

      +
    14. +
    15. +

      Now, you want to push the changes back to the original branch. Use the following command in your + Shell: git push origin proposed-changes

      +
    16. +
    17. +

      Go back to GitHub and find your fork here. Click the little green button, and create a pull request. This + is essentially a review to integrate the changes made into the original branch for this MOOC project.

      +
    18. +
    19. +

      The owners in charge of the MOOC project will now get a notification of this, review it, and confirm it + if everything went to plan! We will review it, and if it all went okay, your name will now appear for all + eternity as someone who completed this advanced task.

      +
    20. +
    21. +

      Have a cup of tea, coffee, or wine to celebrate!

      +
    22. +
    +

    CONGRATULATIONS

    +

    You just integrated Git with R Studio, and made your first change to a version controlled project. Your life + will now never be the same, and your research workflow will probably be more rapid, agile, and collaborative + than ever. Good luck going back to Word.

    +

    The great thing is that this doesn’t have to just be used for code. You can use it for plain text, markdown, + html, and, well, R code. The possibilities are limitless - what you have just learned is a new form of openly + collaborative project management that works for an enormous range of tasks.

    +

    From now on, it is all up to you! Some advice is to:

    +
      +
    • +

      Make frequent commits. Treat Git like your puppy, in that it requires constant and special attention. + Just a pat on the head every now and then is enough to keep it satisfied, but it’ll be happiest with + sustained servicing.

      +
    • +
    • +

      The best way to do this is to make a commit each time you work on a specific problem. For example, + writing a paragraph, running an analysis, or fixing a bug.

      +
    • +
    • +

      Push often. Don’t let those commits build up, otherwise you run more risk of getting into merge + conflicts. Seeing as these can be the stuff of nightmares, just make sure to push often.

      +
    • +
    • +

      Pull often. If others are working remotely on the same project, you will want to stay up to date with + their changes. Make sure to frequently pull in their changes from GitHub to make sure you are all in sync. +

      +
    • +
    • +

      Experiment and explore! This task really only scratches the surface, and there are many different + functions, tools, and ways this can be used. Really, it is up to you to find out how to use this + information to improve your research workflow, and ultimately collaborate on better, more open and + reliable research!

      +
    • +
    • +

      To learn more about issues, branches, merge conflicts, pull requests, and other advanced aspects of using + Git and RStudio, check out this awesome guide by Hadley + Wickham.

      +
    • +
    +


    +

    Know a way this content can be improved?

    +

    Time to take your new GitHub skills for a test-run! All content development primarily happens here. + If you have a suggested improvement to the content, layout, or anything else, you can make it and then it will + automatically become part of the MOOC content after verification from a moderator!

    +
    +
    +

    List of participants who completed the ADVANCED version of this task

    +
      +
    • Brendan Palmer,CRF-C, University College Cork
    • +
    • Lisa Matthias, Freie Universität Berlin
    • +
    • Hollie Marshall, University of Leicester
    • +
    • Eric D. Wilkey, Western University, Canada
    • +
    • José-Raúl Canay-Pazos, Universidade de Santiago de Compostela, Spain
    • +
    • Encarnación Martínez Álvarez, Spain
    • +
    • Alberto Albz Marocchino, Italy
    • +
    • Iratxe Rubio, Basque Centre for Climate Change BC3
    • +
    +

    CC0 Public Domain Dedication

    +
    +
    + + + + +
    + + + + + - + + \ No newline at end of file diff --git a/production_toolkit/MOOC_planning_template.html b/production_toolkit/MOOC_planning_template.html index 1df8de8..95d6e62 100644 --- a/production_toolkit/MOOC_planning_template.html +++ b/production_toolkit/MOOC_planning_template.html @@ -4,771 +4,9199 @@ - - - - - - - - - -Mooc planning template - - - - - - + + + + - - + + + + + + + + - // Change the URL when tabs are clicked - $('a', context).on('click', function(e) { - history.pushState(null, null, this.href); - showStuffFromHash(context); - }); - return this; - }; -}(jQuery)); - -window.buildTabsets = function(tocID) { - - // build a tabset from a section div with the .tabset class - function buildTabset(tabset) { - - // check for fade and pills options - var fade = tabset.hasClass("tabset-fade"); - var pills = tabset.hasClass("tabset-pills"); - var navClass = pills ? "nav-pills" : "nav-tabs"; - - // determine the heading level of the tabset and tabs - var match = tabset.attr('class').match(/level(\d) /); - if (match === null) - return; - var tabsetLevel = Number(match[1]); - var tabLevel = tabsetLevel + 1; - - // find all subheadings immediately below - var tabs = tabset.find("div.section.level" + tabLevel); - if (!tabs.length) - return; - - // create tablist and tab-content elements - var tabList = $(''); - $(tabs[0]).before(tabList); - var tabContent = $('
    '); - $(tabs[0]).before(tabContent); - - // build the tabset - var activeTab = 0; - tabs.each(function(i) { - - // get the tab div - var tab = $(tabs[i]); - - // get the id then sanitize it for use with bootstrap tabs - var id = tab.attr('id'); - - // see if this is marked as the active tab - if (tab.hasClass('active')) - activeTab = i; - - // remove any table of contents entries associated with - // this ID (since we'll be removing the heading element) - $("div#" + tocID + " li a[href='#" + id + "']").parent().remove(); - - // sanitize the id for use with bootstrap tabs - id = id.replace(/[.\/?&!#<>]/g, '').replace(/\s/g, '_'); - tab.attr('id', id); - - // get the heading element within it, grab it's text, then remove it - var heading = tab.find('h' + tabLevel + ':first'); - var headingText = heading.html(); - heading.remove(); - - // build and append the tab list item - var a = $('' + headingText + ''); - a.attr('href', '#' + id); - a.attr('aria-controls', id); - var li = $('
  • '); - li.append(a); - tabList.append(li); - - // set it's attributes - tab.attr('role', 'tabpanel'); - tab.addClass('tab-pane'); - tab.addClass('tabbed-pane'); - if (fade) - tab.addClass('fade'); - - // move it into the tab content div - tab.detach().appendTo(tabContent); - }); - // set active tab - $(tabList.children('li')[activeTab]).addClass('active'); - var active = $(tabContent.children('div.section')[activeTab]); - active.addClass('active'); - if (fade) - active.addClass('in'); - - if (tabset.hasClass("tabset-sticky")) - tabset.rmarkdownStickyTabs(); - } - - // convert section divs with the .tabset class to tabsets - var tabsets = $("div.section.tabset"); - tabsets.each(function(i) { - buildTabset($(tabsets[i])); - }); -}; - - - - - - - - - - - - + - - - - -
    - - - - - - - - - - - - - - -
    -

    MOOC planning template

    -
    -

    How to use this template

    -

    This is to provide a structured check list to track content development.

    -
      -
    • For the ‘Delivered’ column, a simple Yes/No Scheme should be used.
    • -
    • For the ‘Status’ column, please use one of the three symbols below.
    • -
    • For the ‘Deadline’ column, please use a traditional dating scheme: 2018/05/10.
    • -
    • For the ‘Comments’ column, insert any text as neccessary.
    • -
    -

    Status traffic light scheme:

    -

    Green: All looks good

    -
    -Green -

    Green

    -
    -

    Orange: Issues that can impact launch date

    -
    -Orange -

    Orange

    -
    -

    Red: Launch date in danger

    -
    -Red -

    Red

    -
    - ------- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
    Design PhaseDeliveredStatus badgeDeadlineComments
    Initiate and plan
    Kick offYepGreen2018/05/10Sprint success!
    Define target groupYepGreen2018/05/31Sprint success!
    Refine learning objectives/outcomesYepGreen2018/05/31Sprint success!
    Design course outlineYepGreen2018/05/31Sprint success!
    Design project plan and timelineYepGreen2018/06/31
    Identify promotion channelsYepGreen2018/06/31
    Design and scripting
    Identify key resourcesYepGreen2018/06/31Sprint success!
    Design learner activitiesYepGreen2018/06/313/3 completed
    Find existing key resourcesYepGreen2018/06/31Sprint success!
    Write audio/video scriptsIn prepGreen2018/08/316/6 completed
    Review all learning resourcesIn prep2018/11/31
    Finalise all scriptsIn prep2018/11/31
    Copyright strategyYepGreen2018/08/31
    Recording and editing
    Record on location/in studio
    Edit all audio/visual material
    Internal reviewing
    Cross-check and review contentIn prepGreen2018/08/31Continuous process
    Checks from Steering CommitteeIn prepGreen2018/08/31Continuous process
    External testing and review
    All reviewing conducted via GitHubIn prepGreen2018/08/31Continuous process
    Existing channels from communications strategy
    Internal reviewing and finalisation
    Cross-review and check content
    Final checks from Steering Committee
    Implementation
    Agreement on platformIn prepGreen2018/08/31
    Module logo designedYepGreen2018/08/31
    Module description and introductionYesGreen2018/07/31
    Team member and guest lecturer agreementsYesGreen2018/07/31
    Team member and guest lecturer profilesYesGreen2018/07/31
    Course readings acquiredYesGreen2018/07/31
    Port content to selected platform
    All content deposited in ZenodoYepGreen2018/08/31Second release completed
    Promotion
    Content and communication calendar/strategy/timelineIn progressGreen
    Identify relevant channels (mailing lists, social media and hashtags, organisations, individuals, websites, conferences)YesGreen2018/07/31
    Images for use in social mediaYepGreen2018/07/31
    Course title marketing checkYesGreen2018/07/31
    Launch
    Publicity startYesGreenDec 2018
    Open and free for all, continuous, self-paced learning, 100% onlineYesGreenDec 2018Continuous, self-paced
    Soft launchYesGreenDec 2018
    Course launchYesGreen
    Monitoring of learner experiences and reactionsIn progressJan 2019
    Prepare to provide additional information if requiredPending
    Reviewing and optimisation
    Collate and review learner feedback at regular intervalsIn prep
    Track any new information during course durationIn prep
    Prepare evaluation reportPending
    Evaluation meetingPending
    Optimise content where relevantPending
    -
    -
    - - - - -
    - - - - - + + + + +
    + + + + + + + + + + + + + + +
    +

    MOOC planning template

    +
    +

    How to use this template

    +

    This is to provide a structured check list to track content development.

    +
      +
    • For the ‘Delivered’ column, a simple Yes/No Scheme should be used.
    • +
    • For the ‘Status’ column, please use one of the three symbols below.
    • +
    • For the ‘Deadline’ column, please use a traditional dating scheme: 2018/05/10.
    • +
    • For the ‘Comments’ column, insert any text as neccessary.
    • +
    +

    Status traffic light scheme:

    +

    Green: All looks good

    +
    + Green +

    Green

    +
    +

    Orange: Issues that can impact launch date

    +
    + Orange +

    Orange

    +
    +

    Red: Launch date in danger

    +
    + Red +

    Red

    +
    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    Design PhaseDeliveredStatus badgeDeadlineComments
    Initiate and plan
    Kick offYepGreen2018/05/10Sprint success!
    Define target groupYepGreen2018/05/31Sprint success!
    Refine learning objectives/outcomesYepGreen2018/05/31Sprint success!
    Design course outlineYepGreen2018/05/31Sprint success!
    Design project plan and timelineYepGreen2018/06/31
    Identify promotion channelsYepGreen2018/06/31
    Design and scripting
    Identify key resourcesYepGreen2018/06/31Sprint success!
    Design learner activitiesYepGreen2018/06/313/3 completed
    Find existing key resourcesYepGreen2018/06/31Sprint success!
    Write audio/video scriptsIn prepGreen2018/08/316/6 completed
    Review all learning resourcesIn prep2018/11/31
    Finalise all scriptsIn prep2018/11/31
    Copyright strategyYepGreen2018/08/31
    Recording and editing
    Record on location/in studio
    Edit all audio/visual material
    Internal reviewing
    Cross-check and review contentIn prepGreen2018/08/31Continuous process
    Checks from Steering CommitteeIn prepGreen2018/08/31Continuous process
    External testing and review
    All reviewing conducted via GitHubIn prepGreen2018/08/31Continuous process
    Existing channels from communications strategy
    Internal reviewing and finalisation
    Cross-review and check content
    Final checks from Steering Committee
    Implementation
    Agreement on platformIn prepGreen2018/08/31
    Module logo designedYepGreen2018/08/31
    Module description and introductionYesGreen2018/07/31
    Team member and guest lecturer agreementsYesGreen2018/07/31
    Team member and guest lecturer profilesYesGreen2018/07/31
    Course readings acquiredYesGreen2018/07/31
    Port content to selected platform
    All content deposited in ZenodoYepGreen2018/08/31Second release completed
    Promotion
    Content and communication calendar/strategy/timelineIn progressGreen
    Identify relevant channels (mailing lists, social media and hashtags, organisations, individuals, + websites, conferences)YesGreen2018/07/31
    Images for use in social mediaYepGreen2018/07/31
    Course title marketing checkYesGreen2018/07/31
    Launch
    Publicity startYesGreenDec 2018
    Open and free for all, continuous, self-paced learning, 100% onlineYesGreenDec 2018Continuous, self-paced
    Soft launchYesGreenDec 2018
    Course launchYesGreen
    Monitoring of learner experiences and reactionsIn progressJan 2019
    Prepare to provide additional information if requiredPending
    Reviewing and optimisation
    Collate and review learner feedback at regular intervalsIn prep
    Track any new information during course durationIn prep
    Prepare evaluation reportPending
    Evaluation meetingPending
    Optimise content where relevantPending
    +
    +
    + + + + +
    + + + + + - + + \ No newline at end of file