Skip to main content
All Posts By

Shane Coughlan

Shane Coughlan is an expert in communication, security and business development. His professional accomplishments include spearheading the licensing team that elevated Open Invention Network into the largest patent non-aggression community in history, establishing the leading professional network of Open Source legal experts and aligning stakeholders to launch both the first law journal and the first law book dedicated to Open Source. Shane has extensive knowledge of Open Source governance, internal process development, supply chain management and community building. His experience includes engagement with the enterprise, embedded, mobile and automotive industries.

Nokia Announces Adoption of OpenChain ISO/IEC 5230:2020

By Featured, News

Nokia announces conformance with OpenChain ISO/IEC 5230:2020 to manage open source license compliance management in its supply chain. Nokia has been using and contributing to open source initiatives for decades as an active member of the open source community. Nokia implemented an open source compliance program as far back as 2004 and has had a multi-disciplined Open Source Program Office (OSPO) since 2014.

Nokia leads the Telco Working Group in the OpenChain community.

“Nokia is excited to publicly announce our conformance with OpenChain”, says Jonne Soininen, Head of Open Source Initiatives. “Nokia’s mature open source processes and tools fit well within the OpenChain requirements. We believe it is important to have industry-wide recognized standards to provide predictability to various parties in the industry and across the supply chain.”

“The adoption of ISO 5230 by Nokia is a significant step forward for the telecommunications supply chain with respect to open source,” says Shane Coughlan, OpenChain General Manager. “Firstly, it shows how one of the most experienced companies in the world in the sphere is taking a leadership position in managing open source risk and efficiency. Secondly, it is a clear signal to the broader supply chain regarding how other companies can optimize their approach as well. We welcome this development and look forward to working closely with the Nokia OSPO in the years ahead.”

About Nokia

At Nokia, we create technology that helps the world act together.

As a B2B technology innovation leader, we are pioneering networks that sense, think and act by leveraging our work across mobile, fixed and cloud networks. In addition, we create value with intellectual property and long-term research, led by the award-winning Nokia Bell Labs.

With truly open architectures that seamlessly integrate into any ecosystem, our high-performance networks create new opportunities for monetization and scale. Service providers, enterprises and partners worldwide trust Nokia to deliver secure, reliable and sustainable networks today – and work with us to create the digital services and applications of the future.

About the OpenChain Project

The OpenChain Project has an extensive global community of over 1,000 companies collaborating to make the supply chain quicker, more effective and more efficient. It maintains OpenChain ISO/IEC 5230, the international standard for open source license compliance programs and OpenChain ISO/IEC 18974, the industry standard for open source security assurance programs

About The Linux Foundation

The Linux Foundation is the world’s leading home for collaboration on open source software, hardware, standards, and data. Linux Foundation projects are critical to the world’s infrastructure, including Linux, Kubernetes, Node.js, ONAP, PyTorch, RISC-V, SPDX, OpenChain, and more. The Linux Foundation focuses on leveraging best practices and addressing the needs of contributors, users, and solution providers to create sustainable models for open collaboration. For more information, please visit us at linuxfoundation.org.

Nokia Contributes Validator for the OpenChain Telco SBOM Guide

By Featured, News

As part of their engagement with the OpenChain Project, the Nokia Open Source team have contributed the ‘openchain-telco-sbom-validator’, a script to validate SBOMs against the OpenChain Telco SBOM Guide. This reference tool is available to everyone under the Apache 2.0 license. Marc-Etienne Vargenau of Nokia, chair of the OpenChain Telco Work Group, Gergely Csatari and their colleagues have been instrumental in helping to ensure the determination of SBOM quality is easier, faster and more effective.

Check out the Telco SBOM Guide (Written Document):

Access the Validator Code:

Usage

usage: python3 openchain-telco-sbom-validator.py [options] input

positional arguments:
  input                 The input SPDX file.

options:
  -h, --help            Shows this help message and exits.
  --debug               Prints debug logs.
  --nr-of-errors NR_OF_ERRORS
                        Sets a limit on the number of errors displayed.
  --strict-purl-check   Runs a strict check on the given purls. The default behaviour is to run a non strict purl check what means that it is not checked if the purl is translating to a downloadable URL.
  --strict-url-check    Runs a strict check on the URLs of the PackageHomepages. Strict check means that the validator checks also if the given URL can be accessed. The default behaviour is to run a non strict URL check what means that it is not checked if the URL points to a valid page. Strict URL check requires access to the internet and takes some time.')

Installation of prerequisites

This script is written in python and uses a requirements.txt to list its dependencies. To install python on an Ubuntu environment run sudo apt install python3-pip.

It is usually a good practice to install Python dependencies to a Python virtual environment. To be able to manage virtual environments you need to install venv with sudo apt install python3-venv.

If you do not have a virtual environment yet cretate it with python3 -m venv .env and install the dependencies with pip3 install -r requirements.txt, if you already have a virtual environment start it with . .env/bin/activate.

License

This software is Copyright Nokia and is licensed under the Apache 2.0 license.

Issues and contributions

In case of any issues please create a GitHub issue, while also any contributions are warmly welcome in the form of GitHub merge requests.

OpenChain Automotive Workshop – Stuttgart – 2024-09-10

By Featured, News

Agenda

  • 13:00: Opening and introductions
    • 13:00: ‘Opening Greeting’
      • by Shane Coughlan, OpenChain and Marcel Kurzmann, Bosch
    • 13:05: ‘Round Table Introductions’
    • 13:15: ‘Welcome back to the Automotive Work Group, and update from Toyota’
      • by Masato Endo, Toyota
  • 13:35: Government Regulations / Activity Impacting the Automotive Supply Chain
    • 13:35: ‘How OpenChain standards can help manage the United States Executive Order, NTIA Minimum Requirements, CRA and more’
      • by Shane Coughlan, OpenChain Project
    • 13:40: ‘The OpenChain SBOM Study Group – Rationale and Relevance’
      • by Shane Coughlan, OpenChain Project
    • 13:45: ‘The OpenChain AI Study Group – Rationale and Relevance’
      • by Shane Coughlan, OpenChain Project
    • 13:55: ‘The current EU approach to support the Open Source collaboration in the European Automotive Industry’
      • by Detlef Zerfowski, ETAS
    • 14:15: ‘Impact of regulations in the US on the Open Source Software Supply Chain’
      • by Russ Eling, OSS Consultants*
  • 14:35: Pause for coffee and networking
  • 14:55: End-to-End (Open Source) Software Management
    • 14:55: ‘The supply chain, tool updates and the need for standardized interfaces’
      • by Marcel Kurzmann, Bosch
    • 15:15: Catena-X / Eclipse Tractus-X – ‘Are there potential synergies with the OpenChain community and is there a supply chain for data?’
      • by Lars Geyer-Blaumeiser, Cofinity-X
    • 15:35: Linux ELISA / SPDX Safety Profile – ‘How can the safety requirements be covered along the automotive software supply chain?’
      • by Nicole Pappler, AlektoMetis
  • 15:55: Pause for coffee and networking
  • 16:25: Software Defined Vehicle
    • 16:25: Automotive Grade Linux and Software Defined Vehicle – Status and next steps
      • by Dan Cauchy, AGL
      • by Jan-Simon Moller, AGL
    • 16:45: Eclipse Kuksa and Software Defined Vehicle – Status and next steps
      • by Sebastian Schildt, ETAS
  • 17:15: Review of Core Topic
    • 17:15: ‘ISO/IEC 5230, ISO/IEC 18974 and ISO/IEC 5962 – How updates to international standards for open source license compliance, security assurance and SBOM impact the automotive supply chain’
      • Panel: Shane Coughlan @ OpenChain Project, Marcel Kurzmann @ Bosch, Masato Endo @ Toyota
  • 17:30: Open discussion and future planning
  • 17:40: A Brief Word on Open Source IP Yesterday and Today
    • by Keith Bergelt, Open Invention Network
  • 17:55: Close and Goodbye

OpenChain China Regular Meeting #2 – 2024-08-30 @ 06:30 UTC

By News

Our second regular meeting of the OpenChain China Work Group is taking place today, hosted by our current Chair ZhenHua Sun of ByteDance, and featuring a series of talks from community members.

Agenda

  • 14:30 Opening and welcome
  • 14:40 Introductions
  • 15:00 Current status of OpenChain – (Shane)
  • 15:20 – OpenChain Telco SBOM Guide: From Specification, Schema, Tool to Example in ByteDance (David, ByteDance) ;讲师文档:https://bytedance.larkoffice.com/docx/VWxGdskjuonS01xMmqUcgIWonsc
  • 15:50 Coffee break
  • 16:20 Building an OS community from scratch, (Qiuyue)
  • 16:50 Open Source Security Management in Ant Group (Kan Yu)
  • 17:20 Coffee break
  • 17:50 Space for discussion
  • 18:00 Closing
  • Dinner(for free)

Registration

Fun Stuff

David from ByteDance, our local 3D printing fan, made soon pretty cool keychains for attendees! There are also plushy toys and other swag for all to grab 🙂

First Regular SBOM Study Group Meeting – 2024-09-25 – 08:00 UTC

By News

In this first regularly scheduled SBOM Study Group meeting, Marc-Etienne Vargenau of Nokia will discuss the Telco SBOM Guide. This guide, which defines what a quality SBOM looks like in the context of the telecommunications industry, is a document released by the OpenChain Project earlier this year. It has set the foundation for broader discussions about cross-industry and cross-format SBOM quality determination and management.

Check Out Our Previous Kick-Off Meeting

Check Out The Published Telco SBOM Guide

Join Our Mailing List

Use Our Global Calendar To Dial Into The Call

Webinar: Implementing OpenChain ISO/IEC 5230 at endjin + Further Research on OpenChain ISO/IEC 18974

By community, licensing, News, security, standards, Webinar

Recent computer science graduate Charlotte Gayton shared her journey of implementing the OpenChain standard during her Year in Industry (ISO/IEC 5230) and her dissertation project (ISO/IEC 18974). She discussed the challenges she faced and the solutions she developed to achieve compliance. The session will provide a unique perspective on navigating OpenChain from the viewpoint of someone early in their career. Her work lead to the detailed case study recently published regarding OpenChain ISO/IEC 5230 adoption by endjin.

Watch the Recording:

View the Slides:

More About Our Webinars:

This event is part of the overarching OpenChain Project Webinar Series. Our series highlights knowledge from throughout the global OpenChain eco-system. Participants are discussing approaches, processes and activities from their experience, providing a free service to increase shared knowledge in the supply chain. Our goal, as always, is to increase trust and therefore efficiency. No registration or costs involved. This is user companies producing great informative content for their peers.

Check Out The Rest Of Our Webinars

This OpenChain Webinar was broadcast on 2024-08-08.

OpenChain Education Work Group Meeting – 2024-08-07 – Full Recording

By News

The Education Work Group held a hybrid in-person and virtual event on the 7th of August. There has been a lot of work around improving and expanding reference material to support companies adopting OpenChain standards or improving their open source business process management in general. One of the interesting topics recently has been momentum around providing reference capability maturity modeling material, with official partners such as Orcro and now Deloitte providing commitments towards a CC-0 licensed beta document for the Open Compliance Summit in Japan at the end of October.

Watch the Recording:

Be part of this:

You can get involved with the OpenChain Education Work Group through their dedicated mailing list. At this link, you will also find connections to other working groups around the world:

Samsung SDS Announces an OpenChain ISO/IEC 18974 Conformant Program

By Featured, News

Samsung SDS has adopted OpenChain ISO/IEC 18974, the international standard for open source security assurance. This builds on their previous adoption of OpenChain ISO/IEC 5230 in July 2022.

The adoption of ISO/IEC 18974 by Samsung SDS was supported by resources provided by the Linux Foundation’s OpenChain Project, founded in 2016, which maintains self-certification checklists and other materials to help global companies develop enhanced open source management processes.

“We are delighted to continue our relationship with the Samsung SDS team around the adoption and use of OpenChain standards for open source process management,” says Shane Coughlan, OpenChain General Manager. “The Samsung SDS team have long been involved with the OpenChain Korea Work Group, and provide an excellent example of how a company can have a measured, effective approach to community engagement, best practice adoption, and excellence in customer support.”

About Samsung SDS

Samsung SDS is the digital arm of the Samsung group and a global provider of cloud and digital transformation­ innovations. Samsung SDS delivers enterprise-grade solutions and services in cloud, secure mobility, analytics / AI, digital marketing and digital workspace. They enable customers in government, financial services, healthcare, and other industries to drive business in a hyper-connected economy, helping them to increase productivity, safeguard assets, and make smarter decisions.

About the OpenChain Project

The OpenChain Project has an extensive global community of over 1,000 companies collaborating to make the supply chain quicker, more effective and more efficient. It maintains OpenChain ISO/IEC 5230, the international standard for open source license compliance programs and OpenChain ISO/IEC 18974, the industry standard for open source security assurance programs

About The Linux Foundation

The Linux Foundation is the world’s leading home for collaboration on open source software, hardware, standards, and data. Linux Foundation projects are critical to the world’s infrastructure, including Linux, Kubernetes, Node.js, ONAP, PyTorch, RISC-V, SPDX, OpenChain, and more. The Linux Foundation focuses on leveraging best practices and addressing the needs of contributors, users, and solution providers to create sustainable models for open collaboration. For more information, please visit us at linuxfoundation.org.

Case Study: Implementing the OpenChain Specification @ endjin

By News
Implementing the OpenChain Specification
Charlotte Gayton

By Charlotte Gayton
Apprentice Engineer

As my industrial placement year comes to an end, so does my time with the OpenChain project. Over the past year I have worked through the full lifecycle of a project, from ideation & envisioning, to planning, implementation, testing, deployment, roll-out and maintenance to ensure that we are open-source license compliant, and therefore meet the specifications of the OpenChain project. Starting with discovering what open-source licenses are, to collecting and processing data, and finally to displaying it in a way that is useful to the company. In this blog I am going to take you through the processes and explain how we adapted and created our processes in checking our whole codebases components.

You can also read the Case Study as a Slide Deck:

What is OpenChain

The OpenChain Project was set up by the Linux Foundation to create an ISO standard for creating, using and modifying open-source software. Creating an ISO standard like this is important as more software is being created and used extensively without having a clear baseline of quality or structure. During the course of the year, the aim of the OpenChain project has expanded. Initially about license compliance, then it included threats, and now it’s looking at open source contribution process management.

Organisations that have OpenChain have greater control and knowledge about how their open-source software is being created or used, meaning they have a reduced risk of misuse and therefore less legal risk.

If you would like some extra information on what OpenChain is, the risks behind open-source software, and a more in-depth explanation of the specification, I have written a series of blogs:

Discovery

The first few weeks of the project I worked on getting familiar with the different terminology involved around licensing. This included the different types of open-source licensing:

  • Permissive – a free-software license that involved minimal restrictions on using the software e.g. MIT or Apache 2.0
  • Weak Copyleft – allow the code under the license to be combined with code under more permissive licenses without imposing the full copyleft requirements on the entire combined work e.g. LGPL
  • Strong Copyleft – any derivative of the software must also be distributed under the same strong copyleft terms

I learnt how compliance means you are satisfying the rules of the licenses you are using. I also started to understand the risks involved:

  • Legal issues
  • Ethical issues impacting companies reputation
  • Data breaches due to software vulnerabilities

If you want to know more about licensing and compliance I recommend taking these two courses:

Generating SBOMs

A software bill of materials (SBOM) is a list of all software components that make up a piece of software. Having an SBOM means that you can track all the components required to run your code and ensure they satisfy the license rules. I wrote about SBOMs in greater depth in part 3 of my OpenChain blog series.

We wanted to generate SBOMs for all our repositories across all our codebases so we used a tool called Covenant, created by Patrik Svensson, which generates an SBOM from either a directory or a specific file from .NET 5, .NET 6, .NET Core or NPM projects. To get the SBOM you simply have to run one command:

dotnet covenant generate 

We have CodeOps processes to manage our build scripts across our GitHub Organisations and repositories meaning we can easily modify the tasks that run within the build. This meant we could easily add in Covenant to generate our SBOMs.

There are a growing number of SBOM formats, however the two most popular: CycloneDx and SPDX are the formats that Covenant will generate. The current versions don’t add extensibility but Patrik (creator of Covenant) added a custom format which allowed us to capture more custom metadata, for example the branch that was being reviewed and the GitHub organisation it was being taken from.

The SBOMs are stored as JSON files which contain both a Components and Dependencies section. The Components section contains each component that has been identified along with information about it’s license. The Dependencies section brings together all the components and lists for each component which dependencies it has, which when mapped out would create a big tree-like structure of dependencies.

Below is a cut down version of what an SBOM could look like, you can see there is basic metadata at the top, then the components and dependencies section:

{
  "Name": "ExampleProject",
  "Version": "1.0.0",
  "ToolVendor": "Covenant",
  "ToolVersion": "1.2.3",
  "Metadata": [
    {
      "Key": "git_sha",
      "Value": "a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6"
    },
    {
      "Key": "git_branch",
      "Value": "main"
    },
    {
      "Key": "git_repo",
      "Value": "exampleuser/example-repo"
    }
  ],
  "Components": [
    {
      "Data": "2.0.0",
      "UUID": "a1b2c3d4-e5f6-4g7h-8i9j-k0l1m2n3o4p5",
      "Purl": "pkg:toolco/dotnet/example@2.0.0",
      "Name": "example",
      "Version": "2.0.0",
      "Groups": [
        "ExampleProject.sln"
      ],
      "IsRoot": true
    },
    {
      "Data": "4.7.0",
      "UUID": "24b55bc8-f9dd-481c-b43f-bd8d45f0410c",
      "Purl": "pkg:nuget/System.Security.Principal.Windows@4.7.0",
      "Name": "System.Security.Principal.Windows",
      "Version": "4.7.0",
      "Kind": "Library",
      "Copyright": "© Microsoft Corporation. All rights reserved.",
      "Hash": {
        "Algorithm": "SHA512",
        "Content": "F30A16D34C8792DB60B2240363A8B200CAB28BC2C7441405CF19ABF71DBF5FB0BF3BD1CBEC4D9B5EB4CF73EC482E4505D08D80AFDEF00B2B4B3BB56D6D4CAE96"
      },
      "License": {
        "Id": "MIT",
        "Name": "MIT License",
        "Url": "https://licenses.nuget.org/MIT"
      }
    }
  ],
  "Dependencies": [
    {
      "Purl": "pkg:toolco/dotnet/example@2.0.0",
      "Dependencies": [
        "pkg:nuget/MarkdownUtils@1.0.0",
        "pkg:nuget/ToolExtensions@2.0.0",
        "pkg:nuget/ThirdPartyLib@3.0.0"
      ]
    },
    {
      "Purl": "pkg:nuget/ExampleLibrary@3.1.2",
      "Dependencies": [
        "pkg:nuget/Microsoft.Build.Tasks.Git@1.1.1"
      ]
    },
    {
      "Purl": "pkg:nuget/ExampleLibrary.GitHub@3.1.2",
      "Dependencies": [
        "pkg:nuget/ExampleLibrary@3.1.2",
        "pkg:nuget/Microsoft.SourceLink.GitHub@1.1.1"
      ]
    }
  ]
}

Processing the SBOMs

Now that we were able to generate our SBOMs, we stored them in our CodeOps Azure Data Lake Storage Gen2 instance in Azure. We are keeping all historic versions so that if we ever need to look back we have the original SBOMs. As JSON is structured data but doesn’t have a schema, we needed to be able to process and clean it so we could extract the parts that were most important to us.

This is when we created the SBOM Wrangler, a PySpark notebook in Azure Synapse Analytics that would load in the data, transform it and then save it back to the datalake. The main benefit of using PySpark is making use of being able to horizontally partition the work, meaning we can process large amounts of data really quickly. To do this we had to design our code in a certain way (e.g. not using for loops) so that it was easily able to be partitioned.

As we had nested data in our JSON file, it made it more difficult to convert the data to be more structured. This was because we had to follow the schema it had, meaning if we imported a different format of SBOM, our wrangler wouldn’t work. In order to make use of the partitioning we used a function called explode which explodes our nested array into individual rows:

Components = df.select(
    explode(col('Components')),
    col('Name'),
    col('src_platform'),
    col('org'),
    col('repo'),
    col('file_timestamp'))

Below is an example of the table that we were generating, along with the column names and types of data we were using:

NamePurlUUIDLicense_IDLicense_NameLicense_UrlVersionisRootDataKindHash_AlgorithmHash_ContentCopyright
PySparkExamplepkg:covenant/dotnet/PySparkExample@1.0.0fd90c2b1-4450-4c72-b3f2-f395ba1617491.0.0true1.0.0
Newtonsoft.Jsonpkg:nuget/Newtonsoft.Json@13.0.2e2701915-6fcc-4a4c-aa5e-f068b43bebe4MITMIT Licensehttps://licenses.nuget.org/MIT13.0.213.0.2LibrarySHA512D743AE673BAC17FDBF53C05983DBA2FFDB99D7E6AF8CF5FE008D57AA30B6C6CA615D672C4140EEC516E529EB6AD5ACF29C20B5CC059C86F98C80865652ACDDE1Copyright © James Newton-King 2008

Creating scores for each SBOM

Now that we had collected the data we needed about each repository, we were going to apply some business logic to it which would give us a score for each SBOM. Our business logic included which licenses we were going to allow to be used, which ones we weren’t, and for the case in which they didn’t have a license, would we allow the given copyright notice.

Policy Hierarchy

From here we created a policy hierarchy. Policies may need to be tailored for individual repositories but this would be onerous to manage on a per-repository basis. So, using three categories: repository, organisation and company. For each different level, if there exists a policy that is lower down, that takes precedence over the higher level. For example, if there is a repo-specific policy, that will override the org-specific policy.

company

org

repo

We stored this information in our OpenChain repository under a set of different files. This allowed us to be able to access it from any of our other repositories and create a workflow to change it into a JSON file. Here is the structure of the file system:
.
├── company
│ └── company-level.yaml
├── organisation
│ ├── org-level-endjin.yaml
│ └── org-level-corvus-dotnet.yaml
└── repository
└── repo-level-corvus-extensions.yaml\

Each yaml file followed the same structure. Here is an example file:

### Corvus License rule set
accepted: 
  - Apache-2.0
  - BSD-2-Clause
  - BSD-3-Clause
  - MIT
  - AGPL-3.0-or-later
rejected:
  - GPL-2.0
  - GPL-3.0
copyright:
  - endjin
  - microsoft
  - corvus
overrideAccept:
  - Marain.Services.Tenancy
overrideReject:
  - Newtonsoft.Json

Each different category contributes to creating a score:

  • accepted: lists the licenses for this policy that are alright to use
  • rejected: lists the licenses for this policy that would break compliance and are not allowed
  • copyright: lists the key words identified in a copyright notice that would allow the component to be used
  • overrideAccept: lists the components that are always allowed, no matter their license
  • overrideReject: lists the components that are never allowed, no matter their license

We decided to add functionality to override the license policy for certain components as we found we were catching out some licenses that were our own. The Marain.Net open-source repositories are licensed with the AGPL 3.0 license which is a strong copyleft license. It enforces open source on all derivative work that uses the component. Generally, we don’t want to use copyleft licenses as it may require us to open source some code that we wish to keep as proprietary software rather than permissive. However, as it’s our own code, we are fine using it, meaning we want to override our current rules and accept those specific Marain repositories that have the AGPL 3.0 license.

To use this data later on in our SBOM Analyser we needed it to be in JSON form, so created a workflow to run a script that would pull together all the yaml files and save it as a JSON file to the datalake.

SBOM Analyser

We wanted to be able to produce scores for our SBOMs. The scores represent how many accepted, rejected and unknown components there are for each repository (SBOM). To determine whether they are accepted or not, it is checked against the policy for that specific repository. If there exists a policy for the repository, then that is used, else if there is a GitHub organisation policy, that is use, else there is a default company level policy as a default when no other has been defined. If a component exists in teh accepted list, it’s accepted, same for rejected, if it’s not on either list it checks for a specific match from the copyright section against the copyright notice. If none of those match, it gets assigned unknown.

We want to be able to produce scores for our SBOMs in two different places:

  1. Synapse: generating scores for all the data we have produced at once
  2. GitHub: generating a score for an SBOM in a PR when build runs to see if there are any breaking changes Pandas is now supported by PySpark so we were able to design a package that would run in both basic Python code and in PySpark (still making use of the horizontal partitioning). To do this, again similar to the SBOM Wrangler, we had to ensure we didn’t use for loops and instead other functions that could be partitioned out, whilst still working in normal Python.

We packaged the code up using poetry which allowed us to list the dependencies we needed and bundle it up into a .whl file which could be used in both our build workflows and in Synapse.

The tool consists of three separate parts:

  1. SBOM Wrangler:

When using the tool in the build to instantly generate a score, we need to have the SBOM in the right format, so this SBOM wrangler is the pure Python version of the SBOM Wrangler we have in Synapse. This part of the tool will only get used in GitHub as we already have our SBOMs wrangled in Synapse

  1. Ruleset Formatter

The ruleset formatter takes the JSON ruleset that is generated by the policy hierarchy manager and restructures the data into dataframes so it can be used by the SBOM Scorer

  1. SBOM Scorer

The SBOM Scorer is the main ‘brain’ of the tool. It uses the information generated by the other parts of the tool to score either singular or multiple SBOMs depending on how many accepted or rejected licenses they have.

First, the SBOM data is read in as a Pandas DataFrame. We merge the different policy hierarchy level’s data using left merging, first with the repository level. If the name of the repository matches any of the policy names, it gets merged in. Then the same with organisation level, filling the gaps. And finally if there are any empty spaces, the company level table is merged in.

We then create an extra column for each of the categories in the policies ‘accepted’, ‘rejected’, ‘copyright’, ‘overrideAccept’ and ‘overrideReject’ which checks for each component whether it can find a match in each category or not.

raw_data['licenseAccepted'] = raw_data.apply(lambda x: x['License'] in x['accepted'], axis=1)
raw_data['licenseRejected'] = raw_data.apply(lambda x: x['License'] in x['rejected'], axis=1)
raw_data['copyrightAccepted'] = raw_data.apply(lambda x: any(item in x['copyright'] for item in x['CopyrightSplit']) if isinstance(x['CopyrightSplit'], (list, tuple)) and x['CopyrightSplit'] is not None else False, axis=1)
raw_data['override-accepted'] = raw_data.apply(lambda x: x['Name'] in x['override-rejected'] if x['override-rejected'] is not None else False, axis=1)
raw_data['override-rejected'] = raw_data.apply(lambda x: x['Name'] in x['override-accepted'] if x['override-accepted'] is not None else False, axis=1)

These columns then get sorted logically into one column called ‘sorting’ which, for each component, lists whether it is accepted, rejected or unknown:

raw_data['scoring'] = raw_data.apply(
    lambda x: 'Accepted' if x['override-accepted'] else
    'Rejected' if x['override-rejected'] else
    'Accepted' if x['licenseAccepted'] else
    'Rejected' if x['licenseRejected'] else
    'Accepted' if x['copyrightAccepted'] else
    'Unknown',
    axis=1
)

The ‘override’ checks go first, having the highest weighting.

The scores are then summarised into a CSV table similar to below:

Repo NameAcceptedRejectedUnknown
Example 1153232
Example 2234712

Outputted alongside the scores are the unknown and rejected tables:

ComponentLicenseCopyright
Example Component 1AGPL
Example Component 2GNU General Public License

If there are any rejected components in our GitHub build, then it is failed. If there are unknowns then warnings will be thrown.

Using the Scores

Now that we had our scores, we had to find a way to either display them or create an alert saying that something wasn’t right.

Backstage

Backstage is an open source framework for building developer portals which was created by Spotify. Endjin uses it as it provides a unified view over all of the separate github organisations and repositories, so we decided it would be a good place to output our scores.

Backstage uses a plugin-based architecture. It allows you to create custom plugins to add new features and integrations. These plugins can integrate with current internal tools you use, or with third-party services. We brought in data from our Azure Data Lake Storage Gen2 account, and passed the data into tables to be displayed on both of the pages.

Backstage include UI customisation that you can align with your organisation’s brand preferences, whilst still making it easy to have consistency between different people working on different pages.

Overview Page

The first page we have in Backstage is the ‘SBOM Analysis’ summary page. I created this page and added components to it that Backstage provide as a template, these are like building blocks. Backstage offer a Storybook page that demonstrates the different components usages. The image below shows the tab to the left with the ‘SBOM Analysis’ page. This shows, from a top level view, all the repositories that are being checked, and how many rejected, accepted or unknown components they are using.

This makes it easy to spot the rejected components if scanning across the repos, as you can sort the rejected column from highest to lowest amount of rejected components.

Backstage SBOM Analysis summary page including a table with columns: repository, accepted, rejected and unknown

Repository specific page

The second type of page we have in Backstage are the repository specific pages which give more information about each repository. I built this page also using the components from the Backstage Storybook.

Below is an example of one page in Backstage. We have the summary section, similar to the overview page, which contains the ‘accepted’, ‘rejected’ and ‘unknown’ counts. Below are the unknown and rejected components tables which display the components, their license and copyright. These are the files generated with the scores from the ‘SBOM Analysis’ package.

Backstage repository specific SBOM Analysis page containing a summary of how many accepted, rejected and unknown components there are. Below are two tables: rejected components and unknown components which show which components have the problems, and what these are

Slack message

In order to communicate changes to repositories, and any errors that might be flagged up, we set up a Slack app which will send daily messages to a channel with updates on the current state of the codebase.

I built this into my Synapse notebook ‘SBOM Analyser’, which generates the scores using the SBOM Analyser Python package.

We use the Python package ‘requests’ to post the response to Slack. You can build up your message in the body variable using information you’ve generated previously in the notebook. This runs last in the SBOM Analyser notebook so can access all the information generated by the SBOM Analyser package.

import requests
from azure.keyvault.secrets import SecretClient
if rejectedSlack == 0:
    icon = ':white_check_mark:'
else:
    icon = ':x:'
url = webhookurl
body = {
    "blocks": [
        {
            "type": "section",
            "text": {
                "type": "mrkdwn",
                "text": f"OpenChain SBOM status summary:\n{icon} There are {rejectedSlack} rejected components {icon}\n\n• *Accepted*: {acceptedSlack}\n• *Unknown*: {unknownSlack}\n\n<[URL to Backstage]| More details in Backstage>"
            }
        }
    ]
}
response = requests.post(
    url,
    json = body
)
print(response.text)

Below is a screenshot of our Slack channel #openchain-notifications which gives a daily update on the status. If there aren’t any rejected components then the message will have green ticks.

Screenshot of slack channel that contains the OpenChain notifications

Alternatively, if there are some rejected components the message will present itself with red crosses. I made the decision to structure the message like this as it was short and consise, with obvious icons showing whether it’s passed or failed. This means if there is a problem, it’s less likely to be missed as it’s eye catching.

Screenshot of Slack channel that contains a failing OpenChain notification

IMM (IP Maturity Matrix)

Endjin has developed a framework called the IP Maturity Matrix (IMM) that measures the engineering practices ‘quality’ of a repository, for example how much of it is documented or how much code is covered by tests. As part of the OpenChain project, we wanted to link in OpenChain to the IMM as a new category, so that it would show whether a repository generates an SBOM.

IMM scores for each repository are stored in an imm.yaml file, with these scores being manually created and updated. However, as we wanted to display whether an SBOM is being created and checked for each repository, we were automatically creating the data we needed, so had to create a process which would update these files for us. This was especially important; as it isn’t a one-time update, our data is dynamic and can be updated from day to day.

Having already got processes from our build script manager that can update files across all our repositories, we decided to repurpose this for the new functionality. I rewrote the logic to instead target the imm.yaml files, if that repo had them, and update the SBOM scores we were accessing from our Azure Data Lake Storage Gen2. Then went through and set up the workflow, so that it can run daily on its own as a GitHub actions workflow. Now it updates all the yaml files to give insight into each one, whether they have an SBOM or not.

These IMM scores get added to the README for each repository, so if you’re looking to use the open-source library, or need some information about it, you can get a quick overview of each one. This includes our SBOM score, meaning it can increase confidence in the code, knowing that the repository is being checked.

100% means that the repository is generating an SBOM, and it is being checked. 0% means that an SBOM isn’t being generated.

Below is an image of what a repository could look like with an SBOM IMM score:

Example of IMM scores

Wrapping up

Finishing my Year in Industry, I am wrapping up my work with the OpenChain project. We covered the ETL (Extract Transform Load) process: extracted and ingested our data, visualised our data, and used it against business rules to gain insight on the whole of our codebase. We added additional output locations, such as Slack messages and displaying as IMM scores, so we get instant feedback without having to look for it. We will get notifications through Slack when compliance is breached, meaning we are significantly reducing the risk of missing a threat to our license compliance.

Now we are able to fully track and manage our open-source license usage, satisfying one of the main points in the OpenChain specification. I hope to build on this work, by implementing OpenChain ISO/IEC DIS 18974 – the industry standard for open source security assurance programs, for my final year project in my upcoming final year at the University of York, where I will be completing my Bachelor of Engineering in Computer Science.