How to connect the Trust Agent to your source code management tool

We offer multiple approaches to connecting the SCW Trust Agent to your Git-based source code management tool. Each approach has its benefits and limitations, and we can help you make a decision based on your organization’s specific environment and policies.

All approaches store commit metadata only, including:

Commit author name
Commit author email address
Timestamp of commit
List of file extensions modified in the commit
List of programming languages and frameworks associated with the commit based on the modified files

This summary data is stored by the SCW Trust Agent for the purposes of generating dashboards and reports, and to perform automation actions such as optimizing your Secure Code Warrior learning pathways or applying your configured policy.

Source code is inspected to facilitate accurate framework detection but is never stored. Only the results of the analysis is stored in the form of file extension or language and framework lists as described above.

On-Premises

This approach provides a combination of flexibility and risk mitigation by running a container within your environment that is configured and managed by your team. This offsets the risk of having a third party platform connected directly to your source code management tool.

The On-Premises Container is configurable and can work with both cloud-based and on-premises source code management platforms. The On-Premises Container can be run at a regular frequency to automatically connect to designated repositories and push summarised data back to the SCW Trust Agent. At no point does your organization’s source code, or keys to access your organization’s source code, leave your environment under this approach.

Please follow these steps to run the On-Premises Container within your environment

We have published the On-Premises Docker image at ghcr.io/securecodewarrior/trust-agent:1
The On-Premises Docker image can be run in your container platform of choice, e.g. Docker, Kubernetes, etc, or within your CI pipeline with Docker container support
Our customers typically use one of the following three deployment approaches:
- Run the container within each repository's CI pipeline - This approach typically requires a common CI project or workflow to be shared by all your repositories to be scalable but if this is available, it provides a neat solution to configure the container for all repositories so that they individually run the container to import data. The specific method will depend heavily on the CI approach and tooling adopted by your organization - the only requirement is that the CI tooling has Docker container support.
- Run the container centrally in your organization's container platform - This approach allows the container configuration and execution to be managed from a single point, where it is configured to connect to all your Git platforms and batch import data from all repositories in bulk. However, container disk space and network bandwidth can potentially be an issue if there are a large number of repositories or if the repositories are large in size. A persistent file system volume for the container is recommended to limit network traffic to commit deltas and reduce run times. The WORKDIR environment variable described below allows you to specify the persistent mounted volume as the container working directory.
- Run the container on a schedule from a laptop or workstation - This is usually the simplest approach but requires the individual's laptop or workstation to be available to continue running regular imports. It can be run adhoc on a manual schedule or automated using a tool such as cron or task scheduler.
The container must be configured with:
- Network connectivity to your Git repositories via HTTPS
- Network connectivity to subdomains under prod.securecodewarrior.com
- Sufficient disk space for cloning your Git repositories
  - A persistent volume can be mounted so that each container run is an incremental pull rather than a full repository clone, reducing run time and network bandwidth. The WORKDIR environment variable described below allows you to specify the persistent mounted volume as the container working directory.
  - Alternatively, if disk space is limited, the PROGRESSIVE_CLEANUP environment variable described below can be used to reduce the disk space required to only the largest repository you are importing.
The following environment variables must be passed to the container when run:
- SCW_API_KEY - Your SCW Admin API key. Follow this guide to create your Admin API key.
  - Alternatively SCW_API_KEY_FILE containing the path to a file containing the Admin API key
- SCW_API_URL - Your SCW API endpoint URL. Please head to Trust Agent Configuration > Git Connections > Add Provider > On-Premises > Setup Details for your specific endpoint URL. An example is shown below:
  
  REPO_URLS - A comma separated list of Git repository URLs with no spaces. e.g. https://git.local/projects/project1.git,https://git2.internal/projects/projectX.git
  - Please note currently only HTTPS Git repository URLs are supported, e.g.
  - An alternate method to providing credentials in the URL is provided - please see below.
  - This environment variable can be omitted if repository autodiscovery (see below) is being used.
The following environment variables are optional and can be passed to the container when run to control specific behaviour:
- GIT_USERNAME- An alternate method for providing credentials. This allows you to specify the username that will be passed to the repository during authentication.
  - Note: This will pass the same set of credentials to all repositories listed in REPO_URLS. If different credentials are required for different repositories, a separate container run will be needed for each set of credentials.
- GIT_PASSWORD- An alternate method for providing credentials. This allows you to specify the password that will be passed to the repository during authentication. In most Git platforms, API keys and access tokens can be provided using GIT_PASSWORD without needing to provide GIT_USERNAME.
  - Alternatively GIT_PASSWORD_FILE containing the path to a file containing the Git password
  - Note: This will pass the same set of credentials to all repositories listed in REPO_URLS. If different credentials are required for different repositories, a separate container run will be needed for each set of credentials.
- WORKDIR - Specify the working directory that the container will use for storing Git repositories during analysis. This can be used in conjunction with a persistent mounted volume (e.g. using -v or --volume) to reduce network bandwidth usage and run time by persisting cloned Git repositories across multiple container runs. This results in the container only pulling incremental commit data for repositories that have already been cloned in a previous run.
- SKIP_CERTIFICATE_CHECK - Set this to trueto skip certificate verification. This may be required when connecting to systems with internally issued SSL/TLS certificates as the root and issuing certificates are not public. Some scenarios where this may happen are outlined below:
  - Connecting to internal source code management servers with internally issued SSL/TLS certificates. Note that in this scenario the transport remains encrypted and the internal nature of the traffic greatly reduces the chance of network attacks that this verification is designed to prevent.
  - Connecting to SCW servers where SSL/TLS inspection or interception is being performed and SCW SSL/TLS server certificates are being dynamically replaced with internally issued SSL/TLS certificates.
- PROGRESSIVE_CLEANUP - Set this to true to configure the container to delete each repository after it has been processed and imported. This minimises the disk space requirements when a large number of repositories (e.g. from autodiscovery) are being imported. Note: Use of this option precludes the ability to mount a persistent disk volume to reduce network bandwidth usage and run times across container runs as described above.
The following environment variables are optional and enable GitHub App authentication using a custom GitHub App installed in your GitHub instance. Please note that if GITHUBCLOUD_TOKEN is provided (via environment variable or file), it takes precedence over GitHub App authentication credentials. To use GitHub App authentication, please ensure GITHUBCLOUD_TOKEN is unset.
- GITHUB_APP_ID - The App ID assigned to your GitHub App.
  - Alternatively GITHUB_APP_ID_FILE containing the path to a file containing the App ID
- GITHUB_INSTALLATION_ID - The unique Installation ID for the App installed on your Organization or User account.
  - Alternatively GITHUB_INSTALLATION_ID_FILE containing the path to a file containing the Installation ID
- GITHUB_PRIVATE_KEY - The private key in raw PEM format for the GitHub App.
  - Alternatively GITHUB_PRIVATE_KEY_FILE containing the path to a file containing the private key in PEM format
The following environment variables are optional and control repository autodiscovery behaviour:
- EXCLUDE_REPOS - A comma-separated list of repository URL patterns to exclude from processing. This allows you to filter out repositories that you don't want to import data for. Supports both full URLs and URL substrings (e.g. user/repo). Pattern matching supports wildcards - use * to match any number of characters (e.g. company/internal-* or */archived-*). Please note that this works on the URL string, which may contain slightly different repository names than their display versions.
- REPO_PROVIDERS - Specify the repository provider to autodiscover repositories from as a comma separated list. e.g. REPO_PROVIDERS=bitbucketdatacenter,githubcloud,gitlabcloud
  - Please include the environment variables below that correspond to your specified repository providers
  - Note that only Bitbucket Data Center (bitbucketdatacenter), Bitbucket Cloud (bitbucketcloud), GitHub Cloud (githubcloud), GitLab Cloud (gitlabcloud), GitLab On-Prem (gitlabonprem) and Azure DevOps (azuredevops) are currently supported but additional providers will be added over time
- Bitbucket Data Center
  - BITBUCKETDATACENTER_HOST- Specify the base Bitbucket Data Center API URL to use for repository autodiscovery. e.g. BITBUCKETDATACENTER_HOST=https://bitbucket.corp.internal Note that this value MUST start with http or https and MUST NOT end with /
  - BITBUCKETDATACENTER_USER- Specify the username to use for authenticating to the Bitbucket Data Center host. e.g. BITBUCKETDATACENTER_USER=somebody
  - BITBUCKETDATACENTER_TOKEN - Specify the authentication token to use for authenticating to the Bitbucket Data Center API host to use for repository autodiscovery. e.g. BITBUCKETDATACENTER_TOKEN=ATBBexampletoken1234
    - Alternatively BITBUCKETDATACENTER_TOKEN_FILE containing the path to a file containing the authentication token
  - BITBUCKETDATACENTER_CLONE_TOKEN - (Optional) Specify a different authentication token to be used for cloning discovered repositories, instead of the API token specified above
    - Alternatively BITBUCKETDATACENTER_CLONE_TOKEN_FILE containing the path to a file containing the authentication token
- Bitbucket Cloud
  - BITBUCKETCLOUD_HOST- Specify the base Bitbucket Cloud API URL to use for repository autodiscovery. e.g. BITBUCKETCLOUD_HOST=https://api.bitbucket.org/2.0 Note that this value MUST start with http or https and MUST NOT end with /
  - BITBUCKETCLOUD_USER- Specify the username to use for authenticating to the Bitbucket Cloud host. e.g. BITBUCKETCLOUD_USER=somebody This is typically the email address you use to log in to Bitbucket Cloud
  - BITBUCKETCLOUD_TOKEN - Specify the API token to use for authenticating to the Bitbucket Cloud API host to use for repository autodiscovery. e.g. BITBUCKETCLOUD_TOKEN=ATATTexampletoken1234 API tokens can be managed here
    - Alternatively BITBUCKETCLOUD_TOKEN_FILE containing the path to a file containing the authentication token
  - BITBUCKETCLOUD_CLONE_TOKEN - (Optional) Specify a different authentication token to be used for cloning discovered repositories, instead of the API token specified above
    - Alternatively BITBUCKETCLOUD_CLONE_TOKEN_FILE containing the path to a file containing the authentication token
  - Notes:
    - Please ensure the API token has the read:project:bitbucket, read:repository:bitbucket, read:workspace:bitbucket, and read:user:bitbucket scopes
- GitHub Cloud
  - GITHUBCLOUD_HOST - Specify the GitHub API URL to use for repository autodiscovery. e.g. GITHUBCLOUD_HOST=https://api.github.com Note that this value MUST start with http or https and MUST NOT end with /
  - GITHUBCLOUD_USER - Specify the organisation name or username to use for authenticating to the GitHub API. e.g. GITHUBCLOUD_USER=somebody
  - GITHUBCLOUD_TOKEN - Specify the authentication token to use for authenticating to the GitHub API host to use for repository autodiscovery. e.g. GITHUBCLOUD_TOKEN=ghp_exampletoken1234
    - Alternatively GITHUBCLOUD_TOKEN_FILE containing the path to a file containing the authentication token
  - GITHUBCLOUD_CLONE_TOKEN - (Optional) Specify a different authentication token to be used for cloning discovered repositories, instead of the API token specified above
    - Alternatively GITHUBCLOUD_CLONE_TOKEN_FILE containing the path to a file containing the authentication token
  - Notes:
    - You can find your organization name in the URL of your repositories (organization_name/repository_name)
    - The access token must be authorised with your Single Sign-On (SSO) organization if your organization is using SSO
- GitLab Cloud
  - GITLABCLOUD_HOST - Specify the GitLab API URL to use for repository autodiscovery. e.g. GITLABCLOUD_HOST=https://gitlab.com Note that this value MUST start with http or https and MUST NOT end with /
  - GITLABCLOUD_USER - Specify the group name or username to use for authenticating to the GitLab API. e.g. GITLABCLOUD_USER=somebody
  - GITLABCLOUD_TOKEN - Specify the authentication token to use for authenticating to the GitLab API host to use for repository autodiscovery. e.g. GITLABCLOUD_TOKEN=glpat_exampletoken1234
    - Alternatively GITLABCLOUD_TOKEN_FILE containing the path to a file containing the authentication token
  - GITLABCLOUD_CLONE_TOKEN - (Optional) Specify a different authentication token to be used for cloning discovered repositories, instead of the API token specified above
    - Alternatively GITLABCLOUD_CLONE_TOKEN_FILE containing the path to a file containing the authentication token
  - GITLABCLOUD_VISIBILITY_SCOPES: Comma separated list of visibilities. Options include, private, internal and public
    - eg: GITLABCLOUD_VISIBILITY_SCOPES=private,internal
    - WARNING: public will retrieve all public repositories and is not recommended for use on GitLab Cloud
  - Notes:
    - Please take the group name from the URL of your group, as the display name on the page can be different
    - Please ensure your authentication token has read_api and read_repository scopes
- GitLab On-Premises
  - GITLABONPREM_HOST - Specify the GitLab API URL to use for repository autodiscovery. e.g. GITLABONPREM_HOST=https://gitlab.corp.internal Note that this value MUST start with http or https and MUST NOT end with /
  - GITLABONPREM_USER - Specify the group name or username to use for authenticating to the GitLab API. e.g. GITLABONPREM_USER=somebody
  - GITLABONPREM_TOKEN - Specify the authentication token to use for authenticating to the GitLab API host to use for repository autodiscovery. e.g. GITLABONPREM_TOKEN=glpat_exampletoken1234
    - Alternatively GITLABONPREM_TOKEN_FILE containing the path to a file containing the authentication token
  - GITLABONPREM_CLONE_TOKEN - (Optional) Specify a different authentication token to be used for cloning discovered repositories, instead of the API token specified above
    - Alternatively GITLABONPREM_CLONE_TOKEN_FILE containing the path to a file containing the authentication token
  - GITLAB_VISIBILITY_SCOPES: Comma separated list of visibilities. Options include, private, internal and public
    - eg: GITLAB_VISIBILITY_SCOPES=private,internal
    - WARNING: public will retrieve all public repositories and is only recommended for use on GitLab On-Premises
  - Notes:
    - Please take the group name from the URL of your group, as the display name on the page can be different
- Azure DevOps
  - AZUREDEVOPS_HOST - Specify the Azure DevOps API URL to use for repository autodiscovery. This must include the “organization” or “collection” part of the URL. e.g. AZUREDEVOPS_HOST=https://dev.azure.com/ORGANIZATION_NAME (Azure DevOps Cloud) or AZUREDEVOPS_HOST=https://azdo.internal/COLLECTION_NAME (Azure DevOps Server). Note that this value MUST start with http or https and MUST NOT end with /
  - AZUREDEVOPS_USER - Specify the group name or username to use for authenticating to the Azure DevOps API. e.g. AZUREDEVOPS_USER=somebody
  - AZUREDEVOPS_TOKEN - Specify the authentication token to use for authenticating to the Azure DevOps API host to use for repository autodiscovery. e.g. AZUREDEVOPS_TOKEN=exampletoken1234
    - Alternatively AZUREDEVOPS_TOKEN_FILE containing the path to a file containing the authentication token
  - AZUREDEVOPS_CLONE_TOKEN - (Optional) Specify a different authentication token to be used for cloning discovered repositories, instead of the API token specified above
    - Alternatively AZUREDEVOPS_CLONE_TOKEN_FILE containing the path to a file containing the authentication token
Examples for passing environment variables to Docker are shown below. Please refer to the documentation for other container platforms.
- Passing environment variables via command line options
  - docker run --pull=always -e SCW_API_KEY='your_api_key' -e SCW_API_URL='https://trust-agent.prod-us.prod.securecodewarrior.com' -e REPO_URLS='https://service_user:personal_access_token@git.local/projects/project1.git,https://service_user:service_password@git.local/projects/project2.git,https://git2.internal/projects/projectX.git' ghcr.io/securecodewarrior/trust-agent:1
  - docker run --pull=always -e SCW_API_KEY='your_api_key' -e SCW_API_URL='https://trust-agent.prod-us.prod.securecodewarrior.com' -e REPO_PROVIDERS='githubcloud,gitlabonprem' -e GITHUBCLOUD_HOST=https://api.github.com -e GITHUBCLOUD_USER=somebody GITHUBCLOUD_TOKEN=ghp_exampletoken1234 -e GITLABONPREM_HOST=https://gitlab.corp.internal -e GITLABONPREM_USER=somebody -e GITLABONPREM_TOKEN=glpat_exampletoken1234 ghcr.io/securecodewarrior/trust-agent:1
- Passing environment variables via a file
  - docker run --pull=always --env-file env.list ghcr.io/securecodewarrior/trust-agent:1
HTTP proxy support is provided by the underlying JVM and can be enabled by supplying the standard JVM system properties in the JAVA_TOOL_OPTIONS environment variable. For example:
- JAVA_TOOL_OPTIONS="-Dhttp.proxyHost=<host> -Dhttp.proxyPort=<port> -Dhttps.proxyHost=<host> -Dhttps.proxyPort=<port>" -Dhttp.nonProxyHosts=host1|host2
We suggest running the On-Premises Container on a daily or weekly schedule but the frequency can be adjusted based on your organization-specific requirements

GitHub App

This approach is specific to organizations that use GitHub Cloud or GitHub Enterprise Cloud as their source code management tool. It involves a simple process for installing a GitHub App that works within GitHub’s ecosystem framework and is considered the easiest approach overall. However, the App requires read permissions to access repository contents in order to access commit data, although only summarised commit metadata is actually stored by Secure Code Warrior. Any source code inspected for the purposes of framework detection is immediately discarded after analysis and is never stored. If you would prefer that source code is not inspected, please let your CSM know and we can explicitly disable this behaviour but please note that this will also disable the framework detection capability.

Note: One limitation of this approach is that repositories that have over 30,000 commits within the last 12 months will be automatically excluded due to API request limits.

Please follow these steps to connect using the GitHub App

Navigate to the Trust Agent Configuration screen and click Add Provider under Git Connections
Click Connect under GitHub App
Click Add under Step 1 to open up the installation process in a new tab. Note that App installation for a GitHub Organization requires an Organization Admin

If you are not an Organization Admin, you will be able to request installation of the App by an Organization Admin

If you are an Organization Admin, follow the steps to select the repositories the App will have access to and install it into your GitHub Organization

Once installed, return to the Trust Agent Configuration screen and click “Authorize” under Step 2 to link your Secure Code Warrior tenant to your GitHub Organization

Manual Upload

This approach requires no direct integration with your Git-based source code management tool and relies on an adhoc manual process to export summarised commit metadata for uploading in to the SCW Trust Agent. As a result, it is the preferred approach for trialing the SCW Trust Agent but results in a point-in-time snapshot only, and is not scalable for a large number of repositories.

The Manual Upload process involves cloning your selected repositories onto your local system and running a provided shell script to generate the commit summaries that can be uploaded via the SCW Trust Agent web interface. It can be run manually at a frequency of your choosing. At no point does your organization’s source code, or keys to access your organization’s source code, leave your environment under this approach.

Please follow these steps to manually upload commit metadata

Clone the Git repository or repositories (within a single parent directory) that you would like to import into the SCW Trust Agent locally on your laptop or workstation
Copy the provided shell script, reproduced below for reference, to the locally cloned Git repository or repositories parent directory

git-export.sh

SUBDIRS=$(find . -type d -name ".git" -exec dirname {} \;)
BASEDIR=$(pwd)
for SUBDIR in $SUBDIRS
do
  cd $SUBDIR

  PROJECT_NAME=$(basename -s .git `git config --get remote.origin.url`)

  echo "Exporting commit data for $SUBDIR"
  git log --name-only --since="3 years ago" --no-merges --pretty=format:'- hash: %H
  authorDate: %aI
  authorEmail: %ae
  authorName: %an
  modifiedFileExtensions:' | sed -e '/^[^ -]/ s/.*\.\(.*\)/    - \1/' | sed -e '/^[^ -]/d' | sed '$!N; /^\(.*\)\n\1$/!P; D' > $BASEDIR/scw_gci_$PROJECT_NAME.yaml

  cd $BASEDIR
done

On your laptop or workstation, change working directory to the locally cloned Git repository or repositories parent directory
Run the git-export.sh shell script to produce a YAML file per repository
- The YAML export is a summary of the commit data and can be inspected prior to uploading
Inspect the contents of the YAML file(s) and redact or remove any information if needed
Navigate to the Trust Agent Configuration screen and click Add Provider under Git Connections
Click Upload under Manual Upload
Select the YAML file to upload, specify an organization and repository name, and then click Upload Repository. The organization name can be considered a folder to group repositories under. Repeat this step for all exported YAML files.

Menu

On-Premises

GitHub App

Manual Upload

Comments

Menu

On-Premises

GitHub App

Manual Upload

Related articles