We offer multiple approaches to connecting the SCW Trust Agent to your Git-based source code management tool. Each approach has its benefits and limitations, and we can help you make a decision based on your organization’s specific environment and policies.
All approaches store commit metadata only, including:
- Commit author name
- Commit author email address
- Timestamp of commit
- List of file extensions modified in the commit
- List of programming languages and frameworks associated with the commit based on the modified files
This summary data is stored by the SCW Trust Agent for the purposes of generating dashboards and reports, and to perform automation actions such as optimizing your Secure Code Warrior learning pathways or applying your configured policy.
Source code is inspected to facilitate accurate framework detection but is never stored. Only the results of the analysis is stored in the form of file extension or language and framework lists as described above.
On-Premises
This approach provides a combination of flexibility and risk mitigation by running a container within your environment that is configured and managed by your team. This offsets the risk of having a third party platform connected directly to your source code management tool.
The On-Premises Container is configurable and can work with both cloud-based and on-premises source code management platforms. The On-Premises Container can be run at a regular frequency to automatically connect to designated repositories and push summarised data back to the SCW Trust Agent. At no point does your organization’s source code, or keys to access your organization’s source code, leave your environment under this approach.
- We have published the On-Premises Docker image at
ghcr.io/securecodewarrior/trust-agent:1 - The On-Premises Docker image can be run in your container platform of choice, e.g. Docker, Kubernetes, etc, or within your CI pipeline with Docker container support
- Our customers typically use one of the following three deployment approaches:
-
Run the container within each repository's CI pipeline - This approach typically requires a common CI project or workflow to be shared by all your repositories to be scalable but if this is available, it provides a neat solution to configure the container for all repositories so that they individually run the container to import data. The specific method will depend heavily on the CI approach and tooling adopted by your organization - the only requirement is that the CI tooling has Docker container support.
-
Run the container centrally in your organization's container platform - This approach allows the container configuration and execution to be managed from a single point, where it is configured to connect to all your Git platforms and batch import data from all repositories in bulk. However, container disk space and network bandwidth can potentially be an issue if there are a large number of repositories or if the repositories are large in size. A persistent file system volume for the container is recommended to limit network traffic to commit deltas and reduce run times. The
WORKDIRenvironment variable described below allows you to specify the persistent mounted volume as the container working directory. - Run the container on a schedule from a laptop or workstation - This is usually the simplest approach but requires the individual's laptop or workstation to be available to continue running regular imports. It can be run adhoc on a manual schedule or automated using a tool such as cron or task scheduler.
-
Run the container within each repository's CI pipeline - This approach typically requires a common CI project or workflow to be shared by all your repositories to be scalable but if this is available, it provides a neat solution to configure the container for all repositories so that they individually run the container to import data. The specific method will depend heavily on the CI approach and tooling adopted by your organization - the only requirement is that the CI tooling has Docker container support.
- The container must be configured with:
- Network connectivity to your Git repositories via HTTPS
- Network connectivity to subdomains under prod.securecodewarrior.com
- Sufficient disk space for cloning your Git repositories
- A persistent volume can be mounted so that each container run is an incremental pull rather than a full repository clone, reducing run time and network bandwidth. The
WORKDIRenvironment variable described below allows you to specify the persistent mounted volume as the container working directory. - Alternatively, if disk space is limited, the
PROGRESSIVE_CLEANUPenvironment variable described below can be used to reduce the disk space required to only the largest repository you are importing.
- A persistent volume can be mounted so that each container run is an incremental pull rather than a full repository clone, reducing run time and network bandwidth. The
- The following environment variables must be passed to the container when run:
-
SCW_API_KEY- Your SCW Admin API key. Follow this guide to create your Admin API key.- Alternatively
SCW_API_KEY_FILEcontaining the path to a file containing the Admin API key
- Alternatively
-
SCW_API_URL- Your SCW API endpoint URL. Please head to Trust Agent Configuration > Git Connections > Add Provider > On-Premises > Setup Details for your specific endpoint URL. An example is shown below:REPO_URLS- A comma separated list of Git repository URLs with no spaces. e.g.https://git.local/projects/project1.git,https://git2.internal/projects/projectX.git- Please note currently only HTTPS Git repository URLs are supported, e.g.
- An alternate method to providing credentials in the URL is provided - please see below.
- This environment variable can be omitted if repository autodiscovery (see below) is being used.
-
- The following environment variables are optional and can be passed to the container when run to control specific behaviour:
-
GIT_USERNAME- An alternate method for providing credentials. This allows you to specify the username that will be passed to the repository during authentication.- Note: This will pass the same set of credentials to all repositories listed in
REPO_URLS. If different credentials are required for different repositories, a separate container run will be needed for each set of credentials.
- Note: This will pass the same set of credentials to all repositories listed in
-
GIT_PASSWORD- An alternate method for providing credentials. This allows you to specify the password that will be passed to the repository during authentication. In most Git platforms, API keys and access tokens can be provided usingGIT_PASSWORDwithout needing to provideGIT_USERNAME.- Alternatively
GIT_PASSWORD_FILEcontaining the path to a file containing the Git password - Note: This will pass the same set of credentials to all repositories listed in
REPO_URLS. If different credentials are required for different repositories, a separate container run will be needed for each set of credentials.
- Alternatively
-
WORKDIR- Specify the working directory that the container will use for storing Git repositories during analysis. This can be used in conjunction with a persistent mounted volume (e.g. using-vor--volume) to reduce network bandwidth usage and run time by persisting cloned Git repositories across multiple container runs. This results in the container only pulling incremental commit data for repositories that have already been cloned in a previous run. -
SKIP_CERTIFICATE_CHECK- Set this totrueto skip certificate verification. This may be required when connecting to systems with internally issued SSL/TLS certificates as the root and issuing certificates are not public. Some scenarios where this may happen are outlined below:- Connecting to internal source code management servers with internally issued SSL/TLS certificates. Note that in this scenario the transport remains encrypted and the internal nature of the traffic greatly reduces the chance of network attacks that this verification is designed to prevent.
- Connecting to SCW servers where SSL/TLS inspection or interception is being performed and SCW SSL/TLS server certificates are being dynamically replaced with internally issued SSL/TLS certificates.
-
PROGRESSIVE_CLEANUP- Set this totrueto configure the container to delete each repository after it has been processed and imported. This minimises the disk space requirements when a large number of repositories (e.g. from autodiscovery) are being imported. Note: Use of this option precludes the ability to mount a persistent disk volume to reduce network bandwidth usage and run times across container runs as described above.
-
- The following environment variables are optional and enable GitHub App authentication using a custom GitHub App installed in your GitHub instance. Please note that if
GITHUBCLOUD_TOKENis provided (via environment variable or file), it takes precedence over GitHub App authentication credentials. To use GitHub App authentication, please ensureGITHUBCLOUD_TOKENis unset.-
GITHUB_APP_ID- The App ID assigned to your GitHub App.- Alternatively
GITHUB_APP_ID_FILEcontaining the path to a file containing the App ID
- Alternatively
-
GITHUB_INSTALLATION_ID- The unique Installation ID for the App installed on your Organization or User account.- Alternatively
GITHUB_INSTALLATION_ID_FILEcontaining the path to a file containing the Installation ID
- Alternatively
-
GITHUB_PRIVATE_KEY- The private key in raw PEM format for the GitHub App.- Alternatively
GITHUB_PRIVATE_KEY_FILEcontaining the path to a file containing the private key in PEM format
- Alternatively
-
- The following environment variables are optional and control repository autodiscovery behaviour:
-
EXCLUDE_REPOS- A comma-separated list of repository URL patterns to exclude from processing. This allows you to filter out repositories that you don't want to import data for. Supports both full URLs and URL substrings (e.g.user/repo). Pattern matching supports wildcards - use*to match any number of characters (e.g.company/internal-*or*/archived-*). Please note that this works on the URL string, which may contain slightly different repository names than their display versions. -
REPO_PROVIDERS- Specify the repository provider to autodiscover repositories from as a comma separated list. e.g.REPO_PROVIDERS=bitbucketdatacenter,githubcloud,gitlabcloud- Please include the environment variables below that correspond to your specified repository providers
- Note that only Bitbucket Data Center (
bitbucketdatacenter), Bitbucket Cloud (bitbucketcloud), GitHub Cloud (githubcloud), GitLab Cloud (gitlabcloud), GitLab On-Prem (gitlabonprem) and Azure DevOps (azuredevops) are currently supported but additional providers will be added over time
- Bitbucket Data Center
-
BITBUCKETDATACENTER_HOST- Specify the base Bitbucket Data Center API URL to use for repository autodiscovery. e.g.BITBUCKETDATACENTER_HOST=https://bitbucket.corp.internalNote that this value MUST start withhttporhttpsand MUST NOT end with/ -
BITBUCKETDATACENTER_USER- Specify the username to use for authenticating to the Bitbucket Data Center host. e.g.BITBUCKETDATACENTER_USER=somebody -
BITBUCKETDATACENTER_TOKEN- Specify the authentication token to use for authenticating to the Bitbucket Data Center API host to use for repository autodiscovery. e.g.BITBUCKETDATACENTER_TOKEN=ATBBexampletoken1234- Alternatively
BITBUCKETDATACENTER_TOKEN_FILEcontaining the path to a file containing the authentication token
- Alternatively
-
BITBUCKETDATACENTER_CLONE_TOKEN- (Optional) Specify a different authentication token to be used for cloning discovered repositories, instead of the API token specified above- Alternatively
BITBUCKETDATACENTER_CLONE_TOKEN_FILEcontaining the path to a file containing the authentication token
- Alternatively
-
- Bitbucket Cloud
-
BITBUCKETCLOUD_HOST- Specify the base Bitbucket Cloud API URL to use for repository autodiscovery. e.g.BITBUCKETCLOUD_HOST=https://api.bitbucket.org/2.0Note that this value MUST start withhttporhttpsand MUST NOT end with/ -
BITBUCKETCLOUD_USER- Specify the username to use for authenticating to the Bitbucket Cloud host. e.g.BITBUCKETCLOUD_USER=somebodyThis is typically the email address you use to log in to Bitbucket Cloud -
BITBUCKETCLOUD_TOKEN- Specify the API token to use for authenticating to the Bitbucket Cloud API host to use for repository autodiscovery. e.g.BITBUCKETCLOUD_TOKEN=ATATTexampletoken1234API tokens can be managed here- Alternatively
BITBUCKETCLOUD_TOKEN_FILEcontaining the path to a file containing the authentication token
- Alternatively
-
BITBUCKETCLOUD_CLONE_TOKEN- (Optional) Specify a different authentication token to be used for cloning discovered repositories, instead of the API token specified above- Alternatively
BITBUCKETCLOUD_CLONE_TOKEN_FILEcontaining the path to a file containing the authentication token
- Alternatively
- Notes:
- Please ensure the API token has the
read:project:bitbucket,read:repository:bitbucket,read:workspace:bitbucket, andread:user:bitbucketscopes
- Please ensure the API token has the
-
- GitHub Cloud
-
GITHUBCLOUD_HOST- Specify the GitHub API URL to use for repository autodiscovery. e.g.GITHUBCLOUD_HOST=https://api.github.comNote that this value MUST start withhttporhttpsand MUST NOT end with/ -
GITHUBCLOUD_USER- Specify the organisation name or username to use for authenticating to the GitHub API. e.g.GITHUBCLOUD_USER=somebody -
GITHUBCLOUD_TOKEN- Specify the authentication token to use for authenticating to the GitHub API host to use for repository autodiscovery. e.g.GITHUBCLOUD_TOKEN=ghp_exampletoken1234- Alternatively
GITHUBCLOUD_TOKEN_FILEcontaining the path to a file containing the authentication token
- Alternatively
-
GITHUBCLOUD_CLONE_TOKEN- (Optional) Specify a different authentication token to be used for cloning discovered repositories, instead of the API token specified above- Alternatively
GITHUBCLOUD_CLONE_TOKEN_FILEcontaining the path to a file containing the authentication token
- Alternatively
- Notes:
- You can find your organization name in the URL of your repositories (
organization_name/repository_name) - The access token must be authorised with your Single Sign-On (SSO) organization if your organization is using SSO
- You can find your organization name in the URL of your repositories (
-
- GitLab Cloud
-
GITLABCLOUD_HOST- Specify the GitLab API URL to use for repository autodiscovery. e.g.GITLABCLOUD_HOST=https://gitlab.comNote that this value MUST start withhttporhttpsand MUST NOT end with/ -
GITLABCLOUD_USER- Specify the group name or username to use for authenticating to the GitLab API. e.g.GITLABCLOUD_USER=somebody -
GITLABCLOUD_TOKEN- Specify the authentication token to use for authenticating to the GitLab API host to use for repository autodiscovery. e.g.GITLABCLOUD_TOKEN=glpat_exampletoken1234- Alternatively
GITLABCLOUD_TOKEN_FILEcontaining the path to a file containing the authentication token
- Alternatively
-
GITLABCLOUD_CLONE_TOKEN- (Optional) Specify a different authentication token to be used for cloning discovered repositories, instead of the API token specified above- Alternatively
GITLABCLOUD_CLONE_TOKEN_FILEcontaining the path to a file containing the authentication token
- Alternatively
-
GITLABCLOUD_VISIBILITY_SCOPES: Comma separated list of visibilities. Options include,private,internalandpublic- eg:
GITLABCLOUD_VISIBILITY_SCOPES=private,internal - WARNING:
publicwill retrieve all public repositories and is not recommended for use on GitLab Cloud
- eg:
- Notes:
- Please take the group name from the URL of your group, as the display name on the page can be different
- Please ensure your authentication token has
read_apiandread_repositoryscopes
-
- GitLab On-Premises
-
GITLABONPREM_HOST- Specify the GitLab API URL to use for repository autodiscovery. e.g.GITLABONPREM_HOST=https://gitlab.corp.internalNote that this value MUST start withhttporhttpsand MUST NOT end with/ -
GITLABONPREM_USER- Specify the group name or username to use for authenticating to the GitLab API. e.g.GITLABONPREM_USER=somebody -
GITLABONPREM_TOKEN- Specify the authentication token to use for authenticating to the GitLab API host to use for repository autodiscovery. e.g.GITLABONPREM_TOKEN=glpat_exampletoken1234- Alternatively
GITLABONPREM_TOKEN_FILEcontaining the path to a file containing the authentication token
- Alternatively
-
GITLABONPREM_CLONE_TOKEN- (Optional) Specify a different authentication token to be used for cloning discovered repositories, instead of the API token specified above- Alternatively
GITLABONPREM_CLONE_TOKEN_FILEcontaining the path to a file containing the authentication token
- Alternatively
-
GITLAB_VISIBILITY_SCOPES: Comma separated list of visibilities. Options include,private,internalandpublic- eg:
GITLAB_VISIBILITY_SCOPES=private,internal - WARNING:
publicwill retrieve all public repositories and is only recommended for use on GitLab On-Premises
- eg:
- Notes:
- Please take the group name from the URL of your group, as the display name on the page can be different
-
- Azure DevOps
-
AZUREDEVOPS_HOST- Specify the Azure DevOps API URL to use for repository autodiscovery. This must include the “organization” or “collection” part of the URL. e.g.AZUREDEVOPS_HOST=https://dev.azure.com/ORGANIZATION_NAME(Azure DevOps Cloud) orAZUREDEVOPS_HOST=https://azdo.internal/COLLECTION_NAME(Azure DevOps Server). Note that this value MUST start withhttporhttpsand MUST NOT end with/ -
AZUREDEVOPS_USER- Specify the group name or username to use for authenticating to the Azure DevOps API. e.g.AZUREDEVOPS_USER=somebody -
AZUREDEVOPS_TOKEN- Specify the authentication token to use for authenticating to the Azure DevOps API host to use for repository autodiscovery. e.g.AZUREDEVOPS_TOKEN=exampletoken1234- Alternatively
AZUREDEVOPS_TOKEN_FILEcontaining the path to a file containing the authentication token
- Alternatively
-
AZUREDEVOPS_CLONE_TOKEN- (Optional) Specify a different authentication token to be used for cloning discovered repositories, instead of the API token specified above- Alternatively
AZUREDEVOPS_CLONE_TOKEN_FILEcontaining the path to a file containing the authentication token
- Alternatively
-
-
- Examples for passing environment variables to Docker are shown below. Please refer to the documentation for other container platforms.
- Passing environment variables via command line options
docker run --pull=always -e SCW_API_KEY='your_api_key' -e SCW_API_URL='https://trust-agent.prod-us.prod.securecodewarrior.com' -e REPO_URLS='https://service_user:personal_access_token@git.local/projects/project1.git,https://service_user:service_password@git.local/projects/project2.git,https://git2.internal/projects/projectX.git' ghcr.io/securecodewarrior/trust-agent:1docker run --pull=always -e SCW_API_KEY='your_api_key' -e SCW_API_URL='https://trust-agent.prod-us.prod.securecodewarrior.com' -e REPO_PROVIDERS='githubcloud,gitlabonprem' -e GITHUBCLOUD_HOST=https://api.github.com -e GITHUBCLOUD_USER=somebody GITHUBCLOUD_TOKEN=ghp_exampletoken1234 -e GITLABONPREM_HOST=https://gitlab.corp.internal -e GITLABONPREM_USER=somebody -e GITLABONPREM_TOKEN=glpat_exampletoken1234 ghcr.io/securecodewarrior/trust-agent:1
- Passing environment variables via a file
docker run --pull=always --env-file env.list ghcr.io/securecodewarrior/trust-agent:1
- Passing environment variables via command line options
- HTTP proxy support is provided by the underlying JVM and can be enabled by supplying the standard JVM system properties in the
JAVA_TOOL_OPTIONSenvironment variable. For example:JAVA_TOOL_OPTIONS="-Dhttp.proxyHost=<host> -Dhttp.proxyPort=<port> -Dhttps.proxyHost=<host> -Dhttps.proxyPort=<port>" -Dhttp.nonProxyHosts=host1|host2
- We suggest running the On-Premises Container on a daily or weekly schedule but the frequency can be adjusted based on your organization-specific requirements
GitHub App
This approach is specific to organizations that use GitHub Cloud or GitHub Enterprise Cloud as their source code management tool. It involves a simple process for installing a GitHub App that works within GitHub’s ecosystem framework and is considered the easiest approach overall. However, the App requires read permissions to access repository contents in order to access commit data, although only summarised commit metadata is actually stored by Secure Code Warrior. Any source code inspected for the purposes of framework detection is immediately discarded after analysis and is never stored. If you would prefer that source code is not inspected, please let your CSM know and we can explicitly disable this behaviour but please note that this will also disable the framework detection capability.
Note: One limitation of this approach is that repositories that have over 30,000 commits within the last 12 months will be automatically excluded due to API request limits.
- Navigate to the Trust Agent Configuration screen and click Add Provider under Git Connections
- Click Connect under GitHub App
- Click Add under Step 1 to open up the installation process in a new tab. Note that App installation for a GitHub Organization requires an Organization Admin
- If you are not an Organization Admin, you will be able to request installation of the App by an Organization Admin
- If you are an Organization Admin, follow the steps to select the repositories the App will have access to and install it into your GitHub Organization
- Once installed, return to the Trust Agent Configuration screen and click “Authorize” under Step 2 to link your Secure Code Warrior tenant to your GitHub Organization
Manual Upload
This approach requires no direct integration with your Git-based source code management tool and relies on an adhoc manual process to export summarised commit metadata for uploading in to the SCW Trust Agent. As a result, it is the preferred approach for trialing the SCW Trust Agent but results in a point-in-time snapshot only, and is not scalable for a large number of repositories.
The Manual Upload process involves cloning your selected repositories onto your local system and running a provided shell script to generate the commit summaries that can be uploaded via the SCW Trust Agent web interface. It can be run manually at a frequency of your choosing. At no point does your organization’s source code, or keys to access your organization’s source code, leave your environment under this approach.
- Clone the Git repository or repositories (within a single parent directory) that you would like to import into the SCW Trust Agent locally on your laptop or workstation
- Copy the provided shell script, reproduced below for reference, to the locally cloned Git repository or repositories parent directory
git-export.sh
SUBDIRS=$(find . -type d -name ".git" -exec dirname {} \;)
BASEDIR=$(pwd)
for SUBDIR in $SUBDIRS
do
cd $SUBDIR
PROJECT_NAME=$(basename -s .git `git config --get remote.origin.url`)
echo "Exporting commit data for $SUBDIR"
git log --name-only --since="3 years ago" --no-merges --pretty=format:'- hash: %H
authorDate: %aI
authorEmail: %ae
authorName: %an
modifiedFileExtensions:' | sed -e '/^[^ -]/ s/.*\.\(.*\)/ - \1/' | sed -e '/^[^ -]/d' | sed '$!N; /^\(.*\)\n\1$/!P; D' > $BASEDIR/scw_gci_$PROJECT_NAME.yaml
cd $BASEDIR
done
- On your laptop or workstation, change working directory to the locally cloned Git repository or repositories parent directory
-
Run the
git-export.sh
shell script to produce a YAML file per repository
- The YAML export is a summary of the commit data and can be inspected prior to uploading
- Inspect the contents of the YAML file(s) and redact or remove any information if needed
- Navigate to the Trust Agent Configuration screen and click Add Provider under Git Connections
- Click Upload under Manual Upload
- Select the YAML file to upload, specify an organization and repository name, and then click Upload Repository. The organization name can be considered a folder to group repositories under. Repeat this step for all exported YAML files.
Comments
0 comments
Article is closed for comments.