We offer multiple approaches to connecting the SCW Trust Agent to your Git-based source code management tool. Each approach has its benefits and limitations, and we can help you make a decision based on your organization’s specific environment and policies.
All approaches store commit metadata only, including:
- Commit author name
- Commit author email address
- Timestamp of commit
- List of file extensions modified in the commit
- List of programming languages and frameworks associated with the commit based on the modified files
This summary data is stored by the SCW Trust Agent for the purposes of generating dashboards and reports, and to perform automation actions such as optimizing your Secure Code Warrior learning pathways or applying your configured policy.
Source code is inspected to facilitate accurate framework detection but is never stored. Only the results of the analysis is stored in the form of file extension or language and framework lists as described above.
On-Premises
This approach provides a combination of flexibility and risk mitigation by running a container within your environment that is configured and managed by your team. This offsets the risk of having a third party platform connected directly to your source code management tool.
The On-Premises Container is configurable and can work with both cloud-based and on-premises source code management platforms. The On-Premises Container can be run at a regular frequency to automatically connect to designated repositories and push summarised data back to the SCW Trust Agent. At no point does your organization’s source code, or keys to access your organization’s source code, leave your environment under this approach.
- We have published the On-Premises Docker image at
ghcr.io/securecodewarrior/trust-agent:1
- The On-Premises Docker image can be run in your container platform of choice, e.g. Docker, Kubernetes, etc, or within your CI pipeline with Docker container support
- Our customers typically use one of the following three deployment approaches:
-
Run the container within each repository's CI pipeline - This approach typically requires a common CI project or workflow to be shared by all your repositories to be scalable but if this is available, it provides a neat solution to configure the container for all repositories so that they individually run the container to import data. The specific method will depend heavily on the CI approach and tooling adopted by your organization - the only requirement is that the CI tooling has Docker container support.
-
Run the container centrally in your organization's container platform - This approach allows the container configuration and execution to be managed from a single point, where it is configured to connect to all your Git platforms and batch import data from all repositories in bulk. However, container disk space and network bandwidth can potentially be an issue if there are a large number of repositories or if the repositories are large in size. A persistent file system volume for the container is recommended to reduce disk space usage and to limit network traffic to commit deltas. The
WORKDIR
environment variable described below allows you to specify the persistent mounted volume as the container working directory. - Run the container on a schedule from a laptop or workstation - This is usually the simplest approach but requires the individual's laptop or workstation to be available to continue running regular imports. It can be run adhoc on a manual schedule or automated using a tool such as cron or task scheduler.
-
Run the container within each repository's CI pipeline - This approach typically requires a common CI project or workflow to be shared by all your repositories to be scalable but if this is available, it provides a neat solution to configure the container for all repositories so that they individually run the container to import data. The specific method will depend heavily on the CI approach and tooling adopted by your organization - the only requirement is that the CI tooling has Docker container support.
- The container must be configured with:
- Network connectivity to your Git repositories via HTTPS
- Network connectivity to subdomains under prod.securecodewarrior.com
- Sufficient disk space for cloning your Git repositories
- Note a persistent volume can be mounted so that each container run is an incremental pull rather than a full repository clone. The
WORKDIR
environment variable described below allows you to specify the persistent mounted volume as the container working directory.
- Note a persistent volume can be mounted so that each container run is an incremental pull rather than a full repository clone. The
- The following environment variables must be passed to the container when run:
-
SCW_API_KEY
- Your SCW Admin API key. Follow this guide to create your Admin API key. -
SCW_API_URL
- Your SCW API endpoint URL. Please head to Trust Agent Configuration > Git Connections > Add Provider > On-Premises > Setup Details for your specific endpoint URL. An example is shown below:REPO_URLS
- A comma separated list of Git repository URLs with no spaces. e.g.https://git.local/projects/project1.git,https://git2.internal/projects/projectX.git
- Please note currently only HTTPS Git repository URLs are supported, e.g.
- An alternate method to providing credentials in the URL is provided - please see below.
- This environment variable can be omitted if repository autodiscovery (see below) is being used.
-
- The following environment variables are optional and can be passed to the container when run to control specific behaviour:
-
GIT_USERNAME
- An alternate method for providing credentials. This allows you to specify the username that will be passed to the repository during authentication.- Note: This will pass the same set of credentials to all repositories listed in
REPO_URLS
. If different credentials are required for different repositories, a separate container run will be needed for each set of credentials.
- Note: This will pass the same set of credentials to all repositories listed in
-
GIT_PASSWORD
- An alternate method for providing credentials. This allows you to specify the password that will be passed to the repository during authentication. In most Git platforms, API keys and access tokens can be provided usingGIT_PASSWORD
without needing to provideGIT_USERNAME
.- Note: This will pass the same set of credentials to all repositories listed in
REPO_URLS
. If different credentials are required for different repositories, a separate container run will be needed for each set of credentials.
- Note: This will pass the same set of credentials to all repositories listed in
-
WORKDIR
- Specify the working directory that the container will use for storing Git repositories during analysis. This can be used in conjunction with a persistent mounted volume (e.g. using-v
or--volume
) to reduce bandwidth and disk usage by persisting cloned Git repositories across multiple container runs. This results in the container only pulling incremental commit data for repositories that have already been cloned in a previous run. -
SKIP_CERTIFICATE_CHECK
- Set this totrue
to skip certificate verification. This may be required when connecting to systems with internally issued SSL/TLS certificates as the root and issuing certificates are not public. Some scenarios where this may happen are outlined below:- Connecting to internal source code management servers with internally issued SSL/TLS certificates. Note that in this scenario the transport remains encrypted and the internal nature of the traffic greatly reduces the chance of network attacks that this verification is designed to prevent.
- Connecting to SCW servers where SSL/TLS inspection or interception is being performed and SCW SSL/TLS server certificates are being dynamically replaced with internally issued SSL/TLS certificates.
-
- The following environment variables are optional and control repository autodiscovery behaviour:
-
EXCLUDE_REPOS
- A comma-separated list of repository URL patterns to exclude from processing. This allows you to filter out repositories that you don't want to import data for. Supports both full URLs and shortened patterns (e.g.user/repo
). Pattern matching supports wildcards - use*
to match any number of characters (e.g.company/internal-*
or*/archived-*
). -
REPO_PROVIDERS
- Specify the repository provider to autodiscover repositories from as a comma separated list. e.g.REPO_PROVIDERS=bitbucketdatacenter,githubcloud,gitlabcloud
- Please include the environment variables below that correspond to your specified repository providers
- Note that only Bitbucket Data Center (
bitbucketdatacenter
), GitHub Cloud (githubcloud
), GitLab Cloud (gitlabcloud
) and GitLab On-Prem (gitlabonprem
) are currently supported but additional providers will be added over time
- Bitbucket Data Center
-
BITBUCKETDATACENTER_HOST
- Specify the base Bitbucket Data Center API URL to use for repository autodiscovery. e.g.BITBUCKETDATACENTER_HOST=https://bitbucket.corp.internal
Note that this value MUST start withhttp
orhttps
and MUST NOT end with/
-
BITBUCKETDATACENTER_USER
- Specify the username to use for authenticating to the Bitbucket Data Center host. e.g.BITBUCKETDATACENTER_USER=somebody
-
BITBUCKETDATACENTER_TOKEN
- Specify the authentication token to use for authenticating to the Bitbucket Data Center API host to use for repository autodiscovery. e.g.BITBUCKETDATACENTER_TOKEN=ATBBexampletoken1234
-
BITBUCKETDATACENTER_CLONE_TOKEN
- (Optional) Specify a different authentication token to be used for cloning discovered repositories, instead of the API token specified above
-
- GitHub Cloud
-
GITHUBCLOUD_HOST
- Specify the GitHub API URL to use for repository autodiscovery. e.g.GITHUBCLOUD_HOST=https://api.github.com
Note that this value MUST start withhttp
orhttps
and MUST NOT end with/
-
GITHUBCLOUD_USER
- Specify the organisation name or username to use for authenticating to the GitHub API. e.g.GITHUBCLOUD_USER=somebody
-
GITHUBCLOUD_TOKEN
- Specify the authentication token to use for authenticating to the GitHub API host to use for repository autodiscovery. e.g.GITHUBCLOUD_TOKEN=ghp_exampletoken1234
-
GITHUBCLOUD_CLONE_TOKEN
- (Optional) Specify a different authentication token to be used for cloning discovered repositories, instead of the API token specified above - Notes:
- You can find your organization name in the URL of your repositories (
organization_name
/repository_name
) -
The access token must be authorised with your Single Sign-On (SSO) organization if your organization is using SSO
- You can find your organization name in the URL of your repositories (
-
- GitLab Cloud
-
GITLABCLOUD_HOST
- Specify the GitLab API URL to use for repository autodiscovery. e.g.GITLABCLOUD_HOST=https://gitlab.com
Note that this value MUST start withhttp
orhttps
and MUST NOT end with/
-
GITLABCLOUD_USER
- Specify the group name or username to use for authenticating to the GitLab API. e.g.GITLABCLOUD_USER=somebody
-
GITLABCLOUD_TOKEN
- Specify the authentication token to use for authenticating to the GitLab API host to use for repository autodiscovery. e.g.GITLABCLOUD_TOKEN=glpat_exampletoken1234
-
GITLABCLOUD_CLONE_TOKEN
- (Optional) Specify a different authentication token to be used for cloning discovered repositories, instead of the API token specified above - Notes:
- Please take the group name from the URL of your group, as the display name on the page can be different
-
- GitLab On-Premises
-
GITLABONPREM_HOST
- Specify the GitLab API URL to use for repository autodiscovery. e.g.GITLABONPREM_HOST=https://gitlab.corp.internal
Note that this value MUST start withhttp
orhttps
and MUST NOT end with/
-
GITLABONPREM_USER
- Specify the group name or username to use for authenticating to the GitLab API. e.g.GITLABONPREM_USER=somebody
-
GITLABONPREM_TOKEN
- Specify the authentication token to use for authenticating to the GitLab API host to use for repository autodiscovery. e.g.GITLABONPREM_TOKEN=glpat_exampletoken1234
-
GITLABONPREM_CLONE_TOKEN
- (Optional) Specify a different authentication token to be used for cloning discovered repositories, instead of the API token specified above - Notes:
- Please take the group name from the URL of your group, as the display name on the page can be different
-
-
- Examples for passing environment variables to Docker are shown below. Please refer to the documentation for other container platforms.
- Passing environment variables via command line options
docker run --pull=always -e SCW_API_KEY='your_api_key' -e SCW_API_URL='https://trust-agent.prod-us.prod.securecodewarrior.com' -e REPO_URLS='https://service_user:personal_access_token@git.local/projects/project1.git,https://service_user:service_password@git.local/projects/project2.git,https://git2.internal/projects/projectX.git' ghcr.io/securecodewarrior/trust-agent:1
docker run --pull=always -e SCW_API_KEY='your_api_key' -e SCW_API_URL='https://trust-agent.prod-us.prod.securecodewarrior.com' -e REPO_PROVIDERS='githubcloud,gitlabonprem' -e GITHUBCLOUD_HOST=https://api.github.com -e GITHUBCLOUD_USER=somebody GITHUBCLOUD_TOKEN=ghp_exampletoken1234 -e GITLABONPREM_HOST=https://gitlab.corp.internal -e GITLABONPREM_USER=somebody -e GITLABONPREM_TOKEN=glpat_exampletoken1234 ghcr.io/securecodewarrior/trust-agent:1
- Passing environment variables via a file
docker run --pull=always --env-file env.list ghcr.io/securecodewarrior/trust-agent:1
- Passing environment variables via command line options
- We suggest running the On-Premises Container on a daily or weekly schedule but the frequency can be adjusted based on your organization-specific requirements
GitHub App
This approach is specific to organizations that use GitHub Cloud or GitHub Enterprise Cloud as their source code management tool. It involves a simple process for installing a GitHub App that works within GitHub’s ecosystem framework and is considered the easiest approach overall. However, the App requires read permissions to access repository contents in order to access commit data, although only summarised commit metadata is actually stored by Secure Code Warrior. Any source code inspected for the purposes of framework detection is immediately discarded after analysis and is never stored. If you would prefer that source code is not inspected, please let your CSM know and we can explicitly disable this behaviour but please note that this will also disable the framework detection capability.
Note: One limitation of this approach is that repositories that have over 30,000 commits within the last 12 months will be automatically excluded due to API request limits.
- Navigate to the Trust Agent Configuration screen and click Add Provider under Git Connections
- Click Connect under GitHub App
- Click Add under Step 1 to open up the installation process in a new tab. Note that App installation for a GitHub Organization requires an Organization Admin
- If you are not an Organization Admin, you will be able to request installation of the App by an Organization Admin
- If you are an Organization Admin, follow the steps to select the repositories the App will have access to and install it into your GitHub Organization
- Once installed, return to the Trust Agent Configuration screen and click “Authorize” under Step 2 to link your Secure Code Warrior tenant to your GitHub Organization
Manual Upload
This approach requires no direct integration with your Git-based source code management tool and relies on an adhoc manual process to export summarised commit metadata for uploading in to the SCW Trust Agent. As a result, it is the preferred approach for trialing the SCW Trust Agent but results in a point-in-time snapshot only, and is not scalable for a large number of repositories.
The Manual Upload process involves cloning your selected repositories onto your local system and running a provided shell script to generate the commit summaries that can be uploaded via the SCW Trust Agent web interface. It can be run manually at a frequency of your choosing. At no point does your organization’s source code, or keys to access your organization’s source code, leave your environment under this approach.
- Clone the Git repository or repositories (within a single parent directory) that you would like to import into the SCW Trust Agent locally on your laptop or workstation
- Copy the provided shell script, reproduced below for reference, to the locally cloned Git repository or repositories parent directory
git-export.sh
SUBDIRS=$(find . -type d -name ".git" -exec dirname {} \;)
BASEDIR=$(pwd)
for SUBDIR in $SUBDIRS do
cd $SUBDIR PROJECT_NAME=$(basename -s .git `git config --get remote.origin.url`)
echo "Exporting commit data for $SUBDIR"
git log --name-only --since="3 years ago" --no-merges --pretty=format:'- hash: %H authorDate: %aI
authorEmail: %ae
authorName: %an
modifiedFileExtensions:' | sed -e '/^[^ -]/ s/.*\.\(.*\)/ - \1/' | sed -e '/^[^ -]/d' | sed '$!N; /^\(.*\)\n\1$/!P; D' > $BASEDIR/scw_gci_$PROJECT_NAME.yaml
cd $BASEDIR
done
- On your laptop or workstation, change working directory to the locally cloned Git repository or repositories parent directory
- Run the git-export.sh shell script to produce a YAML file per repository
- The YAML export is a summary of the commit data and can be inspected prior to uploading
- Inspect the contents of the YAML file(s) and redact or remove any information if needed
- Navigate to the Trust Agent Configuration screen and click Add Provider under Git Connections
- Click Upload under Manual Upload
- Select the YAML file to upload, specify an organization and repository name, and then click Upload Repository. The organization name can be considered a folder to group repositories under. Repeat this step for all exported YAML files.
Comments
0 comments
Article is closed for comments.