Modern web applications come with a REST API which returns JSON. The format needs to be parsed, and often feeds into scripts and service daemons polling the API for automation.

Starting with a new REST API and its endpoints can often be overwhelming. Documentation may suggest looking into a set of SDKs and libraries for various languages, or instruct you to use curl or wget on the CLI to send a request. Both CLI tools come with a variety of parameters which help to download and print the response string, for example in JSON format.

The response string retrieved from curl may get long and confusing. It can require parsing the JSON format and filtering for a smaller subset of results. This helps with viewing the results on the CLI, and minimizes the data to process in scripts. The following example retrieves all projects from GitLab and returns a paginated result set with the first 20 projects:

$ curl "https://gitlab.com/api/v4/projects"

Raw JSON as API response

The GitLab REST API documentation guides you through the first steps with error handling and authentication. In this blog post, we will be using the Personal Access Token as the authentication method. Alternatively, you can use project access tokens for automated authentication that avoids the use of personal credentials.

REST API authentication

Since not all endpoints are accessible with anonymous access they might require authentication. Try fetching user profile data with this request:

$ curl "https://gitlab.com/api/v4/user"
{"message":"401 Unauthorized"}

The API request against the /user endpoint requires to pass the personal access token into the request, for example, as a request header. To avoid exposing credentials on the terminal, you can export the token and its value into the user's environment. You can automate the variable export with ZSH and the .env plugin in your shell environment. You can also source the .env once in the existing shell environment.

$ vim ~/.env

export GITLAB_TOKEN=”...”

$ source ~/.env

Scripts and commands being run in your shell environment can reference the $GITLAB_TOKEN variable. Try querying the user API endpoint again, with adding the authorization header into the request:

$ curl -H "Authorization: Bearer $GITLAB_TOKEN" "https://gitlab.com/api/v4/user"

A reminder that only administrators can see the attributes of all users, and the individual can only see their user profile – for example, email is hidden from the public domain.

How to request responses in JSON

The GitLab API provides many resources and URL endpoints. You can manage almost anything with the API that you’d otherwise configure using the graphic user interface.

After sending the API request, the response message contains the body as string, for example as a JSON content type. curl can provide more information about the response headers which is helpful for debugging. Multiple verbose levels enable the full debug output with -vvv:

$ curl -vvv "https://gitlab.com/api/v4/projects"
[...]
* SSL connection using TLSv1.2 / ECDHE-RSA-CHACHA20-POLY1305
* ALPN, server accepted to use h2
* Server certificate:
*  subject: CN=gitlab.com
*  start date: Jan 21 00:00:00 2021 GMT
*  expire date: May 11 23:59:59 2021 GMT
*  subjectAltName: host "gitlab.com" matched cert's "gitlab.com"
*  issuer: C=GB; ST=Greater Manchester; L=Salford; O=Sectigo Limited; CN=Sectigo RSA Domain Validation Secure Server CA
*  SSL certificate verify ok.
[...]
> GET /api/v4/projects HTTP/2
> Host: gitlab.com
> User-Agent: curl/7.64.1
> Accept: */*
[...]
< HTTP/2 200
< date: Mon, 19 Apr 2021 11:25:31 GMT
< content-type: application/json
[...]
[{"id":25993690,"description":"project for adding issues","name":"project-for-issues-1e1b6d5f938fb240","name_with_namespace":"gitlab-qa-sandbox-group / qa-test-2021-04-19-11-13-01-d7d873fd43cd34b6 / project-for-issues-1e1b6d5f938fb240","path":"project-for-issues-1e1b6d5f938fb240","path_with_namespace":"gitlab-qa-sandbox-group/qa-test-2021-04-19-11-13-01-d7d873fd43cd34b6/project-for-issues-1e1b6d5f938fb240"

[... JSON content ...]

"avatar_url":null,"web_url":"https://gitlab.com/groups/gitlab-qa-sandbox-group/qa-test-2021-04-19-11-12-56-7f3128bd0e41b92f"}}]
* Closing connection 0

The curl command output provides helpful insights into TLS ciphers and versions, the request lines starting with > and response lines starting with <. The response body string is encoded as JSON.

How to see the structure of the returned JSON

To get a quick look at the structure of the returned JSON file, try these tips:

The values in JSON consist of specific types - a string value is put in double-quotes. Boolean true/false, numbers, and floating-point numbers are also present as types. If a key exists but its value is not set, REST APIs often return null.

Verify the data structure by running "linters". Python's JSON module can parse and lint JSON strings. The example below misses a closing square bracket to showcase the error:

$ echo '[{"key": "broken"}' | python -m json.tool
Expecting object: line 1 column 19 (char 18)

jq – a lightweight and flexible CLI processor – can be used as a standalone tool to parse and validate JSON data.

$ echo '[{"key": "broken"}' | jq
parse error: Unfinished JSON term at EOF at line 2, column 0

jq is available in the package managers of most operating systems.

$ brew install jq
$ apt install jq
$ dnf install jq
$ zypper in jq
$ pacman -S jq
$ apk add jq

Dive deep into JSON data structures

The true power of jq lies in how it can be used to parse JSON data:

jq is like sed for JSON data. It can be used to slice, filter, map, and transform structured data with the same ease that sed, awk, grep etc., let you manipulate text.

The output below shows how it looks to run the request against the project API again, but this time, the output is piped to jq.

$ curl "https://gitlab.com/api/v4/projects" | jq
[
  {
    "id": 25994891,
    "description": "...",
    "name": "...",

[...]

    "forks_count": 0,
    "star_count": 0,
    "last_activity_at": "2021-04-19T11:50:24.292Z",
    "namespace": {
      "id": 11528141,
      "name": "...",

[...]

    }
  }
]

The first difference is the format of the JSON data structure, so-called pretty-printed. New lines and indents in data structure scopes help your eyes and allow you to identify the inner and outer data structures involved. This format is needed to determine which jq filters and methods you want to apply next.

About arrays and dictionaries

The set of results from an API often is returned as a list (or "array") of items. An item itself can be a single value or a JSON object. The following example mimics the response from the GitLab API and creates an array of dictionaries as a nested result set.

$ vim result.json
[
  {
    "id": 1,
    "name": "project1"
  },
  {
    "id": 2,
    "name": "project2"
  },
  {
    "id": 3,
    "name": "project-internal-dev",
    "namespace": {
      "name": "🦊"
    }
  }
]

Use cat to print the file content on stdout and pipe it into jq. The outer data structure is an array – use -c .[] to access and print all items.

$ cat result.json | jq -c '.[]'
{"id":1,"name":"project1"}
{"id":2,"name":"project2"}
{"id":3,"name":"project-internal-dev","namespace":{"name":"🦊"}}

How to filter data structures with jq

Filter items by passing | select (...) to jq. The filter takes a lambda callback function as a comparator condition. When the item matches the condition, it is returned to the caller.

Use the dot indexer . to access dictionary keys and their values. Try to filter for all items where the name is project2:

$ cat result.json | jq -c '.[] | select (.name == "project2")'
{"id":2,"name":"project2"}

Practice this example by selecting the id with the value 2 instead of the name.

Filter with matching a string

During tests, you may need to match different patterns instead of knowing the full name. Think of projects that match a specific path or are located in a group where you only know the prefix. Simple string matches can be achieved with the | contains (...) function. It allows you to check whether the given string is inside the target string – which requires the selected attribute to be of the string type.

For a filter with the select chain, the comparison condition needs to be changed from the equal operator == to checking the attribute .name with | contains ("dev").

$ cat result.json | jq -c '.[] | select (.name | contains ("dev") )'
{"id":3,"name":"project-internal-dev","namespace":{"name":"🦊"}}

Simple matches can be achieved with the contains function.

Filter with matching regular expressions

For advanced string pattern matching, it is recommended to use regular expressions. jq provides the test function for this use case. Try to filter for all projects which end with a number, represented by \d+. Note that the backslash \ needs to be escaped as \\ for shell execution. ^ tests for beginning of the string, $ is the ending check.

$ cat result.json | jq -c '.[] | select (.name | test ("^project\\d+$") )'
{"id":1,"name":"project1"}
{"id":2,"name":"project2"}

Tip: You can test and build the regular expression with regex101 before test-driving it with jq.

Access nested values

Key value pairs in a dictionary may have a dictionary or array as a value. jq filters need to take this factor into account when filtering or transforming the result. The example data structure provides project-internal-dev which has the key namespace and a value of a dictionary type.

  {
    "id": 3,
    "name": "project-internal-dev",
    "namespace": {
      "name": "🦊"
    }
  }

jq allows the user to specify the array and dictionary types as [] and {} to be used in select chains with greater and less than comparisons. The [] brackets select filters for non-empty dictionaries for the namespace attribute, while the {} brackets select for all null (raw JSON) values.

$ cat result.json | jq -c '.[] | select (.namespace >={} )'
{"id":3,"name":"project-internal-dev","namespace":{"name":"🦊"}}

$ cat result.json | jq -c '.[] | select (.namespace <={} )'
{"id":1,"name":"project1"}
{"id":2,"name":"project2"}

These methods can be used to access the name attribute of the namespace, but only if the namespace contains values. Tip: You can chain multiple jq calls by piping the result into another jq call. .name is a subkey of the primary .namespace key.

$ cat result.json | jq -c '.[] | select (.namespace >={} )' | jq -c '.namespace.name'
"🦊"

The additional select command with non-empty namespaces ensures that only initialized values for .namespace.name are returned. This is a safety check, and avoids receiving null values in the result you would need to filter again.

$ cat result.json| jq -c '.[]' | jq -c '.namespace.name'
null
null
"🦊"

By using the additional check with | select (.namespace >={} ), you only get the expected results and do not have to filter empty null values.

How to expand the GitLab endpoint response

Save the result from the API projects call and retry the examples above with jq.

$ curl "https://gitlab.com/api/v4/projects" -o result.json 2&>1 >/dev/null

Validate CI/CD YAML with jq for Git hooks

While writing this blog post, I learned that you can escape and encode YAML into JSON with jq. This trick comes in handy when automating YAML linting on the CLI, for example as a Git pre-commit hook.

Let’s take a look at the simplest way to test GitLab CI/CD from our community meetup workshops. A common mistake with the first steps of the process can be missing the two spaces indent or missing whitespace between the dash and following command. The following examples use .gitlab-ci.error.yml as a filename to showcase errors and .gitlab-ci.main.yml for working examples.

$ vim .gitlab-ci.error.yml

image: alpine:latest

test:
script:
  -exit 1

Committing the change and waiting for the CI/CD pipeline to validate at runtime can be time-consuming. The GitLab API provides a resource endpoint /ci/lint. A POST request with JSON-encoded YAML content will return a linting result faster.

Parse CI/CD YAML into JSON with jq

You can use jq to parse the raw YAML string into JSON:

$ jq --raw-input --slurp < .gitlab-ci.error.yml
"image: alpine:latest\n\ntest:\nscript:\n  -exit 1\n"

The /ci/lint API endpoint requires a JSON dictionary with content as key, and the raw YAML string as a value. You can use jq to format the input by using the arg parser:

§ jq --null-input --arg yaml "$(<.gitlab-ci.error.yml)" '.content=$yaml'
{
  "content": "image: alpine:latest\n\ntest:\nscript:\n  -exit 1"
}

Send POST request to /ci/lint

The next building block is to send a POST request to the /ci/lint. The request needs to specify the Content-Type header for the body. With using the pipe | character, the JSON-encoded YAML configuration is fed into the curl command call.

$ jq --null-input --arg yaml "$(<.gitlab-ci.error.yml)" '.content=$yaml' \
| curl "https://gitlab.com/api/v4/ci/lint?include_merged_yaml=true" \
--header 'Content-Type: application/json' --data @-
{"status":"invalid","errors":["jobs test config should implement a script: or a trigger: keyword","jobs script config should implement a script: or a trigger: keyword","jobs config should contain at least one visible job"],"warnings":[],"merged_yaml":"---\nimage: alpine:latest\ntest: \nscript: \"-exit 1\"\n"}

The CLI command returns JSON output. You can use jq again to format the response in a more readable way.

$ jq --null-input --arg yaml "$(<.gitlab-ci.error.yml)" '.content=$yaml' \
| curl "https://gitlab.com/api/v4/ci/lint?include_merged_yaml=true" \
--header 'Content-Type: application/json' --data @- \
| jq --raw-output '.errors'
[
  "jobs test config should implement a script: or a trigger: keyword",
  "jobs script config should implement a script: or a trigger: keyword",
  "jobs config should contain at least one visible job"
]

Expanded CI/CD configuration

When you are using GitLab 13.8+ and the pipeline editor, the API endpoint also includes the merged YAML output for further processing. This feature has a limitation: Remote includes work while other include types do not. Push the changes to the repository in a draft MR and trigger a remote full lint as an alternative.

The example below shows CI/CD job templates with extends.

$ vim .gitlab-ci.main.yml

.job-tmpl:
  image: alpine:latest
  variables:
    BUILD_TYPE: "Debug"
  script:
    - echo "Hello from GitLab 🦊"

test-extends-stage:
  extends: .job-tmpl

test-extends-prod:
  extends: .job-tmpl
  variables:
    BUILD_TYPE: "Release"
  script:
    - echo "Hello from GitLab 🦊🌈"

Validate and extract the .merged_yaml attribute by sending the YAML config to the GitLab API.

$ jq --null-input --arg yaml "$(<.gitlab-ci.main.yml)" '.content=$yaml' \
| curl "https://gitlab.com/api/v4/ci/lint?include_merged_yaml=true" \
--header 'Content-Type: application/json' --data @- \
| jq --raw-output '.merged_yaml'
---
".job-tmpl":
  image: alpine:latest
  variables:
    BUILD_TYPE: Debug
  script:
  - "echo \"Hello from GitLab \U0001F98A\""
test-extends-stage:
  image: alpine:latest
  variables:
    BUILD_TYPE: Debug
  script:
  - "echo \"Hello from GitLab \U0001F98A\""
  extends: ".job-tmpl"
test-extends-prod:
  image: alpine:latest
  variables:
    BUILD_TYPE: Release
  script:
  - "echo \"Hello from GitLab \U0001F98A\U0001F308\""
  extends: ".job-tmpl"

Do more with jq

You can use the CI lint command for your own ideas. For example: Wrapping it in a Git pre-commit hook which triggers an API call to /ci/lint on your GitLab host. Make sure to edit the variables fitting your environment. In this case, GITLAB_URL needs to point to your self-hosted instance.

$ vim lint.sh

#!/bin/bash

GITLAB_CI_YML=".gitlab-ci.yml"

GITLAB_URL="https://gitlab.com"
GITLAB_CI_LINT_URL="${GITLAB_URL}/api/v4/ci/lint"

GITLAB_CI_YML_CONTENT=$(<$GITLAB_CI_YML)

errors=()
while read -r value; do
        errors+=("$value")
done < <(jq --null-input --arg yaml "${GITLAB_CI_YML_CONTENT}" '.content=$yaml' \
| curl "${GITLAB_CI_LINT_URL}?include_merged_yaml=true" \
--header 'Content-Type: application/json' --data @- --silent \
| jq --raw-output '.errors' | jq -c '.[]')

echo -e "Analysing CI/CD config lint results ..."

count_err=0

for error in "${errors[@]}"; do
        echo "${error}"
        count_err=$count_err+1
done

if [[ $count_err -gt 0 ]]; then
        echo -e "GitLab CI/CD linting errors found. Aborting."
        exit 1
else
        echo -e "GitLab CI/CD linting ok."
        exit 0
fi

Save the file and make it executable with chmod.

$ chmod +x lint.sh

When the script lint.sh is run with the working .gitlab-ci.main.yml file, the output looks like this:

$ rm .gitlab-ci.yml
$ ln -s .gitlab-ci.main.yml .gitlab-ci.yml

$ ./lint.sh
Analysing CI/CD config lint results ...
GitLab CI/CD linting ok.

If you change the symlink to the .gitlab-ci.error.yml file and run the lint.sh script again you can see the error and exit code:

$ rm .gitlab-ci.yml
$ ln -s .gitlab-ci.error.yml .gitlab-ci.yml

$ ./lint.sh
Analysing CI/CD config lint results ...
"jobs test config should implement a script: or a trigger: keyword"
"jobs script config should implement a script: or a trigger: keyword"
"jobs config should contain at least one visible job"
GitLab CI/CD linting errors found. Aborting.

The Git Hook is located in the CI/CD API lint hook repository in the Developer Evangelism group.

Git hook with CI/CD YAML linting with the GitLab API

Use cases for programmatic API Clients

Sometimes shell programming cannot solve a requirement or a specific language integration is required for communicating with the API. Our community provides awesome API clients for many different programming languages.

Status and error handling

The GitLab API is designed to return different status codes depending on the context and requests. The HTTP response headers and response body tell about possible errors and API clients provide a programmatic interface.

Large result sets and pagination

The REST API can return a lot of results, and this stresses both the server and client on a new request. With returning a smaller subset of results - a page with a defined number of results - this limits response and helps save resources. This is called "Pagination" in the context of a REST API.

Pagination is enabled by default for the GitLab API. It requires you to fetch multiple pages to retrieve a full result set. The Link headers specify the next/previous page to follow.

Parsing the response header with Bash and jq can get complicated and is prone to error. Programming languages like Python, Perl, etc., provide abstract interfaces for HTTP requests and responses, header parsing and error handling. API client libraries are available that provide full support for pagination in a few lines of code.

The monitoring scripts for Docker Hub rate limits use a similar approach in Python where parsing the response headers is required to determine the rate limit values.

The following code provides an example with pagination using the python-gitlab docs and works with Python 3:

$ vim requirements.txt

python-gitlab

$ pip3 install -r requirements.txt

$ vim pagination.py

#!/usr/bin/env python

import gitlab
import os

# https://python-gitlab.readthedocs.io/en/stable/api-usage.html#getting-started-with-the-api
SERVER='https://gitlab.com'
GROUP_NAME='everyonecancontribute'

# Prefer keyset pagination
# https://python-gitlab.readthedocs.io/en/stable/api-usage.html#pagination
gl = gitlab.Gitlab(SERVER, private_token=os.environ['GITLAB_TOKEN'], pagination="keyset", order_by="id", per_page=100)

# Iterate over the list, and fire new API calls in case the result set does not match yet
groups = gl.groups.list(as_list=False)

found_page = 0

for group in groups:
    if GROUP_NAME in group.name:
        print(group.attributes)
        found_page = groups.current_page
        break

print("Pagination API example for Python with %s %s - result on page %d" % ("GitLab", "🦊", found_page))

Run the pagination.py script with the Python interpreter shown below. Adjust the python as needed for your environment.

$ python3 pagination.py

Pagination API example for Python with GitLab 🦊 - result on page 5

The full code example can be found in my API playground repository.

What's next?

Programming language libraries and SDKs provide abstractions for requests, response, and error handling. Depending on the use case, language libraries and SDKs can help with tests and code quality and be used instead of CLI calls. CLI, curl, and jq are a great combination to quickly test the response on a remote server shell. There are many more API endpoints and tips and tricks beyond what is described in this blog post. Read the posts below to learn more about API endpoint strategies.

What’s your cool API integrations you have built with jq and/or a programming language (library)? Tweet your favorites to @dnsmichi @gitlab :)

Cover image by Gert Boers on Unsplash

Try all GitLab features - free for 30 days

GitLab is more than just source code management or CI/CD. It is a full software development lifecycle & DevOps tool in a single application.

Try GitLab Free
Git is a trademark of Software Freedom Conservancy and our use of 'GitLab' is under license

Try the GitLab DevOps Platform for free for 30 days

Achieve higher productivity, faster and secure deployments

Start your free trial Maybe later