
What is JSON linting?
To understand JSON linting, let’s quickly break down the two concepts of JSON and linting.
JSON is an acronym for JavaScript Object Notation, which is a lightweight, text-based, open standard format designed specifically for representing structured data based on the JavaScript object syntax. It is most commonly used for transmitting data in web applications. It parses data faster than XML and is easy for humans to read and write.
Linting is a process that automatically checks and analyzes static source code for programming and stylistic errors, bugs and suspicious constructs.
JSON has become popular because it is human-readable and doesn’t require a complete markup structure like XML. It is easy to analyze into logical syntactic components, especially in JavaScript. It also has many JSON libraries for most programming languages.
Benefits of JSON linting
Finding an error in JSON code can be challenging and time-consuming. The best way to find and correct errors while simultaneously saving time is to use a linting tool. When Json code is copied and pasted into the linting editor, it validates and reformats Json. It is easy to use and supports a wide range of browsers, so applications development with Json coding don’t require a lot of effort to make them browser-compatible.
JSON linting is an efficient way to reduce errors and it improves the overall quality of the JSON code. This can help accelerate development and reduce costs because errors are discovered earlier.
Some common JSON linting errors
In instances where a JSON transaction fails, the error information is conveyed to the user by the API gateway. By default, the API gateway returns a very basic fault to the client when a message filter has failed.
One common JSON linting error is parsing. A “parse: unexpected character" error occurs when passing a value that is not a valid JSON string to the JSON. parse method, for example, a native JavaScript object. To solve the error, make sure to only pass valid JSON strings to the JSON.
Another common error is NULL or inaccurate data errors, not using the right data type per column or extension for JSON files, and not ensuring every row in the JSON table is in the JSON format.
How to fix JSON linting errors
If you encounter a NULL or inaccurate data error in parsing, the first step is to make sure you use the right data type per column. For example, in the case of “age,” use 12 instead of twelve.
Also make sure you are using the right extension for JSON files. When using a compressed JSON file, it must end with “json” followed by the extension of the format, such as “.gz.”
Next, make sure the JSON format is used for every row in the JSON table. Create a table with a delimiter that is not in the input files. Then, run a query equivalent to the return name of the file, row points and the file path for the null NSON rows.
Sometimes you may find files that are not your source code files, but ones generated by the system when compiling your project. In that instance, when the file has a .js extension, the ESLint needs to exclude that file when searching for errors. One method of doing this is by using ‘IgnorePatterns:’ in .eslintrc.json file either after or before the “rules” tag.
“ignorePatterns”: [“temp.js”, “*/vendor/.js”],
“rules”: {
Alternatively, you can create a separate file named‘.eslintignore’ and incorporate the files to be excluded as shown below : */.js If you opt to correct instead of ignore, look for the error code in the last column. Correct all the errors in one fule and rerun ‘npx eslint . >errfile’ and ensure all the errors of that type are cleared. Then look for the next error code and repeat the procedure until all errors are cleared.
Of course, there will be instances when you won’t understand an error, so in that case, open https://eslint.org/docs/user-guide/getting-started and type the error code in the ‘Search’ field on the top of the document. There you will find very detailed instructions as to why that error is raised and how to fix it.
Finally, you can forcibly fix errors automatically while generating the error list using:
Npx eslintrc . — fix
This is not recommended until you become more well-versed with lint errors and how to fix them. Also, you should keep a backup of the files you are linting because while fixing errors, certain code may get overwritten, which could cause your program to fail.
JSON linting best practices
Here are some tips for helping your consumers use your output:
First, always enclose the Key : Value pair within double quotes. It may be convenient (not sure how) to generate with Single quotes, but JSON parser don’t like to parse JSON objects with single quotes.
For numerical values, quotes are optional but it is a good idea to enclose them in double quotes.
Next, don’t ever use hyphens in your key fields because it breaks python and scala parser. Instead use underscores (_).
It’s a good idea to always create a root element, especially when you’re creating a complicated JSON.
Modern web applications come with a REST API which returns JSON. The format needs to be parsed, and often feeds into scripts and service daemons polling the API for automation.
Starting with a new REST API and its endpoints can often be overwhelming. Documentation may suggest looking into a set of SDKs and libraries for various languages, or instruct you to use curl
or wget
on the CLI to send a request. Both CLI tools come with a variety of parameters which help to download and print the response string, for example in JSON format.
The response string retrieved from curl
may get long and confusing. It can require parsing the JSON format and filtering for a smaller subset of results. This helps with viewing the results on the CLI, and minimizes the data to process in scripts. The following example retrieves all projects from GitLab and returns a paginated result set with the first 20 projects:
$ curl "https://gitlab.com/api/v4/projects"
The GitLab REST API documentation guides you through the first steps with error handling and authentication. In this blog post, we will be using the Personal Access Token as the authentication method. Alternatively, you can use project access tokens for automated authentication that avoids the use of personal credentials.
REST API authentication
Since not all endpoints are accessible with anonymous access they might require authentication. Try fetching user profile data with this request:
$ curl "https://gitlab.com/api/v4/user"
{"message":"401 Unauthorized"}
The API request against the /user
endpoint requires to pass the personal access token into the request, for example, as a request header. To avoid exposing credentials on the terminal, you can export the token and its value into the user's environment. You can automate the variable export with ZSH and the .env plugin in your shell environment. You can also source the .env
once in the existing shell environment.
$ vim ~/.env
export GITLAB_TOKEN=”...”
$ source ~/.env
Scripts and commands being run in your shell environment can reference the $GITLAB_TOKEN
variable. Try querying the user API endpoint again, with adding the authorization header into the request:
$ curl -H "Authorization: Bearer $GITLAB_TOKEN" "https://gitlab.com/api/v4/user"
A reminder that only administrators can see the attributes of all users, and the individual can only see their user profile – for example, email
is hidden from the public domain.
How to request responses in JSON
The GitLab API provides many resources and URL endpoints. You can manage almost anything with the API that you’d otherwise configure using the graphic user interface.
After sending the API request, the response message contains the body as string, for example as a JSON content type. curl
can provide more information about the response headers which is helpful for debugging. Multiple verbose levels enable the full debug output with -vvv
:
$ curl -vvv "https://gitlab.com/api/v4/projects"
[...]
* SSL connection using TLSv1.2 / ECDHE-RSA-CHACHA20-POLY1305
* ALPN, server accepted to use h2
* Server certificate:
* subject: CN=gitlab.com
* start date: Jan 21 00:00:00 2021 GMT
* expire date: May 11 23:59:59 2021 GMT
* subjectAltName: host "gitlab.com" matched cert's "gitlab.com"
* issuer: C=GB; ST=Greater Manchester; L=Salford; O=Sectigo Limited; CN=Sectigo RSA Domain Validation Secure Server CA
* SSL certificate verify ok.
[...]
> GET /api/v4/projects HTTP/2
> Host: gitlab.com
> User-Agent: curl/7.64.1
> Accept: */*
[...]
< HTTP/2 200
< date: Mon, 19 Apr 2021 11:25:31 GMT
< content-type: application/json
[...]
[{"id":25993690,"description":"project for adding issues","name":"project-for-issues-1e1b6d5f938fb240","name_with_namespace":"gitlab-qa-sandbox-group / qa-test-2021-04-19-11-13-01-d7d873fd43cd34b6 / project-for-issues-1e1b6d5f938fb240","path":"project-for-issues-1e1b6d5f938fb240","path_with_namespace":"gitlab-qa-sandbox-group/qa-test-2021-04-19-11-13-01-d7d873fd43cd34b6/project-for-issues-1e1b6d5f938fb240"
[... JSON content ...]
"avatar_url":null,"web_url":"https://gitlab.com/groups/gitlab-qa-sandbox-group/qa-test-2021-04-19-11-12-56-7f3128bd0e41b92f"}}]
* Closing connection 0
The curl
command output provides helpful insights into TLS ciphers and versions, the request lines starting with >
and response lines starting with <
. The response body string is encoded as JSON.
How to see the structure of the returned JSON
To get a quick look at the structure of the returned JSON file, try these tips:
- Enclose square brackets to identify an array
[ …. ]
. - Enclose curly brackets identify a dictionary
{ … }
. Dictionaries are also called associative arrays, maps, etc. ”key”: value
indicates a key-value pair in a dictionary, which is identified by curly brackets enclosing the key-value pairs.
The values in JSON consist of specific types - a string value is put in double-quotes. Boolean true/false, numbers, and floating-point numbers are also present as types. If a key exists but its value is not set, REST APIs often return null
.
Verify the data structure by running "linters". Python's JSON module can parse and lint JSON strings. The example below misses a closing square bracket to showcase the error:
$ echo '[{"key": "broken"}' | python -m json.tool
Expecting object: line 1 column 19 (char 18)
jq – a lightweight and flexible CLI processor – can be used as a standalone tool to parse and validate JSON data.
$ echo '[{"key": "broken"}' | jq
parse error: Unfinished JSON term at EOF at line 2, column 0
jq
is available in the package managers of most operating systems.
$ brew install jq
$ apt install jq
$ dnf install jq
$ zypper in jq
$ pacman -S jq
$ apk add jq
Dive deep into JSON data structures
The true power of jq
lies in how it can be used to parse JSON data:
jq
is likesed
for JSON data. It can be used to slice, filter, map, and transform structured data with the same ease thatsed
,awk
,grep
etc., let you manipulate text.
The output below shows how it looks to run the request against the project API again, but this time, the output is piped to jq
.
$ curl "https://gitlab.com/api/v4/projects" | jq
[
{
"id": 25994891,
"description": "...",
"name": "...",
[...]
"forks_count": 0,
"star_count": 0,
"last_activity_at": "2021-04-19T11:50:24.292Z",
"namespace": {
"id": 11528141,
"name": "...",
[...]
}
}
]
The first difference is the format of the JSON data structure, so-called pretty-printed. New lines and indents in data structure scopes help your eyes and allow you to identify the inner and outer data structures involved. This format is needed to determine which jq
filters and methods you want to apply next.
About arrays and dictionaries
The set of results from an API often is returned as a list (or "array") of items. An item itself can be a single value or a JSON object. The following example mimics the response from the GitLab API and creates an array of dictionaries as a nested result set.
$ vim result.json
[
{
"id": 1,
"name": "project1"
},
{
"id": 2,
"name": "project2"
},
{
"id": 3,
"name": "project-internal-dev",
"namespace": {
"name": "🦊"
}
}
]
Use cat
to print the file content on stdout and pipe it into jq
. The outer data structure is an array – use -c .[]
to access and print all items.
$ cat result.json | jq -c '.[]'
{"id":1,"name":"project1"}
{"id":2,"name":"project2"}
{"id":3,"name":"project-internal-dev","namespace":{"name":"🦊"}}
How to filter data structures with jq
Filter items by passing | select (...)
to jq
. The filter takes a lambda callback function as a comparator condition. When the item matches the condition, it is returned to the caller.
Use the dot indexer .
to access dictionary keys and their values. Try to filter for all items where the name is project2
:
$ cat result.json | jq -c '.[] | select (.name == "project2")'
{"id":2,"name":"project2"}
Practice this example by selecting the id
with the value 2
instead of the name
.
Filter with matching a string
During tests, you may need to match different patterns instead of knowing the full name. Think of projects that match a specific path or are located in a group where you only know the prefix. Simple string matches can be achieved with the | contains (...)
function. It allows you to check whether the given string is inside the target string – which requires the selected attribute to be of the string type.
For a filter with the select chain, the comparison condition needs to be changed from the equal operator ==
to checking the attribute .name
with | contains ("dev")
.
$ cat result.json | jq -c '.[] | select (.name | contains ("dev") )'
{"id":3,"name":"project-internal-dev","namespace":{"name":"🦊"}}
Simple matches can be achieved with the contains
function.
Filter with matching regular expressions
For advanced string pattern matching, it is recommended to use regular expressions. jq
provides the test function for this use case. Try to filter for all projects which end with a number, represented by \d+
. Note that the backslash \
needs to be escaped as \\
for shell execution. ^
tests for beginning of the string, $
is the ending check.
$ cat result.json | jq -c '.[] | select (.name | test ("^project\\d+$") )'
{"id":1,"name":"project1"}
{"id":2,"name":"project2"}
Tip: You can test and build the regular expression with regex101 before test-driving it with jq
.
Access nested values
Key value pairs in a dictionary may have a dictionary or array as a value. jq
filters need to take this factor into account when filtering or transforming the result. The example data structure provides project-internal-dev
which has the key namespace
and a value of a dictionary type.
{
"id": 3,
"name": "project-internal-dev",
"namespace": {
"name": "🦊"
}
}
jq
allows the user to specify the array and dictionary types as []
and {}
to be used in select chains with greater and less than comparisons. The []
brackets select filters for non-empty dictionaries for the namespace
attribute, while the {}
brackets select for all null
(raw JSON) values.
$ cat result.json | jq -c '.[] | select (.namespace >={} )'
{"id":3,"name":"project-internal-dev","namespace":{"name":"🦊"}}
$ cat result.json | jq -c '.[] | select (.namespace <={} )'
{"id":1,"name":"project1"}
{"id":2,"name":"project2"}
These methods can be used to access the name attribute of the namespace, but only if the namespace contains values. Tip: You can chain multiple jq
calls by piping the result into another jq
call. .name
is a subkey of the primary .namespace
key.
$ cat result.json | jq -c '.[] | select (.namespace >={} )' | jq -c '.namespace.name'
"🦊"
The additional select command with non-empty namespaces ensures that only initialized values for .namespace.name
are returned. This is a safety check, and avoids receiving null
values in the result you would need to filter again.
$ cat result.json| jq -c '.[]' | jq -c '.namespace.name'
null
null
"🦊"
By using the additional check with | select (.namespace >={} )
, you only get the expected results and do not have to filter empty null
values.
How to expand the GitLab endpoint response
Save the result from the API projects call and retry the examples above with jq
.
$ curl "https://gitlab.com/api/v4/projects" -o result.json 2&>1 >/dev/null
Validate CI/CD YAML with jq
for Git hooks
While writing this blog post, I learned that you can escape and encode YAML into JSON with jq
. This trick comes in handy when automating YAML linting on the CLI, for example as a Git pre-commit hook.
Let’s take a look at the simplest way to test GitLab CI/CD from our community meetup workshops. A common mistake with the first steps of the process can be missing the two spaces indent or missing whitespace between the dash and following command. The following examples use .gitlab-ci.error.yml
as a filename to showcase errors and .gitlab-ci.main.yml
for working examples.
$ vim .gitlab-ci.error.yml
image: alpine:latest
test:
script:
-exit 1
Committing the change and waiting for the CI/CD pipeline to validate at runtime can be time-consuming. The GitLab API provides a resource endpoint /ci/lint. A POST request with JSON-encoded YAML content will return a linting result faster.
Parse CI/CD YAML into JSON with jq
You can use jq to parse the raw YAML string into JSON:
$ jq --raw-input --slurp < .gitlab-ci.error.yml
"image: alpine:latest\n\ntest:\nscript:\n -exit 1\n"
The /ci/lint
API endpoint requires a JSON dictionary with content
as key, and the raw YAML string as a value. You can use jq
to format the input by using the arg parser:
§ jq --null-input --arg yaml "$(<.gitlab-ci.error.yml)" '.content=$yaml'
{
"content": "image: alpine:latest\n\ntest:\nscript:\n -exit 1"
}
Send POST request to /ci/lint
The next building block is to send a POST request to the /ci/lint. The request needs to specify the Content-Type
header for the body. With using the pipe |
character, the JSON-encoded YAML configuration is fed into the curl command call.
$ jq --null-input --arg yaml "$(<.gitlab-ci.error.yml)" '.content=$yaml' \
| curl "https://gitlab.com/api/v4/ci/lint?include_merged_yaml=true" \
--header 'Content-Type: application/json' --data @-
{"status":"invalid","errors":["jobs test config should implement a script: or a trigger: keyword","jobs script config should implement a script: or a trigger: keyword","jobs config should contain at least one visible job"],"warnings":[],"merged_yaml":"---\nimage: alpine:latest\ntest: \nscript: \"-exit 1\"\n"}
The CLI command returns JSON output. You can use jq
again to format the response in a more readable way.
$ jq --null-input --arg yaml "$(<.gitlab-ci.error.yml)" '.content=$yaml' \
| curl "https://gitlab.com/api/v4/ci/lint?include_merged_yaml=true" \
--header 'Content-Type: application/json' --data @- \
| jq --raw-output '.errors'
[
"jobs test config should implement a script: or a trigger: keyword",
"jobs script config should implement a script: or a trigger: keyword",
"jobs config should contain at least one visible job"
]
Expanded CI/CD configuration
When you are using GitLab 13.8+ and the pipeline editor, the API endpoint also includes the merged YAML output for further processing. This feature has a limitation: Remote includes work while other include types do not. Push the changes to the repository in a draft MR and trigger a remote full lint as an alternative.
The example below shows CI/CD job templates with extends.
$ vim .gitlab-ci.main.yml
.job-tmpl:
image: alpine:latest
variables:
BUILD_TYPE: "Debug"
script:
- echo "Hello from GitLab 🦊"
test-extends-stage:
extends: .job-tmpl
test-extends-prod:
extends: .job-tmpl
variables:
BUILD_TYPE: "Release"
script:
- echo "Hello from GitLab 🦊🌈"
Validate and extract the .merged_yaml
attribute by sending the YAML config to the GitLab API.
$ jq --null-input --arg yaml "$(<.gitlab-ci.main.yml)" '.content=$yaml' \
| curl "https://gitlab.com/api/v4/ci/lint?include_merged_yaml=true" \
--header 'Content-Type: application/json' --data @- \
| jq --raw-output '.merged_yaml'
---
".job-tmpl":
image: alpine:latest
variables:
BUILD_TYPE: Debug
script:
- "echo \"Hello from GitLab \U0001F98A\""
test-extends-stage:
image: alpine:latest
variables:
BUILD_TYPE: Debug
script:
- "echo \"Hello from GitLab \U0001F98A\""
extends: ".job-tmpl"
test-extends-prod:
image: alpine:latest
variables:
BUILD_TYPE: Release
script:
- "echo \"Hello from GitLab \U0001F98A\U0001F308\""
extends: ".job-tmpl"
Do more with jq
You can use the CI lint command for your own ideas. For example: Wrapping it in a Git pre-commit hook which triggers an API call to /ci/lint
on your GitLab host. Make sure to edit the variables fitting your environment. In this case, GITLAB_URL
needs to point to your self-managed instance.
$ vim lint.sh
#!/bin/bash
GITLAB_CI_YML=".gitlab-ci.yml"
GITLAB_URL="https://gitlab.com"
GITLAB_CI_LINT_URL="${GITLAB_URL}/api/v4/ci/lint"
GITLAB_CI_YML_CONTENT=$(<$GITLAB_CI_YML)
errors=()
while read -r value; do
errors+=("$value")
done < <(jq --null-input --arg yaml "${GITLAB_CI_YML_CONTENT}" '.content=$yaml' \
| curl "${GITLAB_CI_LINT_URL}?include_merged_yaml=true" \
--header 'Content-Type: application/json' --data @- --silent \
| jq --raw-output '.errors' | jq -c '.[]')
echo -e "Analysing CI/CD config lint results ..."
count_err=0
for error in "${errors[@]}"; do
echo "${error}"
count_err=$count_err+1
done
if [[ $count_err -gt 0 ]]; then
echo -e "GitLab CI/CD linting errors found. Aborting."
exit 1
else
echo -e "GitLab CI/CD linting ok."
exit 0
fi
Save the file and make it executable with chmod
.
$ chmod +x lint.sh
When the script lint.sh
is run with the working .gitlab-ci.main.yml
file, the output looks like this:
$ rm .gitlab-ci.yml
$ ln -s .gitlab-ci.main.yml .gitlab-ci.yml
$ ./lint.sh
Analysing CI/CD config lint results ...
GitLab CI/CD linting ok.
If you change the symlink to the .gitlab-ci.error.yml
file and run the lint.sh
script again you can see the error and exit code:
$ rm .gitlab-ci.yml
$ ln -s .gitlab-ci.error.yml .gitlab-ci.yml
$ ./lint.sh
Analysing CI/CD config lint results ...
"jobs test config should implement a script: or a trigger: keyword"
"jobs script config should implement a script: or a trigger: keyword"
"jobs config should contain at least one visible job"
GitLab CI/CD linting errors found. Aborting.
The Git Hook is located in the CI/CD API lint hook repository in the Developer Evangelism group.
Use cases for programmatic API Clients
Sometimes shell programming cannot solve a requirement or a specific language integration is required for communicating with the API. Our community provides awesome API clients for many different programming languages.
Status and error handling
The GitLab API is designed to return different status codes depending on the context and requests. The HTTP response headers and response body tell about possible errors and API clients provide a programmatic interface.
Large result sets and pagination
The REST API can return a lot of results, and this stresses both the server and client on a new request. With returning a smaller subset of results - a page with a defined number of results - this limits response and helps save resources. This is called "Pagination" in the context of a REST API.
Pagination is enabled by default for the GitLab API. It requires you to fetch multiple pages to retrieve a full result set. The Link headers specify the next/previous page to follow.
Parsing the response header with Bash and jq
can get complicated and is prone to error. Programming languages like Python, Perl, etc., provide abstract interfaces for HTTP requests and responses, header parsing and error handling. API client libraries are available that provide full support for pagination in a few lines of code.
The monitoring scripts for Docker Hub rate limits use a similar approach in Python where parsing the response headers is required to determine the rate limit values.
The following code provides an example with pagination using the python-gitlab docs and works with Python 3:
$ vim requirements.txt
python-gitlab
$ pip3 install -r requirements.txt
$ vim pagination.py
#!/usr/bin/env python
import gitlab
import os
# https://python-gitlab.readthedocs.io/en/stable/api-usage.html#getting-started-with-the-api
SERVER='https://gitlab.com'
GROUP_NAME='everyonecancontribute'
# Prefer keyset pagination
# https://python-gitlab.readthedocs.io/en/stable/api-usage.html#pagination
gl = gitlab.Gitlab(SERVER, private_token=os.environ['GITLAB_TOKEN'], pagination="keyset", order_by="id", per_page=100)
# Iterate over the list, and fire new API calls in case the result set does not match yet
groups = gl.groups.list(as_list=False)
found_page = 0
for group in groups:
if GROUP_NAME in group.name:
print(group.attributes)
found_page = groups.current_page
break
print("Pagination API example for Python with %s %s - result on page %d" % ("GitLab", "🦊", found_page))
Run the pagination.py
script with the Python interpreter shown below. Adjust the python
as needed for your environment.
$ python3 pagination.py
Pagination API example for Python with GitLab 🦊 - result on page 5
The full code example can be found in my API playground repository.
What's next?
Programming language libraries and SDKs provide abstractions for requests, response, and error handling. Depending on the use case, language libraries and SDKs can help with tests and code quality and be used instead of CLI calls. CLI, curl, and jq
are a great combination to quickly test the response on a remote server shell. There are many more API endpoints and tips and tricks beyond what is described in this blog post. Read the posts below to learn more about API endpoint strategies.
- Variable assignment with Bash and JSON
- Manage Personal Access Tokens with jq
- Parse ISO timestamps with
jq
and milliseconds issue
What’s your cool API integrations you have built with jq
and/or a programming language (library)? Tweet your favorites to @dnsmichi @gitlab
:)
Cover image by Gert Boers on Unsplash
“Learn JSON superpowers with jq, the @GitLab API & automated CI/CD Linting” – Michael Friedrich
Click to tweet