So, a few weeks back, I was working on a Bash script that uses cURL to drive a set of REST APIs, and parses the responses, separating out the HTTP response code e.g. 200, 201, 301, 403 etc. from the actual response body.
This allowed me to write conditional logic based upon the return code, using IF/ELSE/FI.
This worked, and did the job for which it was intended, and everyone was happy .....
Fast forward a few weeks, and ... the script starts breaking ...
When I dig into the WHAT and WHY, I realise that the HTTP response codes are no longer being parsed from the response, even though the response still includes them.
After some serious digging around, including a major refactor of my code, I think I've worked out what went wrong ....
I don't necessarily know WHY, but I'm gonna blame the REST APIs, working on the assumption that my code didn't change, and therefore something outside my control did ....
At one point, I even wondered whether Bash on my Mac had changed, perhaps as a result of the latest macOS updates, but I was then able to reproduce the problem on an Ubuntu VM, so kinda suspect that Bash is NOT the villain.
Rather than sharing the full script here, I've written a prototype that demonstrates the problem, and then the solution.
So here's the bit of code that fails: -
curl_output=$(curl -s -k http://www.bbc.com --write-out "|%{http_code}")
IFS='|' read -r RESPONSE HTTP_CODE <<< "$curl_output"
echo -e "Response is" $RESPONSE "\n"
echo -e "HTTP Code is" $HTTP_CODE "\n"
Now, to break it down a bit, we've got four lines of code; the first merely calls the BBC website, which actually responds with HTTP 301 ( redirect ).
This grabs the HTTP response code, and appends it to the end of the string, captured within the curl_output variable, prepended by a vertical bar ( | ) separator.
For reference, the HTTP response code is obtained using the cURL parameter --write-out "|%{http_code}".
The second line of code does two things: -
ii. Parses the curl_output variable, using the read command ( which is internal to Bash ), parsing the content of the variable into two sub-variables, RESPONSE and HTTP_CODE, using IFS to separate the two
The third and fourth lines of code merely output the content of the two sub-variables; I'm using echo -e merely to allow me to add the "\n" instruction to the end of the output, to make it look nice.
When I run the script, this is what I see: -
Curl Output is 301 Moved Permanently
Moved Permanently
The document has moved here.
|301
First Pass
==========
Response is
HTTP Code is
which demonstrates the problem that I was seeing with my real REST APIs.
Note that the output from the cURL command is returned, including the HTTP 301 response code: -
Curl Output is 301 Moved Permanently
Moved Permanently
The document has moved here.
|301
but also note that, whilst the HTTP response is partially returned within the RESPONSE sub-variable, the HTTP return code is NOT returned via the HTTP_CODE variable.
There's a clue in this behaviour but it took me a while to find it.
I went down a rabbit hole for a while, replacing the use of IFS and the read command, with awk, because who doesn't love awk. This works BUT it does require one to know precisely how many columns are being returned in the output from the cURL command i.e. if I use awk '{print $3}' I need to be sure that the thing I want will always be in column 3.
But, back to the root cause ....
Notice that, in the above example, the value of the RESPONSE sub-variable is only PART of the actual curl_output variable, specifically the first piece of HTML: -
This gave me a clue; for some reason, the read command was treating that first element as the entirety of the response, and not seeing the vertical bar separator, meaning that IFS was, in essence, being ignored, meaning that the HTTP_CODE variable never gets populated.
So there's something between: -
and the next element of the curl_output variable: -
such as a space character ( 0x20 ).
I even resorted to using the hexdump and hexedit commands ( via brew install hexedit on macOS and apt-get install -y hexedit on Ubuntu ), to confirm what I was seeing: -
00000000 43 75 72 6C 20 4F 75 74 70 75 74 20 69 73 20 20 3C 21 44 4F 43 54 59 50 45 20 48 54 4D 4C 20 50 55 42 4C 49 43 20 22 2D 2F 2F 49 45 Curl Output is
0000002C 54 46 2F 2F 44 54 44 20 48 54 4D 4C 20 32 2E 30 2F 2F 45 4E 22 3E 20 3C 68 74 6D 6C 3E 3C 68 65 61 64 3E 20 3C 74 69 74 6C 65 3E 33 TF//DTD HTML 2.0//EN"> 3
00000058 30 31 20 4D 6F 76 65 64 20 50 65 72 6D 61 6E 65 6E 74 6C 79 3C 2F 74 69 74 6C 65 3E 20 3C 2F 68 65 61 64 3E 3C 62 6F 64 79 3E 20 3C 01 Moved Permanently <
00000084 68 31 3E 4D 6F 76 65 64 20 50 65 72 6D 61 6E 65 6E 74 6C 79 3C 2F 68 31 3E 20 3C 70 3E 54 68 65 20 64 6F 63 75 6D 65 6E 74 20 68 61 h1>Moved Permanently
The document ha
000000B0 73 20 6D 6F 76 65 64 20 3C 61 20 68 72 65 66 3D 22 68 74 74 70 73 3A 2F 2F 77 77 77 2E 62 62 63 2E 63 6F 6D 2F 22 3E 68 65 72 65 3C s moved here<
000000DC 2F 61 3E 2E 3C 2F 70 3E 20 3C 2F 62 6F 64 79 3E 3C 2F 68 74 6D 6C 3E 20 7C 33 30 31 20 0A 0A /a>.
|301 ..
( For reference, the ASCII codes are documented
here ).
Interestingly, there are multiple space characters ( 0x20 ) in the output, between DOCTYPE and HTML and PUBLIC etc. and yet ......
I'm not 100% sure why read would treat them differently but ..... c'est la vie.
This led me down a different path, reading up on the
read command ( apologies for the punning ) led me to the
-d parameter: -
-d delim continue until the first character of DELIM is read, rather than newline
This allowed me to tell read to effectively ignore space characters, rather than delimiting at the "first" space.
Thus I changed my script: -
IFS='|' read -d '' -r RESPONSE HTTP_CODE <<< "$curl_output"
echo -e "Response is" $RESPONSE "\n"
echo -e "HTTP Code is" $HTTP_CODE "\n"
adding in -d ''
Having created a script that does both types of read, one without -d and one with -d, this is what I see: -
Curl Output is 301 Moved Permanently
Moved Permanently
The document has moved here.
|301
First Pass
==========
Response is
HTTP Code is
Second Pass
===========
Response is 301 Moved Permanently
Moved Permanently
The document has moved here.
HTTP Code is 301
Note that, in the second pass, I see the entirety of the curl_output variable, specifically that before the vertical bar ( | ) separator in the RESPONSE sub-variable and, more importantly, I see the HTTP response code ( 301 ) in the HTTP_CODE sub-variable.
This is the entire script: -
#!/bin/bash
curl_output=$(curl -s -k http://www.bbc.com --write-out "|%{http_code}")
echo -e "Curl Output is " $curl_output "\n"
echo -e "First Pass"
echo -e "==========\n"
IFS='|' read -r RESPONSE HTTP_CODE <<< "$curl_output"
echo -e "Response is" $RESPONSE "\n"
echo -e "HTTP Code is" $HTTP_CODE "\n"
echo -e "Second Pass"
echo -e "===========\n"
IFS='|' read -d '' -r RESPONSE HTTP_CODE <<< "$curl_output"
echo -e "Response is" $RESPONSE "\n"
echo -e "HTTP Code is" $HTTP_CODE "\n"
Further more, whilst writing this post, with this little prototype script, I discovered something else that validated my original hypothesis, that the REST APIs had changed between the original script working and breaking ....
Recognising that the BBC homepage isn't really a good example of a REST API, even though cURL doesn't really care, I changed my script to use a "real" REST REST API ( I am using the
Dummy REST API Example ).
When I run my script, as per the above, I get this: -
Curl Output is {"status":"success","data":[{"id":"1","employee_name":"Tiger Nixon","employee_salary":"320800","employee_age":"61","profile_image":""},{"id":"2","employee_name":"Garrett Winters","employee_salary":"170750","employee_age":"63","profile_image":""},{"id":"3","employee_name":"Ashton Cox","employee_salary":"86000","employee_age":"66","profile_image":""},{"id":"4","employee_name":"Cedric Kelly","employee_salary":"433060","employee_age":"22","profile_image":""},{"id":"5","employee_name":"Airi Satou","employee_salary":"162700","employee_age":"33","profile_image":""},{"id":"6","employee_name":"Brielle Williamson","employee_salary":"372000","employee_age":"61","profile_image":""},{"id":"7","employee_name":"Herrod Chandler","employee_salary":"137500","employee_age":"59","profile_image":""},{"id":"8","employee_name":"Rhona Davidson","employee_salary":"327900","employee_age":"55","profile_image":""},{"id":"9","employee_name":"Colleen Hurst","employee_salary":"205500","employee_age":"39","profile_image":""},{"id":"10","employee_name":"Sonya Frost","employee_salary":"103600","employee_age":"23","profile_image":""},{"id":"11","employee_name":"Jena Gaines","employee_salary":"90560","employee_age":"30","profile_image":""},{"id":"12","employee_name":"Quinn Flynn","employee_salary":"342000","employee_age":"22","profile_image":""},{"id":"13","employee_name":"Charde Marshall","employee_salary":"470600","employee_age":"36","profile_image":""},{"id":"14","employee_name":"Haley Kennedy","employee_salary":"313500","employee_age":"43","profile_image":""},{"id":"15","employee_name":"Tatyana Fitzpatrick","employee_salary":"385750","employee_age":"19","profile_image":""},{"id":"16","employee_name":"Michael Silva","employee_salary":"198500","employee_age":"66","profile_image":""},{"id":"17","employee_name":"Paul Byrd","employee_salary":"725000","employee_age":"64","profile_image":""},{"id":"18","employee_name":"Gloria Little","employee_salary":"237500","employee_age":"59","profile_image":""},{"id":"19","employee_name":"Bradley Greer","employee_salary":"132000","employee_age":"41","profile_image":""},{"id":"20","employee_name":"Dai Rios","employee_salary":"217500","employee_age":"35","profile_image":""},{"id":"21","employee_name":"Jenette Caldwell","employee_salary":"345000","employee_age":"30","profile_image":""},{"id":"22","employee_name":"Yuri Berry","employee_salary":"675000","employee_age":"40","profile_image":""},{"id":"23","employee_name":"Caesar Vance","employee_salary":"106450","employee_age":"21","profile_image":""},{"id":"24","employee_name":"Doris Wilder","employee_salary":"85600","employee_age":"23","profile_image":""}]}|200
First Pass
==========
Response is {"status":"success","data":[{"id":"1","employee_name":"Tiger Nixon","employee_salary":"320800","employee_age":"61","profile_image":""},{"id":"2","employee_name":"Garrett Winters","employee_salary":"170750","employee_age":"63","profile_image":""},{"id":"3","employee_name":"Ashton Cox","employee_salary":"86000","employee_age":"66","profile_image":""},{"id":"4","employee_name":"Cedric Kelly","employee_salary":"433060","employee_age":"22","profile_image":""},{"id":"5","employee_name":"Airi Satou","employee_salary":"162700","employee_age":"33","profile_image":""},{"id":"6","employee_name":"Brielle Williamson","employee_salary":"372000","employee_age":"61","profile_image":""},{"id":"7","employee_name":"Herrod Chandler","employee_salary":"137500","employee_age":"59","profile_image":""},{"id":"8","employee_name":"Rhona Davidson","employee_salary":"327900","employee_age":"55","profile_image":""},{"id":"9","employee_name":"Colleen Hurst","employee_salary":"205500","employee_age":"39","profile_image":""},{"id":"10","employee_name":"Sonya Frost","employee_salary":"103600","employee_age":"23","profile_image":""},{"id":"11","employee_name":"Jena Gaines","employee_salary":"90560","employee_age":"30","profile_image":""},{"id":"12","employee_name":"Quinn Flynn","employee_salary":"342000","employee_age":"22","profile_image":""},{"id":"13","employee_name":"Charde Marshall","employee_salary":"470600","employee_age":"36","profile_image":""},{"id":"14","employee_name":"Haley Kennedy","employee_salary":"313500","employee_age":"43","profile_image":""},{"id":"15","employee_name":"Tatyana Fitzpatrick","employee_salary":"385750","employee_age":"19","profile_image":""},{"id":"16","employee_name":"Michael Silva","employee_salary":"198500","employee_age":"66","profile_image":""},{"id":"17","employee_name":"Paul Byrd","employee_salary":"725000","employee_age":"64","profile_image":""},{"id":"18","employee_name":"Gloria Little","employee_salary":"237500","employee_age":"59","profile_image":""},{"id":"19","employee_name":"Bradley Greer","employee_salary":"132000","employee_age":"41","profile_image":""},{"id":"20","employee_name":"Dai Rios","employee_salary":"217500","employee_age":"35","profile_image":""},{"id":"21","employee_name":"Jenette Caldwell","employee_salary":"345000","employee_age":"30","profile_image":""},{"id":"22","employee_name":"Yuri Berry","employee_salary":"675000","employee_age":"40","profile_image":""},{"id":"23","employee_name":"Caesar Vance","employee_salary":"106450","employee_age":"21","profile_image":""},{"id":"24","employee_name":"Doris Wilder","employee_salary":"85600","employee_age":"23","profile_image":""}]}
HTTP Code is 200
Second Pass
===========
Response is {"status":"success","data":[{"id":"1","employee_name":"Tiger Nixon","employee_salary":"320800","employee_age":"61","profile_image":""},{"id":"2","employee_name":"Garrett Winters","employee_salary":"170750","employee_age":"63","profile_image":""},{"id":"3","employee_name":"Ashton Cox","employee_salary":"86000","employee_age":"66","profile_image":""},{"id":"4","employee_name":"Cedric Kelly","employee_salary":"433060","employee_age":"22","profile_image":""},{"id":"5","employee_name":"Airi Satou","employee_salary":"162700","employee_age":"33","profile_image":""},{"id":"6","employee_name":"Brielle Williamson","employee_salary":"372000","employee_age":"61","profile_image":""},{"id":"7","employee_name":"Herrod Chandler","employee_salary":"137500","employee_age":"59","profile_image":""},{"id":"8","employee_name":"Rhona Davidson","employee_salary":"327900","employee_age":"55","profile_image":""},{"id":"9","employee_name":"Colleen Hurst","employee_salary":"205500","employee_age":"39","profile_image":""},{"id":"10","employee_name":"Sonya Frost","employee_salary":"103600","employee_age":"23","profile_image":""},{"id":"11","employee_name":"Jena Gaines","employee_salary":"90560","employee_age":"30","profile_image":""},{"id":"12","employee_name":"Quinn Flynn","employee_salary":"342000","employee_age":"22","profile_image":""},{"id":"13","employee_name":"Charde Marshall","employee_salary":"470600","employee_age":"36","profile_image":""},{"id":"14","employee_name":"Haley Kennedy","employee_salary":"313500","employee_age":"43","profile_image":""},{"id":"15","employee_name":"Tatyana Fitzpatrick","employee_salary":"385750","employee_age":"19","profile_image":""},{"id":"16","employee_name":"Michael Silva","employee_salary":"198500","employee_age":"66","profile_image":""},{"id":"17","employee_name":"Paul Byrd","employee_salary":"725000","employee_age":"64","profile_image":""},{"id":"18","employee_name":"Gloria Little","employee_salary":"237500","employee_age":"59","profile_image":""},{"id":"19","employee_name":"Bradley Greer","employee_salary":"132000","employee_age":"41","profile_image":""},{"id":"20","employee_name":"Dai Rios","employee_salary":"217500","employee_age":"35","profile_image":""},{"id":"21","employee_name":"Jenette Caldwell","employee_salary":"345000","employee_age":"30","profile_image":""},{"id":"22","employee_name":"Yuri Berry","employee_salary":"675000","employee_age":"40","profile_image":""},{"id":"23","employee_name":"Caesar Vance","employee_salary":"106450","employee_age":"21","profile_image":""},{"id":"24","employee_name":"Doris Wilder","employee_salary":"85600","employee_age":"23","profile_image":""}]}
HTTP Code is 200
Yes, it works perfectly in both cases.
In other words, a real REST API that's returning JSON doesn't seem to break my script, regardless of whether I use read -d '' or not.
Which is nice.
However, I learned a lot whilst digging into this, both to fix the actual script AND to write this post.
And ....
EVERY DAY IS A SCHOOL DAY