Running Drain Script fails for api_z1 during deployment with: no such file or directory (vSphere Cloudfoundry Deployment)

Johannes Hiemer
25. March 2015
Reading time: 2 min

While we have done many deployments of Cloudfoundry on Openstack, this time we tried to deploy Cloudfoundry (v205) on vSphere. Changing the deployment manifest to switch from Openstack Deployments to vSphere was quite easy and will be covered in a next blog post. The deployment ran through quite smoothly except for the nginx upload limit of 5 GB, which has been addressed in the latest Stemcell build (http://boshartifacts.cloudfoundry.org/file_collections?type=stemcells v2891). The error for that was:

E, [2015-03-24 19:08:48 #29701] [task:8] ERROR -- DirectorJobRunner: Failed to extract release archive '/var/vcap/data/tmp/director/0000000003' into dir '/var/vcap/data/tmp/director/d20150324-29701-1wvtgjm', tar returned 2, output:
gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error is not recoverable: exiting now

This could easily be fixed in microbosh by changing the config of nginx, or by just updating to the current stemcell for the microbosh director. So if you are not able to update the stemcell easily you can change the following property client_max_body_size 8192m in your microbosh found in:

/var/vcap/jobs/director/config/nginx.conf

After this configuration had been fixed in nginx the next issue we got stuck was:

api_z1/0 (canary). Failed: Action Failed get_task: Task 4e6d2c5a-2dc2-4535-5980-57717c2329a4 result: Running Drain Script: Running drain script: Starting command /var/vcap/jobs/cloud_controller_ng/bin/drain job_shutdown hash_unchanged: fork/exec /var/vcap/jobs/cloud_controller_ng/bin/drain: no such file or directory (00:00:01)

Error 450001: Action Failed get_task: Task 4e6d2c5a-2dc2-4535-5980-57717c2329a4 result: Running Drain Script: Running drain script: Starting command /var/vcap/jobs/cloud_controller_ng/bin/drain job_shutdown hash_unchanged: fork/exec /var/vcap/jobs/cloud_controller_ng/bin/drain: no such file or directory

Currently it is not clear for me what is the reason behind this. Dmitriy Kalinin of the Cloudfoundry team recommended the following steps:

  • Delete VM in vSphere
  • Run `bosh cck`
  • Run bosh deploy

Afterwards the configuration of the API job did take a while, but it went through successfully:

Started updating job api_z1 > api_z1/0 (canary). Done (00:06:06)