Saturday, January 20, 2018

Configuring Streamsets Data Collector With Hashicorp Vault Using AppRole

     In my previous post I detailed how to install and configure Hashicorp Vault using the AppID auth backend to work with Streamsets Data Collector. Now that the AppID auth backend has been deprecated, the AppRole auth backend is the Vault backend of choice to work with Data Collector as of version 2.7.0 of the product. I'd like to build upon my previous post and show how to enable the AppRole auth backend and get Data Collector configured to work with it.  

First we need to enable the AppRole Auth backend. We do this by:

curl \
-X POST \-H "X-Vault-Token:<roottoken>" \-d '{"type":"approle"}' \
http://<vaultserver>:8200/v1/sys/auth/approle


We then create a role for streamsets, and associate it with our existing secret-policy policy:

curl \
-H "X-Vault-Token:<roottoken>" \
-X POST \
-d '{ "token_ttl": "500h", "token_max_ttl": "500h", "secret_id_num_uses": 0, \ 
"policies": "secret-policy", "period": 0, "bind_secret_id": true}' \
http://<vaultserver>:8200/v1/auth/approle/role/streamsets



 The token_ttl is the unit of time to life for issued tokens, the token_maxx_ttl is the unit  
of time to life for which a token can no longer be renewed, secret_id_num_uses is the number of 
times a secret can be used to fetch a token from this auth backend, here we have set a 0 meaning 
unlimited, policies associates our policy we created in our previous post with this auth backend
(can be multiple), period if set will change the token to be periodic and will never expire as 
long as it is renewed, and finally bind_secret_id which requires the secret_id to be present when 
logging. 
With this command executed we can verify our role was created:

curl \
--header "X-Vault-Token:<roottoken>" \
--request LIST \ 
http://<vaultserver>:8200/v1/auth/approle/role
After verifying our role as created successfully, we need to grab our Role Id:

curl \
-X GET \
-H "X-Vault-Token:<roottoken>" \ 
http://<vaultserver>:8200/v1/auth/approle/role/streamsets/role-id
Now we need to grab the Secret Id associated with this Role Id:
curl \
-X POST \
-H "X-Vault-Token:<roottoken>" \
http://<vaultserver>:8200/v1/auth/approle/role/streamsets/secret-id

With these 2 credentials we can now login to vault:

curl \
-X POST \
-d '{"role_id":"<roleid>","secret_id":"<secretid>"}' \
http://<vaultserver>:8200/v1/auth/approle/login
This will grant us a new token, we've been using the root token so far.
We can test out this token by attempting to access a secret we created in
the previous post:

curl \
-X GET \
-H "X-Vault-Token:<newtoken>" \
http://<vaultserver>:8200/v1/secret/source/username

Now we need to configure Streamsets with to use AppRole. In Cloudera Manager 
go to the configuration for the Streamsets service. In the Data Collector 
Advanced Configuration Snippet (Safety Valve) for sdc.properties text block enter
these values:


credentialStore.vault.config.addr=http://<vaultserver>:8200/
credentialStores=vault
credentialStore.vault.def=streamsets-datacollector-vault-credentialstore-lib::com_streamsets_datacollector_credential_vault_VaultCredentialStore
credentialStore.vault.config.secret.id=<secretid>
credentialStore.vault.config.role.id=<roleid>
You'll notice that we used the Role Id and Secret Id obtained in the previous steps. 
After entering these you'll need to restart the Streamsets service. In the Streamsets 
pipelines we need to change the expression language we use to access secrets. 
In the previous example we used:

curl \
-X GET \
-H "X-Vault-Token:<newtoken>" \
http://<vaultserver>:8200/v1/secret/source/username
to access the source username. To mimic this in the Streamsets pipeline, in credentials
tab of an origin/destination we use this syntax


${credential:get("vault", "all","secret/source/username&value" )}


likewise, if we want to get the password for source we would enter


${credential:get("vault", "all","secret/source/password&value" )}
Figure 1. Credentials in Streamsets Pipeline
That's it! The pipelines will now interact with Vault, and get your usernames and passwords for your source systems.