Troubleshooting the Installation

This section provides the steps to troubleshoot the
DX Platform
installation:
dxp10
This section provides the steps to troubleshoot the
DX Platform
installation:
System Pods in Crashloopback State
Symptom:
The status for the kube-system pods displays crashloopback.
Solution:
Restart the docker. To restart, run the following command:
systemctl restart docker
Pods are Not in Running State
Symptom:
Pods are not in the running state.
Solution:
Run the following commands to flush the pods:
iptables -P FORWARD ACCEPT iptables-save
Unable to Push the Image and Installation Fails
Symptom:
The installation fails because I am unable to push the image due to limited memory as shown:
[oerth-scx.ca.com:4443/analytics/jarvis-ldds:2.6.4] Tagging image: localhost:5000/jarvis/ldds:2.6.4. Error during callback com.github.dockerjava.api.exception.NotFoundException: {"message":"no such id: oerth-scx.ca.com:4443/dxi/platform/dxi-postgres:aop_v19.1.0.6_1.0.14"} at com.github.dockerjava.netty.handler.HttpResponseHandler.channelRead0(HttpResponseHandler.java:103) at com.github.dockerjava.netty.handler.HttpResponseHandler.channelRead0(HttpResponseHandler.java:33) at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340
Solution:
Follow these steps:
  1. Run the following command to stop the docker.
    systemctl stop docker
  2. Run the following command to remove the docker-storage directory:
    run rm -rf /mnt/docker-storage
  3. Run the following command to start the docker.
    systemctl start docker
Onboard the Tenant, Product, and Document Type Manually
The
DX Platform
installer onboards the product, the default tenant and the document type. You can check the logs to verify if the onboarding was successful. Alternatively, you can use the dxi-tenant-check.sh to verify the same. For more information, see the section.
Perform the following steps if the onboarding was not successful.
Onboard the Product
To onboard the product, update the curl command with the appropriate URL and execute the following curl. If the product is already on-boarded, the API throws an error saying product is already on-boarded. Ignore the error and proceed further.
Onboard the product manually, only if the onboarding by the installer was not successful.
curl -v -XPOST -H "Content-Type: application/json" -H "Cache-Control: no-cache" "<JARVIS_API_BASE_URL>/onboarding/products" -d '{ "product_id": "ao", "product_name": "Agile Operations", "product_description": "Agile Operations" }'
Onboard the Default Tenant
To onboard the default tenant, update the curl command with the appropriate URL and execute the following curl. If the tenant is already on-boarded, the API throws an error saying that the tenant is already on-boarded. Ignore the error and proceed further.
Onboard the default tenant manually, only if the onboarding by the installer was not successful.
curl -v -XPOST -H "Content-Type: application/json" -H "Cache-Control: no-cache" "<JARVIS_API_BASE_URL>/onboarding/tenants" -d '{ "product_id" : "ao", "tenant_id": "DEFAULTORG" }'
Onboard the Document Types
Onboard the document types manually, only if the onboarding by the installer was not successful.
Follow these steps:
  1. Navigate to the <
    INSTALL_DIR>/resources/dbscripts/jaas/jarvis
    directory.
  2. Update the Jarvis API host and port in the
    setup.sh
    script. If the port is https, then update the URL in the curl to use the https URI scheme.
    JARVIS_INGESTION_HOST=<JARVIS_API_HOSTNAME> JARVIS_REST_PORT=<JARVIS_API_PORT>
  3. Execute the script.
    sh setup.sh
  4. Navigate to the
    INSTALL_DIR/resources/dbscripts/jaas/jarvis/children
    directory.
    cd <INSTALL_DIR>/resources/dbscripts/jaas/jarvis/children
  5. Update the Jarvis API host and port in the
    setup.sh
    script. If the port is https, then update the URL in the curl to use the https URI scheme.
    JARVIS_INGESTION_HOST=<JARVIS_API_HOSTNAME> JARVIS_REST_PORT=<JARVIS_API_PORT>
    Run the following command to get these values:
    echo "http://$(kubectl get ingress -n <namespace> |fgrep jarvis-apis|awk '{print $2}')"
  6. Execute the script:
    sh setup.sh
Pods are Not in Running State
Symptom:
Pods are not in the running state.
Solution:
Follow these steps:
  1. Run the following command to get the namespace:
    kubectl get ns
  2. Run the following command to get the list of pods:
    kubectl get pods -n <namespace>
  3. Run the following command to check the logs of the pod:
    kubectl logs <podname> -n <namespace>
Installation Fails with Error
Symptom:
During the installation, when I select option 2 (Medium Installation) at the prompt, the installation fails with an error:
Specify the size of the Elasticsearch. 1. Small Installation: One Elasticsearch Hot node is installed. 2. Medium Installation: Three Elasticsearch Hot nodes are installed Enter your choice: 2Do you want to enable OI? (Y/N) [N]: y[es-validate] [ FAILED ] Expected to find 3 nodes with labels master-data-1, master-data-2, master-data-3. Need exactly one node tagged for each label.Sun Jul 21 19:36:39 PDT 2019 Installation failed.
Solution:
Perform the following steps:
  • Delete the config file from the install directory install directory> and run the installer again.
  • Ensure that there is only one node with each of the following label:
    • dxi-es-node=master-data-1
    • dx-es-node=master-data-2
    • dxi-es-node=master-data-3
apm-logstash Pod Error
Symptom:
The following error is displayed in the apm-logstash pod log:
"[2020-01-16T16:13:19,839][WARN ][org.apache.kafka.clients.NetworkClient] [Consumer clientId=logstash-3, groupId=logstash-apm-E612F4BB-A1C8-4175-9EBA-6D94592A422E] Error while fetching metadata with correlation id 8983 : {barouting_E612F4BB_A1C8_4175_9EBA_6D94592A422E=UNKNOWN_TOPIC_OR_PARTITION}
Solution:
This error is displayed till the first application is created in DX App Experience Analytics or till DX APM starts monitoring the web applications. You can safely ignore the warning message.
User Access Logs Are Not in Enterprise Manager
Symptom:
After several unsuccessful logins I want to see the user access in log. The Enterprise Manager log does not show any authorization message.
Solution:
You need to check the
apmservices-manager
logs for login failures because authentication is done in apmservices-manager and not Enterprise Manager. Simple way to do it is logging into Cluster Management console and getting logs from Cluster Management on the
Services
screen. Other option is using kubectl to query logs from the running pod.
kubectl logs ${POD_NAME} Resulting log entry: 2019-08-01 15:11:43.240 ERROR [manager,3064d4205d623f77,35211da6bb868168,false] 1 --- [http-nio-8008-exec-4] com.ca.apm.ess.services.EssManagerImpl : { "code" : 1351129, "msg" : "Authentication Failed: Invalid Username, Password, or Tenant ID", "desc" : "Authentication Failed: Invalid Username, Password, or Tenant ID" }
Kibana Not Populating
Symptom:
After the installation, Kibana is not populated.
Solution:
Follow these steps:
  1. Stop the Kibana pod.
  2. Stop the init-es-kibana pod.
  3. Stop the Jarvis-es pod.
  4. Restart the pods in the following order:
    • init-es-kibana
    • Jarvis-es
    • Kibana
      Ensure to restart the pods in this sequence with a 30 seconds time gap.
  5. Recreate the Kibana tenant index to populate the dashboards properly.
    curl -X DELETE <ES_URL>/.kibana_<tenant_id in lower case> For example, curl -X DELETE http://es.10.175.56.55.nip.io/.kibana_a3b92267-2a8a-40ba-a308-0568e69b32d8
  6. 6. Onboard the tenant using the following command:
    curl -X POST <ES_URL>/_ca_es_acl/onboard?tenant=<tenant id> For example, curl -X POST http://es.10.175.56.55.nip.io/_ca_es_acl/onboard?tenant=a3b92267-2a8a-40ba-a308-0568e69b32d8
You may scale down the
init-es-kibana
pod once the dashboards are populated.
Verify the Installation
After the installation is successful, run the following commands to verify the status:
Verify
Steps
Verify if the containers are deployed successfully.
Run the following command to check the deployment status of the pods:
kubectl get pods -n <namespace>
The status must be Running.
If some of the pods report an error, run the following command to view the details:
kubectl get events -n <namespace> |fgrep<service name> | more
Get Ingresses for components
Run the following command to get all the ingresses:
kubectl get ingress -n <namespace>
Log into the Admin UI Console
Run the following command to display the Admin Console URL:
echo http://$(kubectl get ingress -n <namespace> |fgrep apmservices-gateway|awk '{print $2}')"/dxiportal
Open the URL that was displayed after the installation. Log in as masteradmin with the password provided during the installation.
Verify the Elasticsearch Data
Run the following command:
echo "http://$(kubectl get ingress -n <namespace> |fgrep jarvis-es|awk '{print $2}')"
Open the following URL:
<URL printed above>_cat/indices
Verify that you can see multiple rows of data.
Check the Jarvis Health
Run the following command:
echo "http://$(kubectl get ingress -n <namespace> |fgrep jarvis-apis|awk '{print $2}')"
Run the following command to check the health of Jarvis. Replace the URL with the APIs Ingress URL. The status of all the components should report green.
curl -X GET "<URL printed above>/health" -H "accept: application/json;charset=UTF-8"