Troubleshoot the Installation

Use the following solutions when you encounter issues with installation and/or upgrade:
apip42
Use the following solutions when you encounter issues with installation and/or upgrade:
2
2
If you are using your own provisioned host, all Docker and Portal installation commands
must
be prepended with
sudo
for example
sudo ./portal.sh
.
How to Inspect Services
To view all Docker services:
docker service ls
To view details for a specific service:
docker service inspect <serviceName>
Deployment Failure
Symptom
The
API Portal
 deployment failed.
Solution
Perform the following steps to troubleshoot a failed deployment:
  1. Ensure that Docker is running.
    For Docker information, see https://docs.docker.com/engine/admin/.
  2. View the services using the following command:
    docker service ls
    Look for services with
    0/X
    under the
    REPLICAS
    column in the following example:
    ID NAME MODE REPLICAS ... h8uty1lrwq9r portal_analytics-server global 0/1 ... 2vs9wy6mwvac portal_apim global 1/1 ...
  3. View the logs for the failed services using the following command:
    docker service logs -f <service_name>
  4. View the Docker logs using the following command:
    journalctl -fu docker
  5. Restart the failed service using the following command:
    docker service update --force <service_name | service_id>
  6. View all the node status using the following command:
    docker node ls
Script Failure
Symptom 
Executing 
config.sh
and
portal.sh
failed.
Solution
Perform the following steps to troubleshoot a failed execution of config.sh and portal.sh: 
The
API Portal
 installation requires an external network connectivity to retrieve the required packages when the installation packages are not on the host system.
  1. Ensure that you can reach external resources, using the following example command:
    ping google.com
  2. Ensure that the
    network
    service is running by typing, using the following example command:
    service network status
External Network Unavailable
Symptom
An external network connectivity is not available.
Solution
When an external network connectivity is not available, use the following method to by-pass external resource access:
  1. Back up the existing
    portal.sh
    file using the following command:
    cp portal.sh portal.sh.bak
  2. Execute the following command:
    sed -i 's/docker \(login\|pull\)/true/' portal.sh
  3. Execute 
    portal.sh
    to start 
    API Portal
Database Lockup during Upgrade 
Symptom
Services from Authenticator, portal-data, and Portal Enterprise are not started when upgrading to the latest version, causing database to lock up.
Sample log messages are shown next that throw a changelog lock exception:
[email protected] | 2018-04-13 22:36:44.402 [INFO ] org.springframework.boot.liquibase.CommonsLoggingLiquibaseLogger - Waiting for changelog lock.... [email protected] | 2018-04-13 22:42:24.966 [INFO ] org.springframework.boot.liquibase.CommonsLoggingLiquibaseLogger - Waiting for changelog lock.... [email protected] | 2018-04-13 22:52:49.660 [INFO ] org.springframework.boot.web.servlet.AbstractFilterRegistrationBean - Mapping filter: 'loggingFilter' to: [/*] [email protected] | 2018-04-13 22:52:49.661 [INFO ] org.springframework.boot.web.servlet.AbstractFilterRegistrationBean - Mapping filter: 'characterEncodingFilter' to: [/*] [email protected] | 2018-04-13 22:36:49.693 [DEBUG] org.springframework.security.saml.metadata.MetadataManager - Reloading metadata [email protected]
Solution
To unlock the database and proceed with upgrade:
  • If you’re using the out-of-the-box PostgreSQL database, run the following SQL query on PostgreSQL database:
    docker exec -it <Container ID of the 'portal_portaldb' getting from 'docker ps'> psql -U admin portal --command "UPDATE DATABASECHANGELOGLOCK SET locked='false', lockgranted=null, lockedby=null WHERE id=1;"
  • If you’re using your own MySQL database, run the following command on the MySQL console:
    mysql> use portal; mysql> UPDATE DATABASECHANGELOGLOCK SET locked=0, lockgranted=null, lockedby=null WHERE id=1;
For more information on Portal services, see API Portal Architecture.
Health Check Failure during Upgrade
Symptom
Portal Enterprise service cannot be started due to health check failure after upgrading, showing 0/X under the REPLICAS column in the output of the command
docker service ls
. This is due to a health check timeout while patches are being applied after the upgrade.
Sample log messages are shown next that show a health check failure:
[email protected] | 2018-04-18 20:15:47,149 main ERROR Unable to register shutdown hook because JVM is shutting down. java.lang.IllegalStateException: Cannot add new shutdown hook as this is not started. Current state: STOPPED [email protected] | at org.apache.logging.log4j.core.util.DefaultShutdownCallbackRegistry.addShutdownCallback(DefaultShutdownCallbackRegistry.java:113) [email protected] | at org.apache.logging.log4j.core.impl.Log4jContextFactory.addShutdownCallback(Log4jContextFactory.java:273) [email protected] | at org.apache.logging.log4j.core.LoggerContext.setUpShutdownHook(LoggerContext.java:256) [email protected] | at org.apache.logging.log4j.core.LoggerContext.start(LoggerContext.java:216)
Solution
Run the following command to increase the health check start period and restart the services:
sudo docker service update portal_portal-enterprise --health-start-period=10m
For more information on Portal services, see API Portal Architecture.
Restart Deployment
The
API Portal
installation issues are resolved automatically when the Docker Daemon is running. If you must start the deployment, follow these steps:
  1. Remove the existing deployment (services only) using the following command:
    docker stack rm portal
    Note:
     Removing the existing deployment does not remove the persistent data. To remove persistent data, use:
    docker volume rm $(docker volume ls -q)
  2. Run the deployment.
  3. If you encounter any errors, repeat step 1.
Portal Policies Missing from the Enrolled Gateway
Symptom
A number of Portal integration policies (for example, the Standard Policy Template Fragment) are not created in the enrolled Gateway after the enrollment process is completed.
Solution
Perform the following steps to troubleshoot a failed enrollment:
  1. Inside the Gateway appliance, ping the FQDN of the portal to make sure the Gateway can reach the Portal server through the customer’s DNS server.
    ping apim.<portal domain>
  2. If the FQDN of the portal is not reachable through their DNS server, add the FQDN for the portal to
    /etc/hosts
    inside the Gateway appliance as the short term fix. Refer to the Configure Your DNS Server for more information.
Unable to Send Mail through External Mail Server
Symptom
You are unable to send mail through your external mail server. This occurs if your mail server is on the 172.18.0.0/16 subnet and you are using default settings on the docker_gwbridge network. You will not be able to reach your mail server as Docker uses the 172.18.0.0/16 subnet internally by default.
Sample log messages that accompany this situation are shown next:
May 03 13:05:44 yourserver.yourdomain.com dockerd[23046]: WARNING: 4: Unable to send email: Could not connect to SMTP host: 172.18.5.22, port: 25. Exception caught! May 03 13:05:44 yourserver.yourdomain.com dockerd[23046]: May 03, 2018 7:35:44 AM com.l7tech.server.policy.assertion.alert.ServerEmailAlertAssertion May 03 13:05:44 yourserver.yourdomain.com dockerd[23046]: WARNING: 4: Unable to send email: Could not connect to SMTP host: 172.18.5.22, port: 25. Exception caught! May 03 13:05:44 yourserver.yourdomain.com dockerd[23046]: May 03, 2018 7:35:44 AM com.l7tech.server.policy.assertion.alert.ServerEmailAlertAssertion May 03 13:05:44 yourserver.yourdomain.com dockerd[23046]: WARNING: 4: Unable to send email: Unknown SMTP host: mail.ca.com. Exception caught! May 03 13:05:44 yourserver.yourdomain.com dockerd[23046]: May 03, 2018 7:35:44 AM com.l7tech.server.policy.assertion.ServerAuditDetailAssertion May 03 13:05:44 yourserver.yourdomain.com dockerd[23046]: INFO: -4: transactionId:,sessionId:,requestId:00000163210b978a-13b20,username:,statusCode:500,domain: May 03 13:05:44 yourserver.yourdomain.com dockerd[23046]: WARNING: 3016: Request routing failed with status 600 (Assertion Falsified) May 03 13:05:44 yourserver.yourdomain.com dockerd[23046]: May 03, 2018 7:35:44 AM com.l7tech.server.message May 03 13:05:44 yourserver.yourdomain.com dockerd[23046]: WARNING: Message was not processed: Assertion Falsified (600)
Solution
Remove and re-create the docker_gwbridge network with custom settings using a subnet that is not in use on your network:
  1. Stop the API Portal:
    docker stack rm portal
  2. Delete the existing
    docker_gwbridge
    interface:
    docker network rm docker_gwbridge
  3. (Optional)
    If you receive errors while running the previous command, this may be caused by a Docker bug that prevents the docker_gwbridge network from being properly removed. Run the following command, then repeat step 2:
    docker network disconnect --force docker_gwbridge gateway_ingress-sbox
  4. Re-create the docker_gwbridge network using custom settings. In this example, you are assigning 172.20.0.0/16 subnet to the docker_gwbridge network with a default Gateway address of 172.20.0.1 :
     
    docker network create \ --subnet 172.20.0.0/16 \ --opt com.docker.network.bridge.name=docker_gwbridge \ --opt com.docker.network.bridge.enable_icc=false \ --opt com.docker.network.bridge.enable_ip_masquerade=true \ --gateway 172.20.0.1 \ docker_gwbridge
  5. Restart the Portal:
    sudo ./portal.sh
Info:
 For more information about the docker_gwbridge network, see How do I change the docker gwbridge address?
Unable to Start Jarvis Elasticsearch and ZooKeeper Containers after Updating to CentOS 7.5
Symptom
ZooKeeper fails to start on a CA provided hardened image or any system with SELinux in enforcing mode after updating to CentOS 7.5 due to this newly fixed SELinux bug: https://www.seimaxim.com/knowledgebase/263/Container-running-systemd-fails-to-run-after-upgrade-to-Red-Hat-Enterprise-Linux-75.html
Troubleshooting
Use the following commands to confirm whether you are affected by this issue:
  • docker service ls
     shows the Elasticsearch and ZooKeeper containers are not started.
    ID NAME MODE REPLICAS IMAGE PORTS xzc769r8yzpk portal_analytics-server global 1/1 apim-portal.packages.ca.com/apim-portal/analytics-server:4.2.7.1 k0mbl0hvs7b6 portal_apim global 1/1 apim-portal.packages.ca.com/apim-portal/ingress:4.2.7.1 d3lxvmb7t118 portal_apis replicated 1/1 jarvis.packages.ca.com/analytics/jarvis_api:2.3.1.99 *:8080->8080/tcp 0xor8ejvqe88 portal_authenticator global 1/1 apim-portal.packages.ca.com/apim-portal/authenticator:4.2.7.1 ti8kb703ifcs portal_dispatcher global 1/1 apim-portal.packages.ca.com/apim-portal/dispatcher:4.2.7.1 7bv3d141s3ds portal_elasticsearch global 0/1 jarvis.packages.ca.com/analytics/elasticsearch-5.6.5:2.3.1.134 1ki05ncmwtjc portal_indexer replicated 1/1 jarvis.packages.ca.com/analytics/jarvis_indexer:2.3.1.47 1pex8jsogwgz portal_kafka1 replicated 1/1 jarvis.packages.ca.com/analytics/kafka-0.10.1.0:2.3.1.130 o3rce3oqre8g portal_kron replicated 1/1 jarvis.packages.ca.com/analytics/jarvis_kron:2.3.1.23 *:8081->8080/tcp opgg4276d9xr portal_ldds-web replicated 1/1 jarvis.packages.ca.com/analytics/ldds:2.1.8.1 w9bgxtodmpq4 portal_portal-data global 1/1 apim-portal.packages.ca.com/apim-portal/portal-data:4.2.7.1 y150u38dm29u portal_portal-enterprise global 1/1 apim-portal.packages.ca.com/apim-portal/portal-enterprise:4.2.7.1 mcy62djmnxji portal_portaldb replicated 1/1 apim-portal.packages.ca.com/apim-portal/postgres:4.2.7.1 oj0p2n8wf4d7 portal_portaldb-slave replicated 1/1 apim-portal.packages.ca.com/apim-portal/postgres:4.2.7.1 036lz8zhwilp portal_pssg global 1/1 apim-portal.packages.ca.com/apim-portal/pssg:4.2.7.1 a3dh5sfcganz portal_rabbitmq replicated 1/1 apim-portal.packages.ca.com/apim-portal/message-broker:4.2.7.1 rikvjuqrlvjv portal_rabbitmq-worker global 1/1 apim-portal.packages.ca.com/apim-portal/message-broker:4.2.7.1 7rdm5ze4felp portal_schemaregistry replicated 1/1 jarvis.packages.ca.com/analytics/jarvis_schema_registry:2.3.1.17 cww0kowsk1ao portal_smtp replicated 1/1 apim-portal.packages.ca.com/apim-portal/smtp:4.2.7.1 v5f3dnssir0n portal_solr replicated 1/1 apim-portal.packages.ca.com/apim-portal/solr:4.2.7.1 k7xl21iz1fu9 portal_tenant-provisioner replicated 1/1 apim-portal.packages.ca.com/apim-portal/tenant-provisioning-service:4.2.7.1 r60cpsbxzvht portal_utils replicated 1/1 jarvis.packages.ca.com/analytics/jarvis_es_utils:2.3.1.87 94ix3ao23au3 portal_verifier replicated 1/1 jarvis.packages.ca.com/analytics/jarvis_verifier:2.3.1.49 m8cz5qibq20t portal_zookeeper1 replicated 0/1 jarvis.packages.ca.com/analytics/zookeeper-3.4.8:2.3.1.126
  • journalctl -u docker
     shows the output with the following errors.
    cgget: cannot read group '/': Permission denied System max memory limit (mb): 32012 cgroup limit enforced with max memory limit (mb): Ram ratio to use: 0.8 Memory limits: min=0m, max=0m ZooKeeper JMX enabled by default Using config: /opt/ca/zookeeper/bin/../conf/zoo.cfg Invalid maximum heap size: -Xmx0m Error: Could not create the Java Virtual Machine. Error: A fatal exception has occurred. Program will exit. cgget: cannot read group '/': Permission denied System max memory limit (mb): 32012 cgroup limit enforced with max memory limit (mb): Ram ratio to use: 0.6 Memory limits: min=0m, max=0m Adding node.attr.box_type to elasticsearch.yml Invalid maximum heap size: -Xmx0m Error: Could not create the Java Virtual Machine. Error: A fatal exception has occurred. Program will exit.
Solution
Run the following command:
sudo setsebool -P container_manage_cgroup 1
This change is effectively immediately with no further action required. The ZooKeeper and Elasticsearch containers should start automatically after running this command.