[[EucalyptusTroubleshooting_v1.6]]

*Eucalyptus Troubleshooting (1.6) [#eaaf158a]

Eucalyptus cloud admins are encouraged to consult the Known Bugs page before diving into the investigation of unexpected behavior.

The instructions below rely on the euca2ools command-line tools distributed by the Eucalyptus Team. Please, install them if you haven't done so already.

**1. Restarting [#j33ee349]
Eucalyptus components can be restarted using the init scripts at any time with the 'restart' operation:

 /etc/init.d/eucalyptus-cloud restart
 /etc/init.d/eucalyptus-cc restart
 /etc/init.d/eucalyptus-nc restart

If you need to make a change to the cluster controller or node controller configuration through modification of $EUCALYPTUS/etc/eucalyptus/eucalyptus.conf, you will typically be required to 'stop' and then 'start' the service after the modification has been made:

 /etc/init.d/eucalyptus-cc stop
 /etc/init.d/eucalyptus-cc start
 /etc/init.d/eucalyptus-nc stop
 /etc/init.d/eucalyptus-nc start

Warning: depending on your configuration of Eucalyptus, making changes to eucalyptus.conf that drastically alter the way Eucalyptus is handling non-eucalyptus resources (network, hypervisor, etc) may require that all currently running VMs be terminated before the configuration changes can be successfully applied. In addition, if you are running in any network mode (VNET_MODE) other than SYSTEM, correct VM network connectivity is only ensured while the CC that launched the VMs is running. If the machine that hosts a CC that has previously launched VMs fails or reboots, then the VMs will lose network connectivity.

If the administrator needs to terminate running VMs for the reasons described above, they can use the client tools to terminate all instances. Optionally, the admin can manually stop all eucalyptus components, destroy all running Xen instances using 'xm shutdown' or 'xm destroy' on the nodes, and start all Eucalyptus components to return to a clean state.

**2. Diagnostics [#d06bd0f9]
***Installation/Discovering resources [#t40907ec]
If something is not working right with your Eucalyptus installation, the best first step (after making sure that you have followed the installation/configuration/networking documents faithfully) is to make sure that your cloud is up and running, that all of the components are communicating properly, and that there are resources available to run instances. After you have set up and configured Eucalyptus, set up your environment properly with your admin credentials, and use the following command to see the 'status' of your cloud:

 euca-describe-availability-zones verbose

You should see output similar to the following:

 AVAILABILITYZONE        cluster <hostname of your front-end>
 AVAILABILITYZONE        |- vm types     free / max   cpu   ram  disk
 AVAILABILITYZONE        |- m1.small     0128 / 0128   1    128    10
 AVAILABILITYZONE        |- c1.medium    0128 / 0128   1    256    10
 AVAILABILITYZONE        |- m1.large     0064 / 0064   2    512    10
 AVAILABILITYZONE        |- m1.xlarge    0064 / 0064   2   1024    20
 AVAILABILITYZONE        |- c1.xlarge    0032 / 0032   4   2048    20
 AVAILABILITYZONE        |- <node-hostname-a>        certs[cc=true,nc=true] @ Sun Jan 04 15:13:30 PST 2009
 AVAILABILITYZONE        |- <node-hostname-b>        certs[cc=true,nc=true] @ Sun Jan 04 15:13:30 PST 2009
 AVAILABILITYZONE        |- <node-hostname-c>        certs[cc=true,nc=true] @ Sun Jan 04 15:13:30 PST 2009
 AVAILABILITYZONE        |- <node-hostname-d>        certs[cc=true,nc=true] @ Sun Jan 04 15:13:30 PST 2009
 AVAILABILITYZONE        |- <node-hostname-e>        certs[cc=true,nc=true] @ Sun Jan 04 15:13:30 PST 2009
 AVAILABILITYZONE        |- <node-hostname-f>        certs[cc=true,nc=true] @ Sun Jan 04 15:13:30 PST 2009
 ...

Next, the administrator should consult the Eucalyptus logfiles. On each machine running a Eucalyptus component, the logfiles are located in:

 $EUCALYPTUS/var/log/eucalyptus/

On the front-end, the Cloud Controller (CLC) logs primarily to 'cloud-output.log' and 'cloud-debug.log'. Consult these files if your client tool (ec2 API tools) output contains exception messages, or if you suspect that none of your operations are ever being executed (never see Xen activity on the nodes, network configuration activity on the front-end, etc.).

The Cluster Controller (CC) also resides on the front-end, and logs to 'cc.log' and 'httpd-cc_error_log'. Consult these logfile in general, but especially if you suspect there is a problem with networking. 'cc.log' will contain log entries from the CC itself, and 'httpd-cc_error_log' will contain the STDERR/STDOUT from any external commands that the CC executes at runtime.

A Node Controller (NC) will run on every machine in the system that you have configured to run VM instances. The NC logs to 'nc.log' and 'httpd-nc_error_log'. Consult these files in general, but especially if you believe that there is a problem with VM instances actually running (i.e., it appears as if instances are trying to run - get submitted, go into 'pending' state, then go into 'terminated' directly - but fail to stay running).

***Node Controller troubleshooting [#g6a9697f]
-If nc.log reports "Failed to connect to hypervisor," xen/kvm + libvirt is not functioning correctly.
-If the NC cannot be contacted, make sure that you have synchronized keys to the nodes and that the keys are owned by the user that you are running the NC as (EUCA_USER in eucalyptus.conf).

***Walrus troubleshooting [#r5580801]
"ec2-upload-bundle" will report a "409" error when uploading to a bucket that already exists. This is a known compatibility issue when using ec2 tools with Eucalyptus. The workaround is to use ec2-delete-bundle with the "--clear" option to delete the bundle and the bucket, before uploading to a bucket with the same name, or to use a different bucket name.
Note: If you are using Euca2ools, this is not necessary.

When using "ec2-upload-bundle," make sure that there is no "/" at the end of the bucket name.

***Block storage troubleshooting [#w524c7e6]
Unable to attach volumes when the front end and the NC are running on the same machine. This is a known issue with ATA over Ethernet (AoE). AoE will not export to the same machine that the server is running on. The workaround is to run the front end and the node controller on different hosts.
Volume ends up in "deleted" state when created, instead of showing up as "available." Look for error messages in $EUCALYPTUS/var/log/eucalyptus/cloud-error.log. A common problem is that ATA-over-Ethernet may not be able to export the created volume (this will appear as a "Could not export..." message in cloud-error.log). Make sure that "VNET_INTERFACE" in eucalyptus.conf on the front end is correct.
Failure to create volume/snapshot. Make sure you have enough loopback devices. If you are installing from packages, you will get a warning. On most distributions, the loopback driver is installed as a module. The following will increase the number of loopback devices available,
rmmod loop ; modprobe loop max_loop=256
If block devices do not automatically appear in your VMs, make sure that you have the "udev" package installed.
If you are running gentoo and you get "which: no vblade in ((null)).", try compiling "su" without pam.
« Setting up Dynamic DNSUPEucalyptus User's Guide (1.6) »

Printer-friendly version Login or register to post comm



トップ   新規 一覧 単語検索 最終更新   ヘルプ   最終更新のRSS