Debugging Tips

Sometimes things don't work the way they should. Here are some tips on debugging issues out in production.

Starting a Rails console session

Troubleshooting and debugging your GitLab instance often requires a Rails console.

Your type of GitLab installation determines how to start a rails console. See also:

Enabling Active Record logging

You can enable output of Active Record debug logging in the Rails console session by running:

ActiveRecord::Base.logger = Logger.new(STDOUT)

This will show information about database queries triggered by any Ruby code you may run in the console. To turn off logging again, run:

ActiveRecord::Base.logger = nil

Disabling database statement timeout

You can disable the PostgreSQL statement timeout for the current Rails console session by running:

ActiveRecord::Base.connection.execute('SET statement_timeout TO 0')

Note that this change only affects the current Rails console session and will not be persisted in the GitLab production environment or in the next Rails console session.

Output Rails console session history

If you'd like to output your Rails console command history in a format that's easy to copy and save for future reference, you can run:

puts Readline::HISTORY.to_a

Using the Rails runner

If you need to run some Ruby code in the context of your GitLab production environment, you can do so using the Rails runner. When executing a script file, the script must be accessible by the git user.

For Omnibus installations

sudo gitlab-rails runner "RAILS_COMMAND"

# Example with a two-line Ruby script
sudo gitlab-rails runner "user = User.first; puts user.username"

# Example with a ruby script file (make sure to use the full path)
sudo gitlab-rails runner /path/to/script.rb

For installations from source

sudo -u git -H bundle exec rails runner -e production "RAILS_COMMAND"

# Example with a two-line Ruby script
sudo -u git -H bundle exec rails runner -e production "user = User.first; puts user.username"

# Example with a ruby script file (make sure to use the full path)
sudo -u git -H bundle exec rails runner -e production /path/to/script.rb

Mail not working

A common problem is that mails are not being sent for some reason. Suppose you configured an SMTP server, but you're not seeing mail delivered. Here's how to check the settings:

  1. Run a Rails console.

  2. Look at the ActionMailer delivery_method to make sure it matches what you intended. If you configured SMTP, it should say :smtp. If you're using Sendmail, it should say :sendmail:

    irb(main):001:0> ActionMailer::Base.delivery_method
    => :smtp
  3. If you're using SMTP, check the mail settings:

    irb(main):002:0> ActionMailer::Base.smtp_settings
    => {:address=>"localhost", :port=>25, :domain=>"localhost.localdomain", :user_name=>nil, :password=>nil, :authentication=>nil, :enable_starttls_auto=>true}

    In the example above, the SMTP server is configured for the local machine. If this is intended, you may need to check your local mail logs (e.g. /var/log/mail.log) for more details.

  4. Send a test message via the console.

    irb(main):003:0> Notify.test_email('youremail@email.com', 'Hello World', 'This is a test message').deliver_now

    If you do not receive an e-mail and/or see an error message, then check your mail server settings.

Advanced Issues

For more advanced issues, gdb is a must-have tool for debugging issues.

The GNU Project Debugger (GDB)

To install on Ubuntu/Debian:

sudo apt-get install gdb

On CentOS:

sudo yum install gdb

rbtrace

GitLab 11.2 ships with rbtrace, which allows you to trace Ruby code, view all running threads, take memory dumps, and more. However, this is not enabled by default. To enable it, define the ENABLE_RBTRACE variable to the environment. For example, in Omnibus:

gitlab_rails['env'] = {"ENABLE_RBTRACE" => "1"}

Then reconfigure the system and restart Unicorn and Sidekiq. To run this in Omnibus, run as root:

/opt/gitlab/embedded/bin/ruby /opt/gitlab/embedded/bin/rbtrace

Common Problems

Many of the tips to diagnose issues below apply to many different situations. We'll use one concrete example to illustrate what you can do to learn what is going wrong.

502 Gateway Timeout after Unicorn spins at 100% CPU

This error occurs when the Web server times out (default: 60 s) after not hearing back from the Unicorn worker. If the CPU spins to 100% while this in progress, there may be something taking longer than it should.

To fix this issue, we first need to figure out what is happening. The following tips are only recommended if you do NOT mind users being affected by downtime. Otherwise skip to the next section.

  1. Load the problematic URL

  2. Run sudo gdb -p <PID> to attach to the Unicorn process.

  3. In the GDB window, type:

    call (void) rb_backtrace()
  4. This forces the process to generate a Ruby backtrace. Check /var/log/gitlab/unicorn/unicorn_stderr.log for the backtrace. For example, you may see:

    from /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/metrics/sampler.rb:33:in `block in start'
    from /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/metrics/sampler.rb:33:in `loop'
    from /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/metrics/sampler.rb:36:in `block (2 levels) in start'
    from /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/metrics/sampler.rb:44:in `sample'
    from /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/metrics/sampler.rb:68:in `sample_objects'
    from /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/metrics/sampler.rb:68:in `each_with_object'
    from /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/metrics/sampler.rb:68:in `each'
    from /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/metrics/sampler.rb:69:in `block in sample_objects'
    from /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/metrics/sampler.rb:69:in `name'
  5. To see the current threads, run:

    thread apply all bt
  6. Once you're done debugging with gdb, be sure to detach from the process and exit:

    detach
    exit

Note that if the Unicorn process terminates before you are able to run these commands, GDB will report an error. To buy more time, you can always raise the Unicorn timeout. For omnibus users, you can edit /etc/gitlab/gitlab.rb and increase it from 60 seconds to 300:

unicorn['worker_timeout'] = 300

For source installations, edit config/unicorn.rb.

Reconfigure GitLab for the changes to take effect.

Troubleshooting without affecting other users

The previous section attached to a running Unicorn process, and this may have undesirable effects for users trying to access GitLab during this time. If you are concerned about affecting others during a production system, you can run a separate Rails process to debug the issue:

  1. Log in to your GitLab account.

  2. Copy the URL that is causing problems (e.g. https://gitlab.com/ABC).

  3. Create a Personal Access Token for your user (User Settings -> Access Tokens).

  4. Bring up the GitLab Rails console.

  5. At the Rails console, run:

    app.get '<URL FROM STEP 2>/?private_token=<TOKEN FROM STEP 3>'

    For example:

    app.get 'https://gitlab.com/gitlab-org/gitlab-foss/-/issues/1?private_token=123456'
  6. In a new window, run top. It should show this Ruby process using 100% CPU. Write down the PID.

  7. Follow step 2 from the previous section on using GDB.

GitLab: API is not accessible

This often occurs when GitLab Shell attempts to request authorization via the internal API (e.g., http://localhost:8080/api/v4/internal/allowed), and something in the check fails. There are many reasons why this may happen:

  1. Timeout connecting to a database (e.g., PostgreSQL or Redis)
  2. Error in Git hooks or push rules
  3. Error accessing the repository (e.g., stale NFS handles)

To diagnose this problem, try to reproduce the problem and then see if there is a Unicorn worker that is spinning via top. Try to use the gdb techniques above. In addition, using strace may help isolate issues:

strace -ttTfyyy -s 1024 -p <PID of unicorn worker> -o /tmp/unicorn.txt

If you cannot isolate which Unicorn worker is the issue, try to run strace on all the Unicorn workers to see where the /internal/allowed endpoint gets stuck:

ps auwx | grep unicorn | awk '{ print " -p " $2}' | xargs  strace -ttTfyyy -s 1024 -o /tmp/unicorn.txt

The output in /tmp/unicorn.txt may help diagnose the root cause.

More information