Wednesday, June 5, 2019

How to find out what processes were killed by OOM on Linux

Problem

Services with high memory utilization are dying sporadically.

Solution

Run the following command to get a list of processes killed by oom.

dmesg | egrep -i 'killed process'


Useful articles:

1. How to troubleshoot OOM
https://access.redhat.com/solutions/2612861

2. How to add swap file to Linux Azure VM
https://support.microsoft.com/en-us/help/4010058/how-to-add-a-swap-file-in-linux-azure-virtual-machines

vi /etc/waagent.conf
ResourceDisk.Format=y
ResourceDisk.EnableSwap=y
ResourceDisk.SwapSizeMB=xx
service walinuxagent restart

How to find out how long the process has been running on Linux

Problem

Need to find out how long process is running

Solution

In order to get time how long process is running, execute following command;

ps -o etime= -p <your pid>


Thursday, May 30, 2019

Azure VM is not accessible /proc/net/route contains no routes

Problem

Oversudden Azure VMs running Ubuntu 18LTS became unreachable (WestUS region)
/proc/net/route contains no routes


Solution

Connected to Azure VM via Console and got following running lines of error messages

2019/05/30 18:24:20.747552 ERROR ExtHandler /proc/net/route contains no routes

Restart waagent service produced the following:

INFO Daemon Azure Linux Agent Version:2.2.32.2
2019/05/30 18:24:18.242022 INFO Daemon OS: ubuntu 18.04
2019/05/30 18:24:18.246632 INFO Daemon Python: 3.6.7
2019/05/30 18:24:18.251476 INFO Daemon Add daemon process pid 2330 to walinuxagent systemd cgroup
2019/05/30 18:24:18.258645 INFO Daemon CGroups: ok
2019/05/30 18:24:18.263173 INFO Daemon Run daemon
2019/05/30 18:24:18.268142 INFO Daemon Clean protocol
2019/05/30 18:24:18.272685 INFO Daemon Provisioning already completed, skipping.
2019/05/30 18:24:18.277567 INFO Daemon RDMA capabilities are not enabled, skipping
2019/05/30 18:24:18.285862 INFO Daemon Determined Agent WALinuxAgent-2.2.40 to be the latest agent
2019/05/30 18:24:18.613217 INFO ExtHandler Agent WALinuxAgent-2.2.40 is running as the goal state agent
2019/05/30 18:24:18.635339 INFO ExtHandler Detect protocol endpoints
2019/05/30 18:24:18.642625 INFO ExtHandler Clean protocol
2019/05/30 18:24:18.651472 INFO ExtHandler WireServer endpoint is not found. Rerun dhcp handler
2019/05/30 18:24:18.658883 INFO ExtHandler Test for route to 168.63.129.16
2019/05/30 18:24:18.665334 WARNING ExtHandler No route exists to 168.63.129.16
2019/05/30 18:24:18.670932 INFO ExtHandler Checking for dhcp lease cache
2019/05/30 18:24:18.675843 INFO ExtHandler looking for leases in path [/var/lib/dhcp/dhclient.*.leases]
2019/05/30 18:24:18.684096 INFO ExtHandler cached endpoint not found
2019/05/30 18:24:18.690504 INFO ExtHandler Cache exists [False]
2019/05/30 18:24:18.695994 INFO ExtHandler Send dhcp request
2019/05/30 18:24:18.700569 INFO ExtHandler Examine /proc/net/route for primary interface
2019/05/30 18:24:18.705882 ERROR ExtHandler /proc/net/route contains no routes
2019/05/30 18:24:18.710763 WARNING ExtHandler Could not determine primary interface, please ensure /proc/net/route is correct
2019/05/30 18:24:18.717827 WARNING ExtHandler Contents of /proc/net/route:
Iface   Destination     Gateway         Flags   RefCnt  Use     Metric  Mask            MTU     Window  IRTT

2019/05/30 18:24:18.732775 WARNING ExtHandler Primary interface examination will retry silently
2019/05/30 18:24:20.747552 ERROR ExtHandler /proc/net/route contains no routes
2019/05/30 18:24:22.755975 ERROR ExtHandler /proc/net/route contains no routes

VM reboot didn't help, but VM instance stop/start helped.

References
https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1822133
https://github.com/Azure/WALinuxAgent/issues/980
https://github.com/Azure/WALinuxAgent/issues/1439

Tuesday, May 14, 2019

Tibco Jaspersoft throws error java.lang.NoClassDefFoundError: Could not initialize class java.awt.Color

Problem

Tibco Jaspersoft Report server throws java.lang.NoClassDefFoundError: Could not initialize class java.awt.Color when attempting to render report containing graphical elements.

Solution

Check missing dependencies using following command:

ldd /opt/jasperreports-server-7.1.1/java/lib/amd64/libawt_xawt.so


Results:

        linux-vdso.so.1 (0x00007ffff17f4000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f346bc2f000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f346b891000)
        libawt.so => /opt/jasperreports-server-7.1.1/java/lib/amd64/libawt.so (0x00007f346b5bf000)
        libXext.so.6 => /usr/lib/x86_64-linux-gnu/libXext.so.6 (0x00007f346b3ad000)
        libX11.so.6 => /usr/lib/x86_64-linux-gnu/libX11.so.6 (0x00007f346b075000)
        libXrender.so.1 => not found
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f346ae71000)
        libXtst.so.6 => not found
        libXi.so.6 => not found
        libjava.so => /opt/jasperreports-server-7.1.1/java/lib/amd64/libjava.so (0x00007f346ac45000)
        libjvm.so => not found
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f346a854000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f346c0a7000)
        libjvm.so => not found
        libxcb.so.1 => /usr/lib/x86_64-linux-gnu/libxcb.so.1 (0x00007f346a62c000)
        libjvm.so => not found
        libverify.so => /opt/jasperreports-server-7.1.1/java/lib/amd64/libverify.so (0x00007f346a41d000)
        libXau.so.6 => /usr/lib/x86_64-linux-gnu/libXau.so.6 (0x00007f346a219000)
        libXdmcp.so.6 => /usr/lib/x86_64-linux-gnu/libXdmcp.so.6 (0x00007f346a013000)
        libjvm.so => not found
        libbsd.so.0 => /lib/x86_64-linux-gnu/libbsd.so.0 (0x00007f3469dfe000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f3469bf6000)

Install missing packages using following command, libjvm.so can be ignored.

sudo apt-get install libfontconfig1 libxrender1 libxi6 libxtst6

Restart Jasper Report server using commands

sudo /opt/jasperreports-server-7.1.1/ctlscript.sh stop
sudo /opt/jasperreports-server-7.1.1/ctlscript.sh start




Wednesday, May 1, 2019

Create GitLab backup without repositories

Problem

Need to create GitLab backup skipping git repositories themselves. GitLab installed using Omnibus package.

Solution

According GitLab Omnibus Documentation and Backing up and restoring GitLab, required backup can be created using command below:

gitlab-rake gitlab:backup:create BACKUP=gitlab_20190420 GZIP_RSYNCABLE=yes SKIP=repositories

Please note, that following files are not included to backup and must be backed up separately.

/etc/gitlab/gitlab.rb
/etc/gitlab/gitlab-secrets.json

Reference

Add user to sudoers

Problem

Need to add existing user to sudoers.

Solution

Ubuntu

usermod -aG sudo <username>

Install and configure Domain Controller in Microsoft Azure

Problem

Need to install and configure domain controller in MS Azure.

Solution

There is excellent article how to setup DC in Azure.
https://www.assistanz.com/steps-to-create-new-active-directory-forest-in-azure-portal/

Configure Azure Firewall ports required to join Domain Controller

Problem

There are two virtual networks, Network 1 (contains Domain Controller, Windows Server 2016) and Network 2 (contains workload VMs, Windows Server 2016). Traffic between them goes through Azure Firewall and User Defined Routes.

Solution

After some experiments came up with following categories of rules:

Name Proto Src Dest Ports
tcp-to-dc tcp * <dc server> 53,88,135,139,389,445,464,3268,3269,49152-64000
udp-to-dc udp * <dc server> 53,88,123,135,137,138,464,389,49152-64000


References
https://support.microsoft.com/en-us/help/179442/how-to-configure-a-firewall-for-domains-and-trusts
http://powershell365.com/2016/01/19/firewall-ports-required-to-join-ad-domain/

Sync files using rsync with sudo privileges

Problem

Need to copy files from one Linux VM to another (Ubuntu 18). Root account is not available, have accounts with sudo privileges.

Solution

Sample command listed below:

sudo rsync -e "ssh" --rsync-path="sudo rsync" -Pav remoteuser@<remote server>:/mnt/disk01/folder_to_sync/ /mnt/disk01/folder_to_sync/


When executing, the system will ask you for password to existing user and then password for remoteuser.

Note, that remoteuser should be able to execute sudo command without prompting to type password.

Refer to https://sk.solutionmentors.com/2019/05/run-commands-with-sudo-without-having.html for instructions.

Run commands with sudo without having to enter a password

Problem

Need to temporarily allow access to user to run commands with sudo without typing password (for batch jobs)

Solution

Add following entry to /etc/sudoers to allow remoteuser to execute commands with sudo without entering password.

remoteuser ALL=(ALL) NOPASSWD:ALL

References
https://askubuntu.com/questions/334318/sudoers-file-enable-nopasswd-for-user-all-commands/340669

Tuesday, April 30, 2019

Azure icon sets

Problem

Need get PNG, SVG icons for Microsoft Azure services, components, etc. Who struggled with presentations, will understand..

Solution

Excellent article from Microsoft MVP Chris Pietschmann - Microsoft Azure Icon Set Download – Visio stencil, PowerPoint, PNG, SVG

Icon set can be found at: https://www.microsoft.com/en-us/download/details.aspx?id=41937

ORA-01102: cannot mount database in EXCLUSIVE mode

Problem

ORA-01102: cannot mount database in EXCLUSIVE mode during database instance startup

sqlplus / as sysdba
QL*Plus: Release 12.2.0.1.0 Production on Tue Apr 30 21:26:57 2019

Copyright (c) 1982, 2016, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup;
ORACLE instance started.

Total System Global Area 6174015488 bytes
Fixed Size                  8634320 bytes
Variable Size            1241514032 bytes
Database Buffers         4915724288 bytes
Redo Buffers                8142848 bytes
ORA-01102: cannot mount database in EXCLUSIVE mode

Solution

Same as https://sk.solutionmentors.com/2019/04/ora-01012-not-logged-on-startup-failed.html, seems this problem appeared because database was not stopped properly.

Follow great article http://www.dba-oracle.com/t_ora_01102_cannot_mount_database_in_exclusive_mode.htm

+++++++++++

POSSIBLE SOLUTION:
Verify that the database was shutdown cleanly by doing the following:

1. Verify that there is not a "sgadef<sid>.dbf" file in the directory
"ORACLE_HOME/dbs".

% ls $ORACLE_HOME/dbs/sgadef<sid>.dbf

If this file does exist, remove it.

% rm $ORACLE_HOME/dbs/sgadef<sid>.dbf

2. Verify that there are no background processes owned by "oracle"

% ps -ef | grep ora_ | grep $ORACLE_SID

If background processes exist, remove them by using the Unix
command "kill". For example:

% kill -9 <Process_ID_Number>

3. Verify that no shared memory segments and semaphores that are owned
by "oracle" still exist

% ipcs -b

If there are shared memory segments and semaphores owned by "oracle",
remove the shared memory segments

% ipcrm -m <Shared_Memory_ID_Number>

and remove the semaphores

% ipcrm -s <Semaphore_ID_Number>

NOTE: The example shown above assumes that you only have one
database on this machine. If you have more than one
database, you will need to shutdown all other databases
before proceeding with Step 4.

4. Verify that the "$ORACLE_HOME/dbs/lk<sid>" file does not exist. This is what caused issue in our case. Simple removal of this file did the trick.

5. Startup the instance

Related issues
https://sk.solutionmentors.com/2019/04/ora-01012-not-logged-on-startup-failed.html
https://sk.solutionmentors.com/2019/04/ora-27125-unable-to-create-shared.html

Reference
http://www.dba-oracle.com/t_ora_01102_cannot_mount_database_in_exclusive_mode.htm

ORA-01012: not logged on startup failed

Problem

Getting ORA-01012: no logged on startup failed during database startup

Solution

This behavior in our case caused by "disgraceful" database shutdown.. Seems like zombie Oracle processes existed that prevented database instance from starting up.

sysresv utility allows to diagnose orphaned shared memory segments.

> sysresv

IPC Resources for ORACLE_SID "<your SID>" :
Maximum shared memory segment size (shmmax): 8589934592 bytes
Total system shared memory (shmall): 6871949312 bytes
Total system shared memory count (shmmni): 4096
*********************** Dumping ipcs output ********************
------ Message Queues --------
key        msqid      owner      perms      used-bytes   messages
------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status
0x00000000 0          oracle     600        8634368    891
0x00000000 32769      oracle     600        6157238272 446
0x00000000 65538      oracle     600        8142848    446
0x9ab568e0 98307      oracle     600        28672      446

------ Semaphore Arrays --------
key        semid      owner      perms      nsems
0x1d625978 98304      oracle     600        227
0x1d625979 131073     oracle     600        227
0x1d62597a 163842     oracle     600        227

*********************** End of ipcs commanddump **************

***************** Dumping Resource Limits(s/h) *****************
core file size                         0 KB/UNLIMITED
data seg size                     UNLIMITED/UNLIMITED
scheduling priority                    0 KB/0 KB
file size                         UNLIMITED/UNLIMITED
pending signals                       63 KB/63 KB
max locked memory                     14 GB/14 GB
max memory size                   UNLIMITED/UNLIMITED
open files                            64 KB/64 KB
POSIX message queues                 800 KB/800 KB
real-time priority                     0 KB/0 KB
stack size                            32 MB/32 MB
cpu time                          UNLIMITED/UNLIMITED
max user processes                    16 KB/16 KB
virtual memory                    UNLIMITED/UNLIMITED
file locks                        UNLIMITED/UNLIMITED

***************** End of Resource Limits Dup ******************

Maximum map count configured per process:  65530
Total /dev/shm size: 8403361792 bytes, used: 0 bytes
Shared Memory:
ID              KEY
32769           0x00000000
65538           0x00000000
0               0x00000000
98307           0x9ab568e0
Semaphores:
ID              KEY
98304           0x1d625978
131073          0x1d625979

Kill memory segments

ipcrm -m <Shared Memory ID>

 Kill semaphores

ipcrm -s <Semaphore>

Try to startup database instance again.

ORA-27125: unable to create shared memory segment during startup

Problem

After reboot, unable to startup Oracle 12c database instance (Red Hat Enterprise Server 7.6)

ORA-27125: unable to create shared memory segment
Linux-x86_64 Error: 28: No space left on device
Additional information: 3822
Additional information: 6157238272

Solution

Verify OS kernel.shmall memory setting.

1. Get current value 
cat /proc/sys/kernel/shmall
1677722

This seems to be too high..

2. Determine page size
getconf PAGE_SIZE
4096

3. Calculate recommended value for shmall

shmall = <total size of SGA>/<page size>

In our case, we have 16GB RAM, so, shmall = 16 * 1024 * 1024 * 1024 / 4096 = 4194304

4. Update /etc/sysctl.conf
vi /etc/sysctl.conf
kernel.shmall=4194304
sudo sysctl -p

5. Verify  kernel.shmall again
cat /proc/sys/kernel/shmall
4194304

6. Start Oracle instance
sudo su - oracle
sqlplus / as sysdba
startup;

This of course leads to another error..


References


Formula to set proper values for max processes, sessions and transactions in Oracle

Problem

Need to adjust max number of sessions in Oracle DB.

Solution

Standard formula looks like:

PROCESSES = Operating System Dependant
SESSIONS = (1.1 * PROCESSES) + 5
TRANSACTIONS = 1.1 * SESSIONS

alter system set sessions=1000 scope=spfile;
alter system set processes=905 scope=spfile;
alter system set transactions=1100 scope=spfile;

shutdown immediate;
startup;

Wednesday, April 24, 2019

How to get external IP for Linux VMs on Azure

Problem

How to get external IP for Linux VMs on Azure

Solution

dig +short myip.opendns.com @resolver1.opendns.com 

Change mount point for /mnt folder in Ubuntu Linux on Azure

Problem

When provisioning Linux VMs on Azure, /mnt folder is automatically created and pointed to Resource Disk. In some case, we need to have more fine-grained control over /mnt folder (for instance for backward compatibility) and map Resource Disk to its sub-folder, e.g. /mnt/tmp

Solution

Azure provides fantastic support for Linux VMs called Microsoft Azure Linux Agent.

sudo service waagent restart
vi /etc/waagent.conf 
# Mount point for the resource disk
ResourceDisk.MountPoint=/mnt/resources 

 Restart waagent service

 sudo service waagent restart

Install telnet client from command line on Windows Server

Problem

Need to install telnet client on Windows Server 2012, 2016

Solution

pkgmgr /iu:”TelnetClient”