elasticsearch.yml

Elasticsearch Configuration

By default elasticsearch.yml comes with out of the box support with default settings. It doesn't need to be changed too much
By default the configuration file contains all the required template items. Using there is the best way to start to build a production level Elasticsearch

Cluster Name

Do not forget to give your production cluster a name as this is used to discover and auto-join other nodes
Don’t reuse the same cluster names in different environments otherwise you might end up with nodes joining the wrong cluster
For instance you could use logging-dev, logging-stage and logging-prod for the development, staging, and production clusters

cluster:
  name: <NAME OF YOUR CLUSTER>

cluster.name: elasticsearch_netmon_prod

Node Name

You may also want to change the default node name for each node to something like the display hostname
By default Elasticsearch will randomly pick a Marvel character name from a list of around 3000 names when your node starts up if you haven't choosen a name

node:
  name: <NAME OF YOUR NODE>

The hostname of the machine is provided in the environment variable HOSTNAME
If on your machine you only run a single Elasticsearch node for that cluster you can set the node name to the hostname using the `${...}`` notation

node:
  name: ${HOSTNAME}

node.name: elasticnetstack-6

Node Rack

This is used to specify special attributes like the location of the rack where the node is located
It also appears in the JSON for Elasticsearch

node.rack: r1

Paths

In production use you will almost certainly want to change paths for data and log files

path:
  logs: /var/log/elasticsearch
  data: /var/data/elasticsearch
  conf: /usr/local/etc/elasticsearch

Memory

This is one of the key components of Elasticsearch
Most operating systems try to use as much memory as possible for file system caches and eagerly swap out unused application memory, possibly resulting in the Elasticsearch process being swapped
Swapping is very bad for performance and for node stability, so it should be avoided at all costs
Disable swap
- The simplest option is to completely disable swap
- Usually Elasticsearch is the only service running on a box and its memory usage is controlled by the ES_HEAP_SIZE environment variable. There should be no need to have swap enabled
- On Linux systems, you can disable swap temporarily by running: sudo swapoff -a
- To disable it permanently, you will need to edit the /etc/fstab file and comment out any lines that contain the word swap
Configure swappiness
- The second option is to ensure that the sysctl value vm.swappiness is set to 0. This reduces the kernel’s tendency to swap and should not lead to swapping under normal circumstances while still allowing the whole system to swap in emergency conditions

From kernel version 3.5-rc1 and above a swappiness of 0 will cause the OOM killer to kill the process instead of allowing swapping. You will need to set swappiness to 1 to still allow swapping in emergencies

mlockall
- The third option is to use mlockall on Linux/Unix systems preventing any Elasticsearch memory from being swapped out

bootstrap.mlockall: true

mlockall might cause the JVM or shell session to exit if it tries to allocate more memory than is available!

curl http://localhost:9200/_nodes/process?pretty

The most probable reason, on Linux/Unix systems, is that the user running Elasticsearch doesn’t have permission to lock memory
This can be granted by running ulimit -l unlimited as root before starting Elasticsearch

Network

Elasticsearch binds to localhost only by default
Set the bind address to a specific IP (IPv4 or IPv6)
This is one of the most simple and best way to protect the data accessing by other users from elasticsearch
You can use an IP address, hostname or an array of any combination of these

network.host: 127.0.0.1

You can also specify a custom port based on your use case

http.port: 9200

Discovery

Elasticsearch is a peer to peer based system, nodes communicate with one another directly if operations are delegated / broadcast
In order to join a cluster, a node needs to know the hostname or IP address of at least some of the other nodes in the cluster
This setting provides the initial list of other nodes that this node will try to contact, it accepts IP addresses or hostnames
Defaults to ["127.0.0.1", "[::1]"]

discovery.zen.ping.unicast.hosts: ["host1", "host2"]

All the main APIs (index, delete, search) do not communicate with the master node
The responsibility of the master node is to maintain the global cluster state and act if nodes join or leave the cluster by reassigning shards
Each time a cluster state is changed the state is made known to the other nodes in the cluster
To prevent "split brain" by configuring as the smallest possible majority of nodes to be master (total number of (nodes / 2 )+ 1)

discovery.zen.minimum_master_nodes: 3

Example configuration

discovery.zen.minimum_master_nodes: 5
discovery.zen.ping.unicast.hosts: 10.64.14.66,10.64.3.249,10.9.248.64,100.64.15.128,10.9.240.151,10.9.212.223,10.65.9.238,10.65.3.43
discovery.zen.ping.multicast.enabled: false

Gateway

The local gateway module stores the cluster state and shard data across full cluster restarts
This must be set on every data node in the cluster

gateway.recover_after_nodes: 3

These settings only take effect on a full cluster restart

Various-Other

Disable starting multiple nodes on a single system

node.max_local_storage_nodes: 1

Require explicit names when deleting indexes
- This prevents accidental deletion since explicit names are now required

action.destructive_requires_name: true

To setup master and data nodes
- This is helpful when you are building a cluster for production

node.master: true
node.data: true

Index related configuration changes
- To predefine each index shard, by default elasticsearch creates 5 primary shards

index.number_of_shards: 5

This setting is to define the shard replica
- One of the most important configuration settings for High Availability environments
- This can be used to take a backup within elasticsearch using the shard functionality

index.number_of_replicas: 1

This is to dynamically auto update the elastic search indexes
- Which helps in performance best practice

index.mapper.dynamic: true

This is a default setting in elasticsearch which helps to create an index automatically based on the logstash output

action.auto_create_index: true

This disables the ability to delete all indexes at a single time

action.disable_delete_all_indices: true

Example configuration samples

https://gist.github.com/zsprackett/8546403

https://joinup.ec.europa.eu/svn/opencities/tags/opencities-core/release-3.4.3/core-module/src/main/resources/elasticsearch.yml

References

https://www.elastic.co/guide/en/elasticsearch/reference/current/setup-configuration.html

EXTRA: Elasticsearch Configuration Checklist

elasticsearch.yml

Example configuration samples

References

results matching ""

No results matching ""