Big Data Interview Questions [Beginner to Advanced] 2025

1. What is Splunk?

Splunk is a software platform that provides users with the ability to access, analyze and visualize data from machine data and other forms of data such as networks, servers, IoT devices, logs from mobile apps, and other sources.

The data collected from various sources are analyzed, processed, and transformed into operational intelligence that offers some real-time insight. It helps to widely use search, visualize, monitor, understand, and optimize the performance of the machines.

Splunk depends on indexes to store the data and gathered all required information to the central index, which helps in narrowing down the specific data for the users from the massive amount of data. Moreover, machine data after processing is extremely important for monitoring, understanding, and optimizing machine performance. 

2. How does Splunk work?

Splunk is a software that is used for the purpose of searching, analysing, monitoring, visualization, and examination of large amounts of machine-generated data via a web styling interface. If you want to use Splunk in your architecture, then you need to understand how it works. Data processing in Splunk happens using three stages.

Data Input Stage:

In the Data input stage, Splunk uses not only single but multiple sources to consume the raw data, then break it into 64K blocks, and then each block is annotated with metadata keys. A metadata key comprises the source, hostname, and source type of the data.

Data Storage Stage:

Data Storage Stage is further divided into two different phases: parsing and indexing.

In the Parsing phase, Splunk does the analysis and transformation of data and eventually extracts only the desired relevant information. This process is also called ‘event processing, as it involves breaking down data sets into different events.
In this Indexing phase, Splunk writes the parsed events from the Parsing phase down into the index queue. The main advantage of using this phase is to provide easy access to the data for everyone during the search.

Data Searching Stage:

The indexed data from the previous stage is controlled by this data searching stage, which includes how the index data is viewed, accessed, and used by the user. Reports, dashboards, event types, alerts, visualization, and other knowledge objects can be easily created based on the reporting requirements provided by the user.

3. What are the features not available in Splunk Free?

Splunk Free does not include the below features:

Authentication and scheduled searches/alerting
Distributed search
Forwarding in TCP/HTTP (to non-Splunk)
Deployment management
Agile statistics and reporting with Real-time architecture

4. Explain various components of Splunk architecture.

Splunk is a software that is used for the purpose of searching, analyzing, monitoring, visualization, and examination of large amounts of machine-generated data via a web styling interface. Splunk helps to perform indexing, capture, and correlation of the real-time data in a searchable container with the help of which it can produce graphs, reports, dashboards, alerts, and visualizations. Splunk architecture is composed of the below components.

Splunk Search head:

A Splunk Enterprise instance can be used for the purpose of a search peer as well as search head. Search management functions are handled by Splunk Enterprise instance only which helps to direct the search requests to a set of search peers and then collect and merge the results to end-users.

Splunk Forwarder:

Splunk Infrastructure consists of an important component known as Splunk Forwarder which works as an agent for the purpose of collection of logs from remote machines. After collecting these logs from remote machines, it forwards them to the Splunk database (also known as Indexer) for storage and further processing

Splunk Indexer:

Splunk Indexer is used for the purpose of indexing data, creating events using raw data, and then placing the results into an index. It also takes all the search requests into consideration and provides the desired response based on those search requests.

5. Explain various uses of Splunk

The various uses of Spunk are as follows.

Splunk helps to enable searching with the help of search processing language. A large amount of data can be easily inspected with the help of this language. Then statistical operations are performed on this huge data for any required context.
It helps to provide various apps, add-ons, and data sources. Splunk gets data from various data sources like Windows event logs, Log files, Syslog, and SNMP and it uses this data to find out when the application is getting started, how a user is interacting, etc. If we are still unable to find what we need then, Spunk also provides add-ons like Spunk’s App directory to get the desired data. This data is monitored independently of Splunk at various endpoints and then sends the collected data from Spunk for further processing.
Splunk helps to create indexes of the data present in the system. It accepts all kinds of data after installation. After completion of indexing, incoming data is processed and prepared for storage. Events are created by the segregation of data.
Splunk contains various tools which help to create reports like pie charts, graphs, bar charts, etc. These reports capture almost everything from frequencies to statistics. A user can easily customize the dashboard contained in the reports to show the required data. Along with that, log management can be easily managed by the alert mechanism provided by Splunk.
Splunk can be easily installed on any software and it is widely available on a range of platforms. Data can be easily distributed to multiple servers in case of any server failure or space shortage which eventually increases the speed as data is getting held by multiple machines. There is slim chance of any failure.
Understanding the infrastructure and identifying the root cause of the issues are very important and difficult tasks in DevOps. Splunk helps in analyzing the system performance and helps to find the root cause of various issues. Logs are generated which can be easily stored and monitored

6. What are the common ports that are used in Splunk?

The common port numbers for Splunk are:

Splunk Web Port: 8000
Splunk Management Port: 8089
Splunk Network port: 514
Splunk Index Replication Port: 8080
Splunk Indexing Port: 9997
KV store: 819

7. What are the various processes/commands in Splunk to perform the below operations?

command for restarting just the Splunk web server?
Commandfor restarting just the Splunk daemon?
Commandto check for running Splunk processes on Unix/Linux?
Commandto enable Splunk to boot start?
Commandto stop Splunk service?
commands used to stop the Splunk service?
commands used to start the Splunk service?
process disable Splunk boot start?

Please find the commands:

The command for restarting just the Splunk web server

àsplunk restart splunkweb

The command for restarting just the Splunk daemon

àsplunk restart splunkd

The command to check for running Splunk processes on Unix/Linux

àps aux | grep Splunk

The command to enable Splunk to boot start

à$SPLUNK_HOME/bin/Splunk enable boot-start

Process disable Splunk boot start

à$SPLUNK_HOME/bin/Splunk disable boot-start

The command to stop Splunk service

./splunk stop

The command used to stop the Splunk service

./splunk stop

The command used to start the Splunk service

./splunk start

8. What are other log management and Analysis tools available that can be used for?

Solar Winds Log Analyzer, Sematext Logs, Datadog, Site24x7, Splunk, Fluentd, ManageEngine EventLog Analyzer, LogDNA, Graylog, and Logalyze are the most popular Log Management Tools that are used worldwide.

Please find below a brief about some log management tools:

1. Solar Winds Log Analyzer: It is a log analyzer that helps in easily investigating machine data to identify the root cause of issues inafaster way.

2. Sematext Logs: Sematext Logs is a Log Management-as-a-service. In this, we can collect logs from any part of the software stack, IoT devices, network hardware, and much more.By using log shippers, we centralize and index logs from all parts in one single place. Sematext Logs supports sending logs from infrastructure, containers, AWS, applications, custom events, and much more, throughout an Elasticsearch API or Syslog. It's a cheaper alternative to Splunk or Logz.io.

3. Datadog: Datadog uses a Go-based agent and it made its backend from Apache Cassandra, PostgreSQL, and Kafka. Datadog is a SaaS-based monitoring and analytics platform for large-scale applications and infrastructure. It combines real-time metrics from the servers, containers, databases, and applications with end-to-end tracing, and delivers actionable alerts and powerful visualizations to provide full-stack observability. Also, it includes many vendor-supported integrations and APM libraries for several languages.

4. Site24x7: Site24x7 offers unified cloud monitoring for DevOps and IT operations with monitoring capabilities extending to analyze the experience of the real users accessing websites and applications from desktop and mobile devices.

5. Splunk: Splunk is one of the well-known log monitoring and analysis platforms available in market which is offering both free and paid plans.

It helps to collects, stores, indexes, analyzes, visualizes, and reports the machine-generated data, present in any form whether it’s structured, unstructured or sophisticated application logs.

With the help of Splunk, user can search through both real-time and historical log data.

It help user to create custom reports and dashboards to have better view about the performance of system and also help user to set up alerts where automatic trigger notifications can be sent through email in case defined criteria is reached.

6. ManageEngine EventLog Analyzer: ManageEngine EventLog Analyzer is a web-based and real-time log monitoring system that collects log data from various sources across the network infrastructure including servers, applications, network devices, and later on monitors user behavior, identifies network anomalies, system downtime, and policy violations.

Not only this EventLog Analyzer is also a compliance management solution that helps to provide solution for Security Information Event Management and detect various security threats which later helps to comply with the IT audit requirements.

We can also use the EventLog Analyzer even for analyzing data for extracting the meaningful information in the form of reports, dashboards, and alerts which are generally auto-configured in form of SMS or Email notification as indicators of compromise about network anomalies or threshold violations.

7.LogDNA: LogDNA is a centralized log management service tool available both in the cloud and on-premises that collects data from various applications, servers, platforms and systems and send to web viewer where it can be used to monitor and analyze log files in the real-time scenarios. With LogDNA it can be used to search, save, tail, and store data from any application, platforms, servers and system, such as AWS, Heroku, Elastic, Python, Linux, Windows, or Docker, which is able to handle one million log events per second.

8. Fluentd: Fluentd is the open-source log analysis tool that collects event logs from multiple sources such as application logs, system logs, server logs, access logs, etc., and unify the data collection into one logging layer which further helps in consumption for better use and understanding of data.

Fluentd allows to filter, buffer, and ship logging data to various systems such as Elasticsearch, AWS, Hadoop, and more.

It’s one of the most frequently used in teams due to the 500+ extensive plugin library which allows to connect with multiple data sources and drive better analysis.  

Other than these Fluentd has following features:

Unified Logging with JSON, due to which down streaming data becomes much easier.
Pluggable Architecture :Fluentd has a 500+ community-contributed plugins system that allows the community to extend its functionality.
Minimum Resources Required
Built-in Reliability

9. Logalyze: Logalyze is open-source, centralized log management, and network monitoring software. It supports Linux/Unix servers, network devices, and Windows hosts. It provides real-time event detection and extensive search capabilities. With this open source applicationsourceapplicationlect your log data from any device, analyse, normalize and parse them with any custom-made Log Definition, use the built-in Statistics and Report Definitions

10. Graylog: Graylog is a faster, affordable, effective, and open-source log management platform that collects data from the different locations across the infrastructure. It’s one of the most preferred among system administrators due to its scalability, user-friendly interface, and functionality along with speed, and scale in capturing, storing, and enabling real-time analysis of machine data.

Along with that Graylog provides customizable dashboards by which we can choose the metrics or data sources to monitor and analyze with the help of charts, graphs etc. We can also set alerts and triggers to monitor data failures or detect potential security risks

9. How to troubleshoot Splunk performance issues?

There are multiple ways and wider approach for troubleshooting the Splunk performance issue, for interview point of view below details can be covered:

Navigate to splunkd.log and check for any kind of errors
Server performance issues needs to be checked such as CPU or memory usage, disk i/o, etc
Install Splunk on Splunk (SOS) app and check for warnings and errors in the dashboard
The number of saved searches can also be helpful in checking the currently running and their system resource consumption, which further on helps in Splunk troubleshooting issues.
Firebug (which is a Firefox extension) installation is also helpful. Once it is installed and enabled, log into Splunk (using Firefox), open firebug panels, and switch to the ‘Net’ panel (where you will have to enable it). The Net panel will show us the details about HTTP requests and responses along with the time spent on each.

From this we can have insights about a lot of information quickly over requests that are hanging Splunk for a few seconds etc.

10. How we can add folder access logs from a Windows machine to Splunk?

To add folder access logs from a Windows machine to Splunk, below are the steps that need to follow:

Navigate to Group Policy and enable the Object Access Audit on the Windows machine where the folder is located.
Now once we enable the Object Access Audit, we have to enable auditing on the specific folder for which we want to monitor the access logs.
Proceed with installing Splunk Universal Forwarder on the Windows machine.
At last, configure the Universal Forwarder to send security logs to the Splunk Indexer.

11. How can we manage to exclude certain events from being indexed by Splunk?

This can be done by defining a regex to match the necessary event and sending everything else to the null queue. Here is a basic example.

The example that will drop everything except events that contain the string login

In props. conf:

[source::/var/log/foo]

# We must apply Transforms in this order

# to make we dropped sure events on the

# floor prior to making their way to the

# index processor

TRANSFORMS -set= setnull, setparsing

In transforms. conf

[setnull]

REGEX = .

DEST_KEY = queue

FORMAT = nullQueue

[setparsing]

REGEX = debugmessage

DEST_KEY = queue

FORMAT = indexQueue 

12. What are the different types of Splunk forwarders?

The Splunk forwarder is a free version the Splunk Enterprise that is used for collecting the machine logs and sending them to the indexer. Data transfer is a major problem with almost every tool in the market. Since there is minimal processing on the data before it is forwarded, a lot of unnecessary data is also forwarded to the indexer resulting in performance overheads.

As compared to the traditional monitoring tools, there is very less CPU utilization approximately 1-2% in the case of Splunk forwarder

There are basically three types of forwarders:

Universal Forwarder: A universal forwarder is ideal for sending the raw data collected at the source to an indexer. It is a simple component that performs minimal processing on the incoming data streams before forwarding data to an indexer.

The universal forwarder can get data from a variety of inputs and forward the data to a Splunk deployment for indexing and searching. It can also forward data to another forwarder as an intermediate step before sending the data onward to an indexer.

Also, the universal forwarder is a separately downloadable piece of software. Unlike the heavy and light forwarders, we do not enable it from a full Splunk Enterprise instance.

Heavy Forwarder: Heavy forwarder processes one level of data at the source itself before forwarding data to the indexer. It typically does parse and indexing at the source and also intelligently routes the data to the Indexer saving on bandwidth and storage space. So, when a heavy forwarder parses the data, the indexer only needs to handle the indexing segment.

One key advantage of the heavy forwarder is that it can index data locally, as well as forward data to another Splunk instance.

Light Forwarder: Light forwarder, forwards data to another Splunk Enterprise instance or to a third-party system. A light forwarder has less of an impact on system resources because it does not have as much functionality as a heavy forwarder.

It can be configured through the CLI or through Splunk Web.

13. What are Splunk alerts and write about different options available while setting up alerts? Also, what all alerts are available in Splunk?

Splunk alerts are actions that get triggered when a specific criterion is met which is defined by the user.  As a result of action – generally, there is mail, script, or notification is triggered as per added action. Splunk Alerts are set up to have continuous monitoring about the applied condition/ particular criteria is met and perform the action as per configured.

There are mainly two types of Splunk Alert:

Scheduled Alerts - These alerts search conditions at the scheduled time and share the result. We can choose the schedule according to different timing options available or can use a cron expression to schedule the alert.
Real-Time Alerts - In this case, Splunk Alert searches the string continuously and triggers the action once the search criterion is met.

For setting the Splunk Alert, we can trigger the query and then click on Save as --> Alert on the right top corner. Later we can add other details about Alert action, run window, and schedule.

14. How we can create dashboards in Splunk? And What are the different types of Splunk dashboards?

Splunk Dashboard panels are used to display charts, and table data visually in a pleasing manner. On the same dashboard, we can add multiple panels, multiple reports, and charts. Splunk dashboards are mainly popular for data platform system with lots of customization and dashboard options. 

To create a dashboard, we can save the search query as Dashboard Panel and then continue with mentioning a few other details such as Title, description, panel content setting, etc.

There are three kinds of the dashboard we can create with Splunk: 

Dynamic form-based dashboards: It allows Splunk users to change the dashboard data without leaving the page. This is accomplished by adding input fields (such as time, radio (button), text box, checkbox, dropdown, and so on) in the dashboard, which change the data based on the current selection. This is an effective type of dashboard for teams that troubleshoot issues and analyse data.

Static Real-time Dashboards: They are often kept on a big panel screen for constant viewing, simply because they are so useful. Even though they are called static, in fact, the data changes in real-time without refreshing the page; it is just the format that stays constant. The dashboard will also have indicators and alerts that allow operators to easily identify a problem and act on it. Static Real-time Dashboards usually show the current state of the network or business systems, using indicators for web performance and traffic, revenue flow, and other important measures.

Scheduled Dashboards: This type of dashboard will typically have multiple panels included on the same page. Also, the dashboard will not be exposed for viewing; it will generally be saved as a PDF file and sent to e-mail recipients at scheduled times. This format is ideal when you need to send information updates to multiple recipients at regular intervals.

Some of the Splunk dashboard examples include security analytics dashboard, patient treatment flow dashboard, eCommerce website monitoring dashboard, exercise tracking dashboard, runner data dashboard, etc.

15. What are the different types of Splunk products?

Splunk is available in three different product categories, which are as follows −

Splunk Enterprise − It is used by IT companies that have a large infrastructure and IT-driven business. It helps in gathering and analysing the data from diverse websites, applications, devices, sensors, etc. Data from your IT or business infrastructure can be searched, analyzed, and visualized using this program.
Splunk Cloud − It is a cloud-hosted platform with many of the same features as the enterprise version. It can be availed from Splunk itself or through the AWS cloud platform.
Splunk Light − It is a free version of Splunk that allows search, report, and alert on all the log data in real-time from one place. It has limited functionalities and fewer features as compared to the other versions.

16. What are the Search factor” and the “Replication factor”?

Splunk Enterprise provides high reliability in terms of data duplication and redundant search capability by offering the ability to specify a replication factor and search factor in configuration settings for clustered environments. Search Factor and Replication Factor are terms associated with Clustering techniques i.e., Search head clustering & Indexer clustering.

Search Factor: It is only associated with indexer clustering. The search factor determines the number of searchable copies of data the indexing cluster maintains. The default value for a search factor is 2, meaning that the cluster maintains two searchable copies of all the data buckets.

Replication Factor: It specifies the number of raw data copies of indexed data we want to maintain across the indexing cluster. Indexers store incoming data in buckets, and the cluster will maintain copies of each bucket distributed across the nodes in the indexing tier (as many copies as you specify for the replication factor) so that if one or more individual indexers go down, the data still resides elsewhere in the cluster.

This provides both the ability to search all the data in the presence of one or more missing nodes and to redistribute copies of the data to other nodes and so maintain the specified number of duplicate copies.

The indexing cluster can tolerate a failure of (replication factor -1) indexers (or peer nodes, in Splunk nomenclature). If we are using a replication factor (RF) of two, the cluster maintains two copies of the data, so we can lose one peer node and not lose the data altogether; if you use an RF of three, we can lose up to two nodes and still maintain at least one copy; and so on

Therefore, for the replication factor, the default value is 3.

In summary, the replication factor simply represents the number of copies of the raw data maintained across the indexing tier, and the search factor represents the number of copies of the index files used for searching that data is maintained. Also, the search factor must be less than or equal to the replication factor.

17. How many types of search modes are there in Splunk?

There are three types of search modes in Splunk:

Fast mode: speeds up your search result by limiting the types of data.
Verbose mode: Slower as compared to the fast mode, but returns the information for as many events as possible.
Smart mode: It toggles between different modes and search behaviours to provide maximum results in the shortest period of time.

18. State difference between ELK and Splunk?

There are various tools available in the market that help process and store machine data efficiently. Splunk and Elasticsearch both tools perform the same goal which is to handle log management problems and solve them seamlessly. We can choose the right toolbased on different business requirements.

Parameter	ELK	Splunk
Overview	ELK is abbreviated as Elasticsearch (RESTful search/analytics engine), Logstash (Pipeline for data processing), and Kibana (Data Visualization) which is an open-source log management platform provided by the company Elastic.	Splunk is one of the top DevOps tools for log management and analysis solutions. Apart from that, it also helps to provide Event management and security information solutions for determining the collective state of the company’s systems.
Agent for data loading	LogStash is used as an agent for the purpose of collecting the log file data from the target servers and loading it to the destination	Splunk Universal Forwarder is used as an agent for the purpose of collecting the log file data from the target servers and loading it to the destination.
Visualizations	ELK uses Kibana in the ELK stack for visualizations. Visualizations like tables, line charts, etc. can be easily created and added to the dashboard using Kibana. It doesn’t support user management, unlike Splunk. For enabling it, we can use out-of-the-box hosted ELK solutions.	The Splunk web UI consists of controls that are flexible enough to add or edit new or old components to the dashboard. It supports user management and can configure user controls for multiple users, each user can customize his own dashboard according to his own choice. Using XML, users can customize the application and visualizations on mobile devices also.
Cost	ELK is an open-source log management platform so it is free of cost.	We need to buy a license to use Splunk. We can either buy an annual subscription or pay one time for a lifetime subscription. This fee is dependent on the daily log volume that is getting indexed.

19. State difference between Spark and Splunk?

The various differences between Spark and Splunk are as follows.

Parameter	Spark	Splunk
Overview	Apache Spark is a fast general engine that is used for data processing at a large scale in Big Data. It is compatible with Hadoop data. In HDFS, through Spark’s standalone mode or YARN, it can run in Hadoop clusters and helps in processing data.	Splunk is one of the top DevOps tools for log management and analysis solutions. It is used for searching, monitoring, analyzing, and visualizing the machine data.
Working mode	It has both batch and streaming modes.	It has only one working mode i.e., streaming mode.
Cost	Spark is an open-source tool so it is free of cost.	We need to buy a license to use Splunk. We can either buy an annual subscription or pay one time for a lifetime subscription. This fee is dependent on the daily log volume that is getting indexed.
Ease of use	We can easily call and use APIs using Spark.	It is very easy to use via console.
Runtime	Processes are run very fast compared to Hadoop	It has a very high runtime

20. What is Splunk DB connect?

Splunk DB connect is the generic SQL database plugin that helps in integrating the database with Splunk queries and reports. Through DB connect, we can combine the structured data from databases with the unstructured machine data, and then use Splunk Enterprise to provide insights into all of that combined data.

DB Connect allows us to output data from Splunk Enterprise back to the relational database. We can map the Splunk Enterprise fields to the database tables that we want to write.

DB Connect performs the database lookups, which match fields in the reference fields to an external database for the event data. With the help of these matches, user can enrich the event data better by adding more meaningful information and searchable fields. 

Other than this DB Connect is beneficial with below:

Splunk DB Connect is beneficial for users for the scenario where we want to quickly get data from a database into Splunk Enterprise and want to perform lookups from data warehouses or state tables within Splunk Enterprise.
DB Connect is beneficial to preview data and validate settings example locale and time zone, rising column and metadata choice etc before indexing begins, to prevent accidental duplication or other issues at later stages.
DB Connect is also beneficial when we need to scale, distribute, and monitor database read or write jobs which prevent in overloading and receiving notice of failures

Reference: Splunk Official Document

21. What is the difference between Search Head Pooling and Search Head Clustering?

Search head Pooling: Pooling here refers to sharing resources. It uses shared storage for configuring multiple search heads to share user data and configuration. It allows users to have multiple search heads so they can share user data and configuration.

Multiplying the search heads helps in horizontal scaling during high/peak traffic times when a lot of users are searching for the same data.

Search Head Clustering: A search head cluster is a group of Splunk Enterprise search heads that share configurations, search job scheduling, and search artifacts, which are the results and associated metadata from a completed search job.

Search head cluster can be utilized in the distributed Splunk deployment to handle more users and concurrent searches, and to provide multiple search heads so that search capability is not lost if one or more search members go down

22. How to disable Splunk Launch Message?

To disable Splunk Launch Message, we can set the value OFFENSIVE=less in splunk-launch.conf,

This will suppress the messages from showing on the CLI during start-up.

23. What Is Dispatch Directory?

Each search or alert that run creates a search artifact that must be saved to disk. The artifacts are stored in directories under the dispatch directory. For each search job, there is one search-specific directory.

A directory is included in the Dispatch Directory for each search that is running or has been completed. When the job expires, the search-specific directory is deleted. The Dispatch Directory is configured as follows:

$SPLUNK_HOME/var/run/splunk/dispatch

We can take an example of a directory something named like 1346978195.13.

This directory includes a CSV file of all search results, a search.log containing details/information about the search execution, as well as other pertinent information

24. What are the types of Splunk Licenses?

The different Splunk licenses are as below:

The Splunk Enterprise license - The Splunk Enterprise licenses are the most common license types. They provide access to the full set of Splunk Enterprise features within a defined limit of indexed data per day or CPU count. There are several types of Splunk Enterprise licenses which include the Splunk Enterprise Trial license, and Splunk for Industrial IoT license, Developer Licence, Splunk Enterprise trial license.
The Free license - Some features are disabled in this free license. It allows for a limited indexing volume.
The Forwarder license - The Forwarder license is an embedded license within Splunk Enterprise. It is designed to allow unlimited forwarding, along with a subset of Splunk Enterprise features needed for configuration management, authentication, and sending data.

The universal forwarder installs the Forwarder license by default. Heavy forwarders and light forwarders must be manually configured to use the Forwarder license.

The Beta license: The beta license has similar features as the enterprise license but is restricted to Splunk software Beta releases.
Splunk Premium App license: A license for a Splunk premium app is used in conjunction with a Splunk Enterprise license to access the functionality of an app. Splunk Premium apps include, but are not limited to the Splunk Enterprise Security and ITSI

25. How we can reset the Splunk admin password?

For resetting the Splunk password of a version prior to 7.1:

We can follow the below steps:

Stop the Splunk Enterprise.
Find the password file for the instance (generally located at $SPLUNK_HOME/etc/passw) and rename it to passw.bk
After then Start Splunk Enterprise and login to the instance from Splunk Web using the default credentials of admin/changeme.
Here it will be asked to enter a new password for the admin account.
In case you have previously created any other users and know its login details, copy and paste their credentials from the passw.bk file into the passwd file and Splunk restart is required.

For setting Splunk password after the 7.1 version:

We can follow the below steps:

Splunk Enterprise must be stopped first.
Find the 'passw' file for your instance ($SPLUNK_HOME/etc/passwd) and rename it to passwd.bk
Create a file named user-seed.conf in your $SPLUNK_HOME/etc/system/local/ directory.
In the file add the following text:
PASSWORD = NEW_PASSWORD

In the place of "NEW_PASSWORD" insert the password you would like to use.

Start Splunk Enterprise and use the new password to log into your instance from Splunk Web. In case of earlier previously created other users and know their login details, copy and paste their credentials from the passwbk file into the passwd file and restart Splunk.

26. What is the Lookup in Splunk and what it is used for?

While fetching the data after Splunk search, we sometimes get to see the field details which don’t convey meaning as such. Example: By looking at process ID, we can't get an idea about what application process it is referring to. So, it becomes difficult for a human to understand the same. Therefore, linking process ID with process name can give us a better idea in understanding.

Such linking of values of one field to a field with the same name in another dataset using equal values from both the data sets is called a lookup process.

This helps us in retrieving the related values from two different data sets. Not only this, lookups help to expand event data by adding variations of the field value from the search tables. Splunk software uses lookups to retrieve specific fields from an external file to get the value of an event.

For creating a lookup, we can navigate to Settings, where we have Lookup, through which we can proceed to fill the data fields and create a lookup for the required data set.

We have different types of Lookups that can be used as per the scenario: There are four types of lookups:

CSV lookups
External lookups
KV Store lookups
Geospatial lookups

27. What are the input lookup command and output lookup command?

An input lookup basically takes input as the name suggests. It is used to search the contents of a lookup table. For example, it would take the product price, and product name as input and then match it with an internal field like a product id or an item id. Whereas an output lookup is used to write fields in search results to a static lookup table file or generate output from an existing field list. Basically, input lookup is used to enrich the data, and output lookup is used to build their information.

28. What are the few most important configuration files in Splunk?

Some of the important configuration files in Splunk are:

App.conf - It is used to configure the app properties.
Authorize.conf- It is used to configure ITSI-specific roles and capabilities, including role-based access controls.
Bookmarks.conf - It is used to bookmark the monitoring console URLs.
Collections.conf - It is used to configure the KV Store collections for apps.
Eventtypes.conf - It is used to create event type definitions.
Indexes.conf - It is used to manage and configure index settings.
Inputs.conf - It is used to set up the data inputs.
Searchbnf.conf - It is used to configure the search assistant.
Web.conf - It is used to configure Splunk Web, especially with enabling HTTPS.
alert_actions.conf -It generates ITSI notable events and configures episode actions.
datamodels.conf - Attribute/value pairs for configuring data models.
commands.conf - Connect search commands to any custom search script.
datamodels.conf- Attribute/value pairs for configuring data models.
visualizations.conf- Declare common visualizations that other modules can use.
restmap.conf- Create custom REST endpoints.
props.conf- Set indexing property configurations, including time zone offset, custom source type rules, and pattern collision priorities. Also, maps transform into event properties.

29. What is eval command? What is the difference between eval and stats commands?

The eval command in Splunk calculates an expression and puts the resulting value into a search results field. The eval command evaluates mathematical, string, and Boolean expressions.

In the scenario, where the field name mentioned by the user does not match a field in the output, a new field is added to the search results. On the other side, the field name mentioned by the user matches a field name that already exists in the search results, the results of the eval expression overwrite the values in that field.

The stats command calculates statistics based on fields in given events. The eval command creates new fields in events by using existing fields and an arbitrary expression.

Reference: Splunk Official Doc

30. How to clear Splunk Search History?

The Splunk search history can be clear by deleting the following file from the Splunk server:

$splunk_home/var/log/splunk/searches.log

31. What is MapReduce algorithm?

MapReduce implements mathematical algorithms to divide a task into small parts and assign them to multiple systems.

In Splunk, MapReduce algorithm helps in sending the Map & Reduce tasks to the appropriate servers in a cluster which helps in faster data searching.

32. How to enable and disable Splunk boot start?

To enable Splunk boot-start, we need to use the following command:

$SPLUNK_HOME/bin/splunk enable boot-start.

To disable Splunk boot-start, we need to use the following command:

$SPLUNK_HOME/bin/splunk disable boot-start

33. What are the important commands in Splunk which can be used for searching?

Below is the list of some of the important search commands in Splunk:

Erex: It allows us to specify example or counter-example values to automatically extract fields that have similar values.
Abstract: Produces a summary of each search result.
Typer: Calculates the event types for the search results.
Rename: This is used to rename a specified field; wildcards can be used to specify multiple fields.
Anomalies: It Computes an "unexpectedness" score for an event.
Filldown: Fill down replaces NULL values with the last non-NULL value.
Accum: It keeps a running total of the specified numeric field.
Add totals: Computes the sum of all numeric fields for each result.
from: It retrieves data from a dataset, such as a data model dataset, a CSV lookup, a KV Store lookup, a saved search, or a table dataset.
Concurrency: Uses a duration field to find the number of "concurrent" events for each event.
chart: Chart returns results in a tabular output for charting.
uniq: Removes any search that is an exact duplicate of a previous result.
where: where performs arbitrary filtering on your data.

34. What is the difference between Splunk App and Splunk Add-on?

Splunk applications and add-ons both use the same extension, but in general, both are quite separate.

Splunk App: An app is an application running on the Splunk Project. Apps are used to analyze and display knowledge around a particular source or set of data. Due to the navigable GUI for user interface, it is considered to be more useful in a wide range. Each Splunk app consists of a collection of Splunk knowledge objects (lookups, tags, saved searches, event types, etc).

An App can be built on a combination of different Add-ons together. This is possible where they can be reused again to build something completely different.

We can also apply user/role-based permissions and access controls to Apps, thus providing for a level of control while deploying and sharing apps across the organization. Example: Splunk Enterprise Security App, etc.

Splunk Add-on: An add-on offers unique features for helping to collect, standardize, and enrich data sources. This includes both free and paid versions. These are the applications that are built on top of the Splunk platform that add features and functionality to other apps.

This could have:

Data source input configurations.
Splunk Business data sorting and transformation settings for structuring the data.
Lookup files for data enrichment.
Supporting knowledge objects.

We could potentially use an Add-on on its own or bundle them together to form the basis of a Splunk App. In this aspect, Splunk add-on can be reused and modularity so that you can more rapidly construct your Apps.

35. What is Fishbucket? How does it work?

Fishbucket in Splunk is a sub-directory that is used to monitor or track internally how far the content of the file is indexed in Splunk. The fishbucket sub-directory achieves this feature using its two contents seek pointers and CRC (Cyclic Redundancy Check).

The default location of the fish bucket sub-directory is the $splunk_home/splunk/var/lib. To see the content of fishbucket, we can search it under the “index=_thefishbucket” in Splunk GUI

Working: The Splunk monitoring processor selects and reads the data of a new file and then hashes data into a begin and end cyclic redundancy check (CRC), which work as sa fingerprint representing the file content.

This CRC is further used to look up an entry in a database that contains all the beginning CRCs of files it has seen before.

The first step includes a file monitor processor that searches the fish bucket to see if the CRC from the beginning of the file is present there already or not.

This can lead to three possible scenarios:

Case 1. If CRC is not present in fishbucket, the file is indexed as new. This implies the file has never been indexed. After indexing, it stores CRC and seeks the pointer inside fishbucket.
Case 2. If CRC is present in fishbucket and seek pointer is same as the current end of file, this means the file has already been indexed and nothing has changed in the file since last indexed. Seek pointers are used to check if there is a file or not.
Case 3: If CRC is present in fishbucket and seek point is beyond the current end of file, this means something in part of the file which we have already read has been changed. In this case, we were not able to know what has been changed, we can re-index the whole data.

36. Explain pivot and data models.

Below is the difference between pivot and data models:

A Pivot is a dashboard panel in Splunk used to create the front views of the output with the help of filter. The main purpose of Pivots is to make user avoid SPL queries to populate the Pivot and make searching easier in Splunk by using existing data sets.

Data models are one of the most commonly used while creating structured, hierarchical model of data. Within this, datasets are arranged into parent and child datasets and can be helpful in case of using large a amount of unstructured data.

37. How would you handle/troubleshoot Splunk License Violation Warning?

Firstly, the License violation warning basically means Splunk has indexed more data than our purchased quota.

Generally, in this case, to handle a License Violation warning we have to identify which index or which source type has received more data than the usual daily data volume and once we identified a data source that is using a lot of licensed volume, we have to find out source machine which is sending a huge number of logs and root cause for the same.

Based on the below scenario, troubleshooting can be done accordingly. i.e.

Check if this was a one-time data ingestion issue.
Check if this is a new average license usage based upon changes in the infrastructure.
Check if you can filter and drop some of the incoming data.

38. How does Splunk avoid the duplicate indexing of logs?

One method could be top partition the set of files on the different Splunk instances to read and forward.

We can divide logs based let say part1 and part 2 and whitelist part 1 on one set of node: /var/log/[a-m]* and another part on other set of node /var/log/[n-z]*.

39. What is the importance of License Master in Splunk? If the License Master is not reachable, what will happen?

License master is used for the purpose of indexing the right amount of data effectively. It helps to limit the environment to use only a limited amount of storage as per the purchased volume via license throughout the time period in a balanced way.

License master helps to control all its associated license slaves. It provides its slaves access to Splunk Enterprises license. After configuring a License master instance and adding license slaves to it, license slaves make a connection with the license master every minute.

Due to any reason, if the license master is not reachable or not available then a 72 hours timer is started by the license slave. If the license master is still not able to connect with the license slave after completion of 72 hours, then the search is blocked on the license slave, but the indexing process still continues which means that the Splunk deployment receives data and is also indexed. Users will not be able to search data in license slaves until the connection is built again between license slave and license master. When the indexing limit is reached then the user will get a warning to reduce the data intake. Users can upgrade their storage licenses to increase volume capacity.

40. What is Splunk bucket and explain Splunk bucket lifecycle?

A bucket in Splunk is basically a directory for storing data and index files. Each bucket contains data events in a particular time frame. As data ages, buckets move through different stages as given below: 

Hot bucket: A hot bucket contains newly indexed data. It is open for writing. Every index contains one or more hot buckets.
Warm bucket: This bucket contains data that has been rolled or pulled out of the hot bucket. The warm buckets are numerous.
Cold bucket: This bucket contains data that has been rolled or pulled out of the warm bucket. The cold buckets are also numerous.
Frozen bucket: A frozen bucket is comprised of data rolled out from a cold bucket. The indexer deletes frozen data by default, but we can archive it. Archived data can later be thawed (data in a frozen bucket is not searchable).

Buckets are by default located in the below folder:

$SPLUNK_HOME/var/lib/splunk/defaultdb/db.

41. What is the importance of time zone property in Splunk?

Time zone property is an important property that aids when we are searching for events in case of any security breach or fraud. Splunk uses the default time zone which is defined by your browser settings. This time zone is picked up by your browser from the computer or machine on which you are working on.

If you will search for your desired event in the wrong time zone, then you won’t be able to find it. Splunk picks up the time zone when data is entered, and time zone is very important when data from different sources are being searched and compared. We can take an example of events coming in at 5:00 PM IST for your Vietnam data centre or Singapore data centre etc. So, we can say that time zone property is very crucial when comparing such events.

42. What do you mean by File precedence in Splunk?

File precedence plays an important role while troubleshooting Splunk for an administrator, developer, or architect. All Splunk’s configurations are written within plain text .conf files. Most of the aspect of Splunk's behaviour is determined by these configuration files only.

There can be multiple copies present for each of these files, and thus it is important to know the role these files, during a Splunk instance is running or restarted. For modifying configuration files, the user must know how the Splunk software evaluates those files.

File precedence is an important concept to understand for a number of other reasons as well, some of them are below:

To be able to plan Splunk upgrades
To be able to plan app upgrades
To be able to provide different data inputs
To distribute the configurations during the Splunk deployments.

To determine the priority among copies of a configuration file, Splunk considers the context of each configuration file. Configuration files can either be operated in a) Global or b) For the current application/user. 

Directory priority descends as follows when the file context is global:   

System local directory -- >highest priority  
Application local directories  
Application default directories  
System default directory --> lowest priority

Directory priority descends from user to application and then to the system when the file context is current application/user:

User directories for the current user -- >highest priority.  
Application directories for the currently running application (local, followed by default). 
Application directories for all the other applications (local, followed by default) --> for exported settings only.
System directories (local, followed by default) --> lowest priority.

43. Explain what is Splunk Btool?

The Btool in Splunk is a command-line tool that is used to troubleshoot and help us with theconfiguration files. Btool is a utility created and provided within the Splunk Enterprise fordownload and which also comes as a rescue whiletroubleshooting .conf files.

It specifically helps in identifying the “merged” .conf files that are written to the disc andthe current .conf files contained at the time of execution.

Few useful btool commands:

splunk help btool # display usage and list of options
splunk btool check # check all configurations for typos/errors
splunk btool <conf file name> list [options] # conf file name w/o '.conf'
To get a list of all of the configured indexes and their configuration settings, perform the following:
./splunk btool indexes list
./splunk btool indexes list | grep '\[' # list just the stanzas
Btool is located under the $SPLUNK_HOME/bin directory which displays the merged on-disk configuration values from any of the .conf files in the system.
But btool displays the on-disk configuration file settings. In case a user changes a setting and doesn’t restart Splunk, btool will reflect the new settings, but that's not what will necessarily be running in memory.

44. What is the difference between Splunk SDK and Splunk Framework?

Splunk Framework is platform that resides within the Splunk web server and allows us to build dashboards in Splunk Web UI where user accesses splunk through browser, logs in like normal and interacts with Splunk Application or can build a dashboard using web interface other than splunk web UI. Splunk framework does not require separate license to allow users to modify anything in Splunk

Splunk SDK are a set of tools that are designed to allow developer to build applications from scratch which interact with the APIs presented by splunkd.

This generally doesn't require Splunk Web or any components from the Splunk App Framework while building application. The licence for Splunk SDK is separate from Splunk Software.

Splunk Interview Questions and Answers for 2025

Introduction

Beginner