If you are looking to build your career as a SQL Developer then prepare in advance with these frequently asked interview questions on SQL. This SQL based interview questions covers from basic to advanced level of SQL concepts which include types of statement in SQL, Schema Vs. Database, operators in SQL, normalization, nested queries, etc. which will help you get acquainted with the SQL concepts and answer every question with confidence.
A page is a fundamental unit of storage that stores data. The page is the size of 8KB and we have 8 types of pages in SQL Server.
An identity column in the table has auto-generate & auto increase value with each new row insert. The user cannot insert value in the identity column.
The sequence is a new feature introduced with SQL Server 2012 similar to Oracle’s sequence objects. A sequence object generates a sequence of unique numeric values as per the specifications mentioned. Next VALUE for a SEQUENCE object can be generated using the NEXT VALUE FOR clause.
IDENTITY is column level property & tied to a particular table. This cannot be shared among multiple tables.
SEQUENCE is an object defined by the user with specific details and can be shared by multiple tables. This is not tied to any particular table.
IDENTITY property cannot be reset to its initial value but the SEQUENCE object can be reset to initial value any time.
Maximum value cannot be defined for IDENTITY whereas this can be done for SEQUENCE object.
The DMV (Dynamic Management Views) is a set of system views introduced with SQL Server 2005. DMV’s are a new tool of DBA to get internal information of the system and it’s working.
SQL Server supports different types of isolation level.
SQL Server supports the following types of database backups:
SQL Server database backups are supported by all 3 recovery models:
With Compressed backups, SQL Server works on the pre-allocation algorithm to determine how much space the compressed backup will require and will allocate this space at the start of the backup process. At the time of backup completion, the backup process might expand or shrink the backup file on the basis of actual requirement.
Pre-Allocation helps SQL Server save the overhead of constantly growing the backup file.
Sometimes when we have a very large database and the difference in uncompressed and compressed backups is very high, preallocated and actual backup space has a significant difference. Trace flag 3042 promote better space management with little over-head to SQL server to grow backup file on a regular basis. This is useful with limited disk space.
Enabling trace flag 3014 may result in a little spike in CPU & memory but not significant to cause performance issues.
Pseudo-simple Recovery Model is situational based recovery model. This is something not given by project but ones with circumstances. It’s a situation where the database may be configured in the FULL recovery model but behaves like a SIMPLE recovery model.
The database will behave like a SIMPLE recovery model until a full database backup is taken after switching recovery model to Full.
For example, You have a database in Simple Recovery Model and now you have switched it to the FULL recovery model. You need to perform a full backup after switching recovery model otherwise database will keep behaving like simple and keeps truncating Transaction log on commit.
Consider a scenario where the server is performing a full backup at Sunday 2 AM, Differential backup Monday to Saturday 2 AM and Transaction Log backup in each 30 Min. You have been reported system crash around Wednesday 7 AM, Please share the recovery path with required backups to recover the system.
You need to restore Last Sunday full backup happen at 2 AM then Wednesday Differential backup of 2 AM followed by all Transaction log backups after. Consider all backups are valid, in case of any bad backup file recovery path, will be changed.
SQL Server Agent provides multiple functionalities like:-
SQL Server agent has lots of features and sometimes you need to give rights on an agent to manage and view jobs. Allowing every one with sysadmin permission is not appropriate.
SQL Server has 3 fixed DB roles for permission on SQL Agent & Jobs.
The Guest user account is created by default with SQL server installation. Guest user is not mapped to any login but can be used by any login when explicit permissions are not granted to access an object. You can drop the guest user from all databases except Master and TempDB.
When any user login to the SQL Server, it has 2 parts Server & database. First, at the Server level, user authentication verifies at the server level and User is able to login to SQL Servers. Second, Login and mapping to the database are verified. In case, Login is not mapped to any user but able to log in on SQL server. SQL automatically map that login to Guest and grant him database access.
One of the security recommendations of Microsoft is to Drop or disable a Guest user in every database except Master & TempDB database. By having Guest user, you are at risk of unauthorized access to data.
An SQL injection is a web hacking techniques done by unauthorized personnel or processes that might destroy your database.
An SQL injection is a web security vulnerability that allowed hackers to access application code and queries used to interact with the database. Hacker uses those methods to retrieve data belonging to other users and data not authorized to them. The major challenge is SQL injection can cause a system crash, data stolen, data corruption, etc.
An SQL injection is not a task of one man or team. Complete support and architecture team work together to get it prevented to happen.
Developers \ DBAs are responsible for DB security and proper SQL code.
Applications developers are responsible for application code and Db access methods.
Infra team is responsible for network, firewall & OS security.
Proper SQL instance, OS & Farwell security with a well-written application can help to reduce the risk of SQL injection.
Each table can have only one Cluster & multiple nonclustered indexes. All nonclustered indexes use index key of cluster index or directly depends on the clustered index.
Because of this dependency, we usually got this question if rebuilding the clustered index will rebuild the nonclustered index on a table or not.
The answer is NO. Cluster index rebuild will not rebuild or reorganize the nonclustered index. You need to plan a nonclustered index rebuild or reorganization separately.
Push & Pull is a type of Replication configuration. It decides how data will be replicated to the subscriber.
Push – In Push subscription, the publisher is responsible for sending data. Publisher pushes data from the publisher to the subscriber. Distributor agent will run on the Distributor to push changes. The distributor can be configured on the Publisher or separate server. Changes can be pushed to subscribers on demand, continuously, or on a scheduled basis.
Pull is best suited configured when the subscriber is connected all the time and need the latest data all the time accessed.
Pull - In Pull subscription, Subscribers is responsible for fetching data. Subscribers requests changes from the Publisher. Distributor agent will run on the subscriber to Pull changes. The subscriber can pull data as & when needed.
Pull is best suited when subscribers are not online every time. The subscriber can allow data lag and can wait for delay in data refresh
Log shipping is a Microsoft SQL Server DR solution from the very beginning. In Log Shipping, Backup and restores of a database from one Server to another standby server happens automatically.
Log Shipping works with 3 mandatory & 1 optional Job.
These 3 jobs are created separately for each database configured in log shipping.
Also, You can use the Transaction Log Shipping Report at Server Level to view the current status.
Principal Server:- The main server holding a primary copy of the database and serving client application & requests.
Mirror Server:- The secondary server which holds a mirrored copy of Principal database and acts as a hot or warm standby server on basis Synchronous & Asynchronous configuration.
Witness Server: - The witness server is an optional server and it controls automatic failover to the mirror if the principal becomes unavailable.
The FQDN (fully qualified domain name) is computer name of each server with the domain name. This can be found running the following from the command prompt:
Concatenate the “Host Name” and “Primary DNS Suffix”
Host Name . . . . . . . . . . . . : XYZ
Primary DNS Suffix . . . . . . . : corp.mycompany.com
The FQDN of your computer name will be XYZ.corp.mycompany.com.
UNION blends the contents of two structurally-compatible tables into a solitary joined table. The distinction among UNION and UNION ALL is that UNION will discard duplicate records through UNION ALL will incorporate duplicate records. Note that the presentation of UNION ALL will commonly be superior to UNION since UNION requires the server to do the extra work of expelling any duplicate. In this way, in situations where there is a surety that there won't be any copies, or where having copies isn't an issue, utilization of UNION ALL eventual suggested for performance reasons. Let's have a look at the below examples explaining the usage of both. In the first, we have used UNION and in the second we have explained UNION ALL.
Inner Join- This is the simplest one and most widely used JOIN. This is default one if we do not specify any JOIN between tables. It returns all matching records from both tables where the matching condition is satisfied.
LEFT JOIN- We call this LEFT OUTER JOIN as well. When we have situations like we want all columns from one table and only matching records from another table, we go for this type of JOIN. There are two types of same. The LEFT one is the first type where we would like to have all records from LEFT table and only matching records from RIGHT one then we go for this type of JOIN. When we do not have any matching condition in the right table then all columns from the right one will be NULL while LEFT table will have all the records.
RIGHT JOIN- We call this RIGHT OUTER JOIN as well. This is just the reverse of what we discussed for LEFT JOIN. The result set will be having all records from the right table but only matching records from the left one. Even if the ON clause which gives matching record fails, it will ensure rows are returned from the right table and corresponding columns from the LEFT table with NULL values.
FULL JOIN- It is also called FULL OUTER JOIN. It is having characteristics of both LEFT /RIGHT outer join. The result set will be having rows whenever we have the match in any of the tables either LEFT or RIGHT one. We can also say that it gives the same result if we apply UNION on LEFT and RIGHT OUTER JOIN resultset.
CROSS JOIN- This is a cartesian product of two tables where each row from the primary table is joined with each and every row of the secondary table. Even if we use SELECT statement on two tables separated by a comma and without any WHERE condition, it will give the same result as we get from applying CROSS JOIN.
Master: It contains framework catalogs that keep data about disk space, record portions, use, system-wide setup settings, login accounts, the presence of other databases, and the presence of other SQL Servers (for appropriate activities). If this database does not exist or corrupted then the SQL Server instance cannot start. Although we user objects in the master database, it is not advised to do so. This database should always remain as static as possible. In the case of a master database being rebuilt, all user objects will be lost.
Model: It is basically a template database. Each time you make another database, SQL Server makes a duplicate of a model to frame the premise of the new database. Any changes made to this database, related DB size, collation, recovery model, and any other configurations, are applied to any new database created afterward.
Tempdb: Temporary database, tempdb, is a workspace. SQL Server tempdb database is one of a kind among every single other database since it is reproduced not recuperated each time SQL Server is started.
Msdb: This database is utilized by the SQL Server Agent Service, which performs planned exercises, for example, backup and replication assignments.
Normalization and denormalization are the strategies utilized in databases. The terms are differential where Normalization is a procedure of limiting the addition, removal and update peculiarities through disposing of the redundant information. Then again, Denormalization is the reverse procedure of Normalization where the repetition is added to the information to improve the exhibition of the particular application and information integrity. Normalization prevents the disk space wastage by limiting or disposing of the redundancy.
The indexing is required to quicken search results in the database. If we compare index in our real world then page number of books and keywords mostly on the back side of book work similar as Index. We can quickly go to respective pages if we know the page number and also if we have an idea of keywords, we are looking into the book then just visiting keywords section will make our job easier as keywords are linked with page numbers. There are two types of indexes that exist in SQL database. One is called clustered while other is called non-clustered. The page number of the book is similar to the clustered index while the keyword section of the book represents non-clustered indexes. They exist in the database as a B-Tree structure. Let's go into the details of each index.
We can have only one clustered index per table. Once we create a primary key, by default it ends up creating a clustered index on that key. We can also specify the index type as well in case we would like to have non clustered index as well. The table is called heap if it does not have any clustered index. As we said earlier that indexes are B-Tree structures. The leaf node is mapped to data pages of the table in case of a clustered index. The data is physically sorted based on the clustered index. This is the reason why it is quickest in two of the indexes.
Earlier it was possible to have only 249 non clustered indexes on the table but after SQL server 2008 we can have 999 non clustered indexes. It gets created by default when we create unique key constraint. We can also decide index type as well while creating Unique constraint. It is not mandatory to have any non clustered index. The leaf nodes of non clustered index map to index pages having details of clustering key or RID to get the actual row. In case the table does not have any clustered index, leaf nodes are mapped to the physical location of the row which is called as RID and if present leaf nodes mapped to clustering key. It is much quicker for DML operations like adding /deleting and updating records.
The magic tables are an integral part of Triggers. They facilitate Triggers and make the job easier. There are two magic tables that exist, one is inserted one and the other one is called deleted one. These are built by SQL server itself to hold the data while processing gets triggered. They hold the inserted, deleted and updated records while doing DML operations on a table. If we have triggers for INSERT operation then the newly inserted values are stored in the INSERTED magic table. If triggers get fired for UPDATE /DELETE operation then updated and deleted values of records are stored in DELETED magic tables.
INSERTED Magic table
Let me explain the INSERTED magic table by looking at the below example. In the below screenshot we have triggers for INSERT operation and we can see two tables INSERTED/DELETED which we can use in the trigger for performing any manipulation.
DELETED Magic table
The below example illustrates the use case of a DELETED magic table which comes into the picture when we have UPDATE/DELETE trigger. This table holds the old record which was either updated/deleted.
SQL server favors all three relationships and is well supported by SQL server design. Let me explain to you each relationship one by one:
One to One – This type of relationship is supported by a single table and in some cases, we can have two tables as well. As the name suggests, we can have only a single record from each entity, primary and secondary. A person can have only one passport, he can not have more than one. In below example we have two tables Person and Passport having one to one relationship by leveraging foreign key and forcing unique key constraint on the foreign key. In this case, person ID which is the primary key in one table works as the foreign key.
One to Many – This type of relations is always supported by two tables. This relationship focuses on at least one entry in the secondary table for each entry in the primary table. But in the primary table, we will always have a single entry for each record against each record of a secondary table. Let me try to understand you with below examples where we have two table book and author. A book can have more than one writer so there will always be more than one entry of book in author table but there will always be a single entry of author in the author table. This type of relationship is supported by a primary key-foreign key relationship. Here Book Id is working as a foreign key in the Author table to support one to many.
Many to Many – This type of relationship is realized by more than two tables where one table works as Join table between the first two tables. In the below example we have Students, Enrollments, and Classes as three tables where Enrollment table is working as a bridge between Students and classes. One student can enroll in multiple classes and one class can be mapped to multiple students.
There are several scenarios in which we use temp tables. In case we have no DDL or DML access to a table. You can utilize your current read access to maneuver the information into a SQL Server temp table and make modifications from that point. Or on the other hand, you don't have the authorization to make a table in the current database, you can make a SQL Server temp table that you can control. At long last, you may be in a circumstance where you need the information to be available just in the present session.
The temp tables have "hashtag" followed by the table name. For instance: #Table_name. SQL temp tables are made in the tempdb database. A local SQL Server temp table is just available to the present session. It can't be accessed or utilized by procedures or queries outside of the session. A standout frequently utilized situation for SQL Server temp tables is when we use a loop in the query. For instance, you need to process information for a SQL query and we need a spot to store information for our loop to peruse. It gives a speedy and productive way to do as such. Another motivation to utilize SQL Server temp tables is when we have to execute some expensive processing. Suppose that we make a join, and each time we have to pull records from that result-set then it needs to process this join again and again. Why not simply process this outcome set once and toss the records into a SQL temp table? At that point, you can have the remainder of the SQL query allude to the SQL temp table name. In addition to the fact that this saves costly processing, it might even make our code look a little cleaner.
The SQL temp table is dropped or demolished once the session drops. Most of the time we will see: DROP #tbl command but it is not mandatory. Temp tables are always created in the tempdb database. It resolves name conflict by adding a suffix in case of a similar name for temp tables.
Global SQL temp tables are helpful when we need to reference tables in all sessions. Anybody can add, modify, or delete records from the table. Additionally, note that anybody can DROP the table. Like Local SQL Server temp tables, they are dropped once the session drops and there are never again any more references to the table. We can generally utilize the "DROP" command to tidy it up manually.
Table variables are more like some other variables. It is a common understanding that table variables exist just in memory, yet that is basically not correct. They live in the tempdb database just like local SQL Server temp tables. Additionally like local SQL temp tables, table variables are accessible just inside the session in which they are declared. The table variable is just available inside the present batch.
If performance is criterion then table variables are helpful with modest quantities of data. Generally, a SQL Server temp table is helpful when filtering through a lot of information. So for most times, we will in all probability observe the utilization of a SQL Server temp table rather than a table variable.
Self-join is the one in which the same table is joined by itself to get the desired output. We can understand self join in case of the below scenario :
Let's assume we have below table structure for Employee :
If we need to extract employee and manager information from the above table then the only option we have is to using self join. We can make a self join on the Employee table on joining ManagerID with EmployeeID. Please find below query which explains self join clearly :
Although we can have other alternatives as well to get the same desired result set. It uses left outer Join:
CTE is virtually created temporary storage which holds query output created by SQL syntax SELECT, INSERT, UPDATE, DELETE OR CREATE VIEW statement. It holds the data until the time of the query session. It is similar to the derived table only. We can also use the CTE in case of VIEW in some scenarios:
WITH common_table_expression Railroad diagram
expression_name: Is a valid identifier for the common table expression. It must be different from the others defined within the same WITH clause, but it can be the same as the name of a base table or view. Any reference to it in the query uses the common table expression and not the base object
column_name: Specifies a unique column name in the common table expression. The number of column names specified must match the number of columns in CTE _query_definition
If the query definition supplies distinct names for all columns then the column names are optional
CTE_query_definition: This is a SELECT statement whose result set populates the common table expression. This must meet the same requirements as for creating a view except that a CTE cannot define another CTE
Let me list down some benefits of using CTE :
There are two types of CTE :
One of the foremost common issues that occur when running parallel transactions is the Dirty read problem. A dirty read happens once when the transaction is permissible to read the information that's being changed by another one that is running at the same time however which has not nevertheless committed itself.
If the transaction that modifies the information and does the commits itself, the dirty read problem never occurs. but if the transaction, that has triggered the changes in the information, is rolled back when the opposite transaction has read the data, the latter transaction has dirty information that doesn’t truly exist.
Let us try to understand the scenario when a user tries to buy a product. The transaction which does the acquisition task for the user. the primary step within the transaction would be to update the Items in Stock.
Before the transaction, there are 12 items in the stock; the transaction can update this to 11 items. The transaction concurrently communicates with a third party payment gateway. If at this time in time, another transaction, let’s say Transaction 2, reads Items In Stock for laptops, it'll read 11. However, if after, the user who has triggered the first transaction A, seems to possess light funds in his account, transaction A is rolled back and therefore ItemsInStock column can revert to again 12. However, transaction B has 11 items because it read old data from ItemsInStock column. This is often dirty information and therefore the drawback is termed a dirty scan problem.
Both are a set of SQL statements which are encapsulated inside the block to complete some task. But they are very different in so many ways.
Stored Procedure- The set of SQL statements in Stored procedure is pre-compiled objects which get compiled for the very first time only and afterward it gets saved and reused.
Functions- These are executed and compiled every time it gets called. It is mandatory for the function to always return some value also it cannot do DML operations on data.
Let me list down some major differences for both :
It is very critical for any database to maintain data integrity and having consistent data. For choosing any database this is one of the important considerations. Data architects evaluate database on ACID properties only. The ACID is an acronym and stands for Atomicity, Consistency Isolation, Durability.
Let me explain to you four pillars of ACID property:
Atomicity -These properties ensure that either all operation part of any database transaction will commit or none of the operations will be successful. This will ensure the consistency of data. Nothing partially gets committed in the database either all or none.
Consistency-This properties ensure that data will never be in a half-finished state. It ensures changes in data always happen in a consistent way.
Isolation-This ensure that all transactions run independently and without intervening one another and till the time each transaction is finished in its own way.
Durability- This property is very critical as it ensures that even in case of any failover, there is enough mechanism to recover data in the system.
A normalization which ensures a reduction in redundancy and dependability. It is a popular database breaking down large tables into smaller ones with the help of a relationship. Edgar code was the first one who coined the term normalization and came with three different forms of normalization(1NF,2NF,3NF). Raymond F. Boyce later added another form of normalization which has been named after both inventors (Boyce-Codd NF). Till now we have below normalization forms available so far:
1st Normal Form
4th 5th 6th
But in most practical scenario database design is well supported till 3NF only. Now, let us understand the first four forms of normalization:
The above 1NF table can be extended to 2NF by diving above tables into below two tables:
For the above table structure, we can have below design to support 3 NF
The Primary key is the single column or combination of the column which uniquely distinguishes individual rows in the table. It must be unique but can not have NULL values which make it different then UNIQUE KEY. A table can have only a single primary key and contain a single column or combination of columns called a composite key. The foreign key in the table is actually the to the primary key of another table. Please find below the list of differences between two Keys which look similar in nature :
UDF represents user-defined represents which are designed to accept the parameter and does processing on parameter before returning a result set of processing or actions. The result could be either a scalar value or a complete result set. The UDF’s are widely used in scripts, stored procedures and within other’s UDF as well.
There are several benefits of using UDF :
There are two types of UDF based on the output value of UDF’s
Let me list down some basic difference then we will try to understand with examples as well :
Please consider below example where even after deleting records when transaction was rolled back, it reverted the changes :
The WHERE clause is the most widely used command in SQL and used for filtering records on the result set. The HAVING clause does the same thing but on the grouped result set. The WHERE clause will be executed first before having a clause trigger. Let me try to explain the differences with examples. Any SQL statement follows below syntax:
The order of execution of SQL statements follows from top to bottom. It implies that records are filtered first on the WHERE clause and once the result set is grouped, HAVING clause comes into the picture.
If you find that the product level is not consistent across all the nodes, you will need to fool the 2005 patch installer into only patching the nodes that need updating. To do so, you will have to perform the following steps:
Why do you need to do this? Well when the patch installer determines that not all nodes in the cluster are at the same patch level, a passive node operation will fail and will prevent you from moving forward with any further patching.
Event Group: Performance
Event: ShowPlan_ALL (BinaryData column must be selected)
Event Group: T-SQL
Event Group: Stored Procedures
Event Group: Locks
Event: Lock: Deadlock Graph
Event: Lock: Lock Deadlock Chain (Series of events that leads to a deadlock)
Event Group: Sessions
Event: Existing Connection
Event Group: Security Audit
Event: Audit Login
Event: Audit Log Out
Below are the most common columns that help us in understanding the trace file to troubleshoot the problems.
All these columns need not be available for all of the events but depend on the event select we have to choose the appropriate columns.
Snapshot Agent: Copy Schema+Data to snapshot folder on distributor. Used in all types of replication.
Log reader Agent: Sends transactions from Publisher to Distributor. Used in transactional replication
Distribution Agent: Applies Snapshots / Transactions to all subscribers’ runs at a distributor in PUSH and Runs at Subscriber in PULL. Used in transactional and transactional with updatable subscriptions.
Queue reader Agent: Runs at distributor send back transactions from subscriber to publisher. Used in Transactional With updatable subscriptions.
Merge Agent: Applies initial snapshot to subscribers, from the next time synchronize by resolving the conflicts.
Replication does not continue after a log shipping failover. If a failover occurs, replication agents do not connect to the secondary, so transactions are not replicated to Subscribers. If a fallback to the primary occurs, replication resumes. All transactions that log shipping copies from the secondary back to the primary are replicated to Subscribers.
For transactional replication, the behaviour of log shipping depends on the sync with a backup option. This option can be set on the publication database and distribution database; in log shipping for the Publisher, only the setting on the publication database is relevant.
Setting this option on the publication database ensures that transactions are not delivered to the distribution database until they are backed up at the publication database. The last publication database backup can then be restored at the secondary server without any possibility of the distribution database having transactions that the restored publication database does not have. This option guarantees that if the Publisher fails over to a secondary server, consistency is maintained between the Publisher, Distributor, and Subscribers. Latency and throughput are affected because transactions cannot be delivered to the distribution database until they have been backed up at the Publisher.
Before choosing the RAID (Redundant Array of Independent Disks) we should have a look into the usage of SQL Server files.
As a basic thumb rule “Data Files” need random access, “Log files” need sequential access and “TempDB” must be on the fastest drive and must be separated from data and log files.
We have to consider the below factors while choosing the RAID level:
As an Admin, we have to consider all of these parameters in choosing the proper RAID level. Obviously, the choice is always between RAID-5 and RAID-10
There are three methods.
Publisher.SP_ReplTran: Checks the pending transactions at p
Distributor.MSReplCommands and MSReplTransactions: Gives the transactions and commands details. Actual T_SQL data is in binary format. From the entry time, we can estimate the latency.
Distributor.SP_BrowseReplCmds: It shows the eaxct_seqno along with the corresponding T-SQL command
sp_replmonitorsubscriptionpendingcmds: It shows the total number of pending commands to be applied at subscriber along with the estimated time.
Available from Replication Monitor or via TSQL statements, Tracer Tokens are special timestamp transactions written to the Publisher’s Transaction Log and picked up by the Log Reader. They are then read by the Distribution Agent and written to the Subscriber. Timestamps for each step are recorded in tracking tables in the Distribution Database and can be displayed in Replication Monitor or via TSQL statements.
When Log Reader picks up Token it records time in MStracer_tokens table in the Distribution database. The Distribution Agent then picks up the Token and records Subscriber(s) write time in the MStracer_history tables also in the Distribution database.
Below is the T-SQL code to use Tracer tokens to troubleshoot the latency issues.
–A SQL Agent JOB to insert a new Tracer Token in the publication database.
EXEC sys.sp_posttracertoken @publication = <PublicationName>
–Token Tracking Tables
SELECT Top 20 * FROM MStracer_tokens Order by tracer_id desc
SELECT Top 20 * FROM MStracer_history Order by parent_tracer_id desc
The physical database architecture is a description of the way the database and files are organized in a SQL server.
Pages and extents- A page is the size of 8KB and set of 8 Pages are called extent. This is the fundamental unit where data is stored.
Physical Database Files and Filegroups- Database files visible at file system to store data and logs.
Table and Index Architecture- Database objects inside the database to store & access data.
Database- Database is a set of data & Log file which resides over the filesystem and managed by the operating system.
SQL Instance- SQL instance is a logical entity controlled databases. One SQL instance can have multiple databases. Security and accessibility is also part of the SQL instance.
Write-ahead transaction log controls & defined recording of data modifications to disk. To maintain ACID (Atomicity, Consistency, Isolation, and Durability) property of DBMS, SQL Server uses a write-ahead log (WAL), which guarantees that no data modifications are written to disk before the associated log record is written to disk.
Write-ahead log work as below:-
To understand Row versioning, you should first understand RCSI (Read Committed Snapshot Isolation). RCSI does not hold a lock on the table during the transaction so that the table can be modified in other sessions, eliminate the use of shared locks on read operations.
RCSI isolation maintains versioning in Tempdb for old data called Row Versioning. Row versioning increases overall system performance by reducing the resources used to manage locks.
When any transaction updates the row in the table, New row version was generated. With each upcoming update, If DB is already having the previous version of this row, the previous version of the row is stored in the version store and the new row version contains a pointer to the old version of the row in the version store.
SQL Server keeps running clean up task to remove old versions which are not in use. Until the transaction is open, all versions of rows that have been modified by that transaction must be kept in the version store. Long-running open transactions and multiple old row versions can cause a huge tempDB issue.
Sometimes we have a situation when we cannot perform such activities using SSIS due to multiple types of databases, Different domains, Older version, etc.
We have several commands available to import using T-SQL.
We need to run the SQL Server patch installation minimally 4 times to patch both SQL instance on all 3 cluster nodes.
SQL 2005 - 1 installation, SQL 2005 support remote installation. SQL 2005 patch installation will install the patch on all cluster nodes in one go but only for cluster objects like DB engine or agent.
SQL 2008 R2 – 3 installation, SQL 2008 R2 does not support remote installation. We need to patch all 3 nodes separately.
Additionally, We may need to run SQL 2005 setup on other 2 nodes to patch non-cluster objects like SSMS but that’s the additional part.
Full backups are set of the complete database with all data. AT the time of any crash or data recovery it’s the starting point to select which full to sue to plan recovery path.
To make things clear and doubt free, SQL Server includes all data or transactions data into full backup till the END TIME of backup.
If your backup took 2 hours that backup will also contain data changes and transaction happen in this 2 hours.
A full backup includes all data and transaction at the completion of backup time.
The full backup cover complete transaction with all the changes that were made after the backup start checkpoint to apply those changes during the database recovery process.
DCM (Differential Changed map): DCM is used by SQL server for differential backup. DCM pages keep track of all extents which has changed since the last full database backup.
During SQL server differential backup, database engine reads the DCM page and takes a backup of only those pages which has been changed since the last full backup. If the value is 1 means extent has been modified and 0 means not modified. After each full backup, all these extents are marked to 0.
BCM (Bulk Changed map): BCM page is used in bulk-logged recovery model to track extends changed due to bulk-logged or minimally logged operations.
During log backup, the database engine reads BCM page and includes all the extents which have been modified by the bulk-logged process. If the Value 1 means modified extent and 0 means not modified. After each log backup, all these extents are marked to 0
While trying to take a differential backup of MASTER database, I am getting below error. Differential backup is supported by all recovery model then why it’s failing for MASTER database. Can you explain what can be the reason?
“You can only perform a full backup of the master database. Use BACKUP DATABASE to back up the entire master database.”
To restore the differential backup of any database, DB needs to be in restoring mode which means DB will not accessible.
The MASTER database is a startup database for any SQL server instance. SQL instance will be in an offline state if the MASTER database is not accessible.
If we combined both statements, We can see that differential backup of the MASTER database is unnecessary as we can not restore. That’s why SQL server will not allow you to do take a differential backup of MASTER DB.
SQL Server provides the feature of managing jobs from Master \ central server on target servers called multi-server administration. Jobs and steps information is stored on Master Server. When the jobs complete on the target servers notification is sent to the master server so this server has the updated information. This is an enterprise level solution where a consistent set of jobs need to run on numerous SQL Servers.
As the name implies, SQL Server Agent Proxy is an account that grant privilege to the user to execute a particular process or an action when a user does not have rights. The SQL Server Agent Proxies include multiple sub-systems:
All these sub-systems are available under Proxy in SSMS and you can create as per your requirement.
To secure your SQL Server instance, it’s advisable to hide your SQL instance. SQL Server instance can be marked as hidden from the SQL Server Configuration Manager.
SQL Server Configuration Manager > Select the instance of SQL Server, > Right click and select Properties > After selecting properties you will just set Hide Instance to "Yes" and click OK or Apply.
You need to restart the instance of SQL Server.
An SQL injection is a web hacking techniques done by unauthorized personnel or processes that might destroy your database.
The major challenge is SQL injection can cause a system crash, data stolen, data corruption, etc.
Proper SQL instance, OS & Farwell security with the well-written application can help to reduce the risk of SQL injection.
Powershell is windows scripting and powerful enough to manage things from the core in a very efficient manner.
Powershell help in deep automation and quick action on DBA activities. The new face of DBA automation.
We can perform multiple actions from a DBA perspective using Powershell, like:-
Update Statistics has performed a recalculation of query optimization statistics for a table or indexed view. Although, with Auto Stats Update option, Query optimization statistics are automatically recomputed, in some cases, a query may benefit from updating those statistics more frequently. UPDATE STATISTICS uses tempdb for its processing.
Please note that update statistics causes queries to be recompiled. This may lead to performance issues for initial execution.
You can perform Update Statistics by 2 methods:
UPDATE STATISTICS - This need Alter right on the table and has more controlled options of performing an update only for one table or specific stats. This can’t be used for the complete database in one go.
SP_UPDATESTATS - This need sysadmin rights on SQL instance. This can help in performing update stats for all the stats of all table in the database.
If a column in the table is having >50% NULL values then index selection if very selective.
Index schema is based on B-Tree structure and if the column is having more than 50% NULL values then all data will reside on one side of the tree result ZERO benefits of the index on query execution.
The SQL server is having a special index called filtered index which can be used here. You can create an index on column only on NON NULL values. NULL data will not be included in the index.
By Default, SQL Server works in Simple Parameterization where SQL Server internally will add parameters to an SQL statement executed without parameters so that it can try to reuse a cached execution plan. But Simple parametrization has lots of limitation but did not work in the following cases:-
To force SQL Server to have Auto Parameterization for every SQL execution, You need to enable Force Parameterization at the database level.
Force Parameterization settings need proper testing as it may also create an adverse impact on performance.
In the existing replication setup, Sometimes we need to add new articles. When you perform such operations in existing setup, it will mark your snapshot invalid which results in snapshot re-initialize for all articles.
This sounds normal but considers the situation where a number of articles are very high or the size of the remaining article is huge, reinitialization can bring down your system.
SQL server provides you option of generating a snapshot of only newly added article. This can be achieved by T-SQL like:-
Log shipping may fail due to multiple issues can cause data long, Here are some possible reasons:
TUF means Transaction Undo file used when log shipping is configured in standby mode on secondary.
TUF contains details of transaction not applied on secondary in last Log restore. Basically, these transactions were not committed to the Primary database when the Transaction log backup was in progress. On Next T-Log restore on secondary, Log shipping refer TUF file or state of active transactions.
If TUF is missing, You can’t recover your log shipping.
We have 3 Operating Modes in DB Mirroring.
We need to perform force failover to bring mirror online when Principle is down. We can perform this using below command:
ALTER DATABASE <DB Name> SET PARTNER FORCE_SERVICE_ALLOW_DATA_LOSS
If you try to do failover like the normal situation when the principle is online [ALTER DATABASE <DB Name> SET PARTNER FAILOVER] then you will receive below error.
ALTER DATABASE <DB Name> SET PARTNER FAILOVER
Msg 1404, Level 16, State 10, Line 1
Above command failed because the database mirror is busy. Reissue the command later.d later.
AlwaysOn provided you with the feature of Readable replica. This is a long-awaited feature where you can utilize your passive node.
Unlike cluster where your passive node consume all your resources but you cannot utilize them before the primary node goes.
The Remote Procedure Call (RPC) is a system service for interprocess communication (IPC) between different processes. This can be in over on the same computer, on the LAN, or in a remote location, and it can be accessed over a WAN connection or over a VPN connection.
RPC working on a dynamic range of ports and connect using any available port from the available range.
There is a long list of services that depends on RPC like Telnet, DHCP, DNS, COM+, DTC, WINS, etc.
You may face multiple errors when RPC is not working like
An execution plan, basically, is the consequence of the query optimizer endeavor to ascertain the most proficient approach to implement the request represented by the SQL query you submitted. They are, in this way, the DBA's essential methods for investigating an ineffectively performing query. Instead of conjecture at why a given inquiry is performing a large number of outputs, putting your I/O through the rooftop, you can utilize the execution intend to distinguish the precise bit of SQL code that is causing the issue. For instance, it might filter a whole table-worth of information when, with the best possible list, it could basically backpack out just the row you need. This and more are shown in the execution plan. Despite the fact that SQL Server more often than not creates a decent arrangement, now and again it's not sufficiently brilliant to approve its arrangements and fix the poor ones. You can get an expected execution plan and a real graphical execution plan in SQL Server. Produce these plans by utilizing the command ctrl M or ctrl L or by utilizing the symbols, which are put to one side of the execute symbol on the standard device bar of SQL Server Management Studio (SSMS).
There are two sorts of execution plans:
Estimated execution plan: Estimated plans give an estimation of the work that the SQL server is required to perform to get the information.
Actual execution plan: Actual execution plans are produced after the Transact-SQL questions or batches are executed. Along these lines, a real execution plan contains run time data, for example, the genuine resource usage metrics and any run time alerts. You ought to always check the actual execution plan while investigating.
When you launch SQL Server Management Studio (SSMS), you see the choice to connect with a DB instance. Whenever wanted, you can browse instances running on your system. Simply click on the drop down and at the base, there is a ‘Browse for more…’ alternative:
This enables you to look for local or network instances. I have four instances running on my workstation you see them showed in this list:
We should expect one of these instances is top secret and we don't need clients to see the instance name. That is conceivable through SQL Server Configuration Manager (SSCM). Open up SSCM and explore to your SQL Server instances:
select the instance of SQL Server, right click and select Properties. Subsequent to choosing properties you will simply set Hide Instance to "Yes" and click OK or Apply. After the change is made, you have to restart the instance of SQL Server to not uncover the name of the instance.
The answer to this question depends on the View type. There are two types of Views that exist. One is Simple View created from a single table and which does not have any derived columns based on in-built grouping functions like average, sum, etc and the other one is a complex View. We can delete rows from views if it is a simpler one but not from Complex one. Let me try to make you understand by going into VIEW definition and syntax.
In SQL, a view is a virtual table dependent on the outcome set of a SQL query. A view contains rows much the same as a real physical table. The fields in a view are fields from at least one real table in the database. You can include SQL functions, WHERE, and JOIN clauses to a view and present the information as though the information were originating from one single table.
There are two types of views that exist. One is the normal one (derived from a single table) and the other one is a complex one (derived from multiple tables). We can delete/update data from view only when it is a normal one and does not have any derived columns based on AVG, COUNT, SUM, MIN, MAX, etc. We can not delete rows from a complex view.
These are two approaches for taking database backup. One approach supports the offline mode while the other supports online mode. The cold back up is the one which deals in offline mode while copying data and log files. This is not standard practice to bring your database offline and take a back up. But there are situations where we need to bring our databases offline before taking a back up. There is a very simple command to do the same (BACKUP DATABASE). We want to ensure our data is in a consistent state before back up as there might be a scenario where any running query has inserted an entry in the log but yet to make an entry in data files. Please find below scenarios where we go with a cold back up approach:
The hot backup is just a reversal of the cold one. In a real-world scenario, we do not want to suffer business losses by bringing our database offline and at the same time, we want to keep a copy of the database to recover from any disaster situation.
There are lots of things that need to be considered before going into designing a logical and physical data model. We must consider first what type of data we are going to store, how the data is being accessed, understanding system playing the role of upstream, understanding downstream system and finally planning by understanding volume of data that is going to come on a daily If we are going to upgrade any existing platform then understanding existing pain area also help developers to design the database to meet future needs along with remediating those bottlenecks.Understanding your data is important but unless we have clarity of different components of data, we would not be able to make a better plan. Also, we need to revisit our design at several stages of a project as we work in an evolving world where requirement changes overnight. So Agility and flexibility of database design are very important. We should be able to meet any future requirements as well with a flexible approach. We should always revisit data relationships, volume and indexes to ensure we stick to the flexible requirement. Also, we should always frequently profile our database server using tools like SQL server profiler to identify any problem areas on a regular basis.
The extended stored procedure is programs written in c/c++ and similar to a stored procedure like they accept parameter but the result set is returned through SQL server’s open data services API. It runs from SQL server process memory only and stored in a master database. It does not run from the current database context and to trigger extended stored procedure we need to pass the fully qualified name like master.dbo.xp_*. The best is wrapping extended stored proc under system procedure so you do not need to call extended one with the fully qualified name.
The cleaning of parameters to be passed to extended stored procedure xp_varbintohexstr is taken care of by
sp_hexstring. sp_hextsring can be executed from any database context without being referenced by a qualifier. The above is an example of wrapping extended procedures inside the system procedure. Once you call the extended stored procedure, the request transfers in the tabular data stream or in SOAP format from calling application to SQL server.
The performance of any application largely depends upon the backend process which extracts data. If it is faster then the application will be also responsive. The designing of the stored procedure is very critical and need to be careful to remove any We can consider the following points while designing stored procedure :
SQL Server cursors are immaculate when we need to work one record at any given moment, as opposed to taking every one of the information from a table as a solitary bulk. Be that as it may, they ought to be utilized with consideration as they can influence execution, particularly when the volume of information increments. From an amateur's perspective, I truly do feel that cursors ought to be maintained a strategic distance from each time in such a case that they are badly composed, or manage an excess of data, they truly will affect system performance. There will be times when it is absurd to expect to evade cursors, and I doubt if numerous system exists without them. In the event that you do discover you have to utilize them, attempt to lessen the number of records to process by utilizing a temp table first, and afterward assembling the cursor from this. The lower the number of records to process, the quicker the cursor will wrap up. Please find below syntax for CURSORS widely used :
The process flow of CURSOR look like below diagram:
Locks allow seamless functioning of the SQL server even in concurrent user sessions. As we as a whole know, different clients need to get to databases simultaneously. So locks come for rescue to keep information from being undermined or negated when numerous clients endeavor to do data manipulation tasks DML operations, for example, read, compose and update on database. "Lock is characterized as a component to guarantee information integration, consistency while enabling simultaneous access to information. It is utilized to execute simultaneous control when various clients get to Database to control its information in the meantime".
Locks can be applied to various database components. Please find below areas where a lock can be applied :
The locks which get applied on various resources of SQL server can be classified into below models :
Locking hierarchy of database objects can be understood by the below diagram and lock is always obtained from top to bottom:
Below is the lock compatibility matrix between different modes of lock available:
This is the strategy used by SQL server to avoid locking a large number of database resources. The locking on resource use lot of memory spaces. Let's try to understand this by a use case where we have to apply locks on 30,000 records where each record size is 500 bytes for triggering DELETE operation. If we consider the memory space requirement for the above use case then we would be requiring one shared lock on database, one intent lock on a table, exclusive locks on the pages would be in the range of 1875 and around 30000 exclusive locks on rows. If we consider 96 bytes size for each lock then total memory space requirement would be 3MB for single delete operation.
To avoid such a SQL server leverage lock escalation strategy. This will prevent the need of large memory space requirement by escalating those locks to single lock instead of many locks. So in cases where we need thousands of locks on many resources, lock escalation will ensure single lock which will fulfill our objective and at the same time taking care of memory space issue. The exclusive lock on the table will ensure that we do not need page level lock to ensure data integrity. Instead of having so many locks required on many columns and pages lock, SQL Server will escalate to the exclusive lock on a table.
An optimistic lock is the widely used strategy utilized in n tier applications were while reading data from the database we check versioning of data before writing it back to the database. The versioning is done on a date, timestamps or checksum/hashes. This approach is used to prevent dirty read scenario where we take a decision based on data which is not yet committed in the database. The optimistic approach will ensure that in case dirty read is identified based on versioning we will start from fresh. This strategy is popular where system deals with a high volume of data and even in the case of n tier applications where we are not always connected to a database with single connections. It is a pool of connection from which we connect to the database with any connection which is free and available at that time. We can not apply lock in such cases. This strategy is applied in most of the banking operations.
Pessimistic is just opposite of Optimistic lock as it takes an exclusive lock on resources till the time we finish with our operations. It keeps data integrity high but performance will be always slower in this case. We need to connect to a database with an active connection (which is the case in two-tier application).
The performance of the SQL query gets improved by NOLOCK as it does not lockup rows so it will not affect reading /processing data from the table. The NOLOCK can be used while dealing with large tables having a million records and with a table where data rarely gets changed. If we have too many locks in SQL then row level locking can be extended to block and even on table level which might stall processing of query till the time operation which acquired lock does not get completed. We can extract committed and uncommitted data using NOLOCK. This situation may result in dirty read if the system has not been designed to handle such a scenario.
Generally to avoid dirty read we use the TIMESTAMP column which can be referenced while executing DELETE /UPDATE commands to know if data has been changed or not. This helps us in preventing data changes based on dirty read problem. Comparing each column values is very expensive operation so this is the reason industry standard suggest TIMESTAMP column
Essentially we have to identify the bottleneck which is filling the log file.
As a quick resolution check all possible solutions as below:
If in case we can’t resolve just by providing a simple solution we have to shrink the transaction log file. Below are two methods.
To shrink the transaction log file:
To truncate the transaction log file:
In any case, we are not able to provide the solution against the increasing logfile the final solution is to disable the replication, truncate the log and reinitialize the subscribers.
Yes! We can do that. Follow the below steps to publish a new article to the existing publication.
There are two parameters that we need to change to “False”. 1. Immediate Sync and 2. Allow_Ananymous.
Both the fields were set to ON by default. If the Immediate_sync is enabled every time you add a new article it will cause the entire snapshot to be applied and not the one for the particular article alone.
Distributed transactions are transactions that worked across the databases, instances in the given session. Snapshot isolation level does not support distributed transactions.
We can explicitly start a distributed transaction using “BEGIN DISTRIBUTED TRANSACTION <TranName>”
For example, if BEGIN DISTRIBUTED TRANSACTION is issued on ServerA, the session calls a stored procedure on ServerB and another stored procedure on ServerC. The stored procedure on ServerC executes a distributed query against ServerD, and then all four computers are involved in the distributed transaction. The instance of the Database Engine on ServerA is the originating controlling instance for the transaction.
When a distributed query is executed in a local transaction, the transaction is automatically promoted to a distributed transaction if the target OLE DB data source supports ITransactionLocal. If the target OLE DB data source does not support ITransactionLocal, only read-only operations are allowed in the distributed query.
In order to work with these transactions, make sure below settings are done.
We can check the current settings and thread allocation using the below queries.
select max_workers_count from sys.dm_os_sys_info
select count(*) from sys.dm_os_threads
The default value is 255.
Increasing the number of worker threads may actually decrease the performance because too many threads cause context switching which could take so much of the resources that the OS starts to degrade in overall performance.
When you find that the last full backup is corrupted or otherwise unrestorable, making all differentials after that point useless. You then need to go back a further week to the previous full backup (taken 13 days ago) and restore that, plus the differential from 8 days ago, and the subsequent 8 days of transaction logs (assuming none of those ended up corrupted!).
If you’re taking daily full backups, a corrupted full backup only introduces an additional 24 hours of logs to restore.
Alternatively, a log shipped copy of the database could save your bacon (you have a warm standby, and you know the log backups are definitely good).
Are you the kind of DBA who rebuilds all indexes nightly? Your differential backups can easily be nearly as large as your full backup. That means you’re taking up nearly twice the space just to store the backups, and even worse, you’re talking about twice the time to restore the database.
To avoid these issues with diff backups, ideally, schedule the index maintenance to happen right before the full backup.
.TUF file is the Transaction Undo File, which is created when performing log shipping to a server in Standby mode.
When the database is in Standby mode the database recovery is done when the log is restored, and this mode also creates a file on the destination server. TUF extension which is the transaction undo file.
This file contains information on all the modifications performed at the time backup is taken.
The file plays an important role in the Standby mode… the reason is very obvious while restoring the log backup all uncommitted transactions are recorded to the undo file with only committed transactions written to disk which enables the users to read the database. So when we restore the next transaction log backup; the SQL server will fetch all the uncommitted transactions from undo file and check with the new transaction log backup whether committed or not.
If found to be committed the transactions will be written to disk else it will be stored in undoing file until it gets committed or rolled back.
If .tuf file is got deleted there is no way to repair log shipping except reconfiguring it from scratch.
SQL is a database computer language designed to interact with a database. The interaction part includes updating, deleting and requesting information from database. SQL is ISO and ANSI standard and various database products like Oracle and Microsoft SQL Server support SQL. It is used in every industry to deal with complex databases.
The rising demand of SQL has created lots of opportunities for the SQL developers from many prime companies around the world. According to a recent report released by Statistica, SQL is the most popular database management systems (DBMS) in the world and the market is expected to grow by 70% in the year 2020. So, you have a chance to go ahead make your career in SQL.
SQL Developers are paid highly. According to Indeed, they earn an average of $85,483 per year and can play the roles like Database developer, Data/ETL Developer Analyst (SQL Server), SQL Database developer, SQL Server Developer, Data Visualization Developer, Business Intelligence developer, and so on. Amdocs, IBM, Oracle, Wipro, Infosys, TCS, CTS, are the top companies hire SQL Developers.
These PL/SQL interview questions are very common and specially framed for basic to advanced level individual to clear any type of tricky SQL job interview. These top SQL interview questions and answers will increase your confidence level and let you ace the interview easily.
Preparing with these interview questions for SQL will aid you achieve your dream job and face the toughest of SQL questions in the best possible way. PL/SQL interview questions are framed by the experts and proved to be more effective questions.
So, feel confident during an interview with these SQL interview questions and answers given below. All the best and excel the interview!