Introduction
In a production environment, it is not always easy to determine why the sql job has failed and what has happened during the job step execution. The checklist provided here will help in arriving at a decision or give clear direction to troubleshoot
Checklist: First thing first, so let us go sequentially:
- Check that the job is enabled.
- If the job runs fine while we start it manually then verify that the job’s schedule is enabled.
- Go through the history of the job to know the last successful run.
- To know more about the error, go through the SQL server error log, SQL server agent error log, windows application log. IF you think that your sql server job entries disappearing mysteriously then check the job history log configuration.
- If not all the jobs are working, then check the SQL agent service is up. If not running check the account under which service is running , it should be a member of Domain user group. Other possibility could be if the password of the Domain account has been changed and not updated here. In worst case, MSDB database may get corrupt or unavailable due to which sql server agent service may not start.
- Some common reasons could be timeout errors or the deadlock, look for the logged events or activity monitor and work accordingly. This happens especially when the job is long running and configured to work under high isolation transactional level and queries scanning the bigger tables. Try to break the functionality, change the order of the query or rewrite them for efficiency.
- If the scope of the job is spread across different servers then check that if there are any issues with security or linked server configuration
- If invoking system commands or external application, sql agent should have appropriate permissions to carry out the task.
- As a last step, check if there is a data issue, to troubleshoot this take the code that you suspect and check in your test server with the latest database restored from production.
Recommendations:
- It is good to follow the change management process in order to be in sync with the every small change made to the system configuration and will help in troubleshooting. Before troubleshooting any failures we can see the latest changes happened in the platform with respect to database and account expiry due date.
- Since MSDB will be storing all the data related to SQL jobs so make sure that the database configurations support the growth of .mdf and .ldf files in the disk.
- Work on the amount of data to retain in history.
- Take the regular backup of Master and MSDB databases, because in case you fall into situation where you are rebuilding the master database with setup, it will rebuild all system databases in that instance.
- Make a daily check on your scheduled jobs.
- Create custom category for the sql jobs specific to your application to segregate them from regular maintenance jobs.
- Make sure that amount of data to rollback on each step has been taken care.
- Periodically collect the performance monitor statistics for agent jobs, which use significant amount of resources, as part of performance and improvement strategies.
- Try that no two job’s schedule overlaps that work on same set of tables.
- Plan to schedule regular maintenance jobs during off peak hours or during weekends or as per business requirements.
- Have your sql server agent send you a notification about success/failure of the job.
- 12. Keep sql server updated with the latest service pack.
Code Snippets:
/**** Below query can be used to directly pull the failed job on a
particular day and other relevant information. It can be modified to
match the exact requirement ****/
SELECT sj.[name],
sh.step_id,
sh.step_name,
sh.run_date,
sh.run_time,
sh.sql_severity,
sh.message,
sh.server
FROM msdb.dbo.sysjobs sj
INNER JOIN msdb.dbo.sysjobhistory sh
ON sh.job_id = sj.job_id
INNER JOIN msdb.dbo.sysjobsteps ss
ON sj.job_id = ss.job_id
AND sh.step_id = ss.step_id
WHERE sh.run_status = 0 /*Failure-0, Success-1*/
AND sh.run_date > 'Provide Date here'
ORDER BY sh.instance_id ASC
Conclusion
By following the checklist and resolution steps i hope we can easily troubleshoot SQL Server Job Failures. Drop down your comments
Related Posts
- SQL Server 2000 Maintenance Jobs Failed with Error 22029
- SQL Server 2005 database DBA Checklist for database administrator
- Troubleshooting SSRS Error “The report server is not responding”
- Database Mail in SQL Server 2005/2008
- Encryption and Decryption in SQL Server 2005
Tags: Jobs Failure Checklist, Jobs Failure Resolution, SQL Server 2005, Troubleshooting Failure




