• 0

Help: Get oldest record(s) per course, per user, and mark them.


Question

I have ~68000 records.

Key field is [iD].

Shows courses each employee has been on with the date they started, ended the course, and the date the qualification expires.

Many employees have the same course in more than once as they have resat the course, but old record cannot be deleted as it's needed for historic records.

 

I have added a new bit column to mark old duplicate courses as "Invalidated"

 

For each employee in the table, I want to find where someone has a course in more than once and set the oldest records as 'Invalidated'.

 



+-------+-------+------------+------------+------------+-------------+-------------+
| ID    | Staff | Course Ref | StartDate  | EndDate    | Expires     | Invalidated |
+-------+-------+------------+------------+------------+-------------+-------------+
| 10000 | 00016 | 2648       | 2006-03-21 | 2006-03-26 | 2010-03-26  |             |
+-------+-------+------------+------------+------------+-------------+-------------+
| 10001 | 00016 | 1176       | 2009-07-29 | 2009-07-29 | 2011-07-29  |             |
+-------+-------+------------+------------+------------+-------------+-------------+
| 10002 | 00016 | 1176       | 2010-05-20 | 2010-05-20 | 2012-05-20  |             |
+-------+-------+------------+------------+------------+-------------+-------------+
| 10003 | 10224 | 1010       | 2008-06-06 | 2008-06-06 | 2009-06-06  |             |
+-------+-------+------------+------------+------------+-------------+-------------+
| 10004 | 10224 | 1010       | 2009-05-26 | 2009-05-26 | 2010-05-26  |             |
+-------+-------+------------+------------+------------+-------------+-------------+
| 10005 | 10224 | 1010       | 2010-02-04 | 2010-02-04 | 2011-02-04  |             |
+-------+-------+------------+------------+------------+-------------+-------------+
|       |       |            |            |            |             |             |


ID 10000 would be left alone as that staff only has course 2648 in once.

ID 10001 would be invalidated as that staff has the course in twice and 10001 is the oldest. 

ID 10003, and 10004 would be invalidated as that staff has the course in three times and those two are older.

 

Could I do this in SQL? How?

Link to comment
Share on other sites

4 answers to this question

Recommended Posts

  • 0

It's possible, but might require a temporary table.

create temporary table InvalidateRegistration (
select ID FROM CourseRegistration
WHERE Invalidated IS NULL AND EndDate != (SELECT max(EndDate) FROM CourseRegistration CourseRegistration2 WHERE CourseRegistration2.`Course Ref`=CourseRegistration.`Course Ref` AND CourseRegistration2.Staff=CourseRegistration.Staff)
);

UPDATE CourseRegistration INNER JOIN InvalidateRegistration ON InvalidateRegistration.ID=CourseRegistration.ID SET Invalidated=1;

If whatever database you're using allows you to update while doing a subselect, then it won't be necessary but at least MySQL won't do it.  The other option is a stored procedure where you select all the non-invalidated ones sorted by Staff ASC, Course Ref ASC,  EndDate DESC and then loop through and keep variables of the last row examined and if you see the same Staff and Course Ref the next time, update that row.

Link to comment
Share on other sites

  • 0

WITH CTE AS (
SELECT ROW_COUNT() OVER (PARTITION BY [staff], [Course Ref] ORDER BY [Expires] DESC) AS [RowOrder], [iD] FROM [TableName]
)
UPDATE T

SET T.[invalidated] = 1

FROM [TableName] T

JOIN CTE c

ON T.[iD] = c.[iD]

WHERE c.[RowOrder] <> 1;
 

 

should work for TSQL/MSSQL assuming i understood the requirements correctly.

 

Replace 'TableName' with the name of your table.

 

This uses a window query to return a result set of the ID and a row number for each unique staff/course ordered by the expiration date descending. This means that the 'cte' (common table expression) window function will return the following italics/underlined values shown with the table you pasted...

 

+-------+-------+------------+------------+------------+-------------+-------------+
| ID    | Staff | Course Ref | StartDate  | EndDate    | Expires     | Invalidated |  RowCount
+-------+-------+------------+------------+------------+-------------+-------------+
| 10000 | 00016 | 2648       | 2006-03-21 | 2006-03-26 | 2010-03-26  |             | 1
+-------+-------+------------+------------+------------+-------------+-------------+
| 10001 | 00016 | 1176       | 2009-07-29 | 2009-07-29 | 2011-07-29  |             | 2
+-------+-------+------------+------------+------------+-------------+-------------+
| 10002 | 00016 | 1176       | 2010-05-20 | 2010-05-20 | 2012-05-20  |             | 1
+-------+-------+------------+------------+------------+-------------+-------------+

 

Therefore, the query I gave you should, in this example, only set Invalidated on the RowCount 2 value since we said <> 1, aka value 10001.

Link to comment
Share on other sites

  • 0

Forgot to mention. It's a disgusting old dirty horrible MS SQL 2000 database. So CTE won't work.

 

However, kjordan2001 you've given me a good basis. I think I've got a working solution. Will be testing it out today.

 

Will confirm if Solved later.

 

Thanks

Link to comment
Share on other sites

This topic is now closed to further replies.