Application of data linkage in Ukrainian Cancer Registry


While developing the technology of the Ukrainian cancer registry, the application of automated linkage procedures and automated cancer registration tools in international cancer registration practice was analyzed. With the help of linkage procedures the following tasks are accomplished:

Automated linking of records for the same patient in different registries.
Automated search and elimination of duplicate records within one registry.
Automated transfer of data from one registry to another (data from hospital cancer registry are transferred to population-based cancer registry).

The methods of probabilistic linkage are applicable, for example, if we have two rather large (over 10 000 records) independent sources of personified computerized information and we have to identify records about the same patients in both these sources. Unfortunately in public health services of Ukraine the computer databases are not wide practised to be possible to use automated linkage for such as revealing the data about patients' deaths or getting other relevant information. But the task of duplicated records search is more critical for us. The reasons are the following:

First, there is no uniform registration number for a patient in Ukraine similar to the medical insurance number in some countries. The passport number is not usually filled in the medical documents; besides, it can change. Therefore the patient is usually identified by surname, year of birth, place of residence, etc.
Third, keeping paper files for many years did not allow detection of duplicate records, which resulted in over-registration and inflated rates of disease.
Third , keeping paper files for many years did not allow to reveal the duplicated records which results in a tendency to hyper-registration and increasing the quota of diseased.

Application of the usual international practice of probabilistic linkage is complicated by the fact that although there are many studies of probabilistic linkage in English (NYSIIS-codings, methods of batching, etc.) similar studies in Russian were not conducted or their results are inaccessible. Just transferring English algorithms to Russian reduces the quality of linkage. We are making efforts to adapt these algorithms, but the final result is still far from satisfactory.

Furthermore, probabilistic linkage presupposes establishment of some probable false links (usually with probability 99.5%), that is quite allowable for a task of linkage of two registries for a scientific research. However, the complete automated use of these algorithms for search of duplicates or automated data transfer could result in 1 in every 200 patients (0.5%) being attributed to another patient's data (given such a probability of false links). It does not answer our purpose.

Therefore, for application in the Ukrainian cancer registry, a new linkage algorithm has been developed. Its essence is in the automated search for suspected duplicates, followed by the interactive review of pairs found (as shown in the picture , by Phases):

  1. Phase 1. The procedure of search for the record pairs, as potential duplicates, is similar to that used in probabilistic linkage. The search is made by surname, name, patronymic and date of birth of a patient; both complete and partial coincidence of these parameters are taken into account. At the given stage, this procedure gives higher probability of revealing duplicates than the previous procedure of probabilistic linkage adapted by us into Russian language. But in the future it is possible to replace our procedure by the probabilistic one or to use these procedures concurrently.
  2. Phase 2. The final decision that a found pair is a true link is made. Let's note that only the searching for suspicious duplicate pairs is automated. The final conclusion about the concordance or discordance of the found pair is decided by the responsible person of the registry. Sometimes this conclusion cannot be made only on the basis of computer files data, and examination of primary paper forms or consultation with the workers of the patient's oncological department is necessary. However, the benefit of this individual consideration is that the conclusion made may be considered as practically error-free.
  3. Phase 3. After a pair of records is recognized as a duplicate record, the information contained in both records is automatically combined. In practice, each of the duplicate records almost always contains some part of pertinent data to be pieced together. For example, if a duplicate resulted from the wrong registration of multiple cancers, it is most likely that separate records have separate diagnoses, and the resulting final record contains them both. The special algorithm has been developed for analysis and transfer of the information from each of relevant fields so that it provides the most reliable result after the records have been integrated.
    The record recognized by the operator as more reliable should be chosen as the source. For each record of diagnosis, treatment etc. in the other record, the search for the appropriate information in the source record is conducted. If such information is missing, the record is transferred wholly. If a similar (by date and other content) record is presented in the source record, only those fields are transferred that are not filled in the source or are filled with fewer details.
    For example, if a record of diagnosis specifies the morphological type "Malignant neoplasm", and the duplicate record with a similar tumor site and date of diagnosis specifies the morphological type "Alveolar adenocarcinoma", we select the resulting "Alveolar adenocarcinoma" irrespective of which record was recognized as the primary source. This is illustrated in the scheme.
  4. Phase 4. After automated data transfer the unwanted record is deleted, and the operator can edit the resulting record (if necessary).
  5. Phase 5. In any case, termination of data revision is followed by a procedure of complete automated logical check of the resulting record.
  6. Phase 6. Only when the computerized record check procedure is complete and possible errors are corrected is the resulting record considered valid for the database.
    Thus, we, on the one hand, automate all labor-consuming and routine work of search and elimination of duplicates, and, on the other hand, the registry personnel exercise control over this process and participate in the solution of occasional non-standard situations and logical contradictions.

This mechanism for duplicates searching has also proven useful for automated data transfer from the hospital cancer registry to the population-based one. The "Abstract from medical in-patient card for patient with malignant neoplasm" is sent to the oncological dispensary in the place of the patient's residence with the aim of registering the patient or updating the information already stored in the oblast cancer registry. Many patients receive treatment in the local oncological dispensary where they are being followed-up. Their data are stored in the unified computerized system of the hospital cancer registry.

Automated data transfer required solution of the following problems:

In the hospital cancer registry a procedure of creating the computer abstracts' file has been developed. It contains the same information as paper copies (Form 027/onco), and is created at the moment when medical in-patient record input into the computerized hospital cancer registry is complete. These electronic abstracts are stored in files in the same format as in the population-based cancer registry.

While being transferred to the population-based cancer registry, the data of electronic abstracts are located in the so-called exchange data buffer from which they are transferred into the database by the population cancer registry personnel . The procedure of preliminary search by the key fields is performed, similar to the one in creating a new record. If an existing matching record in the database of the population registry is identified, then the same procedure as in the automated duplicate-checking procedure is performed. But the checking for duplicates within the hospital registry records is carried out in the exchange data buffer. Once transfer is complete, the record in the buffer is eliminated and the resulting record in the population registry's database is checked .

Application of the automated procedures of data transfer from the hospital registry to the population one reduces the time needed for handling the new record of the patient of this dispensary from several minutes to several seconds (significantly reducing the working hours for running the registry), and reduces the stream of paper documents within the dispensary. It also reduces the probability of introducing new mistakes caused by repeated input from paper documents. All actions on adding the data, both automated and manual, are recorded in file of protocol.

The introduction of the automated data exchange technology is now being conducted in all cancer registries of Ukraine. For this purpose we use an e-mail. Last year the Ukrainian Research Institute of Oncology and Radiology (URIOR) transferred to oblast dispensaries not only current electronic abstracts, but also abstracts for all patients from those oblasts whom it has been treating during the last 10 years. After processing these data it was discovered that a significant number of patients treated in the URIOR were not yet registered in their place of residence (because paper "Abstracts" and "Notifications" have not been sent).
Development and introduction of the data exchange automated technology has become possible due to the development of the unified software package for population-based cancer registries in all oblasts, and due to the wide introduction of the compatible computerized hospital cancer registry information system.
Nowadays we have an opportunity to create the uniform oncological information environment in Ukraine.


In 2002, we have developed new software for solving most of the record linkage problems:

This software is not oriented just on databases of hospital and population-based cancer registries of UCR. Any data source may be processed.

Version 1.03 of Link_It software is available now!

New technology helps us to solve a lot of problems easier and faster. You can find some details in the abstract and poster presentation of Yevgeniy Gorokh (Tampere, 2002). Maybe, some features can be useful for you too. Contact us. We would like to share our technologies!