![]() Currently the ‘Read’ clinical terminology is used, but SNOMED-CT (Systematized Nomenclature of Medicine–Clinical Terms) will be introduced in the next few years. GPs code diagnoses using a structured clinical terminology. It contains details of consultations, diagnoses, interventions, test results, prescriptions and referrals from general practitioners (GPs) in England, Wales, Scotland and Northern Ireland. The UK General Practice Research Database (GPRD) is a large database of primary care records and is an important source of clinical information for epidemiological and drug safety research. However, overall few programs have been implemented outside the laboratory where they were developed, despite considerable research interest in recent years. National Library of Medicine’s MetaMap program is widely used for data mining and indexing of biomedical text. The MedLEE natural language processing system is used at the Columbia Presbytarian Hospital to encode discharge summaries using ICD-10 codes. This is a difficult task, because clinical text can contain a wide range of complex language structures and terminology, and also context-specific abbreviations and acronyms.Ĭomputer programs have been developed to extract specific categories of information from free text including smoking status, diagnosis of angina or heart failure, family history and quality of life scores. ![]() Manual review of free text records is time-consuming, so there has been interest in developing software algorithms to extract diagnoses and other clinical information from free text. Research to date has predominantly used the coded data, which are readily available for analysis, but the free text may contain important additional information relevant to study outcomes, concomitant diseases, procedures, interventions or test results in observational studies. It may facilitate research using free text in electronic patient records, particularly for extracting the cause of death.Įlectronic health records are an important source of information for medical research, but much of the information is stored as unstructured free text rather than in a structured way. ![]() We have developed an algorithm to extract coded information from free text in GP records with good precision. On the general sample, FMA detected 346 of the 447 positive diagnoses, with precision 91.5% (95% CI 88.3, 94.1) and recall 77.4% (95% CI 73.2, 81.2), which was similar to MetaMap. On the 1000 texts associated with death, FMA coded 683 of the 735 positive diagnoses, with precision (positive predictive value) 98.4% (95% confidence interval (CI) 97.2, 99.2) and recall (sensitivity) 92.9% (95% CI 90.8, 94.7). ResultsĪmong 3310 patients registered in the GPRD who died in 2001, the cause of death was recorded in coded form in 38.1% of patients, and in the free text alone in 19.4%. National Library of Medicine’s MetaMap program and the gold standard of manual review. ![]() We tested it on two random samples of free text from GPRD (1000 texts associated with death in 2001, and 1000 general texts from cases and controls in a coronary artery disease study), comparing the output to the U.S. The program uses lookup tables of synonyms and phrase patterns to identify diagnoses, dates and selected test results. We developed a computer program called the Freetext Matching Algorithm (FMA) to map diagnoses in text to the Read Clinical Terminology. We reviewed the electronic patient records in GPRD of a random sample of 3310 patients who died in 2001, to identify the cause of death. Our aim was to develop an automated method for extracting coded information from free text in electronic patient records. Free text can be difficult to use for research if it requires time-consuming manual review. For example, in the UK General Practice Research Database (GPRD), causes of death and test results are sometimes recorded only in free text. Electronic health records are invaluable for medical research, but much information is stored as free text rather than in a coded form.
0 Comments
Leave a Reply. |