Psühhoosi prodroomi sümptomite eraldamine meditsiinitekstidest treeningandmestike loomiseks

dc.contributor.advisorReisberg, Sulev, juhendaja
dc.contributor.advisorSirts, Kairit, juhendaja
dc.contributor.authorAgu, Kristel
dc.contributor.otherTartu Ülikool. Loodus- ja täppisteaduste valdkondet
dc.contributor.otherTartu Ülikool. Arvutiteaduse instituutet
dc.date.accessioned2024-10-02T09:50:49Z
dc.date.available2024-10-02T09:50:49Z
dc.date.issued2024
dc.description.abstractThe current master thesis aimed to create three annotated training datasets for the extraction of psychosis prodromal symptoms from medical texts using semi-automatic methods. For this purpose, a dataset of medical documents from 10% randomly selected Estonian population in the years 2012-2019 was used. These documents were filtered by the ICD-10 diagnoses evident during psychosis prodrome (2780 texts) and split into sentences (31 009) for simplification of the further workflow. A dataset was created from the sentences, which were filtered using a regular expression and annotated manually by the author, and used to train an initial logistic regression model. To create the features for the logistic regression model, word embeddings were found for each word in a sentence using the Word2Vec model pre-trained on the Estonian Reference Corpus and an average embedding was calculated for the whole sentence. After that, an iterative process was initiated, where more sentences containing the symptom were predicted from the remaining data, annotated by the author, added to the existing dataset and repeated until the model finds no new sentences. Using the logistic regression model for the extraction of psychosis prodromal symptoms simplified the dataset creation process and reduced the amount of work put into searching the sentences manually. As a result of this master thesis, an annotated training dataset with 799 sentences for extracting the psychosis prodrome symptom “odd behaviour”, a dataset with 643 sentences for the symptoms “depersonalization” and/or “derealization” and a dataset with 1176 sentences for the symptoms “paranoid delusions” and/or “suspiciousness” were created.
dc.identifier.urihttps://hdl.handle.net/10062/105017
dc.language.isoet
dc.publisherTartu Ülikoolet
dc.rightsAttribution-NonCommercial-NoDerivs 3.0 Estoniaen
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/3.0/ee/
dc.subjectPsühhoosi prodroom
dc.subjectpsühhiaatriliste sümptomite eraldamine
dc.subjecttreeningandmestiku koostamine
dc.subjectPsychosis prodrome
dc.subjectextraction of psychiatric symptoms
dc.subjecttraining dataset generation
dc.subject.othermagistritöödet
dc.subject.otherinformaatikaet
dc.subject.otherinfotehnoloogiaet
dc.subject.otherinformaticsen
dc.subject.otherinfotechnologyen
dc.titlePsühhoosi prodroomi sümptomite eraldamine meditsiinitekstidest treeningandmestike loomiseks
dc.typeThesisen

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
agu_infotehnoloogiamitteinformaatikutele_2024.pdf
Size:
880.84 KB
Format:
Adobe Portable Document Format