Browsing by Author "Khajuria, Tarun, juhendaja"

Now showing 1 - 4 of 4

Content based analysis of compositionality in Vision Transformers
(Tartu Ülikool, 2023) Dias, Braian Olmiro; Khajuria, Tarun, juhendaja; Tartu Ülikool. Loodus- ja täppisteaduste valdkond; Tartu Ülikool. Arvutiteaduse instituut
Neural Network models have achieved state of the art results in various tasks related to vision and language, there are still questions regarding their logical reasoning capabilities. In particular, its not clear whether these models can reason beyond using analogy. For example, in an image captioning model, the model can either learn to correlate a scene representation to a caption i.e. text space, or the model could learn to bind objects explicitly and the utilise the explicit composition of individual representations. The inability of models to perform the later has been related to their failures to generalise on wider scenarios in various tasks. Transformer based models have achieved high performance in various language and vision tasks. Their success has been accredited to their ability to model long range relations between sequences. But in vision transformers there has been a discussion that the use of patches as tokens and the interaction between them, gives them an ability to flexibly bind and model compositional relations between various objects at different distances. Hence, showing aspects on explicit compositional abilities. In this thesis, we perform experiments on the Transformer (VIT) based vision encoder of an image captioning model. In particular we probe the internal representation of the encoder at various layers to examine if a single token captures the representation of 1) an object 2) related objects in scene 3) composition of two objects in the scene. In our results we find some evidence to hint binding of object properties into a single token as the image is processed by the transformer. Further, this work provides a list of methods to create and setup a dataset to study internal compositionality in Vision Transformers models and suggests future lines of study to expand this analysis.
Graphical User Interface for Constellations Image Generator
(Tartu Ülikool, 2022) Kaurson, Kalmer; Khajuria, Tarun, juhendaja; Tartu Ülikool. Loodus- ja täppisteaduste valdkond; Tartu Ülikool. Arvutiteaduse instituut
This thesis describes the design and development of a graphical user interface The aim of the thesis is to create an interface for an application where the user can insert an image and choose from which region of the image will be used to generate the constellation image. When an area is selected, the program generates a border from the object in the image. With the web application, it is possible to remove redundant lines from the generated images and choose how closely the points are both in the constellation and in the background. The topic is important because the bachelor’s thesis, UI application and source code will be made publicly available so that scientists can generate data for human and machine learning experiments.
Iterative versus amortized inference solutions to the constellation problem
(Tartu Ülikool, 2022) Hasanov, Farid; Aru, Jaan, juhendaja; Khajuria, Tarun, juhendaja; Luik, Taavi, juhendaja; Tartu Ülikool. Loodus- ja täppisteaduste valdkond; Tartu Ülikool. Arvutiteaduse instituut
Making sense of the visual inputs is an essential part of human intelligence. While processing in the human visual cortex has been observed to have recurrent nature, machine vision systems with one feedforward pass from input into prediction have dominated computer vision benchmarks. This discrepancy may be explained through lack of challenging datasets where gradual refinement of solution would be necessary to lead to correct solution. Such a dataset, where local information about the encoded objects has been erased, was recently proposed. The current thesis represents the first attempt to solve this novel dataset. We propose to use generative models DCGAN and VAE with optimization algorithm CMA-ME to refine the solutions as iterative inference, and use generative models Pix2pix and CycleGAN as feedforward or amortized inference. Through solving the problem posed in the novel computer vision dataset, we show the prevalence of iterative refinement of hypotheses over the single-prediction paradigm, encouraging further research in the field of iterative inference.
Recognition as Navigation in Energy-Based Models
(Tartu Ülikool, 2021) Laiho, Henri Harri; Zafra, Raul Vicente, juhendaja; Aru, Jaan, juhendaja; Khajuria, Tarun, juhendaja; Tartu Ülikool. Loodus- ja täppisteaduste valdkond; Tartu Ülikool. Arvutiteaduse instituut
Human vision has an exceptional ability to recognize complex signals from limited and ambiguous observations, which is believed to comprise lower-level processes generating possible explanations for the observations, and higher-level systems selecting the most plausible ones of them. There is a lack of comparable mechanisms in modern artificial intelligence visual recognition solutions that would enable an improved generalization and robustness. This thesis proposes and studies a novel brain-inspired algorithm for face recognition which tackles the problem from a new angle – recognition can be solved as a navigation problem in a space of latent representations. Further, we show that the steps of this navigation correspond to sensible images that the model "imagines" during the process of navigation, comparable to a human imagining possible explanations to the observations which he/she is trying to recognize as an object or a person. In addition to this, we present that with some parameter tuning the algorithm can improve the separability of correct and incorrect navigation trajectories – like the explanations proposed by lower-level processes in the brain – as Fisher's discriminant ratio by up to 0.14 which, according to our guess, corresponds to an increase in accuracy between 5-15%.