OMG-Fuser: Fusion Transformer with Object Mask Guidance for Image Forgery Analysis
A framework for robust image manipulation detection and localization by semantically combining diverse forensic information.

OMG-Fuser detects and localizes image manipulations by employing a deep-learning architecture for semantically combining several clues related to visible and low-level inconsistencies of an image.
The OMG-Fuser framework aims to detect whether and where an image has been forged, by employing a novel transformed-based deep-learning architecture for semantically combining a wide range of clues related to visible and low-level image inconsistencies. To this end, it uses an object-guided attention mechanism for building representations associated with the different objects depicted in the image, as well as a token-fusion transformer for combining such representations from an arbitrary number of network streams, related to different forensic clues. Then, a long-range dependencies transformer captures the relationships among different image objects to understand whether and which image regions have been altered. This framework both increases the performance of forgery detection and localization, as well as the robustness to post-processing operations applied to an image when shared online.