Blog

Alternative Matting Laplacian (Theory)

My paper on an alternative Matting Laplacian has been recently accepted at ICIP 2016. I have uploaded the preprint to arXiv.org and the reference code is also available on my github repository:

git clone https://github.com/frcs/alternative-matting-laplacian.git

Theory

Consider the picture below (taken from www.alphamatting.com):

original image

Say we want to cut out the trolls from the background. We need to extract an opacity mask of the foreground. This mask is called the $\alpha$ matte. When working in the linear RGB colour space, the observed colour $C_i$ at pixel $i$ is a blend between the foreground colour $F_i$ and the background $B_i$:

We need to solve for $F_i$, $B_i$ and $\alpha_i$. This is unfortunately non-linear and over parameterised. In their remarkable work, Levin et al. propose that the transparency values can be approximated as a linear combination of the colour components:

with $a_i = [a_i^R, a_i^G, a_i^B]$. We can think of $a$ and $b$ as a colour filter that we apply on the picture to obtain an intensity map of $\alpha$. In a way, $a$ is the colour through which $\alpha$ is revealed. Below is as example with $\alpha=2.2 C^R -1.36 C^G - 0.41 C^B -0.04$:

This map is a good approximation of $\alpha$ for the top left of the picture but not for the rest of the picture. Ideally then we would map a model to each pixel of the image. The problem is that we would end up with too many unknown parameters. Levin et al. solves this problem by introducing a spatial constraint that makes the problem tracktable and yields a global closed-form solution. They state that each local matting model should fit a local $3\times3$ image patch $w_i$ that overlaps with its neighbourhood:

This is a quadratic expression in $a$, $b$ and $\alpha$. Levin et al. show that a closed-form solution for $\alpha$ can be found without having to explicitly compute the model parameters $a_i$ and $b_i$:

This leads to a sparse linear system $L \alpha = 0$, which can be solved using iterative solvers. It then is easy to add constraints to the values of $\alpha$. Below is a result based on this method:

Now, let us look at what the implicitly computed model parameters look like. The image below is a map of $a/5 + 0.5$:

Recall that $a$ is in fact the colour through which we can reveal $\alpha$, thus what we see here are the colours (up to some brightness and contrast) of the local colour filter that is used to reveal $\alpha$ . What is striking is that $a$ shows some smoothness and we should try to exploit it. Unfortunately this is hard to do with Levin et al. as the model paremeters are not exposed in the equations.

Our contribution is to take the opposite approach of Levin et al. and explicitly solve for the model parameters instead:

We then show that a closed-form solution can also be found for $a$ and $b$. Interestingly the problem turns out to be an anisotropic diffusion of $a$ and $b$:

This also leads to a sparse linear system $A [\begin{array}{cc} a & b \end{array}]^T = 0$. But since we have an explicit representation of the model parameters, it is now easier to add further smoothness priors to the model parameters. For instance below is the result of increasing the spatial smoothness of $a$:

The resulting transparency map still shows little difference with Levin et al.:

The advantages of this approach are

  1. Computational. Our equations are simpler than in Levin et al..
  2. Modelling. It is easier to set meaningful priors on $a$ and $b$.

Alternative Matting Laplacian

My paper on an alternative Matting Laplacian has been recently accepted at ICIP 2016. I have uploaded the preprint to arXiv.org and the reference code is also available on my github repository:

git clone https://github.com/frcs/alternative-matting-laplacian.git

From Within From without

The first movement of the spatial audio 360 VR piece From Within, From Without by Enda Bates is now on YouTube. Enda has published a very detailed post about the project and immersive spatial audio. The music was performed by Pedro López López, Trinity Orchestra, and Cue Saxophone Quartet in Trinity College Dublin on April 8th, 2016 (see previous my previous post and Enda’s blog).

At the moment the best way to watch it is with Google’s Cardboard VR headset and an Android phone as only YouTube’s Android app supports spatial audio.

This is the first piece in a series of immersive spatial audio experiences that we plan to record in Trinity College Dublin.

At the moment we’ve simply stitched the 12 GoPros from our 360 rig using off the shelf software (VideoStitch 2). There are still some visible artefacts that are mainly due to parallax. We’ll get our own video processing algorithms working at some point and try to improve on this.

FFMPEG commands for uploading 360 videos with spatial audio to YouTube

It has been a few weeks since we’ve recorded Trinity360’s event and we’ve started rendering a 360 video with spatial audio.

Thanks to the audio team of Prof Boland from the Sigmedia research group of Trinity College Dublin, Google has brought spatial audio support to Google Cardboard’s virtual reality system (see Google Developers Blog). So now we can experience spatial audio on YouTube!

I’ve detailed below the ffmpeg commands we used to preview our 360 videos with spatial audio on the Jump Inspector and then for uploading to YouTube.

1. Encoding for the Jump Inspector (Preview)

As full processing of the spatial audio by YouTube takes a bit of time, it was very useful to quickly preview our videos on an Android phone using the Jump Inspector App. The Jump Inspector requires videos to be in a specific format that is detailed here.

1.1. Video encoding for the Jump Inspector

Our stitched 360-mono video is named trinity360-stitched.video.mov. Jump Inspector requires us to target a video stream with the following specs:

  • h264 main profile
  • 40 Mbit/s
  • 3840 by 2160 resolution
  • 30 fps
  • YUV 4:2:0 progressive
ffmpeg -i trinity360-stitched.video.mov              \
       -c:v libx264 -b:v 40m -vf scale=3840:2160     \
       -r 30 -profile main -pix_fmt yuv420p          \
       trinity360.encodedforjump.video.360.mono.mp4

Now, it is important for the Jump Inspector that the file ends with .360.mono.mp4.

1.2. Audio encoding for the Jump Inspector

Our Ambisonics are a 4 channel wav file (44.1kHz, 16bit) in the ACN SN3D Ambisonics format specified by YouTube. To work with Jump Inspector, we converted these for to aac 128k as follows:

ffmpeg -i trinity360-Tetra-B-format-ACN-SN3D-4ch.wav     \
       -channel_layout 4.0 -c:a aac -b:a 128k -strict -2 \
       trinity360-ACN-SN3D-4ch-aac128.mp4

1.3. Combining Audio and Video for the Jump Inspector

ffmpeg -i trinity360.encodedforjump.video.360.mono.mp4   \
       -i trinity360-ACN-SN3D-4ch-aac128.mp4             \
       -channel_layout 4.0 -c:a copy -c:v copy -shortest \
       trinity360.encodedforjump.360.mono.mp4

Then we just transferred our file to the Jump directory of our Nexus 5.

2. Encoding for YouTube

The video specs requirements are less stringent for YouTube. There is no requirement of video resolution or audio compression besides having the Ambisonics as a 4 channels in the ACN SN3D Ambisonics format and setting the metadata as described on the YouTube help.

2.1. ffmpeg Encoding

We kept the audio as uncompressed PCM s16 (pcm_s16le). It is supported in MOV containers, but not in MP4. The command is thus simply:

ffmpeg -i trinity360-stitched.video.mov              \
       -i trinity360-Tetra-B-format-ACN-SN3D-4ch.wav \
       -channel_layout 4.0                           \
       -c:v copy -c:a copy trinity360.youtube.mov

2.2. Setting the Metadata

We’ve downloaded Google’s 360 Video Metadata app 360 Video Metadata app and selected spherical and Spatial Audio:

My helpful screenshot

2.3. Upload to YouTube

Then the video was uploaded to YouTube. Nothing special needs to be done here, you just have to wait for a couple of hours for the spatial audio to be fully processed, so be patient.


Edit: I have changed the post to clearly separate the instructions for YouTube and for the Jump Inspector.

For completeness, this is the ffmpeg version we’re using:

ffmpeg version 2.8.6 Copyright (c) 2000-2016 the FFmpeg developers
  built with Apple LLVM version 7.0.2 (clang-700.1.81)
    configuration: --prefix=/opt/local --enable-swscale --enable-avfilter --enable-avresample --enable-libmp3lame --enable-libvorbis --enable-libopus --enable-libtheora --enable-libschroedinger --enable-libopenjpeg --enable-libmodplug --enable-libvpx --enable-libsoxr --enable-libspeex --enable-libass --enable-libbluray --enable-lzma --enable-gnutls --enable-fontconfig --enable-libfreetype --enable-libfribidi --disable-indev=jack --disable-outdev=xv --mandir=/opt/local/share/man --enable-shared --enable-pthreads --cc=/usr/bin/clang --enable-vda --enable-videotoolbox --arch=x86_64 --enable-yasm --enable-gpl --enable-postproc --enable-libx264 --enable-libxvid

Trinity 360

Trinity College Dublin composer and teaching fellow in the Music and Media Technology Programme, Enda Bates, composed a multi-movement spatial music work, entitled From Within, From Without. The piece was performed in Trinity’s Exam Hall on the 8th of April as part of Trinity’s Creative Challenge Showcase.

The concert comprised of an acoustic, electroacoustic, and electronic spatial music and was filmed using 360˚ cameras and microphones for Virtual Reality (VR) presentation. We are currently working on the video side of the VR capture. On this occasion, we’ve designed a compact home brew 12 GoPro’s stereo 360 Rig (timelapse below).

You can follow Enda’s blog for more information about the project.

ADAPT SFI Centre - RTÉ news

Today was the opening day for the ADAPT SFI centre. I was demonstrating for Sigmedia some of our 3D technology for creative artists. There was a bit of news coverage on RTÉ news.

Fun with stereo3D feedback

A bit of fun with 3D.

Colour Transfer code on GitHub

For reference, I’ve put the code for my colour transfer papers on a GitHub repository:

git clone https://github.com/frcs/colour-transfer.git

iDuet - Steve Woods

We have just finished working on a stereo 3D short movie with Steve Woods. This is a 2D version of a 3D stereoscoptic Dance film, performed by dancers Michelle Boulé and Philip Connaughton and choreographed by John Scott from the Irish Modern Dance Theatre.