Issue 5

         V I R T U A L R E A L I T Y – T E C H N O L O G I E S T H AT   PREFACE
         TA K E Y O U T H E R E !
     4   Enjoy VR                                                       Virtual Reality is one of the most appealing   audio, the realism will be intensified. What
    10   Real people in virtual 3D worlds                               and promising new experiences when it          is more, the viewer can navigate through
    14   Television with an all-round view                              comes to enjoying media content. The           the scene, and interact with objects or
    16   Standardization on light field modalities                      interactive selection of the view provides     content in the VR world.
                                                                        more than audio and picture – especially       With 360 degree the viewer can turn his
                                                                        for all users who have grown up with Star      or her head to the left and to the right, up
         TECHNOLOGIES AND TOOLS FOR PRODUCTION                          Trek and computer games. Wearing VR            and down. New production technologies,
         AND IMMERSIVE MEDIA                                            glasses is no longer a barrier – what counts   e. g. light-field, even extend these three di-
    20   OmniCam-360 degree first live transmission with the Berlin     is the real-time view selection of video and   mensions of freedom to six and allow the
         Philharmonics                                                  sound content in best quality very close to    viewer to move forward and backwards or
    22   MPEG-H: Delivering immersive audio to the world’s first        reality. This immersive experience of real­-   step to one or the other side, too. But for
         terrestrial UHD-TV system                                      life content makes VR a door-opener for        VR, there is still some way to go. Bottle-
    24   xHE-AAC: The codec of choice for streaming and Digital Radio   new media business, in a way that is differ-   necks such as parallax-compensated image
    28   Post-production tools                                          ent from CGI-created content.                  stitching are demanding challenges to
    32   MICO – multimodal media annotation and search                  Advances in image sensor technology            solve. And the questions of efficient busi-
                                                                        provide higher resolution, a wider color       ness models for VR, which include the
                                                                        gamut, or higher sensitivity, which allows     foundation of platforms and video chan-
    36   FRAUNHOFER TECHNOLOGY INSIDE –                                 for better capturing of real scenes. New       nels, have still not been answered satisfac-
         SOLUTIONS AND PRODUCTS                                         display technologies enable better repro-      torily. We will give you an overview of new
                                                                        duction and smaller dimensions. 360 de-        technologies that take you to your VR
                                                                        gree capturing combined with 3D audio          experience.
    38   ABOUT US                                                       takes the user into the scene. Computa-
                                                                        tion of high data volumes is possible even
                                                                        in smartphones. By combining VR content
                                                                        with binaural rendering from object-based      Dr. Siegfried Fößel

2                                                                                                                                                                    3

                                  ENJOY VR

                                  N ew tec hnol o gies f rom Fr aunhof er labor at or ies ensure t hat VR w or lds
                                  c an be enj oy e d w it hout lim it at ion.

                                  We need eyes and ears to perceive the          in a star-shaped arrangement, the Omni-
                                  world around us optimally. This also ap-       Cam-360 makes use of ten cameras that
                                  plies to virtual realities. We are only con-   are arranged vertically. With the aid of
                                  vinced by them when both the picture           mirrors they seem as if they are all looking
                                  quality and sound are right.                   at the scene from exactly the same
                                                                                 point,” says Christian Weissig, Project
                                  Convincing all-round panorama                  Manager at Fraunhofer HHI. “For the first
                                                                                 time, automatic post-production is
                                  Researchers at Fraunhofer HHI have de-         possible without manual correction.”
                                  veloped a technology to give us a realistic
                                  picture impression. At its core is the         Thanks to their OmniCam-360, the re-
                                  OmniCam-360: it records 360 degree             searchers were now able to support
                                  panorama images, or “all-round” images,        live-streaming applications for the first
                                  without parallax. This means that there        time – to transmit a concert by the Berlin
                                  are no kinks in the overall visuals that       Philharmonic. The users could enjoy the
                                  could make a violin bow or a hand              concert either on a smart TV or a com-
                                  “disappear”. With other recordings, this       puter, or even put on VR glasses to be in
                                  happens frequently wherever the images         the “live” audience. Either way, the view-
                                  from two cameras overlap. “Instead of          ers had a much better view than the
                                  placing the various cameras back to back       actual concert attendees, as they were

4                                                                                                                            5

“standing” on the stage right behind the        environment where they become a part           chain of production. A specially optimized
conductor.                                      of what is happening around them. For          algorithm that can be applied to different
                                                example, a loud noise can simply change        types of cameras with either built-in or
The recordings made by the Omni-                the direction of a user’s gaze, allowing for   external microphones enables convincing
Cam-360 have a resolution that is ten           entirely new types of storytelling.            3D sound recordings. Additionally,
times that of HD, or, to be more precise,                                                      post-production tools allow straightfor-
10,000 x 2000 pixels. As televisions and        Creating convincing audio content for vir-     ward mixing and mastering of 3D sound
VR glasses are currently unable to display      tual worlds can be challenging; however,       while the audio codecs HE-AAC and
this resolution, the researchers are work-      the developers at Fraunhofer IIS are work-     MPEG-H, co-developed at Fraunhofer IIS,
ing on adapting the data transmitted to         ing on innovative solutions to deliver this    efficiently transmit surround and 3D
match the resolution of the end device in       to consumers. One answer is Fraunhofer         sound, e.g. in a live-streaming app.
question. Furthermore, in the future they       Cingo, today’s leading solution for play-
would only like to transmit the data that       back of enveloping audio for VR applica-       Thanks to Fraunhofer‘s audio and video
lies within the viewer’s point of view – the    tions via headphones, which supports           technologies for VR, the next virtual con-
“region of interest”, as it is known.           every audio format for surround and 3D         cert will be a feast for the eyes and ears,
                                                sound. For a highly realistic listening        where the users’ experience will be as if
Enveloping audio impressions                    experience, Cingo dynamically adapts the       they were attending a live event in a con-
                                                sound to the user’s head movements in          cert hall.
A convincing VR experience is not only          real-time. Fraunhofer Cingo is already
based on what the eyes can see, as 50           being used in Samsung and LG’s VR head-
percent of the experience depends on the        sets.
authentic audio reproduction. Flaws in
audio instantly ruin the illusion of an arti-   In addition to playback, proper recording,
ficial world. However, if done properly,        production, and transport of audio signals
the 3D sound is authentic and allows            are critical to a successful VR experience.
users to enter the virtual world. The audio     For this reason, Fraunhofer IIS developed
impressions lead users through the virtual      technologies for the complete VR audio

6                                                                                                                                            7

                                  The immersive media experience with
                                  Fraunhofer VR-technologies: as if being part
                                  of the live audience

8                                                                                9


A 3D re c ording s ys te m re p l a ce s th e s y n th eti c av atar: A t the moment,
av a t a r s – v ir t ua l p e o p l e – s tu m b l e q u i te u n natural l y through v i rtual
wor lds. A ne w te c h n o l o g y wi l l m a k e th e m ov ements of v i rtual humans
appe a r sm oot he r a n d m o re n a tu ra l .

The background is deceptively realistic:            wear a special suit containing a large
trees and houses look like real life, even          number of sensors that capture the
the cars are not noticeably different from          movements of the arms, legs and other
real vehicles. If you weren’t wearing VR            body parts, and forwards the data directly
glasses, you might imagine that you were            to the PC. This information is then used
in real life instead of virtual reality. Only       to animate the avatar in question.
the “humans” are still moving somewhat
unnaturally, and the textures – the weave           Fluid, lifelike movements
in the characters’ sweater material, their
five o’clock shadow, or moles on their              Researchers at the Fraunhofer Hein-
skin – also still have quite a way to go to         rich-Hertz-Institute HHI have now devel-
appear natural. In order to copy how hu-            oped a more expedient technology. “We
mans move, the creators of the avatars              record the real person from all spatial di-
usually use motion capture: this tracking           rections using 20 to 30 cameras and then
process takes a real person and places              calculate a robust three-dimensional
markers on their body; the movements of             model,” explains Dr. Oliver Schreer, Group
these markers are then recorded by a                Leader at the Fraunhofer HHI. “We can
camera. Alternatively, the person can               then add this model directly to the virtual

10                                                                                                 11

world. We plan on accelerating capture         tion about how far each body part is from
and 3D modeling to allow us to integrate       the camera in each recording. The soft-
the person into the virtual world in real      ware calculates this depth information
time. It is no longer necessary to animate     for each camera pair up to 50 times a
the person any further.” The result is that    second, which is also the rate at which
the person moves as fluidly and naturally      each camera pair takes pictures. The soft-
in the VR world as in the real world; the      ware then fuses the data from individual
person’s motions are carried over one-to-      camera pairs together. The result is a life-
one instead of having to animate them as       like three-dimensional copy of the person,
previously. The texture also looks real to     together with their movements, which
the viewer. The viewer can observe the         can be rendered for any viewing angle.
avatar from any point of view of his or        Mapping faults and undercuts are now
her choice – it is, for example, possible to   a thing of the past. For users, this means
move in 360 degree around the avatar.          that they can see areas that were previ-
This makes one’s virtual companion             ously hidden when only a single camera
appear to be extremely true to life.           pair was used – hidden by the hand that
                                               the person is holding in front of their
A 3D model makes it possible                   torso, for example. This three-dimensional
                                               model can be integrated directly into the
The real trick is in the software that cre-    VR world – the real person “hatches”
ates the 3D model. But let’s start at the      from the camera set and into the artificial
beginning: the cameras – all of which are      world with all their natural movements.
in pairs – are distributed at optimum posi-
tions around the room. Each pair of cam-
eras captures three-dimensional informa-
tion about the real person. Similar to our
two eyes, these cameras receive informa-

12                                                                                            13

TELEVISION WITH AN ALL-ROUND VIEW                                                                in order to show their viewers additional
                                                                                                 content. We use HbbTV in order to move
                                                                                                 functions that television sets previously
I n t he f ut ure , T V v i e we rs wi l l b e th e i r o wn “c ameramen”. The es tab­           couldn’t perform onto the Internet – into
l i s he d st a nda rd H b b T V c a n b e u s e d fo r 3 6 0 degree TV trans mi s s i on.       a cloud, to be precise. So we make sure
                                                                                                 that the TV set only does what it’s able to.
                                                                                                 This means that if the viewer chooses a
The helicopter blades whir as it soars over      all spatial directions; the sound is also re-   specific viewing angle, the calculation is
the mountain peaks, an eagle glides past         corded spherically. For playback, we start      performed in real time in the cloud. The
the side window. How nice it would be            by deciding on a specific viewing direction     television doesn’t even “notice” what it’s
to be able to turn your head and watch           – the viewer’s attention should be princi-      doing.
where the eagle goes. But, alas, you’re          pally focused on the direction in which
watching the scene as part of a TV docu-         the action is happening. From here, the         What does the user need to have in
mentary from the comfort of your own             viewer can use his or her remote control        order to be able to benefit from this
couch and you have to settle for the per-        to move their point of view from side to        technology in the future?
spective that the cameraman chose. But,          side, up, back, or down. It’s no longer just    Basically all he or she needs is a modern
with 360 degree TV, it’s different: here,        one film; it’s several films in one.            television that is HbbTV ready. We delib-
the end user can play director and decide                                                        erately left out additional devices such as
which direction he or she would like to          This requires large quantities of data          VR glasses in order to be able to reach as
look. Dr. Stephan Steglich, Division Direc-      to be transmitted. What technological           wide a public as possible. A test version
tor at the Fraunhofer Institute for Open         requirements need to be fulfilled to            of our 360 degree television is already
Communication Systems FOKUS, explains            bring 360 degree TV to viewers?                 being broadcast. Our technology is even
how it works.                                    One possible solution we have identified        of interest for mobile end devices such as
                                                 is the recognized standard HbbTV, which         smartphones: the more complexity I can
Dr. Steglich, how should we imagine              is short for hybrid broadcast broadband         move from the device into the cloud, the
360 degree television?                           TV. TV stations usually use this standard in    longer the device’s battery will last. The
A special camera setup records the film in       conjunction with an Internet connection         data volume is also reduced.

14                                                                                                                                              15

STANDARDIZATION ON LIGHT FIELD                                                                 ated imagery (CGI). This can be the view-      in VR glasses. By using different arrange-
                                                                                               point, the viewing direction, or the focal     ments of cameras and special sensors and
                                                                                               point.                                         optics, other applications are possible,
                                                                                                                                              such as VR with translational movement
One of the most important achievements         electronic sensor. Several attempts were        The theory behind this has been known          or 3D modeling or generation of point
of mankind is the transfer of knowledge        made to increase the dimensions of the          since the end of the 19th or the begin-        clouds with the option of merging real
to future generations. This is done by sev-    reproduction of the scene, e. g. for stereo-    ning of the 20th century. It was devel-        scenes with CGI scenes. Today, the indus-
eral methods: verbally, by drawings, in        scopic 3D and also to get some depth in-        oped by Faraday, Lippmann, and Gershun         try is investigating which technologies al-
writing and printing, or lately by photo-      formation. But it is difficult to capture and   and is called light field theory or plenop-    low which applications. This relates to Vir-
graphs and video. All these methods are        reproduce the scene in the exact same           tic. This states that, at every point in       tual Reality, Augmented Reality (AR),
exchangeable in some way; however it is        way as an individual with only two im-          space, the light rays form a function          point clouds, light field cameras, visual
a great deal more effort to describe a sit-    ages.                                           based on position, direction, and inten-       effects VfX, depth enriched images, 3D
uation verbally than by a photograph. The                                                      sity. The traditional cameras today cap-       models; in general, every process of mix-
phrase “A picture is worth a thousand          A revolutionary approach to image               ture the light rays in a specific position,    ing real scenes with computer-generated
words” became popular in the early 20th        production                                      but bundle light rays from one or more         scenes.
century and well describes the phenome-                                                        directions by the lens and record this on a
non that people can incorporate knowl-         With the advent of low-cost imagers, ad-        photosensitive sensor. The aperture of the     Standardization of light/sound field
edge by visual sensors much faster than        ditional sensors for localization or depth      lens defines the number of bundled rays.       technology
by hearing and interpreting words. In the      acquisition, and new electronic display
last hundred years, this fact was imple-       devices, true three-dimensional acquisi-        It is clear that we cannot capture the         Within the standardization committees
mented by capturing images with cam-           tion and reproduction for a scene are in        complete light-field as we cannot position     ISO/IEC JTC1 SC29 WG1 (called JPEG)
eras, in the form of a still image as photo-   sight. The approach is not only to capture      a camera in every point in space without       and WG11 (called MPEG), several groups
graph, or in combination with sound as         and reproduce the two images seen by            their influencing each other, but it is pos-   are working on these topics, discussing
motion picture or video. Technology in         human eyes, but to capture the complete         sible to sample the light-field in different   the possible modalities and correct repre-
the 20th century allowed fast imaging of       scene in multiple dimensions, to model          ways. The simplest way is to use a circular    sentation forms. Attached is a list of the
the scene on a flat planar sensor, either      the scene, and to render the images             camera array for capturing 360 degrees,        current working groups.
on a photosensitive emulsion or on an          based on demands as in computer gener-          allowing Virtual Reality (VR) reproductions

16                                                                                                                                                                                        17

                                                                                                                                              data points during capturing, point
–		Joint Ad Hoc Group for Digital rep-        		 images and depth maps allowing            		 formats, 3D projection methods,                 clouds are sparsely distributed in most
     resentations of Light/Sound Field           depth enriched images.                       metadata, and signaling is necessary.           of the cases. The benefit of point
     for immersive media applications                                                                                                         clouds is the 3D representation, which
		 This group was a joint ad hoc group        –		MPEG Wave Field Audio                     –		MPEG Light-Field Coding                         allows new renderings from different
     between JPEG and MPEG to deter-          		 As well as images, audio experiences      		 Light field enables many different spe-         view-points.
     mine the state of the art, potential        are also position dependent and used         cial effects, like the “Matrix Bullet ef-
     modalities, and applications. The sum-      for object localization by viewers. A        fect”, refocusing, or depth-based           In the next few years, light field technol-
     mary technical report can be found as       group in MPEG investigates the use of        editing. For next-generation cinematic      ogy will become a highly interesting tech-
     JPEG Pleno Joint AhG Report on the          wave field/sound field technology to         movies in particular, 3D processing al-     nology for improving the viewing experi-
     JPEG website.                               support less restrictive viewing condi-      lows a much more immersive viewing          ence. Today, this field is being very
                                                 tions, such as flexible point of view        experience. In combination with audio       in­ten­sively developed. It is still unclear
–		JPEG Pleno                                    and object-based sound representa-           wave field processing new workflows         which representation formats are neces-
		 The Adhoc Group JPEG PLENO targets            tions. As in the image field, micro-         are possible. The focus in light field      sary and useful to allow optimized work-
     a standard framework for the repre-         phone arrays or microphone for indi-         coding is the representation and cod-       flows.
     sentation and exchange of new imag-         vidual sound objects are used to             ing of light fields captured by lenslet
     ing modalities such as light field,         capture the sound field.                     light-field cameras or camera arrays.       The JPEG and MPEG committees will
     point-cloud, and holographic imaging.                                                                                                investigate and standardize new formats
     Several workshops for this topic were    –		MPEG Tools for VR                         –		MPEG Point Cloud Compression                and methods for this new technology.
     executed. At the last ICME 2016 con-     		 VR is a fast-progressing technology       		 Images from cameras used in light
     ference, a grand challenge for light        with a strong market momentum.               field capturing are projections of light    Overview by Dr. Siegfried Foessel,
     field image compression was offered.        Several technologies are necessary to        rays to the sensor. A more universal        Fraunhofer Digital Media Alliance
                                                 realize a VR workflow such as image          representation is a point cloud. In this
–		JPEG Systems                                  stitching, metadata enrichment, de-          cloud, the position, light rays and
		 The subgroup JPEG systems investi-            velopment of optimized video coding          Bidirectional Reflectance Distribution
     gates the backward compatible exten-        technologies. To avoid fragmentation         Function, BRDF is defined for specific
     sion of JPEG for integrating multiple       in this market, the standardization of       objects. Because of the limitation of

18                                                                                                                                                                                       19
­P R O D U C T I O N A N D I M M E R S I V E M E D I A

                                                         OMNICAM-360 DEGREE FIRST LIVE TRANS-
                                                         MISSION WITH THE BERLIN PHILHARMONICS

                                                         Si nc e the O mn iCam - syst em w as f ir st developed at Fr aunhof er Heinr ich
                                                         H ertz I ns ti tute in 2009, num erous video product ions have been rea-
                                                         l i z ed i n c ooper at ion w it h t he Ber lin Philhar m onics – a reliable t op- class
                                                         partner. A fter var ious product ions, t he Om niCam - 360 w as inst alled t o
                                                         rec ord the c on cer t given on occasion of t he 25t h anniver sar y of t he
                                                         fal l of the Ber lin Wall in 2014. 2015, t he docum ent ar y “Playing t he
                                                         Spac e” was re leased, in w hich t he Philhar m onics plays a leading role
                                                         and whi c h wa s also f ilm ed w it h t he panor am ic cam er a syst em .

                                                         The compact OmniCam-360 from the             the OmniCam-360 came to use in the
                                                         Fraunhofer HHI is equipped with ten HD       first live transmission: The panoramic im-
                                                         cameras that are attached to a mirror        ages were successfully streamed in UHD
                                                         system. The individual produced images       quality through the Philharmonics Digitals
                                                         are corrected in real time and put to-       Concert Hall platform and shown online.
                                                         gether into a parallax-free UHD video        By wiping and/or zooming, viewers can
                                                         panorama with a resolution of approx.        focus on the part of the picture that inter-
                                                         10,000 x 2,000 pixels. The OmniCam-360       ests them the most. In broadcasts of con-
                                                         measures about 50 x 50 centimeters and       certs this could be the section of the or-
                                                         weighs around 15 kilos. Therefor it can      chestra with their favorite instruments, or
                                                         be utilized for many different cases.        the conductor, or they could just let their
                                                         The collaboration with the Berlin Phil­      gaze wander.
                                                         harmonics was taken to a new level when

20                                                                                                                                             21
­P R O D U C T I O N A N D I M M E R S I V E M E D I A

MPEG-H: DELIVERING IMMERSIVE AUDIO                                                                    would be part of the Korean standard for      According to Korean government repre-
                                                                                                      terrestrial UHD TV. If governmental offi-     sentatives, it will then be expanded to
                                                                                                      cials approve the standard in September,      ­cities near the Olympic Game venues. By
SYSTEM                                                                                                MPEG-H 3D Audio will provide an entirely      2021, the service will be available nation-
                                                                                                      new sound experience in the world’s first     wide.
Ul t r a H igh D e f in i ti o n Te l e vi s i o n (U HD T V ) i s the nex t bes t thi ng to rea-     4K terrestrial TV system.
l i t y . U H D T V t r a n s p o rts c ry s ta l c l e a r, h i g h -res ol uti on pi c tures that                                                 The MPEG-H 3D Audio codec is ready to
m a k e y ou w a nt to to u c h th e s cre e n to ch e ck that the rac e c ar wi l l not              If approved, South Korean consumers           use in broadcast systems, as demon-
dr iv e t hrough yo u r l i v i n g ro o m. T h e 4 K a n d 8K i mage res ol uti on al l ows          would be the first to experience interac-     strated by Fraunhofer IIS in a series of
UHD T V t o c re a te th e i l l u s i o n o f re a l i ty. I n addi ti on, i t offers the ex -       tive and immersive sound, the two main        events. Fraunhofer’s latest presentation of
te nsion t o 3D vi d e o re p ro d u cti o n , Hi g h Dy nami c Range I magi ng (H D RI )             features of MPEG-H 3D Audio. The tech-        a complete ATSC 3.0 broadcast chain
to displa y im a ge l u mi n a n c e i n mo re d e ta i l and H i gh F rame Rate (H F R)              nology enables users to adjust the sound      with MPEG-H Audio was at KOBA Show
whic h prov ide s a m o re fl u e n t p i c tu re s e q u e nc i ng and i mage qual i ty by           mix to their preferences. For example,        in Seoul, South Korea in May 2016.
exc e e ding t he typ i ca l fra m e ra te s u s e d to d ay .                                        consumers would have the ability to
                                                                                                      choose between different commentators         MPEG-H 3D Audio is already being inte-
                                                                                                      in a sporting event or improve the intelli-   grated into professional broadcasting
But video reproduction is only one part of             of “being there” rather than just watch-       gibility of dialogues by decreasing ambi-     equipment. Most recently, the broadcast
the equation for a convincing TV experi-               ing a TV program.                              ence volume. Furthermore, by adding 3D        equipment manufacturers DS Broadcast
ence. Sound is the other central ingredi-                                                             audio components for additional height        and Kai Media announced the availability
ent to authenticity. Gladly, technological             The MPEG-H Audio standard, mainly de-          information the codec delivers true audio     of MPEG-H 3D Audio in their latest 4K
developments on the audio side are                     veloped by Fraunhofer IIS, was selected as     immersion.                                    encoder products.
equally innovative: The next-generation                A/342 Candidate Standard for the new                                                         More product ­announcements, including
audio standard MPEG-H 3D Audio takes                   television standard ATSC 3.0 in May            With the 2018 Winter Olympic Games in         TV sets from major manufacturers for the
three-dimensional sound from the real                  2016. Shortly after, the South Korean          Pyeongchang, South Korea on the hori-         Korean market, are expected this fall.
world into the living room. MPEG-H pro-                standards organization Telecommunica-          zon, the new TV terrestrial system is ex-
vides a full audio immersion experience                tions Technology Association (TTA) an-         pected to launch in February 2017, start-
for TV viewers, which creates the feeling              nounced that the audio technology              ing with the Seoul metropolitan area.

22                                                                                                                                                                                             23
­P R O D U C T I O N A N D I M M E R S I V E M E D I A


As f e a t ur e d by R a d i o Wo rl d I n te rn a ti o n a l on 13 June 2016

To da y , m obile mu s i c s tre a m i n g s e rvi ce s a re fac i ng a s eri es of c hal l en-
g e s. S e r v ic e pro vi d e rs a re s u ffe ri n g d u e to the i nabi l i ty to s erv e the
full r a nge of po te n ti a l c u s to me rs b e c a u s e of real -worl d bandwi dth
l i mit a t ions c omb i n e d wi th a h i g h p l a y o u t/C D N c os t. I n addi ti on,
toda y ’s m obile u s e r e xp e ri e n ce i s fa r fro m perfec t. Streami ng s erv i c es
are e a t ing int o c o n s u me rs ’ m o n th l y d a ta a ll owanc es and s erv i c e rel i a-
b ilit y is f a r f rom wh a t we h a ve g ro wn to a p prec i ate from c l as s i c broad-
ca st c ov e r a ge .

However, there is an answer. Recently, the         ticular today’s de-facto standard MPEG
MPEG working group in ISO standardized             HE-AAC, on the other hand codecs for
and published the latest member of the             speech-only content at very low bit rates
highly successful AAC family of audio co-          represented by AMR-WB+.
decs, “xHE-AAC” (Extended High-Effi-
ciency Advanced Audio Coding).                     xHE-AAC has the ability to provide good
                                                   quality audio down to 8 kbps for mono
xHE-AAC combines and significantly im-             and 16 kbps for stereo services regardless
proves the functionality of two formerly           of the type of content (e.g. talk radio, a
separated worlds: On the one hand the              music program or a jingle). Even in the
general-purpose audio codecs and in par-           mid bit rate range, xHE-AAC shows signif-

24                                                                                                 25
­P R O D U C T I O N A N D I M M E R S I V E M E D I A

icant quality improvements over estab-          geting potential customers limited by        to all existing AAC based content. The
lished codecs, while for higher bit rates it    affordable 2G contracts who so far could     licensing program also makes it more
converges to the same transparent quality       not be served using existing audio codecs,   convenient for manufacturers to mix xHE-
known from the other members of the             but in the case of India represent roughly   AAC and AAC-only devices and apps.
AAC family.                                     90 percent of mobile users. Furthermore,
                                                service providers can benefit from a dra-    Professional streaming encoders are
Due to its universality, the global Digital     matic cut in monthly costs of data distri-   already available by Telos Alliance and
Radio Mondiale (DRM) digital radio stan-        bution. Sample calculations for mid-size     StreamS.
dard has adopted MPEG xHE-AAC as its            services show a saving potential of 75
primary audio codec. India is on the verge      percent of the CDN cost once mobile
of becoming the world’s largest digital ra-     apps start requesting the low bit rate
dio deployment. In February 2016, at-           xHE-AAC streams instead of the mp3 and
tendees at Broadcast Engineering Society        HE-AAC versions.
of India Convention in New Delhi wit-
nessed the first DRM transmission based         Consumers with 4G / LTE contracts around
on xHE-AAC by India’s public broadcaster        the world benefit from the increase in
All India Radio. DRM ready receiver             service reliability when leaving the
chipsets and receivers support xHE-AAC          well-covered city centers and falling back
out of the box, as well as all providers of     to 2G-coverage, or joining mass events
professional DRM encoders and multi-            with plenty of mobile data usage.
                                                In April, Via Licensing announced the
For audio streaming over IP to mobile de-       start of the official xHE-AAC licensing
vices, xHE-AAC promises a revolutionary         program, which includes the functionality
impact on both user experience and busi-        of the existing AAC program. This makes
ness opportunities. Streaming service pro-      up for the fact that xHE-AAC capable
viders for the first time can now start tar-    audio decoders are downward compatible

26                                                                                                                                      27
­P R O D U C T I O N A N D I M M E R S I V E M E D I A

POST-PRODUCTION TOOLS                                                                          Integrated quality check                       green marking. In future, the user will
                                                                                                                                              also receive a detailed quality report for
                                                                                               The great number of largely automatically      those parts of the distribution formats
IMF – a uniform exchange format for             mats, therefore the risk of quality losses     created videos means that a comprehen-         that need further attention. The quality
post-production                                 due to unwanted decoding, image-con-           sive and, indeed, automatic quality check      report is presented in a visual time-line
                                                version and encoding steps is minimized.       of the transcoding process is desirable in     ­directly integrated in the easyDCP soft-
easyDCP, the post-production software           At the same time, the IMP serves as a          order to save time and money. Usually          ware suite in order to establish an intui-
from Fraunhofer IIS, has offered process-       master for the creation of distribution for-   the material is inspected for errors and       tive way to check the problematic scenes
ing of IMPs (Interoperable Master Pack-         mats. In practice, it is not uncommon that     ­artifacts that may arise during lossy         if there are any. The first fault modules
ages) in addition to DCP generation for         up to 100 versions of different distribu-      transcoding or conversion into other color     for quality checking with predefined pa-
quite a while now. The necessary Interop-       tion formats need to be created from a         spaces or bit depths. The Fraunhofer re-       rameters have already been integrated
erable Master Format, IMF, proves its in-       single master and checked simultane-           searchers’ approach is a direct integration    into the overall easyDCP system.
teroperability successfully over and over       ously. IMF allows for the automatic cre-       of various quality check modules into the
at various Plugfest events and serves as a      ation of videos with different technical       transcoding processes. To this end, the        For 2017, the IMF suite with transcoding
universal exchange format within the            parameters such as video compression           scientists will present a version of the       and expanded checking functionality is
movie production. Since standardization         format, spatial resolution of the target       software that can be used both to create       planned to be rolled out to the first pilot
by the SMPTE (Society of Motion Picture         device, audio speaker set-up, or different     the IMP and, in particular, to check the       users.
and Television Engineers), IMF has been         national versions (subtitles, soundtrack in    generation of distribution formats from
establishing itself more and more in the        the local language etc.). Using the            an IMP automatically.
media industry.                                 IMF-concept called Output Profile List
                                                OPL, this work can be automated, since         At IBC 2016, the Fraunhofer Institutes IIS
The advantages of the IMF are obvious: a        the conversion of the IMP into a certain       and IDMT will showcase software compo-
uniform and internationally standardized        distribution format (e. g. iTunes) is de-      nents that detect problematic sections ­­
post-production format guarantees seam-         scribed here.                                  of video and summarize them in a test
less exchange of content in the highest                                                        ­report. In the first demonstration version,
picture quality; it is no longer necessary to                                                  the test results will be shown as a simple
support a lot of different exchange for-                                                       “traffic light function” with red/amber/

28                                                                                                                                                                                         29
­P R O D U C T I O N A N D I M M E R S I V E M E D I A

                                                         Output Profile List OPL allows for the
                                                         automatic creation of different distribution

30                                                                                                  31
­P R O D U C T I O N A N D I M M E R S I V E M E D I A


In order to provide improved search and         – Face detection and recognition
recommendation capabilities for ever-in-
creasing amounts of content, cost-effec-        By multimodal fusion and reuse of results
tive technologies for automatic extraction      from extractors, annotation quality can be
of metadata from raw media objects are          improved at the same time, e.­ g. by com-
needed. One of the core challenges in           bining speaker and face recognition, or
this domain is the need to shift from           by combining speaker and speech recog-
using individual standalone extractors to       nition. In order to realize such potential,
multimodal and context-aware extractor          the following challenges need to be
workflows, which can provide substan-           addressed:
tially improved search functionalities          – Cost-effective integration of heteroge
and results.                                       neous extractors
                                                – Flexible extractor workflow orchestra-
One example: even to realize seemingly             tion
simple search functionalities to find video     – An extensible, common metadata
segments where a (specific) person talks           model
about a specific topic, a news video pro-       – Availability of an expressive,
vider needs these extractors for annota-           media-aware query language
tion:                                           – Recommender systems using both
– Speaker recognition                              annotations and collaborative filtering
– Speech recognition
– Named entity recognition                      Fraunhofer IDMT has been active in the
– Temporal video segmentation                   domains of A/V extractor implementation,

32                                                                                            33
­P R O D U C T I O N A N D I M M E R S I V E M E D I A

integration and orchestration, and recom-
mendation for many years. We believe
that multimodal, complex annotation and
recommendation are keys to future media
systems, but it requires an integrated
technology platform addressing all afore-
mentioned aspects. Such a platform has
now been developed by the EU R&D proj-
ect MICO, with all core functionalities be-
ing provided as business-friendly OSS.

Fraunhofer IDMT has provided significant
contributions to the project and, together
with other MICO partners, intends to
further build on and extend the platform
to realize the potential of multimodal
annotation, search and recommendation.

34                                                       35

easyDCP – ONLY A FEW CLICKS                                                                 FRAUNHOFER CINGO ®
In order to create a DCP (Digital Cinema Package) easily and                                Fraunhofer Cingo enables an immersive sound experience on
compliant to all the requirements, or to test such packages sub-                            mobile and virtual reality devices. For device manufacturers and
jected to delivery, Fraunhofer IIS developed the easyDCP soft-                              service providers, Cingo is available as optimized software imple-
ware suite. More information about easyDCP, which is available                              mentation for all major PC and mobile platforms, including iOS
as standalone tool-set – as well as plug-ins for various post-pro-                          and Android. Equipment manufacturers using Cingo are Sam-
duction solutions incl. BMDs Resolve – can be found                                         sung (Samsung Gear VR), LG (LG 360 VR) and Google (Nexus
at                                                            family).

                       LIGHTWEIGHT IMAGE CODING                                                                    MPEG-H
                       FOR VIDEO OVER IP AND MEDIA
                       CONTRIBUTION                                                                                MPEG-H Audio provides interactive, immersive sound for TV and
                                                                                                                   VR applications. It is available as software implementation to
                       The Lici® codec ensures image-by-image, visually lossless trans-                            chip manufacturers, broadcasters and consumer electronics
                       mission of high-resolution video with compression ratios of 1:2                             manufacturers. The first professional broadcast encoders that
                       to 1:6. It features extremely low latency, high throughput and                              support MPEG-H Audio software are DS Broadcast´s BGE9000
                       requires little logic to implement. Lici is used for Video over IP
                                                                                                                   4K Ultra HD Encoder and the new 4K UHD Live Broadcast En-
                       applications and for contribution in professional movie produc-                             coder KME-U4K from Kai Media.
                       tion. IP Cores are available - licensing conditions can be sent on

36                                                                                                                                                                                  37


As an one-stop competence center for           media, digital movies, and standardiza-       Publication Information
digital media we provide for our custom-       tion, as well as new cinematography,                                              Photo acknowledgements
ers scientific know-how and the develop-       audio, and projection technologies, post-­    Fraunhofer Digital Media Alliance   Cover picture: Fraunhofer IIS/
ment of solutions that can be integrated       production, distribution, and archiving.      c/o Fraunhofer Institute for        Page 3: Fraunofer IIS/Karoline Glasow
in workflows and optimize process steps.       The goal of the Fraunhofer Digital Media      Integrated Circuits IIS             Page 4: Fraunhofer IIS/David Hartfiel
                                               Alliance is to quickly and easily help find                                       Page8/9: Fraunhofer HHI
The members of the Digital Media Net-          the right contacts, partners, and suitable    Am Wolfsmantel 33                   Page 11: Fraunhofer HHI
work are actively working in renowned          technology.                                   91058 Erlangen, Germany             Page 15: Fraunhofer Fokus
organizations and bodies like Interna-                                                                                           Page 20: Fraunhofer HHI/Berlin Phil­
tional Standardization Organization ISO,       The Fraunhofer Institute members are          Concept and Editor                  harmonic Orchestra/Monika Rittershaus
ISDCF (Inter-Society Digital Cinema Fo-        –   Digital Media Technologie IDMT,           Angela Raguse                       Page 23: Fraunhofer IIS/Frank Boxler/
rum), SMPTE (Society for Motion Picture            ­Ilmenau                                  Fraunhofer-Allianz Digital Media    Valentin Schilling
and Television Engineers), FKTG (German        –   Integrated Circuits IIS, Erlangen                                             Page 24: Fraunhofer IIS/Matthias Heyde
Society for Broadcast and Motion Picture),     –   Telecommunications,                       Layout and production               Page 27:
and in the EDCF (European Digital Cin-             ­Heinrich-Hertz-Institut HHI, Berlin      Kathrin Brohasga                    Page 29: Fraunhofer IIS/Fraunhofer IDMT
ema Forum).                                    –   Open Communication Systems                Kerstin Krinke                      Page 30/31: Fraunhofer IIS
                                                   FOKUS, Berlin                             Paul Pulkert                        Page 33: CC0 Public Domain
Fraunhofer Institutes in the Digital Media                                                                                       Page 37: Fraunhofer IIS/Kurt Fuchs
Alliance jointly offer innovative solutions    Contact                                                                           Page 37: DS Broadcast
and products for the transition to the dig-    Fraunhofer Digital Media Alliance
ital movie and media world of tomorrow.        Angela Raguse M.A.
The Institutes in the Alliance are available   Phone +49 9131 776-5105
as renowned contacts and partners for all                                                     © Fraunhofer-Gesellschaft
of the digital topics connected to digital

38                                                                                                                                                                        39
You can also read
NEXT SLIDES ... Cancel