Basics of the 3GP File Structure

10/07/06 Permalink

http://www.rfc-archive.org/getrfc.php?rfc=4396

13.  Basics of the 3GP File Structure

   This section provides a coarse overview of the 3GP file structure,
   which follows the ISO Base Media file Format [2].

   Each 3GP file consists of "Boxes".  In general, a 3GP file contains
   the File Type Box (ftyp), the Movie Box (moov), and the Media Data
   Box (mdat).  The File Type Box identifies the type and properties of
   the 3GP file itself.  The Movie Box and the Media Data Box, serving
   as containers, include their own boxes for each media.  Boxes start
   with a header, which indicates both size and type (these fields are
   called, namely, "size" and "type").  Additionally, each box type may
   include a number of boxes.

   In the following, only those boxes are mentioned that are useful for
   the purposes of this payload format.

   The Movie Box (moov) contains one or more Track Boxes (trak), which
   include information about each track.  A Track Box contains, among
   others, the Track Header Box (tkhd), the Media Header Box (mdhd), and
   the Media Information Box (minf).

   The Track Header Box specifies the characteristics of a single track,
   where a track is, in this case, the streamed text during a session.
   Exactly one Track Header Box is present for a track.  It contains
   information about the track, such as the spatial layout (width and
   height), the video transformation matrix, and the layer number.
   Since these pieces of information are essential and static (i.e.,
   constant) for the duration of the session, they must be sent prior to
   the transmission of any text samples.

   The Media Header Box contains the "timescale" or number of time units
   that pass in one second, i.e., cycles per second or Hertz.  The Media
   Information Box includes the Sample Table Box (stbl), which contains
   all the time and data indexing of the media samples in a track. Using
   this box, it is possible to locate samples in time and to determine
   their type, size, container, and offset into that container. Inside
   the Sample Table Box, we can find the Sample Description Box (stsd,
   for finding sample descriptions), the Decoding Time to Sample Box
   (stts, for finding sample duration), the Sample Size Box (stsz), and
   the Sample to Chunk Box (stsc, for finding the sample description
   index).

   Finally, the Media Data Box contains the media data itself.  In timed
   text tracks, this box contains text samples.  Its equivalent to audio
   and video is audio and video frames, respectively.  The text sample
   consists of the text length, the text string, and one or several
   Modifier Boxes.  The text length is the size of the text in bytes.

Share It: Digg | del.icio.us | Furl | reddit | Facebook | Yahoo! | Send to Phone

mobile-utopia.com | Feedback