Basics of the 3GP File Structure10/07/06 Permalink
13. Basics of the 3GP File Structure
This section provides a coarse overview of the 3GP file structure,
which follows the ISO Base Media file Format .
Each 3GP file consists of "Boxes". In general, a 3GP file contains
the File Type Box (ftyp), the Movie Box (moov), and the Media Data
Box (mdat). The File Type Box identifies the type and properties of
the 3GP file itself. The Movie Box and the Media Data Box, serving
as containers, include their own boxes for each media. Boxes start
with a header, which indicates both size and type (these fields are
called, namely, "size" and "type"). Additionally, each box type may
include a number of boxes.
In the following, only those boxes are mentioned that are useful for
the purposes of this payload format.
The Movie Box (moov) contains one or more Track Boxes (trak), which
include information about each track. A Track Box contains, among
others, the Track Header Box (tkhd), the Media Header Box (mdhd), and
the Media Information Box (minf).
The Track Header Box specifies the characteristics of a single track,
where a track is, in this case, the streamed text during a session.
Exactly one Track Header Box is present for a track. It contains
information about the track, such as the spatial layout (width and
height), the video transformation matrix, and the layer number.
Since these pieces of information are essential and static (i.e.,
constant) for the duration of the session, they must be sent prior to
the transmission of any text samples.
The Media Header Box contains the "timescale" or number of time units
that pass in one second, i.e., cycles per second or Hertz. The Media
Information Box includes the Sample Table Box (stbl), which contains
all the time and data indexing of the media samples in a track. Using
this box, it is possible to locate samples in time and to determine
their type, size, container, and offset into that container. Inside
the Sample Table Box, we can find the Sample Description Box (stsd,
for finding sample descriptions), the Decoding Time to Sample Box
(stts, for finding sample duration), the Sample Size Box (stsz), and
the Sample to Chunk Box (stsc, for finding the sample description
Finally, the Media Data Box contains the media data itself. In timed
text tracks, this box contains text samples. Its equivalent to audio
and video is audio and video frames, respectively. The text sample
consists of the text length, the text string, and one or several
Modifier Boxes. The text length is the size of the text in bytes.