Python Binary Sequence Types

Python binary data is implemented as Python binary sequence types which are handled by predefined ~~bytes~~, ~~bytearray~~, and ~~memoryview~~ iteration objects. Both ~~bytes~~, and ~~bytearray~~ are used to manipulate binary data and are supported by ~~memoryview~~. ~~memoryview~~ is used to access the binary data of a binary object stored in the memory through the buffer protocol directly without making a copy. Besides, ~~bytearray~~ is also supported by the array module with efficient storage of basic data types like 32-bit integers and IEEE754 double-precision floating values.

Python Memory Views

Python ~~memoryview~~ is used to access the internal data of an object that stored in the memory through the buffer protocol without copying.

Python Memoryview Constructor

class memoryview

class memoryview(obj)

Parameter

~~class memoryview~~to return a memoryview that references ~~obj~~. ~~obj~~to specify an object. ~~obj~~ must support the buffer protocol. Built-in objects that support the buffer protocol include ~~bytes~~ and ~~bytearray~~.


A memoryview has the notion of an element, which is the atomic memory unit handled by the originating object obj. For many simple types such as bytes and bytearray, an element is a single byte, but other types such as array.array may have bigger elements.

len(view) is equal to the length of tolist. If view.ndim = 0, the length is 1. If view.ndim = 1, the length is equal to the number of elements in the view. For higher dimensions, the length is equal to the length of the nested list representation of the view. The itemsize attribute will give you the number of bytes in a single element.

A memoryview supports slicing and indexing to expose its data. One-dimensional slicing will result in a subview:
>>>

>>> v = memoryview(b'abcefg')
>>> v[1]
98
>>> v[-1]
103
>>> v[1:4]
<memory at 0x7f3ddc9f4350>
>>> bytes(v[1:4])
b'bce'

If format is one of the native format specifiers from the struct module, indexing with an integer or a tuple of integers is also supported and returns a single element with the correct type. One-dimensional memoryviews can be indexed with an integer or a one-integer tuple. Multi-dimensional memoryviews can be indexed with tuples of exactly ndim integers where ndim is the number of dimensions. Zero-dimensional memoryviews can be indexed with the empty tuple.

Here is an example with a non-byte format:
>>>

>>> import array
>>> a = array.array('l', [-11111111, 22222222, -33333333, 44444444])
>>> m = memoryview(a)
>>> m[0]
-11111111
>>> m[-1]
44444444
>>> m[::2].tolist()
[-11111111, -33333333]

If the underlying object is writable, the memoryview supports one-dimensional slice assignment. Resizing is not allowed:
>>>

>>> data = bytearray(b'abcefg')
>>> v = memoryview(data)
>>> v.readonly
False
>>> v[0] = ord(b'z')
>>> data
bytearray(b'zbcefg')
>>> v[1:4] = b'123'
>>> data
bytearray(b'z123fg')
>>> v[2:3] = b'spam'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: memoryview assignment: lvalue and rvalue have different structures
>>> v[2:6] = b'spam'
>>> data
bytearray(b'z1spam')

One-dimensional memoryviews of hashable (read-only) types with formats ‘B’, ‘b’ or ‘c’ are also hashable. The hash is defined as hash(m) == hash(m.tobytes()):
>>>

>>> v = memoryview(b'abcefg')
>>> hash(v) == hash(b'abcefg')
True
>>> hash(v[2:4]) == hash(b'ce')
True
>>> hash(v[::-2]) == hash(b'abcefg'[::-2])
True

Changed in version 3.3: One-dimensional memoryviews can now be sliced. One-dimensional memoryviews with formats ‘B’, ‘b’ or ‘c’ are now hashable.

Changed in version 3.4: memoryview is now registered automatically with collections.abc.Sequence

Changed in version 3.5: memoryviews can now be indexed with tuple of integers.

Memoryview Method

The methods supported by memoryview are: ~~__eq__(exporter)~~A memoryview and a PEP 3118 exporter are equal if their shapes are equivalent and if all corresponding values are equal when the operands’ respective format codes are interpreted using struct syntax. For the subset of struct format strings currently supported by tolist(), v and w are equal if v.tolist() == w.tolist(): >>> >>> import array >>> a = array.array('I', [1, 2, 3, 4, 5]) >>> b = array.array('d', [1.0, 2.0, 3.0, 4.0, 5.0]) >>> c = array.array('b', [5, 3, 1]) >>> x = memoryview(a) >>> y = memoryview(b) >>> x == a == y == b True >>> x.tolist() == a.tolist() == y.tolist() == b.tolist() True >>> z = y[::-2] >>> z == c True >>> z.tolist() == c.tolist() True If either format string is not supported by the struct module, then the objects will always compare as unequal (even if the format strings and buffer contents are identical): >>> >>> from ctypes import BigEndianStructure, c_long >>> class BEPoint(BigEndianStructure): ... _fields_ = [("x", c_long), ("y", c_long)] ... >>> point = BEPoint(100, 200) >>> a = memoryview(point) >>> b = memoryview(point) >>> a == point False >>> a == b False Note that, as with floating point numbers, v is w does not imply v == w for memoryview objects. Changed in version 3.3: Previous versions compared the raw memory disregarding the item format and the logical array structure. ~~tobytes(order=None)~~Return the data in the buffer as a bytestring. This is equivalent to calling the bytes constructor on the memoryview. >>> >>> m = memoryview(b"abc") >>> m.tobytes() b'abc' >>> bytes(m) b'abc' For non-contiguous arrays the result is equal to the flattened list representation with all elements converted to bytes. tobytes() supports all format strings, including those that are not in struct module syntax. New in version 3.8: order can be {‘C’, ‘F’, ‘A’}. When order is ‘C’ or ‘F’, the data of the original array is converted to C or Fortran order. For contiguous views, ‘A’ returns an exact copy of the physical memory. In particular, in-memory Fortran order is preserved. For non-contiguous views, the data is converted to C first. order=None is the same as order=’C’. ~~hex([sep[, bytes_per_sep]])~~Return a string object containing two hexadecimal digits for each byte in the buffer. >>> >>> m = memoryview(b"abc") >>> m.hex() '616263' New in version 3.5. Changed in version 3.8: Similar to bytes.hex(), memoryview.hex() now supports optional sep and bytes_per_sep parameters to insert separators between bytes in the hex output. ~~tolist()~~to return the data in the buffer as a list of elements. >>> >>> memoryview(b'abc').tolist() [97, 98, 99] >>> import array >>> a = array.array('d', [1.1, 2.2, 3.3]) >>> m = memoryview(a) >>> m.tolist() [1.1, 2.2, 3.3] Changed in version 3.3: tolist() now supports all single character native formats in struct module syntax as well as multi-dimensional representations. ~~toreadonly()~~to return a readonly version of the memoryview object. The original ~~memoryview~~ object is unchanged. >>> >>> m = memoryview(bytearray(b'abc')) >>> mm = m.toreadonly() >>> mm.tolist() [89, 98, 99] >>> mm[0] = 42 Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: cannot modify read-only memory >>> m[0] = 43 >>> mm.tolist() [43, 98, 99] New in version 3.8. ~~release()~~to release the underlying buffer exposed by ~~memoryview~~ object. Many objects take special actions when a view is held on them (for example, a bytearray would temporarily forbid resizing); therefore, calling release() is handy to remove these restrictions (and free any dangling resources) as soon as possible. After this method has been called, any further operation on the view raises a ValueError (except release() itself which can be called multiple times): >>> >>> m = memoryview(b'abc') >>> m.release() >>> m[0] Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: operation forbidden on released memoryview object The context management protocol can be used for a similar effect, using the with statement: >>> >>> with memoryview(b'abc') as m: ... m[0] ... 97 >>> m[0] Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: operation forbidden on released memoryview object New in version 3.2. ~~cast(format[, shape])~~ Cast a memoryview to a new format or shape. shape defaults to [byte_length//new_itemsize], which means that the result view will be one-dimensional. The return value is a new memoryview, but the buffer itself is not copied. Supported casts are 1D -> C-contiguous and C-contiguous -> 1D. The destination format is restricted to a single element native format in struct syntax. One of the formats must be a byte format (‘B’, ‘b’ or ‘c’). The byte length of the result must be the same as the original length. Cast 1D/long to 1D/unsigned bytes: >>> >>> import array >>> a = array.array('l', [1,2,3]) >>> x = memoryview(a) >>> x.format 'l' >>> x.itemsize 8 >>> len(x) 3 >>> x.nbytes 24 >>> y = x.cast('B') >>> y.format 'B' >>> y.itemsize 1 >>> len(y) 24 >>> y.nbytes 24 Cast 1D/unsigned bytes to 1D/char: >>> >>> b = bytearray(b'zyz') >>> x = memoryview(b) >>> x[0] = b'a' Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: memoryview: invalid value for format "B" >>> y = x.cast('c') >>> y[0] = b'a' >>> b bytearray(b'ayz') Cast 1D/bytes to 3D/ints to 1D/signed char: >>> >>> import struct >>> buf = struct.pack("i"*12, *list(range(12))) >>> x = memoryview(buf) >>> y = x.cast('i', shape=[2,2,3]) >>> y.tolist() [[[0, 1, 2], [3, 4, 5]], [[6, 7, 8], [9, 10, 11]]] >>> y.format 'i' >>> y.itemsize 4 >>> len(y) 2 >>> y.nbytes 48 >>> z = y.cast('b') >>> z.format 'b' >>> z.itemsize 1 >>> len(z) 48 >>> z.nbytes 48 Cast 1D/unsigned long to 2D/unsigned long: >>> >>> buf = struct.pack("L"*6, *list(range(6))) >>> x = memoryview(buf) >>> y = x.cast('L', shape=[2,3]) >>> len(y) 2 >>> y.nbytes 48 >>> y.tolist() [[0, 1, 2], [3, 4, 5]] New in version 3.3. Changed in version 3.5: The source format is no longer restricted when casting to a byte view. There are also several readonly attributes available: obj The underlying object of the memoryview: >>> >>> b = bytearray(b'xyz') >>> m = memoryview(b) >>> m.obj is b True New in version 3.3. nbytes nbytes == product(shape) * itemsize == len(m.tobytes()). This is the amount of space in bytes that the array would use in a contiguous representation. It is not necessarily equal to len(m): >>> >>> import array >>> a = array.array('i', [1,2,3,4,5]) >>> m = memoryview(a) >>> len(m) 5 >>> m.nbytes 20 >>> y = m[::2] >>> len(y) 3 >>> y.nbytes 12 >>> len(y.tobytes()) 12 Multi-dimensional arrays: >>> >>> import struct >>> buf = struct.pack("d"*12, *[1.5*x for x in range(12)]) >>> x = memoryview(buf) >>> y = x.cast('d', shape=[3,4]) >>> y.tolist() [[0.0, 1.5, 3.0, 4.5], [6.0, 7.5, 9.0, 10.5], [12.0, 13.5, 15.0, 16.5]] >>> len(y) 3 >>> y.nbytes 96 New in version 3.3. readonly A bool indicating whether the memory is read only. format A string containing the format (in struct module style) for each element in the view. A memoryview can be created from exporters with arbitrary format strings, but some methods (e.g. tolist()) are restricted to native single element formats. Changed in version 3.3: format 'B' is now handled according to the struct module syntax. This means that memoryview(b'abc')[0] == b'abc'[0] == 97. itemsize The size in bytes of each element of the memoryview: >>> >>> import array, struct >>> m = memoryview(array.array('H', [32000, 32001, 32002])) >>> m.itemsize 2 >>> m[0] 32000 >>> struct.calcsize('H') == m.itemsize True ndim An integer indicating how many dimensions of a multi-dimensional array the memory represents. shape A tuple of integers the length of ndim giving the shape of the memory as an N-dimensional array. Changed in version 3.3: An empty tuple instead of None when ndim = 0. strides A tuple of integers the length of ndim giving the size in bytes to access each element for each dimension of the array. Changed in version 3.3: An empty tuple instead of None when ndim = 0. suboffsets Used internally for PIL-style arrays. The value is informational only. c_contiguous A bool indicating whether the memory is C-contiguous. New in version 3.3. f_contiguous A bool indicating whether the memory is Fortran contiguous. New in version 3.3. contiguous A bool indicating whether the memory is contiguous. New in version 3.3.