Scrollable Nav Bar

Python Bytes and Bytearray

In most beginner Python programs, you spend most of your time working with strings (human-readable text) and numbers (ints, floats). But underneath the surface, computers store everything as raw bytes—a stream of 1s and 0s.

Python exposes that raw data through two closely related types:

  • “: an immutable (read-only) sequence of bytes
  • “: a mutable (editable) sequence of bytes

You may not use them every day, but you will see them often when working with files, networking, images, APIs, encryption, hashing, and many third‑party libraries.


What is a bytes object?

A bytes object is simply a sequence of integer values in the range 0–255 (each value is one byte). Python doesn’t treat it as “text” by default; it treats it as raw data.

This is important because text must have an encoding (like UTF‑8). Bytes, on the other hand, don’t automatically carry meaning like “letters” or “emojis”. They are just data.

When you print a bytes object, you’ll usually notice a leading “:

b"hello"

That b prefix is Python’s way of saying: “This is not a normal string; it’s bytes.”


Creating bytes: the most common ways

bytes(n) creates n zero-bytes

This one surprises many people the first time:

data = bytes(4)
print(data)

Output (your exact view may vary):

b'\x00\x00\x00\x00'

This does not convert the integer 4 into a byte value. Instead, it creates a bytes object of length 4, filled with zero bytes.

Each \x00 means a single byte shown in hexadecimal form.

[ Output showing b'\x00\x00\x00\x00' and length 4]


Bytes from a list/iterable of integers (0–255)

If you actually want specific byte values, you can create bytes from a list (or any iterable) of numbers:

data = bytes([65, 66, 67])
print(data)

Output:

b'ABC'
Code language: JavaScript (javascript)

Here, 65, 66, and 67 are ASCII codes for A, B, and C.

[Creating bytes from a list of integers]


Bytes from text using encoding

Text and bytes are not the same thing. To convert a string to bytes, you must choose an encoding.

UTF‑8 is the standard encoding used almost everywhere on the internet.

text = "hello"
data = text.encode("utf-8")
print(data)

Output:

b'hello'Code language: JavaScript (javascript)

Now, let’s try an emoji (emojis require Unicode, so encoding matters):

emoji = "🙄"  # eye roll
emoji_bytes = emoji.encode("utf-8")
print(emoji_bytes)

You’ll see something like:

b'\xf0\x9f\x99\x84'
Code language: JavaScript (javascript)

That output is the raw UTF‑8 representation of the emoji.

[Encoding an emoji to UTF‑8 bytes]


Using bytes(string, encoding)

Python also allows:

emoji_bytes = bytes("🙄", "utf-8")
print(emoji_bytes)

This does the same thing as .encode("utf-8"), but many people prefer .encode() because it reads more clearly.


Understanding \x.. and hexadecimal (base-16)

When bytes are printed as \x.., Python is showing each byte in hexadecimal (base‑16).

  • A byte has 8 bits
  • 8 bits can represent 256 values (0 to 255)
  • Hex uses two digits to represent 0–255 (00 to FF)

Examples:

  • \x00 = 0 in decimal
  • \x41 = 65 in decimal (ASCII 'A')
  • \xFF = 255 in decimal

You can convert hex strings to integers like this:

value = int("85", 16)  # "85" in base-16
print(value)

[Converting hexadecimal to decimal using int(…, 16)]


Converting bytes back to a string

To go from bytes → string, you decode using the same encoding that was used to encode the data.

emoji_bytes = b"\xf0\x9f\x99\x84"
emoji = emoji_bytes.decode("utf-8")
print(emoji)

Output:

🙄

If you decode with the wrong encoding, you might get errors like UnicodeDecodeError or corrupted text.

[ Decoding UTF‑8 bytes back into an emoji]


bytes are immutable

A bytes object behaves like a tuple: once created, it cannot be changed.

data = b"ABC"
# data[0] = 90  # Uncommenting this will raise a TypeError

That immutability is useful for safety and performance, but sometimes you really do need editable byte data.

That’s where “ comes in.


bytearray: mutable bytes you can modify

A bytearray is like bytes, but editable.

barr = bytearray(b"ABC")
print(barr)

Output looks similar, but notice it says bytearray(...):

Indexing and slicing still work

barr = bytearray("🙄", "utf-8")
print(barr)
print(len(barr))

Now you can modify individual bytes:

barr = bytearray("🙄", "utf-8")
print("Before:", barr, barr.decode("utf-8"))

# Change the last byte
barr[-1] = int("85", 16)  # set last byte to 0x85

print("After:", barr)
print("Decoded:", barr.decode("utf-8"))

Depending on the final byte value, decoding may produce a different character/emoji (or even fail if the new byte sequence is not valid UTF‑8). This is a great example of why bytes are “raw” and meaning depends on encoding rules.

[ Modifying a bytearray and decoding the result]


Treating bytes like sequences

Both bytes and bytearray support many sequence operations:

data = b"hello"

print(data[0])       # first byte value as an int
print(data[1:4])     # slice returns bytes
print(list(data))    # convert to list of integers

Key detail: indexing gives an int, not a one-character string.

[Indexing and slicing bytes]


Where bytes appear in real projects

You’ll commonly encounter bytes in situations like:

Reading a file in binary mode

with open("photo.jpg", "rb") as f:
    raw = f.read(20)

print(raw)

The result is bytes because an image file is raw binary data.

Network responses

Many HTTP clients return bytes for raw response bodies. You then decide whether to decode as text (and with which encoding) or keep it binary.

Hashing and encryption

Hash functions (like SHA‑256) and encryption algorithms operate on bytes, not on Python strings.

[ Example of reading binary data from a file]


Common mistakes

Mistake: Mixing up text and bytes

If a function expects bytes, passing a string may fail:

# Some library_function expects bytes, not str<br># library_function("hello")<br># library_function(b"hello")

When in doubt: check the documentation or print the type.

Mistake: Decoding with the wrong encoding

If you don’t know the encoding, decoding may fail:

data = b"..."
# text = data.decode("utf-8")  # may fail if not actually UTF-8

In such cases, you can handle errors:

text = data.decode("utf-8", errors="replace")

This keeps your program running, but it may replace unknown bytes with placeholder characters.


Practice

Try these small exercises to build confidence:

  • Create bytes(10) and print it. Also print len(...).
  • Create bytes([0, 1, 2, 255]) and print it.
  • Encode your name using UTF‑8 and print the bytes.
  • Decode it back into a string.
  • Convert it into a bytearray, change one byte, and see what happens when you decode again.

Summary

A bytes object is Python’s way of representing raw binary data—a sequence of byte values (0–255). It often shows up when data is coming from files, networks, or libraries where the data may not be “text” yet.

  • Use “ when you need a read-only representation of binary data.
  • Use “ when you need to modify the byte values.
  • Converting between text and bytes requires an encoding (encode) and decoding (decode).

Once you understand this text ↔ bytes boundary, many “mysterious” errors in file handling, API responses, and Unicode text processing become much easier to debug.