In most beginner Python programs, you spend most of your time working with strings (human-readable text) and numbers (ints, floats). But underneath the surface, computers store everything as raw bytes—a stream of 1s and 0s.
Python exposes that raw data through two closely related types:
You may not use them every day, but you will see them often when working with files, networking, images, APIs, encryption, hashing, and many third‑party libraries.
A bytes object is simply a sequence of integer values in the range 0–255 (each value is one byte). Python doesn’t treat it as “text” by default; it treats it as raw data.
This is important because text must have an encoding (like UTF‑8). Bytes, on the other hand, don’t automatically carry meaning like “letters” or “emojis”. They are just data.
When you print a bytes object, you’ll usually notice a leading “:
b"hello"That b prefix is Python’s way of saying: “This is not a normal string; it’s bytes.”
This one surprises many people the first time:
data = bytes(4)
print(data)Output (your exact view may vary):
b'\x00\x00\x00\x00'This does not convert the integer 4 into a byte value. Instead, it creates a bytes object of length 4, filled with zero bytes.
Each \x00 means a single byte shown in hexadecimal form.
[ Output showing b'\x00\x00\x00\x00' and length 4]
If you actually want specific byte values, you can create bytes from a list (or any iterable) of numbers:
data = bytes([65, 66, 67])
print(data)Output:
b'ABC'
Code language: JavaScript (javascript)
Here, 65, 66, and 67 are ASCII codes for A, B, and C.
[Creating bytes from a list of integers]
Text and bytes are not the same thing. To convert a string to bytes, you must choose an encoding.
UTF‑8 is the standard encoding used almost everywhere on the internet.
text = "hello"
data = text.encode("utf-8")
print(data)
Output:
b'hello'Code language: JavaScript (javascript)
Now, let’s try an emoji (emojis require Unicode, so encoding matters):
emoji = "🙄" # eye roll
emoji_bytes = emoji.encode("utf-8")
print(emoji_bytes)
You’ll see something like:
b'\xf0\x9f\x99\x84'
Code language: JavaScript (javascript)
That output is the raw UTF‑8 representation of the emoji.
[Encoding an emoji to UTF‑8 bytes]
Python also allows:
emoji_bytes = bytes("🙄", "utf-8")
print(emoji_bytes)
This does the same thing as .encode("utf-8"), but many people prefer .encode() because it reads more clearly.
\x.. and hexadecimal (base-16)When bytes are printed as \x.., Python is showing each byte in hexadecimal (base‑16).
00 to FF)Examples:
\x00 = 0 in decimal\x41 = 65 in decimal (ASCII 'A')\xFF = 255 in decimalYou can convert hex strings to integers like this:
value = int("85", 16) # "85" in base-16
print(value)
[Converting hexadecimal to decimal using int(…, 16)]
To go from bytes → string, you decode using the same encoding that was used to encode the data.
emoji_bytes = b"\xf0\x9f\x99\x84"
emoji = emoji_bytes.decode("utf-8")
print(emoji)
Output:
🙄If you decode with the wrong encoding, you might get errors like UnicodeDecodeError or corrupted text.
[ Decoding UTF‑8 bytes back into an emoji]
A bytes object behaves like a tuple: once created, it cannot be changed.
data = b"ABC"
# data[0] = 90 # Uncommenting this will raise a TypeError
That immutability is useful for safety and performance, but sometimes you really do need editable byte data.
That’s where “ comes in.
A bytearray is like bytes, but editable.
barr = bytearray(b"ABC")
print(barr)Output looks similar, but notice it says bytearray(...):
bytearray(b'ABC')Code language: JavaScript (javascript)
barr = bytearray("🙄", "utf-8")
print(barr)
print(len(barr))Now you can modify individual bytes:
barr = bytearray("🙄", "utf-8")
print("Before:", barr, barr.decode("utf-8"))
# Change the last byte
barr[-1] = int("85", 16) # set last byte to 0x85
print("After:", barr)
print("Decoded:", barr.decode("utf-8"))Depending on the final byte value, decoding may produce a different character/emoji (or even fail if the new byte sequence is not valid UTF‑8). This is a great example of why bytes are “raw” and meaning depends on encoding rules.
[ Modifying a bytearray and decoding the result]
Both bytes and bytearray support many sequence operations:
data = b"hello"
print(data[0]) # first byte value as an int
print(data[1:4]) # slice returns bytes
print(list(data)) # convert to list of integersKey detail: indexing gives an int, not a one-character string.
[Indexing and slicing bytes]
You’ll commonly encounter bytes in situations like:
Reading a file in binary mode
with open("photo.jpg", "rb") as f:
raw = f.read(20)
print(raw)
The result is bytes because an image file is raw binary data.
Many HTTP clients return bytes for raw response bodies. You then decide whether to decode as text (and with which encoding) or keep it binary.
Hash functions (like SHA‑256) and encryption algorithms operate on bytes, not on Python strings.
[ Example of reading binary data from a file]
If a function expects bytes, passing a string may fail:
# Some library_function expects bytes, not str<br># library_function("hello")<br># library_function(b"hello")When in doubt: check the documentation or print the type.
If you don’t know the encoding, decoding may fail:
data = b"..."
# text = data.decode("utf-8") # may fail if not actually UTF-8
In such cases, you can handle errors:
text = data.decode("utf-8", errors="replace")
This keeps your program running, but it may replace unknown bytes with placeholder characters.
Try these small exercises to build confidence:
bytes(10) and print it. Also print len(...).bytes([0, 1, 2, 255]) and print it.bytearray, change one byte, and see what happens when you decode again.A bytes object is Python’s way of representing raw binary data—a sequence of byte values (0–255). It often shows up when data is coming from files, networks, or libraries where the data may not be “text” yet.
encode) and decoding (decode).Once you understand this text ↔ bytes boundary, many “mysterious” errors in file handling, API responses, and Unicode text processing become much easier to debug.