When transitioning to a new hardware setup, unexpected issues can arise. This was the case when I moved from a Windows machine to a MacBook Pro with an Apple M3 Pro chip running macOS Sonoma (version 14.5). A peculiar problem emerged: importing pandas before TensorFlow caused the script to freeze, while reversing the import order worked flawlessly. This blog post details the issue, the debugging process, and the solution.
When pandas is imported before TensorFlow/Keras, the script freezes without throwing any exceptions, even when wrapped in try/catch blocks. Here’s the minimal script that replicates the issue:
import numpy as np
import os
import pandas as pd
from tensorflow.keras import layers, models
print("Creating simple model...")
try:
model = models.Sequential([
layers.Input(shape=(10,)),
layers.Dense(64, activation='relu'),
layers.Dense(1, activation='linear')
])
print("Model created successfully.")
except Exception as e:
print(f"Error creating model: {e}")
x_train = np.random.rand(100, 10)
y_train = np.random.rand(100, 1)
# Compile the model
model.compile(optimizer='adam', loss='mean_squared_error')
# Train the model
try:
model.fit(x_train, y_train, epochs=5, batch_size=32)
print("Model training completed successfully.")
except Exception as e:
print(f"Error training model: {e}")
Creating simple model...
2024-05-31 18:04:07.639131: I metal_plugin/src/device/metal_device.cc:1154] Metal device set to: Apple M3 Pro
2024-05-31 18:04:07.639149:
systemMemory: 36.00 GB
2024-05-31 18:04:07.639154:
maxCacheSize: 13.50 GB
2024-05-31 tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2024-05-31 18:04:07.639186: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
The script freezes at this point and must be terminated.
Creating simple model...
2024-05-31 18:07:18.879661: I metal_plugin/src/device/metal_device.cc:1154] Metal device set to: Apple M3 Pro
2024-05-31 18:07:18.879680:
systemMemory: 36.00 GB
2024-05-31 18:07:18.879685:
maxCacheSize: 13.50 GB
2024-05-31 tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2024-05-31 18:07:18.879717: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>) Model created successfully.
Epoch 1/5
4/4 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - loss: 0.4620
Epoch 2/5
4/4 ━━━━━━━━━━━━━━━━━━━━ 0s 636us/step - loss: 0.3263
Epoch 3/5
4/4 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - loss: 0.2322 Epoch 4/5
4/4 ━━━━━━━━━━━━━━━━━━━━ 0s 629us/step - loss: 0.1395 Epoch 5/5
4/4 ━━━━━━━━━━━━━━━━━━━━ 0s 690us/step - loss: 0.1251 Model training completed successfully.
pip install --upgrade pandas tensorflow
python -m venv myenv
source myenv/bin/activate
pip install pandas tensorflow
export TF_CPP_MIN_LOG_LEVEL=2
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "-1"
import tensorflow as tf
tf.debugging.set_log_device_placement(True)
justMyCode
in the debugger’s launch configuration to step through the code.Using the debugger, I stepped through the script until it froze. I placed breakpoints and stepped into the function calls until I pinpointed the line where the freeze occurred:
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name, inputs, attrs, num_outputs)
This line is located in execute.py
within the quick_execute
function within TensorFlow:
def quick_execute(op_name, num_outputs, inputs, attrs, ctx, name=None):
"""Execute a TensorFlow operation."""
device_name = ctx.device_name
try:
ctx.ensure_initialized()
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name, inputs, attrs, num_outputs)
except core._NotOkStatusException as e:
if name is not None:
e.message += " name: " + name
raise core._status_to_exception(e) from None
except TypeError as e:
keras_symbolic_tensors = [x for x in inputs if _is_keras_symbolic_tensor(x)]
if keras_symbolic_tensors:
raise core._SymbolicException(
"Inputs to eager execution function cannot be Keras symbolic "
"tensors, but found {}".format(keras_symbolic_tensors)
)
raise e
return tensors
The quickest workaround I found is to import TensorFlow/Keras before pandas. This order avoids the freezing issue and allows the script to execute normally. I would suggest this solution only if there is a specific requirement for the 2.16.X version.
After extensive testing and debugging, I found that downgrading TensorFlow to version 2.15.0 resolved the issue. Here are the steps I followed:
pip install tensorflow==2.15.0
The import order of pandas and TensorFlow/Keras can cause a script to freeze on macOS Sonoma with an Apple M3 Pro chip, possibly due to a memory or lock issue. Importing TensorFlow/Keras before pandas resolves the issue. Additionally, downgrading TensorFlow to version 2.15.0 is an effective solution. If you encounter similar issues, ensure all packages are updated, use a virtual environment, and set appropriate environment variables.