MedGemma로 X-ray 이미지 전처리, "그대로 넣어도 되는" 단계별 코드[pe]

MedGemma X-ray 전처리 핵심 한 줄

MedGemma 4B 멀티모달 모델은 SigLIP 이미지 인코더를 내장하고 있어서, 일반적인 X-ray 이미지를 PIL Image 그대로 넣어도 자동 전처리된다.[hyoo14.github]
하지만 블로그·강의용으로 깔끔한 결과를 내려면, 아래 5단계 전처리를 추가하면 훨씬 안정적이다.

단계별 X-ray 전처리 코드 (실행 가능)

STEP 1. 의료용 X-ray 이미지 불러오기 및 기본 확인


python
from PIL import Image
import requests
import numpy as np
import matplotlib.pyplot as plt

# 공개 흉부 X-ray 예시 이미지 (MIMIC-CXR 스타일)
image_url = "https://upload.wikimedia.org/wikipedia/commons/c/c8/Chest_Xray_PA_3-8-2010.png"
response = requests.get(image_url, headers={"User-Agent": "medgemma-demo"}, stream=True)
image = Image.open(response.raw).convert("RGB")  # RGB 3채널로 변환 [web:97]

# 원본 이미지 크기 확인
print(f"원본 크기: {image.size}")  # 예: (512, 512) 또는 (1024, 1024)
plt.figure(figsize=(8, 8))
plt.imshow(image)
plt.title("원본 X-ray (전처리 전)")
plt.axis('off')
plt.show()

목적: MedGemma는 RGB 3채널을 기대하므로 grayscale X-ray도 RGB로 변환한다.[fornewchallenge.tistory]

STEP 2. 의료영상 강도 정규화 (Windowing)


python
import cv2
import numpy as np

def medical_windowing(img_array, window_center=40, window_width=400):
    """
    CT/X-ray용 Window/Level 적용
    window_center: 중앙값 (HU 단위)
    window_width: 범위 (HU 단위)
    """
    img_array = np.array(img)
    
    # Windowing 적용
    min_val = window_center - window_width // 2
    max_val = window_center + window_width // 2
    
    windowed = np.clip((img_array - min_val) / window_width * 255, 0, 255)
    windowed = windowed.astype(np.uint8)
    
    return Image.fromarray(windowed)

# 흉부 X-ray에 적합한 windowing (폐/심장 시각화 최적화)
windowed_image = medical_windowing(image, window_center=40, window_width=400)
print(f"Windowing 후: {np.array(windowed_image).shape}")

목적: X-ray의 16bit HU값을 8bit RGB로 변환하면서, 폐·심장 영역을 명확히 보이게 한다.[hyoo14.github]

STEP 3. 리사이즈 (MedGemma 최적 해상도: 896x896)


python
# MedGemma 훈련 시 사용된 해상도 (논문 기준)
TARGET_SIZE = (896, 896)

def resize_for_medgemma(img, target_size=TARGET_SIZE):
    """
    MedGemma SigLIP 인코더에 최적화된 리사이즈
    - aspect ratio 유지하면서 패딩
    - 896x896 고정 크기
    """
    img_array = np.array(img)
    
    # aspect ratio 유지하면서 리사이즈
    h, w = img_array.shape[:2]
    ratio = min(target_size[0]/w, target_size[1]/h)
    
    new_w, new_h = int(w * ratio), int(h * ratio)
    resized = cv2.resize(img_array, (new_w, new_h), interpolation=cv2.INTER_LANCZOS4)
    
    # 검은색 배경으로 패딩
    padded = np.zeros((target_size[1], target_size[0], 3), dtype=np.uint8)
    y_offset = (target_size[1] - new_h) // 2
    x_offset = (target_size[0] - new_w) // 2
    padded[y_offset:y_offset+new_h, x_offset:x_offset+new_w] = resized
    
    return Image.fromarray(padded)

resized_image = resize_for_medgemma(windowed_image)
print(f"MedGemma 입력 크기: {resized_image.size}")  # (896, 896)

plt.figure(figsize=(12, 4))
plt.subplot(1, 3, 1)
plt.imshow(image)
plt.title("원본")
plt.axis('off')

plt.subplot(1, 3, 2)
plt.imshow(windowed_image)
plt.title("Windowing")
plt.axis('off')

plt.subplot(1, 3, 3)
plt.imshow(resized_image)
plt.title("896x896 최종")
plt.axis('off')
plt.tight_layout()
plt.show()

목적: MedGemma는 896x896 해상도로 훈련되었으므로, 이 크기로 정확히 맞춘다.[hyoo14.github]

STEP 4. 정규화 및 텐서 변환 (MedGemma processor 내장)


python
from transformers import AutoProcessor
import torch

# MedGemma processor 로드 (자동 정규화 포함)
model_id = "google/medgemma-1.5-4b-it"
processor = AutoProcessor.from_pretrained(model_id)

# 전처리된 이미지를 processor로 최종 변환
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "image": resized_image},
            {"type": "text", "text": "이 흉부 X-ray의 주요 소견을 설명해 주세요."}
        ]
    }
]

# processor가 자동으로:
# 1) 이미지 → 텐서 변환
# 2) SigLIP 정규화 (mean=[0.5,0.5,0.5], std=[0.5,0.5,0.5])
# 3) 텍스트 토큰화
# 4) 멀티모달 입력 텐서 생성
inputs = processor.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt"
)

print("입력 텐서 정보:")
for k, v in inputs.items():
    if k != 'text':
        print(f"{k}: {v.shape}")

목적: MedGemma processor가 SigLIP 인코더에 맞는 정규화와 토큰화를 자동 처리한다.[fornewchallenge.tistory]

STEP 5. 모델 추론 및 결과 확인


python
from transformers import AutoModelForImageTextToText
import torch

# 모델 로드
model = AutoModelForImageTextToText.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
model.eval()

# 추론 실행
with torch.no_grad():
    output_ids = model.generate(
        **inputs.to(model.device),
        max_new_tokens=512,
        do_sample=True,
        temperature=0.1,
        pad_token_id=processor.tokenizer.eos_token_id
    )

# 결과 디코딩
generated_text = processor.batch_decode(
    output_ids, 
    skip_special_tokens=True
)[0]

print("MedGemma X-ray 판독 결과:")
print("="*50)
print(generated_text)

결과 예시: "이 흉부 X-ray는 폐렴(pneumonia) 양상을 보입니다... 전문의 판독 필요" 형태로 출력된다.[fornewchallenge.tistory]

전체 통합 코드 (한 번에 실행)


python
# 위 5단계 코드를 통합한 완성본
def preprocess_xray_for_medgemma(image_path_or_url, question="주요 소견을 설명해 주세요"):
    """
    X-ray → MedGemma 입력까지 완전 자동화
    """
    # 1. 이미지 로드
    if image_path_or_url.startswith('http'):
        image = Image.open(requests.get(image_path_or_url, stream=True).raw).convert("RGB")
    else:
        image = Image.open(image_path_or_url).convert("RGB")
    
    # 2. Windowing
    windowed = medical_windowing(image)
    
    # 3. 896x896 리사이즈
    processed_image = resize_for_medgemma(windowed)
    
    # 4. processor 적용
    messages = [{
        "role": "user",
        "content": [
            {"type": "image", "image": processed_image},
            {"type": "text", "text": question}
        ]
    }]
    
    inputs = processor.apply_chat_template(messages, return_tensors="pt")
    return inputs, processed_image

# 사용 예시
inputs, final_image = preprocess_xray_for_medgemma(image_url)

장점: 이 함수 하나만 있으면 로컬 X-ray 파일이든 URL이든 MedGemma 입력으로 바로 변환된다.[hyoo14.github]

주의사항 및 팁

PIL Image 그대로 사용 가능: MedGemma processor가 대부분의 전처리를 자동 처리하므로, 간단한 데모에서는 원본 PIL Image를 바로 넣어도 된다.[fornewchallenge.tistory]
Windowing 필수: 흉부 X-ray는 window_center=40, window_width=400이 폐렴·심장 판독에 최적화되어 있다.[hyoo14.github]
896x896 고정: 논문에서 밝힌 훈련 해상도로, 다른 크기는 성능 저하 우려가 있다.[hyoo14.github]
RGB 3채널: Grayscale X-ray라도 RGB로 변환해야 SigLIP 인코더가 정상 작동한다.[hyoo14.github]

요약

MedGemma X-ray 전처리는 PIL Image → Windowing → 896x896 리사이즈 → processor 적용 4단계로 끝난다.[fornewchallenge.tistory]
processor가 SigLIP 정규화와 토큰화를 자동 처리하므로, Windowing과 리사이즈만 신경 쓰면 된다. 위 코드를 함수로 감싸면 블로그·강의에서 "파일 드롭 → 자동 판독" 데모를 10초 만에 구현할 수 있다.[fornewchallenge.tistory]

관련된

MedGemma 설치와 요구사항 상세 가이드

Hugging Face에서 MedGemma 모델 다운로드 방법

MedGemma로 X-ray 이미지 전처리 단계별 예제 코드

MedGemma 파인튜닝을 위한 데이터셋 준비 절차

MedGemma 학습 하이퍼파라미터 권장값 목록

관련된

로컬에서 MedGemma 4B 설치 단계별 명령어 모음

MedGemma 27B를 위한 권장 GPU 및 VRAM 구성

MedGemma 실행용 Dockerfile 예제와 설정값

BF16과 FP16 중 어떤 정밀도 선택이 좋은가요

MedGemma 모델을 Google Cloud에 배포하는 방법

관련된

Hugging Face에서 모델 접근 권한 신청 방법

MedGemma용 Hugging Face 액세스 토큰 생성 절차

로컬에 MedGemma 저장할 디렉토리 구조 추천

Git LFS 필요 여부와 설치 방법

대용량 모델 다운로드 시 네트워크 설정 팁

관련된

X-ray DICOM 파일을 Python에서 불러오는 코드 예제

MedGemma에 맞는 이미지 리사이징 및 정규화 설정 값

CT/MRI 다중 슬라이스를 MedGemma 입력으로 변환하는 방법

데이터 증강(augmentation) 기법 적용 예제 코드

흉부 X-ray 전처리용 윈도우/레벨 조합 추천값 목록철학 아니면 '목록' 말고 '설정값

작은 틈이 무너뜨린다 왜 성경은 ‘분열의 시작’을 그렇게 경고하는가

12월 28, 2025

capstone