Airflow 활용한 API 크롤링 및 이미지 다운로드 (M1, MacOS)

Page content

개요

  • Airflow 활용해서 이미지 다운로드 받기 예제

개발환경설정

  • MacOS, M1
  • Python uv 개발환경 설정

uv 설치

curl -LsSf https://astral.sh/uv/install.sh | sh

가상환경 설정

  • 다음과 같이 설정, 프로젝트 초기화 (Python 3.11 지정)
$ uv venv -p 3.11
$ source .venv/bin/activate
  • Python 버전 확인
$ python --version 

AIRFLOW_HOME 환경변수 지정

  • airflow는 환경변수에 예민하다.
  • 프로젝트 디렉터리 경로에서 다음과 같이 추가
$ export AIRFLOW_HOME=$(pwd)/airflow
$ echo $AIRFLOW_HOME
/Users/evan/Desktop/your/project/directory/airflow

설치 스크립트 작성

  • 파일명 : install_airflow.sh
AIRFLOW_VERSION=2.8.0

# Python 버전을 3.11로 고정 설정
PYTHON_VERSION="3.11"

CONSTRAINT_URL="https://raw.githubusercontent.com/apache/airflow/constraints-${AIRFLOW_VERSION}/constraints-${PYTHON_VERSION}.txt"
# For example this would install 3.0.0 with python 3.11: https://raw.githubusercontent.com/apache/airflow/constraints-3.0.0/constraints-3.11.txt

uv pip install "apache-airflow==${AIRFLOW_VERSION}" --constraint "${CONSTRAINT_URL}"
# PostgreSQL 제공자 패키지 버전 명시
uv pip install apache-airflow-providers-postgres==5.7.1
uv pip install -r requirements.txt
  • 파일명 : requirements.txt 파일 작성
pandas
numpy
seaborn
matplotlib
requests
  • sh 파일 실행 모드로 변경 후 실행
$ chmod +x install_airflow.sh
$ ./install_airflow.sh

API 확인 및 테스트

  • 다음 5개의 예정된 발사 정보 조회
$ curl "https://ll.thespacedevs.com/2.2.0/launch/upcoming/?limit=5"
{"count":338,"next":"https://ll.thespacedevs.com/2.2.0/launch/upcoming/?limit=5&offset=5","previous":null,"results":[{"id":"1a105ccb-e59f-48e8-b853-c424bd8cc699","url":"https://ll.thespacedevs.com/2.2.0/launch/1a105ccb-e59f-48e8-b853-c424bd8cc699/","slug":"falcon-9-block-5-starlink-group-6-75","name":"Falcon 9 Block 5 | Starlink Group 6-75","status":{"id":6,"name":"Launch in Flight","abbrev":"In Flight","description":"The launch vehicle has lifted off from the launchpad."},"last_updated":"2025-05-02T01:52:11Z","net":"2025-05-02T01:51:10Z","window_end":"2025-05-02T05:51:00Z","window_start":"2025-05-02T01:51:10Z","net_precision":{"id":0,"name":"Second","abbrev":"SEC","description":"The T-0 is accurate to the second."},"probability":99,"weather_concerns":null,"holdreason":"","failreason":"","hashtag":null,"launch_service_provider":{"id":121,"url":"https://ll.thespacedevs.com/2.2.0/agencies/121/","name":"SpaceX","type":"Commercial"},"rocket":{"id":8596,"configuration":{"id":164,"url":"https://ll.thespacedevs.com/2.2.0/config/launcher/164/","name":"Falcon 9","family":"Falcon","full_name":"Falcon 9 Block 5","variant":"Block 5"}},"mission":{"id":7188,"name":"Starlink Group 6-75","description":"A batch of 28 satellites for the Starlink mega-constellation - SpaceX's project for space-based Internet communication system.","launch_designator":null,"type":"Communications","orbit":{"id":8,"name":"Low Earth Orbit","abbrev":"LEO"},"agencies":[{"id":121,"url":"https://ll.thespacedevs.com/2.2.0/agencies/121/","name":"SpaceX","featured":true,"type":"Commercial","country_code":"USA","abbrev":"SpX","description":"Space Exploration Technologies Corp., known as SpaceX, is an American aerospace manufacturer and space transport services company headquartered in Hawthorne, California. It was founded in 2002 by entrepreneur Elon Musk with the goal of reducing space transportation costs and enabling the colonization of Mars. SpaceX operates from many pads, on the East Coast of the US they operate from SLC-40 at Cape Canaveral Space Force Station and historic LC-39A at Kennedy Space Center. They also operate from SLC-4E at Vandenberg Space Force Base, California, usually for polar launches. Another launch site is being developed at Boca Chica, Texas.","administrator":"CEO: Elon Musk","founding_year":"2002","launchers":"Falcon | Starship","spacecraft":"Dragon","launch_library_url":null,"total_launch_count":502,"consecutive_successful_launches":24,"successful_launches":487,"failed_launches":14,"pending_launches":118,"consecutive_successful_landings":71,"successful_landings":449,"failed_landings":26,"attempted_landings":474,"info_url":"http://www.spacex.com/","wiki_url":"http://en.wikipedia.org/wiki/SpaceX","logo_url":"https://thespacedevs-prod.nyc3.digitaloceanspaces.com/media/images/spacex_logo_20220826094919.png","image_url":"https://thespacedevs-prod.nyc3.digitaloceanspaces.com/media/images/spacex_image_20190207032501.jpeg","nation_url":"https://thespacedevs-prod.nyc3.digitaloceanspaces.com/media/images/spacex_nation_20230531064544.jpg"}],"info_urls":[],"vid_urls":[]},"pad":{"id":80,"url":"https://ll.thespacedevs.com/2.2.0/pad/80/","agency_id":121,"name":"Space Launch Complex 40","description":"","info_url":null,"wiki_url":"https://en.wikipedia.org/wiki/Cape_Canaveral_Air_Force_Station_Space_Launch_Complex_40","map_url":"https://www.google.com/maps?q=28.56194122,-80.57735736","latitude":"28.56194122","longitude":"-80.57735736","location":{"id":12,"url":"https://ll.thespacedevs.com/2.2.0/location/12/","name":"Cape Canaveral SFS, FL, USA","country_code":"USA","description":"Cape Canaveral Space Force Station (CCSFS) is an installation of the United States Space Force's Space Launch Delta 45, located on Cape Canaveral in Brevard County, Florida.","map_image":"https://thespacedevs-prod.nyc3.digitaloceanspaces.com/media/map_images/location_12_20200803142519.jpg","timezone_name":"America/New_York","total_launch_count":1020,"total_landing_count":64},"country_code":"USA","map_image":"https://thespacedevs-prod.nyc3.digitaloceanspaces.com/media/map_images/pad_80_20200803143323.jpg","total_launch_count":304,"orbital_launch_attempt_count":304},"webcast_live":true,"image":"https://thespacedevs-prod.nyc3.digitaloceanspaces.com/media/images/falcon2520925_image_20221009234147.png","infographic":null,"program":[{"id":25,"url":"https://ll.thespacedevs.com/2.2.0/program/25/","name":"Starlink","description":"Starlink is a satellite internet constellation operated by American aerospace company SpaceX","agencies":[{"id":121,"url":"https://ll.thespacedevs.com/2.2.0/agencies/121/","name":"SpaceX","type":"Commercial"}],"image_url":"https://thespacedevs-prod.nyc3.digitaloceanspaces.com/media/images/starlink_program_20231228154508.jpeg","start_date":"2018-02-22T14:17:00Z","end_date":null,"info_url":"https://starlink.com","wiki_url":"https://en.wikipedia.org/wiki/Starlink","mission_patches":[{"id":7,"name":"Space X Starlink Mission Patch","priority":10,"image_url":"https://thespacedevs-prod.nyc3.digitaloceanspaces.com/media/mission_patch_images/space2520x252_mission_patch_20221011205756.png","agency":{"id":121,"url":"https://ll.thespacedevs.com/2.2.0/agencies/121/","name":"SpaceX","type":"Commercial"}}],"type":{"id":3,"name":"Communication Constellation"}}],"orbital_launch_attempt_count":6943,"location_launch_attempt_count":1020,"pad_launch_attempt_count":304,"agency_launch_attempt_count":502,"orbital_launch_attempt_count_year":94,"location_launch_attempt_count_year":26,"pad_launch_attempt_count_year":24,"agency_launch_attempt_count_year":53,"type":"normal"},{"id":"7b685ef7-f610-413f-bd4a-cc58aed97be2","url":"https://ll.thespacedevs.com/2.2.0/launch/7b685ef7-f610-413f-bd4a-cc58aed97be2/","slug":"falcon-9-block-5-starlink-group-15-3","name":"Falcon 9 Block 5 | Starlink Group 15-3","status":{"id":8,"name":"To Be Confirmed","abbrev":"TBC","description":"Awaiting official confirmation - current date is known with some certainty."},"last_updated":"2025-04-30T19:54:30Z","net":"2025-05-03T18:13:00Z","window_end":"2025-05-03T22:13:00Z","window_start":"2025-05-03T18:13:00Z","net_precision":{"id":2,"name":"Hour","abbrev":"HR","description":"The T-0 is accurate to the hour."},"probability":null,"weather_concerns":null,"holdreason":"","failreason":"","hashtag":null,"launch_service_provider":{"id":121,"url":"https://ll.thespacedevs.com/2.2.0/agencies/121/","name":"SpaceX","type":"Commercial"},"rocket":{"id":8594,"configuration":{"id":164,"url":"https://ll.thespacedevs.com/2.2.0/config/launcher/164/","name":"Falcon 9","family":"Falcon","full_name":"Falcon 9 Block 5","variant":"Block 5"}},"mission":{"id":7186,"name":"Starlink Group 15-3","description":"A batch of 26 satellites for the Starlink mega-constellation - SpaceX's project for space-based Internet communication system.","launch_designator":null,"type":"Communications","orbit":{"id":8,"name":"Low Earth Orbit","abbrev":"LEO"},"agencies":[{"id":121,"url":"https://ll.thespacedevs.com/2.2.0/agencies/121/","name":"SpaceX","featured":true,"type":"Commercial","country_code":"USA","abbrev":"SpX","description":"Space Exploration Technologies Corp., known as SpaceX, is an American aerospace manufacturer and space transport services company headquartered in Hawthorne, California. It was founded in 2002 by entrepreneur Elon Musk with the goal of reducing space transportation costs and enabling the colonization of Mars. SpaceX operates from many pads, on the East Coast of the US they operate from SLC-40 at Cape Canaveral Space Force Station and historic LC-39A at Kennedy Space Center. They also operate from SLC-4E at Vandenberg Space Force Base, California, usually for polar launches. Another launch site is being developed at Boca Chica, Texas.","administrator":"CEO: Elon Musk","founding_year":"2002","launchers":"Falcon | Starship","spacecraft":"Dragon","launch_library_url":null,"total_launch_count":502,"consecutive_successful_launches":24,"successful_launches":487,"failed_launches":14,"pending_launches":118,"consecutive_successful_landings":71,"successful_landings":449,"failed_landings":26,"attempted_landings":474,"info_url":"http://www.spacex.com/","wiki_url":"http://en.wikipedia.org/wiki/SpaceX","logo_url":"https://thespacedevs-prod.nyc3.digitaloceanspaces.com/media/images/spacex_logo_20220826094919.png","image_url":"https://thespacedevs-prod.nyc3.digitaloceanspaces.com/media/images/spacex_image_20190207032501.jpeg","nation_url":"https://thespacedevs-prod.nyc3.digitaloceanspaces.com/media/images/spacex_nation_20230531064544.jpg"}],"info_urls":[],"vid_urls":[]},"pad":{"id":16,"url":"https://ll.thespacedevs.com/2.2.0/pad/16/","agency_id":null,"name":"Space Launch Complex 4E","description":"Space Launch Complex 4 East (SLC-4E) is a launch site at Vandenberg Space Force Base, California, U.S.\r\n\r\nThe pad was previously used by Atlas and Titan rockets between 1963 and 2005. The pad was built for use by Atlas-Agena rockets, but was later rebuilt to handle Titan rockets.","info_url":null,"wiki_url":"https://en.wikipedia.org/wiki/Vandenberg_Space_Launch_Complex_4#SLC-4E","map_url":"https://www.google.com/maps?q=34.632,-120.611","latitude":"34.632","longitude":"-120.611","location":{"id":11,"url":"https://ll.thespacedevs.com/2.2.0/location/11/","name":"Vandenberg SFB, CA, USA","country_code":"USA","description":"Vandenberg Space Force Base is a United States Space Force Base in Santa Barbara County, California. Established in 1941, Vandenberg Space Force Base is a space launch base, launching spacecraft from the Western Range, and also performs missile testing. The United States Space Force's Space Launch Delta 30 serves as the host delta for the base, equivalent to an Air Force air base wing. In addition to its military space launch mission, Vandenberg Space Force Base also hosts space launches for civil and commercial space entities, such as NASA and SpaceX.","map_image":"https://thespacedevs-prod.nyc3.digitaloceanspaces.com/media/map_images/location_11_20200803142416.jpg","timezone_name":"America/Los_Angeles","total_launch_count":804,"total_landing_count":26},"country_code":"USA","map_image":"https://thespacedevs-prod.nyc3.digitaloceanspaces.com/media/map_images/pad_16_20200803143532.jpg","total_launch_count":190,"orbital_launch_attempt_count":190},"webcast_live":false,"image":"https://thespacedevs-prod.nyc3.digitaloceanspaces.com/media/images/falcon2520925_image_20221009234147.png","infographic":null,"program":[{"id":25,"url":"https://ll.thespacedevs.com/2.2.0/program/25/","name":"Starlink","description":"Starlink is a satellite internet constellation operated by American aerospace company SpaceX","agencies":[{"id":121,"url":"https://ll.thespacedevs.com/2.2.0/agencies/121/","name":"SpaceX","type":"Commercial"}],"image_url":"https://thespacedevs-prod.nyc3.digitaloceanspaces.com/media/images/starlink_program_20231228154508.jpeg","start_date":"2018-02-22T14:17:00Z","end_date":null,"info_url":"https://starlink.com","wiki_url":"https://en.wikipedia.org/wiki/Starlink","mission_patches":[{"id":7,"name":"Space X Starlink Mission Patch","priority":10,"image_url":"https://thespacedevs-prod.nyc3.digitaloceanspaces.com/media/mission_patch_images/space2520x252_mission_patch_20221011205756.png","agency":{"id":121,"url":"https://ll.thespacedevs.com/2.2.0/agencies/121/","name":"SpaceX","type":"Commercial"}}],"type":{"id":3,"name":"Communication Constellation"}}],"orbital_launch_attempt_count":6944,"location_launch_attempt_count":805,"pad_launch_attempt_count":191,"agency_launch_attempt_count":503,"orbital_launch_attempt_count_year":95,"location_launch_attempt_count_year":19,"pad_launch_attempt_count_year":17,"agency_launch_attempt_count_year":54,"type":"normal"},{"id":"8ffb4866-43c9-46c1-aaac-05bd37891b0a","url":"https://ll.thespacedevs.com/2.2.0/launch/8ffb4866-43c9-46c1-aaac-05bd37891b0a/","slug":"falcon-9-block-5-starlink-group-6-84","name":"Falcon 9 Block 5 | Starlink Group 6-84","status":{"id":8,"name":"To Be Confirmed","abbrev":"TBC","description":"Awaiting official confirmation - current date is known with some certainty."},"last_updated":"2025-05-02T01:01:57Z","net":"2025-05-04T08:48:00Z","window_end":"2025-05-04T12:48:00Z","window_start":"2025-05-04T08:48:00Z","net_precision":{"id":2,"name":"Hour","abbrev":"HR","description":"The T-0 is accurate to the hour."},"probability":null,"weather_concerns":null,"holdreason":"","failreason":"","hashtag":null,"launch_service_provider":{"id":121,"url":"https://ll.thespacedevs.com/2.2.0/agencies/121/","name":"SpaceX","type":"Commercial"},"rocket":{"id":8598,"configuration":{"id":164,"url":"https://ll.thespacedevs.com/2.2.0/config/launcher/164/","name":"Falcon 9","family":"Falcon","full_name":"Falcon 9 Block 5","variant":"Block 5"}},"mission":{"id":7190,"name":"Starlink Group 6-84","description":"A batch of 29 satellites for the Starlink mega-constellation - SpaceX's project for space-based Internet communication system.","launch_designator":null,"type":"Communications","orbit":{"id":8,"name":"Low Earth Orbit","abbrev":"LEO"},"agencies":[{"id":121,"url":"https://ll.thespacedevs.com/2.2.0/agencies/121/","name":"SpaceX","featured":true,"type":"Commercial","country_code":"USA","abbrev":"SpX","description":"Space Exploration Technologies Corp., known as SpaceX, is an American aerospace manufacturer and space transport services company headquartered in Hawthorne, California. It was founded in 2002 by entrepreneur Elon Musk with the goal of reducing space transportation costs and enabling the colonization of Mars. SpaceX operates from many pads, on the East Coast of the US they operate from SLC-40 at Cape Canaveral Space Force Station and historic LC-39A at Kennedy Space Center. They also operate from SLC-4E at Vandenberg Space Force Base, California, usually for polar launches. Another launch site is being developed at Boca Chica, Texas.","administrator":"CEO: Elon Musk","founding_year":"2002","launchers":"Falcon | Starship","spacecraft":"Dragon","launch_library_url":null,"total_launch_count":502,"consecutive_successful_launches":24,"successful_launches":487,"failed_launches":14,"pending_launches":118,"consecutive_successful_landings":71,"successful_landings":449,"failed_landings":26,"attempted_landings":474,"info_url":"http://www.spacex.com/","wiki_url":"http://en.wikipedia.org/wiki/SpaceX","logo_url":"https://thespacedevs-prod.nyc3.digitaloceanspaces.com/media/images/spacex_logo_20220826094919.png","image_url":"https://thespacedevs-prod.nyc3.digitaloceanspaces.com/media/images/spacex_image_20190207032501.jpeg","nation_url":"https://thespacedevs-prod.nyc3.digitaloceanspaces.com/media/images/spacex_nation_20230531064544.jpg"}],"info_urls":[],"vid_urls":[]},"pad":{"id":87,"url":"https://ll.thespacedevs.com/2.2.0/pad/87/","agency_id":121,"name":"Launch Complex 39A","description":"","info_url":null,"wiki_url":"https://en.wikipedia.org/wiki/Kennedy_Space_Center_Launch_Complex_39#Launch_Pad_39A","map_url":"https://www.google.com/maps?q=28.60822681,-80.60428186","latitude":"28.60822681","longitude":"-80.60428186","location":{"id":27,"url":"https://ll.thespacedevs.com/2.2.0/location/27/","name":"Kennedy Space Center, FL, USA","country_code":"USA","description":"The John F. Kennedy Space Center, located on Merritt Island, Florida, is one of NASA's ten field centers. Since 1968, KSC has been NASA's primary launch center of American spaceflight, research, and technology. Launch operations for the Apollo, Skylab and Space Shuttle programs were carried out from Kennedy Space Center Launch Complex 39 and managed by KSC. Located on the east coast of Florida, KSC is adjacent to Cape Canaveral Space Force Station (CCSFS).","map_image":"https://thespacedevs-prod.nyc3.digitaloceanspaces.com/media/map_images/location_27_20200803142447.jpg","timezone_name":"America/New_York","total_launch_count":263,"total_landing_count":0},"country_code":"USA","map_image":"https://thespacedevs-prod.nyc3.digitaloceanspaces.com/media/map_images/pad_87_20200803143537.jpg","total_launch_count":205,"orbital_launch_attempt_count":204},"webcast_live":false,"image":"https://thespacedevs-prod.nyc3.digitaloceanspaces.com/media/images/falcon2520925_image_20221009234147.png","infographic":null,"program":[{"id":25,"url":"https://ll.thespacedevs.com/2.2.0/program/25/","name":"Starlink","description":"Starlink is a satellite internet constellation operated by American aerospace company SpaceX","agencies":[{"id":121,"url":"https://ll.thespacedevs.com/2.2.0/agencies/121/","name":"SpaceX","type":"Commercial"}],"image_url":"https://thespacedevs-prod.nyc3.digitaloceanspaces.com/media/images/starlink_program_20231228154508.jpeg","start_date":"2018-02-22T14:17:00Z","end_date":null,"info_url":"https://starlink.com","wiki_url":"https://en.wikipedia.org/wiki/Starlink","mission_patches":[{"id":7,"name":"Space X Starlink Mission Patch","priority":10,"image_url":"https://thespacedevs-prod.nyc3.digitaloceanspaces.com/media/mission_patch_images/space2520x252_mission_patch_20221011205756.png","agency":{"id":121,"url":"https://ll.thespacedevs.com/2.2.0/agencies/121/","name":"SpaceX","type":"Commercial"}}],"type":{"id":3,"name":"Communication Constellation"}}],"orbital_launch_attempt_count":6945,"location_launch_attempt_count":264,"pad_launch_attempt_count":206,"agency_launch_attempt_count":504,"orbital_launch_attempt_count_year":96,"location_launch_attempt_count_year":12,"pad_launch_attempt_count_year":12,"agency_launch_attempt_count_year":55,"type":"normal"},{"id":"82aef7fd-9664-4e94-970c-5e99eff1b331","url":"https://ll.thespacedevs.com/2.2.0/launch/82aef7fd-9664-4e94-970c-5e99eff1b331/","slug":"long-march-12-satnet-leo-group-tbd","name":"Long March 12 | SatNet LEO Group TBD?","status":{"id":1,"name":"Go for Launch","abbrev":"Go","description":"Current T-0 confirmed by official or reliable sources."},"last_updated":"2025-04-30T07:27:01Z","net":"2025-05-05T11:05:00Z","window_end":"2025-05-05T11:47:00Z","window_start":"2025-05-05T10:57:00Z","net_precision":{"id":2,"name":"Hour","abbrev":"HR","description":"The T-0 is accurate to the hour."},"probability":null,"weather_concerns":null,"holdreason":"","failreason":"","hashtag":null,"launch_service_provider":{"id":88,"url":"https://ll.thespacedevs.com/2.2.0/agencies/88/","name":"China Aerospace Science and Technology Corporation","type":"Government"},"rocket":{"id":8600,"configuration":{"id":517,"url":"https://ll.thespacedevs.com/2.2.0/config/launcher/517/","name":"Long March 12","family":"Long March","full_name":"Long March 12","variant":"12"}},"mission":{"id":7192,"name":"SatNet LEO Group TBD?","description":"A batch of Low Earth Orbit communication satellites for the Chinese state owned SatNet constellation operated by the China Satellite Network Group.\r\n\r\nThe constellation will eventually consists of 13000 satellites.","launch_designator":null,"type":"Communications","orbit":{"id":8,"name":"Low Earth Orbit","abbrev":"LEO"},"agencies":[],"info_urls":[],"vid_urls":[]},"pad":{"id":219,"url":"https://ll.thespacedevs.com/2.2.0/pad/219/","agency_id":null,"name":"Commercial LC-2","description":"","info_url":null,"wiki_url":"https://en.wikipedia.org/wiki/Wenchang_Commercial_Space_Launch_Site","map_url":"https://www.google.com/maps?q=19.59755,110.936481","latitude":"19.59755","longitude":"110.936481","location":{"id":8,"url":"https://ll.thespacedevs.com/2.2.0/location/8/","name":"Wenchang Space Launch Site, People's Republic of China","country_code":"CHN","description":"The Wenchang Space Launch Site is a rocket launch site located in Wenchang on the island of Hainan, in China.\r\n\r\nFormally a suborbital test center, it currently serves as China's southernmost spaceport. The site was selected for its low latitude, 19° north of the equator, allowing for larger payloads to be launched. It is capable of launching the Long March 5, the heaviest Chinese rocket. Unlike launch facilities on the mainland, Wenchang uses its seaport for deliveries.","map_image":"https://thespacedevs-prod.nyc3.digitaloceanspaces.com/media/map_images/location_8_20200803142445.jpg","timezone_name":"Asia/Shanghai","total_launch_count":38,"total_landing_count":0},"country_code":"CHN","map_image":"https://thespacedevs-prod.nyc3.digitaloceanspaces.com/media/map_images/pad_commercial_lc-2_20231225074048.jpg","total_launch_count":1,"orbital_launch_attempt_count":1},"webcast_live":false,"image":"https://thespacedevs-prod.nyc3.digitaloceanspaces.com/media/images/cz-12_on_its_la_image_20241128132937.jpg","infographic":null,"program":[],"orbital_launch_attempt_count":6946,"location_launch_attempt_count":39,"pad_launch_attempt_count":2,"agency_launch_attempt_count":521,"orbital_launch_attempt_count_year":97,"location_launch_attempt_count_year":5,"pad_launch_attempt_count_year":1,"agency_launch_attempt_count_year":20,"type":"normal"},{"id":"d5e8b971-0138-42d7-a6ba-7d43bf529d5e","url":"https://ll.thespacedevs.com/2.2.0/launch/d5e8b971-0138-42d7-a6ba-7d43bf529d5e/","slug":"falcon-9-block-5-starlink-group-6-93","name":"Falcon 9 Block 5 | Starlink Group 6-93","status":{"id":8,"name":"To Be Confirmed","abbrev":"TBC","description":"Awaiting official confirmation - current date is known with some certainty."},"last_updated":"2025-05-01T02:37:15Z","net":"2025-05-06T00:48:00Z","window_end":"2025-05-06T04:48:00Z","window_start":"2025-05-06T00:48:00Z","net_precision":{"id":2,"name":"Hour","abbrev":"HR","description":"The T-0 is accurate to the hour."},"probability":null,"weather_concerns":null,"holdreason":"","failreason":"","hashtag":null,"launch_service_provider":{"id":121,"url":"https://ll.thespacedevs.com/2.2.0/agencies/121/","name":"SpaceX","type":"Commercial"},"rocket":{"id":8599,"configuration":{"id":164,"url":"https://ll.thespacedevs.com/2.2.0/config/launcher/164/","name":"Falcon 9","family":"Falcon","full_name":"Falcon 9 Block 5","variant":"Block 5"}},"mission":{"id":7191,"name":"Starlink Group 6-93","description":"A batch of satellites for the Starlink mega-constellation - SpaceX's project for space-based Internet communication system.","launch_designator":null,"type":"Communications","orbit":{"id":8,"name":"Low Earth Orbit","abbrev":"LEO"},"agencies":[{"id":121,"url":"https://ll.thespacedevs.com/2.2.0/agencies/121/","name":"SpaceX","featured":true,"type":"Commercial","country_code":"USA","abbrev":"SpX","description":"Space Exploration Technologies Corp., known as SpaceX, is an American aerospace manufacturer and space transport services company headquartered in Hawthorne, California. It was founded in 2002 by entrepreneur Elon Musk with the goal of reducing space transportation costs and enabling the colonization of Mars. SpaceX operates from many pads, on the East Coast of the US they operate from SLC-40 at Cape Canaveral Space Force Station and historic LC-39A at Kennedy Space Center. They also operate from SLC-4E at Vandenberg Space Force Base, California, usually for polar launches. Another launch site is being developed at Boca Chica, Texas.","administrator":"CEO: Elon Musk","founding_year":"2002","launchers":"Falcon | Starship","spacecraft":"Dragon","launch_library_url":null,"total_launch_count":502,"consecutive_successful_launches":24,"successful_launches":487,"failed_launches":14,"pending_launches":118,"consecutive_successful_landings":71,"successful_landings":449,"failed_landings":26,"attempted_landings":474,"info_url":"http://www.spacex.com/","wiki_url":"http://en.wikipedia.org/wiki/SpaceX","logo_url":"https://thespacedevs-prod.nyc3.digitaloceanspaces.com/media/images/spacex_logo_20220826094919.png","image_url":"https://thespacedevs-prod.nyc3.digitaloceanspaces.com/media/images/spacex_image_20190207032501.jpeg","nation_url":"https://thespacedevs-prod.nyc3.digitaloceanspaces.com/media/images/spacex_nation_20230531064544.jpg"}],"info_urls":[],"vid_urls":[]},"pad":{"id":80,"url":"https://ll.thespacedevs.com/2.2.0/pad/80/","agency_id":121,"name":"Space Launch Complex 40","description":"","info_url":null,"wiki_url":"https://en.wikipedia.org/wiki/Cape_Canaveral_Air_Force_Station_Space_Launch_Complex_40","map_url":"https://www.google.com/maps?q=28.56194122,-80.57735736","latitude":"28.56194122","longitude":"-80.57735736","location":{"id":12,"url":"https://ll.thespacedevs.com/2.2.0/location/12/","name":"Cape Canaveral SFS, FL, USA","country_code":"USA","description":"Cape Canaveral Space Force Station (CCSFS) is an installation of the United States Space Force's Space Launch Delta 45, located on Cape Canaveral in Brevard County, Florida.","map_image":"https://thespacedevs-prod.nyc3.digitaloceanspaces.com/media/map_images/location_12_20200803142519.jpg","timezone_name":"America/New_York","total_launch_count":1020,"total_landing_count":64},"country_code":"USA","map_image":"https://thespacedevs-prod.nyc3.digitaloceanspaces.com/media/map_images/pad_80_20200803143323.jpg","total_launch_count":304,"orbital_launch_attempt_count":304},"webcast_live":false,"image":"https://thespacedevs-prod.nyc3.digitaloceanspaces.com/media/images/falcon2520925_image_20221009234147.png","infographic":null,"program":[{"id":25,"url":"https://ll.thespacedevs.com/2.2.0/program/25/","name":"Starlink","description":"Starlink is a satellite internet constellation operated by American aerospace company SpaceX","agencies":[{"id":121,"url":"https://ll.thespacedevs.com/2.2.0/agencies/121/","name":"SpaceX","type":"Commercial"}],"image_url":"https://thespacedevs-prod.nyc3.digitaloceanspaces.com/media/images/starlink_program_20231228154508.jpeg","start_date":"2018-02-22T14:17:00Z","end_date":null,"info_url":"https://starlink.com","wiki_url":"https://en.wikipedia.org/wiki/Starlink","mission_patches":[{"id":7,"name":"Space X Starlink Mission Patch","priority":10,"image_url":"https://thespacedevs-prod.nyc3.digitaloceanspaces.com/media/mission_patch_images/space2520x252_mission_patch_20221011205756.png","agency":{"id":121,"url":"https://ll.thespacedevs.com/2.2.0/agencies/121/","name":"SpaceX","type":"Commercial"}}],"type":{"id":3,"name":"Communication Constellation"}}],"orbital_launch_attempt_count":6947,"location_launch_attempt_count":1021,"pad_launch_attempt_count":305,"agency_launch_attempt_count":505,"orbital_launch_attempt_count_year":98,"location_launch_attempt_count_year":27,"pad_launch_attempt_count_year":25,"agency_launch_attempt_count_year":56,"type":"normal"}]}

현재까지 준비된 파일 구조

  • 파일 구조는 다음과 같다.
    • airflow 폴더 생성
    • dags 폴더 생성
    • dags 폴더 내에서 step06_rocket_image_download_filename.py 파일
tree
.
├── README.md
├── airflow
│   └── dags
│       ├── step06_rocket_image_download_filename.py
├── install_airflow.sh
└── requirements.txt

파이썬 파일 작업

  • 파일명 : step06_rocket_image_download_filename.py
  • 코드는 다음과 같다.
from datetime import datetime, timedelta
from airflow import DAG
from airflow.operators.python import PythonOperator
from airflow.operators.bash import BashOperator
import requests
import os
from urllib.parse import urlparse
import urllib.parse
import json

# macOS에서 Airflow 네트워크 요청 문제 해결
os.environ['NO_PROXY'] = '*'

# DAG의 기본 인자 설정
default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'start_date': datetime(2024, 1, 1),
    'email_on_failure': False,
    'email_on_retry': False,
    'retries': 1,
    'retry_delay': timedelta(minutes=1),
}

def setup_airflow_home():
    """Airflow 홈 디렉토리 설정 및 환경 변수 설정"""
    try:
        # 현재 작업 디렉토리 확인
        current_dir = os.getcwd()
        # Airflow 홈 디렉토리 설정
        airflow_home = os.path.join(current_dir, 'airflow')
        # 환경 변수 설정
        os.environ['AIRFLOW_HOME'] = airflow_home
        print(f"AIRFLOW_HOME 설정됨: {airflow_home}")
        return "AIRFLOW_HOME 환경 변수 설정 완료"
    except Exception as e:
        print(f"AIRFLOW_HOME 설정 중 오류 발생: {str(e)}")
        raise

def get_launch_images():
    """Launch Library 2 API에서 로켓 발사 이미지 URL을 가져오는 함수"""
    api_url = "https://ll.thespacedevs.com/2.2.0/launch/upcoming/?limit=5"
    print(f"API 요청 시작: {api_url}")
    
    try:
        print(f"NO_PROXY 환경변수: {os.environ.get('NO_PROXY')}")
        response = requests.get(api_url, timeout=30)
        print(f"API 응답 받음: 상태 코드 {response.status_code}")
        
        if response.status_code == 200:
            data = response.json()
            launches = data['results']
            image_urls = [launch['image'] for launch in launches if launch.get('image')]
            print(f"총 {len(image_urls)}개의 이미지 URL 찾음")
            return image_urls
        else:
            raise Exception(f"API 요청 실패: {response.status_code}")
            
    except Exception as e:
        print(f"API 요청 중 오류 발생: {str(e)}")
        return [
            "https://spacelaunchnow-prod-east.nyc3.digitaloceanspaces.com/media/launch_images/falcon2520925_image_20230804070848.jpg",
            "https://spacelaunchnow-prod-east.nyc3.digitaloceanspaces.com/media/launcher_images/falcon_9_block__image_20210506060831.jpg"
        ]

def create_rocket_images_dir():
    """rocket_images 디렉토리 생성 - airflow 디렉토리 내에만 생성"""
    try:
        airflow_home = os.environ.get('AIRFLOW_HOME')
        if not airflow_home:
            raise Exception("AIRFLOW_HOME 환경변수가 설정되지 않았습니다.")
        
        # rocket_images 디렉토리 경로 설정 - airflow 디렉토리 내에 생성
        image_dir = os.path.join(airflow_home, 'rocket_images')
        os.makedirs(image_dir, exist_ok=True)
        print(f"rocket_images 디렉토리 생성됨: {image_dir}")
        return image_dir
    except Exception as e:
        print(f"디렉토리 생성 중 오류 발생: {str(e)}")
        raise

def create_output_dir():
    """output 디렉토리 생성 - airflow 디렉토리 내에만 생성"""
    try:
        airflow_home = os.environ.get('AIRFLOW_HOME')
        if not airflow_home:
            raise Exception("AIRFLOW_HOME 환경변수가 설정되지 않았습니다.")
        
        # output 디렉토리 경로 설정 - airflow 디렉토리 내에 생성
        output_dir = os.path.join(airflow_home, 'output')
        os.makedirs(output_dir, exist_ok=True)
        print(f"output 디렉토리 생성됨: {output_dir}")
        return output_dir
    except Exception as e:
        print(f"디렉토리 생성 중 오류 발생: {str(e)}")
        raise

def download_json_data():
    """Launch Library 2 API에서 JSON 데이터를 다운로드하는 함수"""
    try:
        print("JSON 데이터 다운로드 시작")
        api_url = "https://ll.thespacedevs.com/2.2.0/launch/upcoming/?limit=5"
        response = requests.get(api_url, timeout=30)
        
        if response.status_code == 200:
            data = response.json()
            
            # output 디렉토리에 JSON 파일 저장
            output_dir = create_output_dir()
            json_file_path = os.path.join(output_dir, 'launch_data.json')
            
            with open(json_file_path, 'w', encoding='utf-8') as f:
                json.dump(data, f, ensure_ascii=False, indent=4)
            
            print(f"JSON 데이터 저장 완료: {json_file_path}")
            return data
        else:
            raise Exception(f"API 요청 실패: {response.status_code}")
            
    except Exception as e:
        print(f"JSON 데이터 다운로드 중 오류 발생: {str(e)}")
        raise

def download_images():
    """로켓 발사 이미지를 다운로드하는 함수"""
    try:
        print("download_images 함수 시작")
        # JSON 데이터 가져오기
        data = download_json_data()
        launches = data['results']
        
        # AIRFLOW_HOME 환경변수 가져오기
        airflow_home = os.environ.get('AIRFLOW_HOME')
        if not airflow_home:
            raise Exception("AIRFLOW_HOME 환경변수가 설정되지 않았습니다.")
        
        # rocket_images 디렉토리 경로 설정 - airflow 디렉토리 내에 생성
        image_dir = os.path.join(airflow_home, 'rocket_images')
        os.makedirs(image_dir, exist_ok=True)
        print(f"저장 경로: {os.path.abspath(image_dir)}")
        
        downloaded_paths = []
        for idx, launch in enumerate(launches):
            try:
                if not launch.get('image'):
                    continue
                    
                url = launch['image']
                print(f"이미지 다운로드 시도 {idx+1}: {url}")
                response = requests.get(url, timeout=10)
                
                if response.status_code == 200:
                    # 파일 이름 생성 (로켓 이름과 날짜 사용)
                    rocket_name = launch.get('name', f'rocket_{idx + 1}')
                    launch_date = launch.get('net', '').split('T')[0]  # 날짜만 추출
                    safe_name = "".join(c for c in rocket_name if c.isalnum() or c in (' ', '-', '_')).rstrip()
                    file_extension = os.path.splitext(urlparse(url).path)[1] or '.jpg'
                    file_name = f'{safe_name}_{launch_date}{file_extension}'
                    file_path = os.path.join(image_dir, file_name)
                    
                    with open(file_path, 'wb') as f:
                        f.write(response.content)
                    downloaded_paths.append(file_path)
                    print(f"이미지 저장 성공: {file_path}")
                else:
                    print(f"이미지 다운로드 실패 - 상태 코드: {response.status_code}")
            
            except Exception as e:
                print(f"개별 이미지 다운로드 실패 (URL: {url}): {str(e)}")
        
        return f"총 {len(downloaded_paths)}개의 이미지 다운로드 완료"
        
    except Exception as e:
        print(f"전체 프로세스 실패: {str(e)}")
        return f"오류 발생: {str(e)}"

# DAG 정의
dag = DAG(
    'step06_rocket_image_download',
    default_args=default_args,
    description='로켓 발사 이미지 다운로드',
    schedule_interval=timedelta(days=1),
    catchup=False
)

# AIRFLOW_HOME 설정 태스크
setup_env_task = PythonOperator(
    task_id='setup_airflow_home',
    python_callable=setup_airflow_home,
    dag=dag
)

# Hello Airflow 태스크
hello_task = BashOperator(
    task_id='hello_task',
    bash_command='echo "Hello Airflow" && echo "AIRFLOW_HOME: $AIRFLOW_HOME"',
    dag=dag
)

# 디렉토리 생성 태스크
create_dir_task = PythonOperator(
    task_id='create_dir_task',
    python_callable=create_rocket_images_dir,
    dag=dag
)

# JSON 데이터 다운로드 태스크
download_json_task = PythonOperator(
    task_id='download_json_data',
    python_callable=download_json_data,
    dag=dag
)

# 이미지 다운로드 태스크
download_task = PythonOperator(
    task_id='download_rocket_images',
    python_callable=download_images,
    dag=dag
)

# Task 의존성 설정
setup_env_task >> hello_task >> create_dir_task >> download_json_task >> download_task

테스트

  • 파일 작성이 완료가 되면 Airflow 실행을 할 것이다.
$ airflow db reset -y
$ airflow db init
  • 사용자 생성
$ airflow users create \
    --username admin \
    --firstname admin \
    --lastname admin \
    --role Admin \
    --email admin@example.com \
    --password 1234
  • 웹서버 실행
$ airflow webserver -p 8080
  • 다른 bash 터미널 열고 진행
$ export AIRFLOW_HOME=$(pwd)/airflow
$ airflow scheduler

Screenshot 2025-05-02 at 11.49.42 AM.png

최종 파일 구조

  • 파일 구조는 다음과 같다.
$ tree 
.
├── README.md
├── airflow
│   ├── airflow-webserver.pid
│   ├── airflow.cfg
│   ├── airflow.db
│   ├── dags
│   │   ├── __pycache__
│   │   │   ├── get_data_dag.cpython-311.pyc
│   │   │   ├── hello_world_dag.cpython-311.pyc
│   │   │   ├── step04_feature_engineering_dag.cpython-311.pyc
│   │   │   ├── step05_rocket_image_download.cpython-311.pyc
│   │   │   ├── step06_rocket_image_download_filename.cpython-311.pyc
│   │   │   └── test_file_dag.cpython-311.pyc
│   │   ├── get_data_dag.py
│   │   ├── hello_world_dag.py
│   │   ├── step04_feature_engineering_dag.py
│   │   ├── step05_rocket_image_download.py
│   │   ├── step06_rocket_image_download_filename.py
│   │   └── test_file_dag.py
│   ├── logs
│   │   ├── dag_id=step06_rocket_image_download
│   │   │   ├── run_id=manual__2025-05-02T02:48:32.427761+00:00
│   │   │   │   ├── task_id=create_dir_task
│   │   │   │   │   └── attempt=1.log
│   │   │   │   ├── task_id=download_json_data
│   │   │   │   │   └── attempt=1.log
│   │   │   │   ├── task_id=download_rocket_images
│   │   │   │   │   └── attempt=1.log
│   │   │   │   ├── task_id=hello_task
│   │   │   │   │   └── attempt=1.log
│   │   │   │   └── task_id=setup_airflow_home
│   │   │   │       └── attempt=1.log
│   │   │   └── run_id=scheduled__2025-05-01T00:00:00+00:00
│   │   │       ├── task_id=create_dir_task
│   │   │       │   └── attempt=1.log
│   │   │       ├── task_id=download_json_data
│   │   │       │   └── attempt=1.log
│   │   │       ├── task_id=download_rocket_images
│   │   │       │   └── attempt=1.log
│   │   │       ├── task_id=hello_task
│   │   │       │   └── attempt=1.log
│   │   │       └── task_id=setup_airflow_home
│   │   │           └── attempt=1.log
│   │   ├── dag_processor_manager
│   │   │   └── dag_processor_manager.log
│   │   └── scheduler
│   │       ├── 2025-05-02
│   │       │   ├── get_data_dag.py.log
│   │       │   ├── hello_world_dag.py.log
│   │       │   ├── native_dags
│   │       │   │   └── example_dags
│   │       │   │       ├── example_bash_operator.py.log
│   │       │   │       ├── example_branch_datetime_operator.py.log
│   │       │   │       ├── example_branch_day_of_week_operator.py.log
│   │       │   │       ├── example_branch_labels.py.log
│   │       │   │       ├── example_branch_operator.py.log
│   │       │   │       ├── example_branch_operator_decorator.py.log
│   │       │   │       ├── example_branch_python_dop_operator_3.py.log
│   │       │   │       ├── example_complex.py.log
│   │       │   │       ├── example_dag_decorator.py.log
│   │       │   │       ├── example_datasets.py.log
│   │       │   │       ├── example_dynamic_task_mapping.py.log
│   │       │   │       ├── example_dynamic_task_mapping_with_no_taskflow_operators.py.log
│   │       │   │       ├── example_external_task_marker_dag.py.log
│   │       │   │       ├── example_kubernetes_executor.py.log
│   │       │   │       ├── example_latest_only.py.log
│   │       │   │       ├── example_latest_only_with_trigger.py.log
│   │       │   │       ├── example_local_kubernetes_executor.py.log
│   │       │   │       ├── example_nested_branch_dag.py.log
│   │       │   │       ├── example_params_trigger_ui.py.log
│   │       │   │       ├── example_params_ui_tutorial.py.log
│   │       │   │       ├── example_passing_params_via_test_command.py.log
│   │       │   │       ├── example_python_decorator.py.log
│   │       │   │       ├── example_python_operator.py.log
│   │       │   │       ├── example_sensor_decorator.py.log
│   │       │   │       ├── example_sensors.py.log
│   │       │   │       ├── example_setup_teardown.py.log
│   │       │   │       ├── example_setup_teardown_taskflow.py.log
│   │       │   │       ├── example_short_circuit_decorator.py.log
│   │       │   │       ├── example_short_circuit_operator.py.log
│   │       │   │       ├── example_skip_dag.py.log
│   │       │   │       ├── example_sla_dag.py.log
│   │       │   │       ├── example_subdag_operator.py.log
│   │       │   │       ├── example_task_group.py.log
│   │       │   │       ├── example_task_group_decorator.py.log
│   │       │   │       ├── example_time_delta_sensor_async.py.log
│   │       │   │       ├── example_trigger_controller_dag.py.log
│   │       │   │       ├── example_trigger_target_dag.py.log
│   │       │   │       ├── example_xcom.py.log
│   │       │   │       ├── example_xcomargs.py.log
│   │       │   │       ├── plugins
│   │       │   │       │   ├── event_listener.py.log
│   │       │   │       │   ├── listener_plugin.py.log
│   │       │   │       │   └── workday.py.log
│   │       │   │       ├── subdags
│   │       │   │       │   └── subdag.py.log
│   │       │   │       ├── tutorial.py.log
│   │       │   │       ├── tutorial_dag.py.log
│   │       │   │       ├── tutorial_objectstorage.py.log
│   │       │   │       ├── tutorial_taskflow_api.py.log
│   │       │   │       └── tutorial_taskflow_api_virtualenv.py.log
│   │       │   ├── step04_feature_engineering_dag.py.log
│   │       │   ├── step05_rocket_image_download.py.log
│   │       │   ├── step06_rocket_image_download_filename.py.log
│   │       │   └── test_file_dag.py.log
│   │       └── latest -> /Users/evan/programming_edu/airflow_tutorial/evan_airflow_tutorial/chapter06/airflow/logs/scheduler/2025-05-02
│   ├── output
│   │   └── launch_data.json
│   ├── rocket_images
│   │   ├── Falcon 9 Block 5  Starlink Group 15-3_2025-05-03.png
│   │   ├── Falcon 9 Block 5  Starlink Group 6-75_2025-05-02.png
│   │   ├── Falcon 9 Block 5  Starlink Group 6-84_2025-05-04.png
│   │   ├── Falcon 9 Block 5  Starlink Group 6-93_2025-05-06.png
│   │   └── Long March 12  SatNet LEO Group TBD_2025-05-05.jpg
│   └── webserver_config.py
├── get_launch_images.py
├── install_airflow.sh
└── requirements.txt