본문 바로가기
T./MongoDB

[Data Lake]Atlas Data Federation

by IT Journeyman 2024. 7. 30.

TL;DR

사용자가 AWS S3 Bucket에 JSON이나 Parquet 파일을 올려서 MongoDB(Atlas만 지원)에서 External Table로 사용할 수 있는 기능입니다. 로그 파일이나 외부 데이터와 연계를 하여 데이터 파이프라인으로 활용하기 좋은 기술입니다.

(Atlas를 ODL(Operational Data Layer, ODS를 MongoDB에서 부르는 이름)로 사용하고 있으면 더 효과가 클 것 같습니다.)
(본문 깁니다, 바쁘신 분은 그림(2-14 포함)만 보시면 됩니다.)

IT Journeyman

요즘 Fast Object Storage가 전통적 Analytic에서 많이 제안되고 있는가 봅니다.
MongoDB의 이 기능이 S3 호환 스토리지도 지원했으면 더 좋았을 것 같은데 아쉬운 부분입니다.

(전통적 Analytic : On-Prem으로 구축되는 DW, MF Vertica, Teradata, VMware Greenplum 그리고 Oracle Exadata)
(2024.07.31 기준, AWS S3와 Azure Blob 지원)

(Fast Object Storage : S3를 지원하는 SSD 기반의 스토리지 혹은 서비스)



 

Table of Contents

 

1. MongoDB Atlas Data Lake Overview

1-1 Atlas Data Federation(MongoDB Atlas Data Lake)

1-2 AWS Regions

1-3 Billing

2. Data Federation(Atlas Data Lake) 설정하기

2-1 Data Federation > Create New Federation Database > Set  up Manually

2-2 Federated Database Instance Name - Atlas UI

2-3 Add Data Sources - Atlas UI

2-4 Amazon S3 - Atlas UI

2-4 Configure AWS S3 Data Store - Atlas UI

2-5 Create New Role with the AWS CLI 1 - Atlas UI

2-6 Create New Role with the AWS CLI 2 - Atlas UI

2-7 Create New Role with the AWS CLI 3 - AWS CLI

2-8 AWS IAM roles 확인 - AWS Console

2-9 AWS IAM roles ARN 확인 - AWS Console

2-10 Type AWS role ARN and click Next - Atlas UI

2-11 Create S3 Bucket - AWS console

2-12 Enter S3 Bucket and Next - Atlas UI

2-13 Assign an access policy to your AWS IAM role - AWS CLI

2-14 S3 Bucket Layout

2-15 Define Path for S3 Data - Atlas UI

2-16 Connect to Data Federation - Atlas UI

2-17 Check S3 Data - MongoDB Compass

2-18 Data Query - mongosh CLI

 

 

1. MongoDB Atlas Data Lake Overview

( Link : Set Up and Query Data Federation - MongoDB Atlas)

1-1 Atlas Data Federation(MongoDB Atlas Data Lake)

MongoDB Atlas Data Lake is now an analytic-optimized object storage service for extracted data. Atlas Data Lake provides an analytic storage service optimized for flat or nested data with low latency query performance.

1-2 AWS Regions




1-3 Billing

2. Data Federation(Atlas Data Lake) 설정하기

2-1 Data Federation > Create New Federation Database > Set  up Manually

2-2 Federated Database Instance Name - Atlas UI

2-3 Add Data Sources - Atlas UI

2-4 Amazon S3 - Atlas UI

2-4 Configure AWS S3 Data Store - Atlas UI

2-5 Create New Role with the AWS CLI 1 - Atlas UI

 

2-6 Create New Role with the AWS CLI 2 - Atlas UI

2-7 Create New Role with the AWS CLI 3 - AWS CLI

2-8 AWS IAM roles 확인 - AWS Console

 

2-9 AWS IAM roles ARN 확인 - AWS Console

 

2-10 Type AWS role ARN and click Next - Atlas UI

 

2-11 Create S3 Bucket - AWS console

2-12 Enter S3 Bucket and Next - Atlas UI



2-13 Assign an access policy to your AWS IAM role - AWS CLI

2-14 S3 Bucket Layout

221,771 건의 데이터를 1000건으로 분리한 225개 파일로 구성.
(split -l 1000 trees.json trees-chunk-)
아래 그림처럼 폴더의 Tree 구조로도 Data Lake 구성 및 변경 가능

 

 

2-15 Define Path for S3 Data - Atlas UI

(Link Define Path for S3 Data - MongoDB Atlas)

 

2-16 Connect to Data Federation - Atlas UI

2-17 Check S3 Data - MongoDB Compass

2-18 Data Query - mongosh CLI

 

AtlasDataFederation test> use trees

switched to db trees

AtlasDataFederation trees> db.trees.countDocuments();

221771

AtlasDataFederation trees>

 

AtlasDataFederation trees> match={$match:{circonferenceCM:{$gt:40}}}

{ '$match': { circonferenceCM: { '$gt': 40 } } }

AtlasDataFederation trees> group={$group:{_id:"$libelleFrancais",total:{$sum:1}}}

{ '$group': { _id: '$libelleFrancais', total: { '$sum': 1 } } }

AtlasDataFederation trees> db.trees.aggregate([group]);

[

  { _id: 'Paulownia', total: 1513 },

  { _id: 'Cèdre', total: 988 },

  { _id: 'Prunier à fruits', total: 102 },

  { _id: 'Lilas', total: 10 },

  { _id: 'Camphrier', total: 1 },

  { _id: 'Goyavier', total: 1 },

  { _id: 'Prunus n. sp.', total: 741 },

  { _id: 'Noisetier', total: 70 },

  { _id: 'Aubepine', total: 220 },

  { _id: 'Tapiscia', total: 2 },

  { _id: 'Marronnier', total: 27081 },

  { _id: 'Chicot du Canada', total: 174 },

  { _id: 'Maackie', total: 1 },

  { _id: 'Photinia', total: 86 },

  { _id: 'Sterculier', total: 14 },

  { _id: 'Clerodendron', total: 47 },

  { _id: 'Idesia', total: 1 },

  { _id: 'Callistemon', total: 1 },

  { _id: 'Poirier à fleurs', total: 3411 },

  { _id: 'If', total: 2278 }

]

 

'T. > MongoDB' 카테고리의 다른 글

[DB Internel]killSessions, 누가 이거 돌렸어 !  (0) 2024.08.16
[DB Internal]Deadlock  (0) 2024.08.07
[Sizing]Extended Storage Sizes  (0) 2024.07.27
[Performance]readPreference Test  (2) 2024.07.24
Atlas Chart Demo w/ Lookup(Join)  (0) 2024.07.22