-
Notifications
You must be signed in to change notification settings - Fork 123
Data-35022 : On-Demand Scans Configured for Sensitive Information Discovery #760
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
ee6ddea
5e93907
4571511
41d7346
5bf589c
4ca3b72
22bedca
9fe6d2a
76ae416
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,17 @@ | ||
| Organizations with large volumes of historical content in SharePoint, OneDrive, and Exchange that predates auto-labeling policy implementation lack visibility into the extent of unclassified sensitive data across their tenants. Auto-labeling policies only classify new and modified content going forward; existing files and emails remain unclassified and invisible to data loss prevention policies that depend on label detection. Without on-demand scans, organizations cannot perform a baseline assessment of sensitive information already present in their environments, making it impossible to quantify compliance risk, plan remediation, or validate that DLP controls are effectively protecting all sensitive data. On-demand scans allow organizations to manually trigger sensitive information type detection across specified SharePoint sites, OneDrive accounts, and Exchange mailboxes, identifying where sensitive data exists and enabling targeted classification through retroactive labeling. Configuring at least one on-demand scan enables organizations to discover and classify historical sensitive data, providing a comprehensive view of their information protection posture beyond the forward-looking coverage of auto-labeling policies and creating a complete baseline for compliance and risk management. | ||
|
|
||
| **Remediation action** | ||
|
|
||
| To configure on-demand scans for sensitive information discovery and classification, follow these steps: | ||
| 1. **Plan your scan strategy** by identifying locations with historical sensitive data (finance, HR, legal departments) that predate auto-labeling policies. | ||
| 2. **Access the scan creation interface** in the Microsoft Purview Portal: Information Protection > Classifiers > On-demand classification OR Data Loss Prevention > Classifiers > On-demand classification. | ||
| 3. **Select target locations** (specific SharePoint sites, OneDrive accounts, and/or Exchange mailboxes) and **choose sensitive information types to detect** (credit card numbers, SSNs, healthcare identifiers, trade secrets). | ||
| 4. **Configure scan settings** including confidence thresholds (lower = more matches but higher false positives; higher = fewer false positives but may miss data) and file type filters. For trainable classifiers, ensure high-quality training data. | ||
| 5. **Schedule or run the scan** immediately for baseline scans or set recurring schedules (daily/weekly/monthly). Note: Large scans can take days or weeks depending on data volume and may impact resource utilization. | ||
| 6. **Monitor progress and analyze results** by tracking completion in the Microsoft Purview Portal. Upon completion, identify sensitive data locations, review prevalence by type, and determine remediation actions. | ||
|
|
||
| - [On-demand classification in Microsoft Purview](https://learn.microsoft.com/en-us/purview/on-demand-classification) | ||
| - [Sensitive information types entity reference](https://learn.microsoft.com/en-us/purview/sensitive-information-type-entity-definitions) | ||
|
|
||
| <!--- Results ---> | ||
| %TestResult% |
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,328 @@ | ||||||
| <# | ||||||
| .SYNOPSIS | ||||||
| On-Demand Scans Configured for Sensitive Information Discovery | ||||||
|
|
||||||
| .DESCRIPTION | ||||||
| Checks if on-demand scans are configured for sensitive information discovery in | ||||||
| SharePoint, OneDrive, and Exchange. Implements dynamic SIT GUID -> friendly name | ||||||
| resolution and generates a markdown result suitable for inclusion in test reports. | ||||||
|
|
||||||
| Reference: https://learn.microsoft.com/en-us/purview/on-demand-classification | ||||||
|
|
||||||
| .NOTES | ||||||
| Test ID: 35022 | ||||||
| Pillar: Data | ||||||
| Risk Level: Medium | ||||||
| User Impact: Low | ||||||
| Implementation Cost: Medium | ||||||
| #> | ||||||
|
|
||||||
| function Test-Assessment-35022 { | ||||||
| [ZtTest( | ||||||
| Category = 'Information Protection', | ||||||
| ImplementationCost = 'Medium', | ||||||
| MinimumLicense = 'Microsoft 365 E5', | ||||||
| Pillar = 'Data', | ||||||
| RiskLevel = 'Medium', | ||||||
| SfiPillar = 'Protect tenants and production systems', | ||||||
| TenantType = 'Workforce', | ||||||
| TestId = 35022, | ||||||
| Title = 'On-Demand Scans Configured for Sensitive Information Discovery', | ||||||
| UserImpact = 'Low' | ||||||
| )] | ||||||
| [CmdletBinding()] | ||||||
| param() | ||||||
|
|
||||||
| #region Data Collection | ||||||
| Write-PSFMessage '🟦 Start' -Tag Test -Level VeryVerbose | ||||||
|
|
||||||
| $activity = 'Checking On-Demand Scans Configured for Sensitive Information Discovery' | ||||||
| Write-ZtProgress -Activity $activity -Status 'Getting SIT Catalog' | ||||||
|
|
||||||
| $sitGuidMap = @{} | ||||||
| $scansList = $null | ||||||
| $errorMsg = $null | ||||||
|
|
||||||
| try { | ||||||
| # Build dynamic SIT catalog from tenant | ||||||
| $sitCatalog = Get-DlpSensitiveInformationType -ErrorAction Stop | ||||||
| foreach ($sit in $sitCatalog) { | ||||||
| try { | ||||||
| $id = $null | ||||||
| $id = $sit.Identity | ||||||
| $name = $sit.Name | ||||||
| $sitGuidMap[$id] = $name | ||||||
| } | ||||||
| catch { | ||||||
| # Ignore individual SIT failures, continue | ||||||
| } | ||||||
| } | ||||||
| } | ||||||
| catch { | ||||||
| Write-PSFMessage "Warning: Failed to build SIT catalog from tenant: $($_.Exception.Message)" -Level Warning | ||||||
| } | ||||||
|
|
||||||
| # Fallback common SIT mapping | ||||||
| $fallbackMap = @{ | ||||||
| '50842eb7-edc8-4019-85dd-5a5c1f2bb085' = 'Credit Card Number' | ||||||
| 'a44669fe-0d48-453d-a9b1-2cc83f2cba77' = 'U.S. Social Security Number (SSN)' | ||||||
| 'ed36cf51-9d63-40f3-a9a6-5a865c418d21' = 'U.S. Bank Account Number' | ||||||
| '48ee9090-3f74-4238-89c9-6c0a93767a8f' = 'SWIFT Code' | ||||||
| '50f56e32-3a6f-459f-82e9-e2b27b96b430' = 'Drivers License Number (U.S.)' | ||||||
| '65ce4b3d-79b3-46c0-ba9d-8226d98130c8' = 'IBAN (International Banking Account Number)' | ||||||
| '3b35900d-fd2d-446b-b3ad-b4723419e2d5' = 'ABA Routing Number' | ||||||
| 'f3dbc5dd-e2d4-4487-b43c-ebd87f349aa4' = 'Canada Social Insurance Number' | ||||||
| 'f87b75b6-570d-465d-a91a-f0d9b9e0b000' = 'U.K. National Insurance Number (NINO)' | ||||||
| 'b3a2fd72-cc1b-40fc-b0dc-6c5ca0e00f6f' = 'International Medical Record Number (MRN)' | ||||||
| } | ||||||
|
|
||||||
| Write-ZtProgress -Activity $activity -Status 'Getting On-Demand Scans' | ||||||
|
|
||||||
| try { | ||||||
| $scansList = Get-SensitiveInformationScan -ErrorAction Stop | ||||||
| } | ||||||
| catch { | ||||||
| $errorMsg = $_ | ||||||
| Write-PSFMessage "Error querying on-demand scans: $_" -Level Error | ||||||
| } | ||||||
| #endregion Data Collection | ||||||
|
|
||||||
| #region Assessment Logic | ||||||
| $scanCount = 0 | ||||||
| $passed = $false | ||||||
| $tableData = @() | ||||||
| $statusCounts = @{} | ||||||
| $hasSharePoint = 0 | ||||||
| $hasOneDrive = 0 | ||||||
| $hasExchange = 0 | ||||||
| $customStatus = $null | ||||||
| $mostRecentScan = $null | ||||||
|
|
||||||
| if ($errorMsg) { | ||||||
| $passed = $false | ||||||
| } | ||||||
| else { | ||||||
| $scanCount = @($scansList).Count | ||||||
| $passed = $scanCount -ge 1 | ||||||
|
|
||||||
| if ($scanCount -gt 0) { | ||||||
| foreach ($scan in $scansList) { | ||||||
| # Use scan object directly - already contains full details from Get-SensitiveInformationScan | ||||||
| # Normalize fields | ||||||
| $name = $scan.Name | ||||||
| $status = $scan.SensitiveInformationScanStatus | ||||||
|
|
||||||
| # Workload may be string or array | ||||||
| $workload = '' | ||||||
| if ($scan.Workload -is [System.Collections.IEnumerable] -and -not ($scan.Workload -is [string])) { | ||||||
| $workload = ($scan.Workload -join ', ') | ||||||
| } | ||||||
| else { | ||||||
| $workload = $scan.Workload | ||||||
| } | ||||||
|
|
||||||
| # Parse ItemStatistics.SIT | ||||||
| $sitDetails = @() | ||||||
| try { | ||||||
| if ($scan.ItemStatistics -and $scan.ItemStatistics.SIT) { | ||||||
| $sits = $scan.ItemStatistics.SIT | ||||||
|
|
||||||
| # Determine SIT keys depending on object type | ||||||
| if ($sits -is [System.Collections.IDictionary]) { | ||||||
| $sitKeys = $sits.Keys | ||||||
| } | ||||||
| elseif ($sits -is [PSCustomObject]) { | ||||||
| $sitKeys = $sits.PSObject.Properties | ForEach-Object { $_.Name } | ||||||
| } | ||||||
| else { | ||||||
| $sitKeys = @() | ||||||
| } | ||||||
|
|
||||||
| foreach ($guid in $sitKeys) { | ||||||
| $guidString = $guid.ToString().Trim() | ||||||
|
|
||||||
| # Obtain count for this GUID | ||||||
| $count = 0 | ||||||
| if ($sits -is [System.Collections.IDictionary]) { | ||||||
| $count = $sits[$guid] | ||||||
| } | ||||||
| else { | ||||||
| try { | ||||||
| $count = $sits.$guid | ||||||
| } | ||||||
| catch { | ||||||
| $count = 0 | ||||||
| } | ||||||
| } | ||||||
|
|
||||||
| # Resolve SIT GUID to friendly name | ||||||
| $friendlyName = $null | ||||||
| if ($sitGuidMap.ContainsKey($guidString)) { | ||||||
| $friendlyName = $sitGuidMap[$guidString] | ||||||
| } | ||||||
| elseif ($fallbackMap.ContainsKey($guidString)) { | ||||||
| $friendlyName = $fallbackMap[$guidString] | ||||||
| } | ||||||
| else { | ||||||
| # Attempt to query tenant by Identity as a last resort | ||||||
| try { | ||||||
| $sitObj = Get-DlpSensitiveInformationType -Identity $guidString -ErrorAction SilentlyContinue | ||||||
| if ($sitObj) { | ||||||
| if ($sitObj.PSObject.Properties['Name']) { | ||||||
| $friendlyName = $sitObj.Name | ||||||
| } | ||||||
| elseif ($sitObj.PSObject.Properties['DisplayName']) { | ||||||
| $friendlyName = $sitObj.DisplayName | ||||||
| } | ||||||
| else { | ||||||
| $friendlyName = $sitObj.ToString() | ||||||
| } | ||||||
| } | ||||||
| } | ||||||
| catch { | ||||||
| $friendlyName = "Unknown SIT - $guidString" | ||||||
| } | ||||||
| } | ||||||
|
|
||||||
| if (-not $friendlyName) { | ||||||
| $friendlyName = "Unknown SIT - $guidString" | ||||||
| } | ||||||
|
|
||||||
| $sitDetails += "$friendlyName`: $count matches" | ||||||
| } | ||||||
| } | ||||||
| } | ||||||
| catch { | ||||||
| $sitDetails += "Unable to parse ItemStatistics: $($_.Exception.Message)" | ||||||
| } | ||||||
|
|
||||||
| $sitString = if ($sitDetails.Count -gt 0) { | ||||||
| $sitDetails -join "; " | ||||||
| } | ||||||
| else { | ||||||
| 'None' | ||||||
| } | ||||||
|
|
||||||
| $createdUtc = '' | ||||||
| if ($scan.WhenCreatedUTC) { | ||||||
| $createdUtc = $scan.WhenCreatedUTC | ||||||
| } | ||||||
| $lastScanStart = '' | ||||||
| if ($scan.LastScanStartTime) { | ||||||
| $lastScanStart = $scan.LastScanStartTime | ||||||
| } | ||||||
|
|
||||||
| # Build output row | ||||||
| $row = [PSCustomObject]@{ | ||||||
| Name = $name | ||||||
| Status = $status | ||||||
| Workload = $workload | ||||||
| 'SIT Detected' = $sitString | ||||||
| 'Created (UTC)' = $createdUtc | ||||||
| 'Last Scan Start' = $lastScanStart | ||||||
| } | ||||||
| $tableData += $row | ||||||
|
|
||||||
| # Status counts | ||||||
| if ($status -ne '') { | ||||||
| if ($statusCounts.ContainsKey($status)) { | ||||||
| $statusCounts[$status]++ | ||||||
| } | ||||||
| else { | ||||||
| $statusCounts[$status] = 1 | ||||||
| } | ||||||
| } | ||||||
| } | ||||||
| } | ||||||
|
|
||||||
| # Workload coverage counts | ||||||
| $hasSharePoint = (@($scansList) | Where-Object { $_.Workload -and (($_.Workload -contains 'SharePoint') -or (($_.Workload -join ',') -match 'SharePoint')) }).Count | ||||||
| $hasOneDrive = (@($scansList) | Where-Object { $_.Workload -and (($_.Workload -contains 'OneDrive') -or (($_.Workload -join ',') -match 'OneDrive')) }).Count | ||||||
| $hasExchange = (@($scansList) | Where-Object { $_.Workload -and (($_.Workload -contains 'Exchange') -or (($_.Workload -join ',') -match 'Exchange')) }).Count | ||||||
|
|
||||||
| # Most recent scan start | ||||||
| $mostRecentScan = @($scansList) | Where-Object { $_.LastScanStartTime } | Sort-Object LastScanStartTime -Descending | Select-Object -First 1 | ForEach-Object { $_.LastScanStartTime } | ||||||
| } | ||||||
|
|
||||||
|
|
||||||
| #endregion Assessment Logic | ||||||
|
|
||||||
| #region Report Generation | ||||||
| $testResultMarkdown = "" | ||||||
|
|
||||||
| if ($errorMsg) { | ||||||
| $testResultMarkdown = "Unable to determine on-demand scan configuration due to permissions issues or query failure.`n`n" | ||||||
| $customStatus = 'Investigate' | ||||||
| } | ||||||
| else { | ||||||
| $passed =$false | ||||||
| if ($passed) { | ||||||
| $testResultMarkdown = "✅ At least one on-demand scan is configured in the organization, enabling discovery and classification of historical sensitive information.`n`n" | ||||||
| } | ||||||
| else { | ||||||
| $testResultMarkdown = "❌ No on-demand scans are configured in the organization; historical sensitive data cannot be discovered.`n`n" | ||||||
|
Comment on lines
+257
to
+263
|
||||||
| } | ||||||
|
|
||||||
| $testResultMarkdown += "### On-Demand scan configuration summary`n`n" | ||||||
|
|
||||||
| if ($scanCount -gt 0 -and $tableData) { | ||||||
| $testResultMarkdown += "**Scan details:**`n`n" | ||||||
| $testResultMarkdown += "| Name | Sensitive information scan status | Workload | Sensitive information types detected | When created UTC | Last scan start time|`n" | ||||||
| $testResultMarkdown += "|------|--------|----------|--------------|---------------|-----------------|`n" | ||||||
|
|
||||||
| foreach ($row in $tableData) { | ||||||
| $nameEsc = $row.Name | ||||||
| $statusEsc = $row.Status | ||||||
| $workEsc = $row.Workload | ||||||
| $sitEsc = 'SIT Detected' | ||||||
|
||||||
| $sitEsc = 'SIT Detected' | |
| $sitEsc = $row.'SIT Detected' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
.DESCRIPTIONstates that the test "checks if on-demand scans are configured for sensitive information discovery in SharePoint, OneDrive, and Exchange", but the assessment logic only sets$passedbased on($scanCount -ge 1)without considering per-workload coverage. This creates a mismatch between the documented behavior and the implemented pass criteria; either the description should be relaxed to match the current implementation, or the logic should be updated to require scans for the stated workloads if that is the intended requirement.